Site Reliability Engineer, Technical Referent

6 giorni fa

Lazio, Italia Dlocal A tempo pieno

Site Reliability Engineer, Technical ReferentJoin to apply for the Site Reliability Engineer, Technical Referent role at d LocalWhy should you join d Local?d Local enables the biggest companies in the world to collect payments in 40 countries in emerging markets.Global brands rely on us to increase conversion rates and simplify payment expansion effortlessly.As both a payments processor and a merchant of record where we operate, we make it possible for our merchants to make inroads into the world's fastest-growing, emerging markets.By joining us you will be a part of an amazing global team that makes it all happen, in a flexible, remote-first dynamic culture with travel, health and learning benefits, among others.Being a part of d Local means working with ****+ teammates from 30+ different nationalities and developing an international career that impacts millions of people's daily lives.We are builders, we never run from a challenge, we are customer-centric, and if this sounds like you, we know you will thrive in our team.What's the opportunity?We are looking for a Site Reliability Engineer (SRE) to join our teamAs our Site Reliability Engineer (SRE), you will be focused on the design, implementation and continuous maintenance of our centralized observability platform using Open Telemetry (OTEL) as its backend.You will be part of a talented team that works on mission-critical applications with big customers like Netflix, Amazon, Nike, Facebook & moreAs a Site Reliability Engineer, you are always expected to ask the necessary questions:What data do we need to understand how our systems are performing?How do we collect this data?What patterns are we looking for in the data and what do they mean?Who should be notified when a certain system is not working properly?Do we have any systems that we need more data for?An SRE engineer designs systems and processes to answer the questions above and to provide automated support and response where possible.What will you do?Own Open Telemetry Pipelines: Design, implement, and maintain observability pipelines across the three main signals—logs, metrics, and traces—ensuring standardized, scalable, and efficient data ingestion.Optimize ingestion strategies to balance cost, performance, and usability.Empower Engineering Teams: Build self-service automation and tooling that enables development teams to instrument and leverage observability without requiring manual intervention from the SRE team.Drive adoption of best practices while ensuring teams own their telemetry.Support Incident Management: Be the Engineering side of our Incident Management Team, designing the processes, playbooks, checklists, and automations for them and other engineers to follow during an incident.Collaborate Across Teams: Interact with members from almost all teams across the business to understand their monitoring, alerting and SLO / SLA requirements and design systems and processes that ensure we meet or exceed these requirements.Influence architectural decisions during initial design stages to ensure resiliency and scale at the outset of software development.Automate Observability Infrastructure: Leverage Infrastructure-as-Code (Ia C) to provision and manage monitoring tools, alerting rules, and our observability configurations across OTEL Pipelines.Define Baseline Observability Standards: Design base level requirements for new and existing services to ensure that all d Local infrastructure and code are monitored consistently and accurately at a basic level.Own Technical and Security Health: Take full ownership of d Local's infrastructure reliability, ensuring adherence to key availability and security KPIs.Optimize Alerting Systems: Continuously refine alerting signals to minimize noise and ensure them are always actionable, reducing fatigue and improving response efficiency.Which skill do you need?Over 4 years' of experience as SRE Engineer or in a very similar role more focused on observabilityExpertise in Kubernetes, including its core components, deployment methodologies, and monitoring best practicesSome understanding of Open Telemetry, including setting up OTEL collectors, instrumentation, and pipeline optimizationProficiency with monitoring and logging tools such as Grafana, Prometheus, Loki, New Relic, or DatadogHands-on experience with Ia C tools (Terraform) and Git Ops CI/CD solutions (Argo CD, Git Hub Actions, or similar)Experience integrating incident management platforms (Pager Duty, Jira) with automated alerting workflowsStrong scripting abilities (Python, Go, or similar) for automating observability tasksA problem-solving mindset, with the ability to collaborate across multi-functional teams to drive reliability improvements.You will stand out if you have:Cloud experience, especially AWS and ECS-based workloadsExperience managing observability pipelines at scale in high-throughput environmentsFamiliarity with Configuration-as-Code (Ansible, Chef, or Salt Stack) for managing configurations across legacy instancesDatabase performance monitoring experience, particularly in large-scale distributed environmentsWhat do we offer?Besides the tailored benefits we have for each country, d Local will help you thrive and go that extra mile by offering you:Remote work: work from anywhere or one of our offices around the globe*Flexibility: we have flexible schedules and we are driven by performanceFintech industry: work in a dynamic and ever-evolving environment, with plenty to build and boost your creativityReferral bonus program: our internal talents are the best recruiters - refer someone ideal for a role and get rewardedLearning & development: get access to a Premium Coursera subscriptionLanguage classes: we provide free English, Spanish, or Portuguese classesSocial budget: you'll get a monthly budget to chill out with your team (in person or remotely) and deepen your connectionsd Local Houses: want to rent a house to spend one week anywhere in the world coworking with your team?We've got your backFor people based in Montevideo (Uruguay) applying to non-IT roles, 55% monthly attendance to the office is required

Site Reliability Engineer, Technical Referent

3 giorni fa

Lazio, Italia Altro A tempo pieno

Site Reliability Engineer, Technical ReferentJoin to apply for the Site Reliability Engineer, Technical Referent role at dLocalWhy should you join dLocal?dLocal enables the biggest companies in the world to collect payments in 40 countries in emerging markets.Global brands rely on us to increase conversion rates and simplify payment expansion effortlessly.As...
Site Reliability Engineer, Technical Referent

4 giorni fa

Lazio, Italia Dlocal A tempo pieno

Site Reliability Engineer, Technical ReferentJoin to apply for theSite Reliability Engineer, Technical Referentrole atdLocalWhy should you join dLocal?dLocal enables the biggest companies in the world to collect payments in 40 countries in emerging markets.Global brands rely on us to increase conversion rates and simplify payment expansion effortlessly.As...
Site Reliability Engineer

2 settimane fa

Lazio, Italia Dab Sistemi Integrati Srl A tempo pieno

PosizioneSite Reliability Engineer- Kubernetes e DockerDAB Sistemi Integratiè una realtà fondata nel ****, attiva su tutto il territorio nazionale e leader nella realizzazione disoluzioni integrate di sicurezza attivaebuilding automation. Il processo comprende tutte le fasi di analisi e progettazione all'interno della nostra divisione Engineering, per poi...
Senior Site Reliability Engineer

2 settimane fa

Lazio, Italia Circle A tempo pieno

Circle is a financial technology company at the epicenter of the emerging internet of money, where value can travel globally, nearly instantly and less expensively than legacy settlement systems.Our infrastructure – including USDC, a blockchain-based dollar – helps businesses, institutions and developers harness these breakthroughs and capitalize on this...
Senior Site Reliability Engineer

6 giorni fa

Lazio, Italia Altro A tempo pieno

Circle is a financial technology company at the epicenter of the emerging internet of money, where value can travel globally, nearly instantly and less expensively than legacy settlement systems.Our infrastructure – including USDC, a blockchain-based dollar – helps businesses, institutions and developers harness these breakthroughs and capitalize on this...
Senior Site Reliability Engineer

4 giorni fa

Lazio, Italia Atlas Reply Roma A tempo pieno

Circle is a financial technology company at the epicenter of the emerging internet of money, where value can travel globally, nearly instantly and less expensively than legacy settlement systems.Our infrastructure – including USDC, a blockchain-based dollar – helps businesses, institutions and developers harness these breakthroughs and capitalize on this...
Remote Site Reliability

12 ore fa

Lazio, Italia Canonical A tempo pieno

A pioneering tech firm is looking for a Site Reliability / Gitops Engineer to enhance automation and cloud operations.This role requires an enthusiast for Linux who can develop infrastructure as code and maintain core services across global teams.Ideal candidates will have a strong engineering background, experience in software development and Linux...
Site Reliability Engineer

2 settimane fa

Lazio, Italia Canonical A tempo pieno

A leading provider of open source software is hiring a Site Reliability Engineer to enhance enterprise infrastructure through DevOps practices.The role focuses on deploying and managing OpenStack, Kubernetes, and various open-source applications.Ideal candidates will have a software engineering or computer science degree and strong experience in Python...
Remote Site Reliability Engineer

7 giorni fa

Lazio, Italia Canonical A tempo pieno

A leading open source software company is seeking a Site Reliability Engineer based in Pisa, Italy.The role focuses on deploying and maintaining OpenStack and Kubernetes, while enhancing DevOps practices.Candidates should possess a degree in software engineering or computer science, with proficiency in Python.This position also offers a global remote role...
Remote Senior Site Reliability Engineer

7 giorni fa

Lazio, Italia Canonical A tempo pieno

A leading software provider is looking for a Senior Site Reliability Engineer to manage next-gen operations globally.This fully remote role emphasizes automation and involves running scalable Kubernetes and private cloud solutions.Candidates should have a background in Software Engineering or Computer Science, experience with Linux, and expertise in...

Americhe

Europa

Asia / Oceania

Africa

Site Reliability Engineer, Technical Referent