Lead Site Reliability Engineer

2 settimane fa

Milano, Italia Pragmatike A tempo pieno

Job DescriptionLocation:Fully remote EU timezone (CET ±2h)Start date:ASAPLanguages:Fluent English is mandatoryIndustry:Cloud ComputingWe are hiring at Pragmatike to expand our team and drive the growth of our internal projects.Our focus is on developing cutting-edge solutions in Cloud Computing, while fostering a culture of collaboration and innovation. Joining us means being part of a passionate team where your ideas and skills directly contribute to shaping tomorrows technologies.If you're excited about working on ambitious projects in a dynamic and flexible environment, we'd love to hear from youResponsibilitiesOperate and maintain Linux-based infrastructure (Debian/Ubuntu).Deploy, manage, and scale Kubernetes clusters across bare-metal, virtualized, and on-prem environments.Oversee full cluster lifecycle: upgrades, node pools, networking, storage, and security hardening.Implement automation for provisioning and operations using Ansible, Bash/Python, and GitOps workflows.Design and maintain networking architecture including VLANs, L2/L3 routing, VPNs, and multi-site connectivity.Build automated deployment workflows (PXE boot, Preseed, cloud-init).Deploy and maintain observability stacks (Prometheus/Grafana, Loki, ELK, Graylog).Lead incident response and escalation activities across the platform.Improve system availability and reduce latency at all levels.Define and implement SLOs/SLIs at multiple infrastructure levels (physical network/hardware, platform virtualization, software services).Optimize alerting and monitoring pipelines to provide actionable insights.Establish and maintain on-call schedules to ensure coverage across timezones.Develop Standard Operating Procedures (SOPs) for repeatable operations and maintenance tasks.Coordinate physical maintenance for Policlouds (periodic maintenance, hardware issues, DC-Ops).Manage virtualization and orchestration layers (OpenStack, Proxmox, VMware).Help develop and maintain overall architecture across all products.Plan resources for future initiatives, accounting for demand and growth projections.Work with development teams to improve overall quality and optimize resource utilization.Collaborate with cross-functional stakeholders (Hivenet, Policloud, Customer Success teams).RequirementsExpert-level, hands-on experience operating Kubernetes in production environments.Strong network engineering skills (VLANs, L2/L3 routing, VPNs, multi-site connectivity) - this is essential for the role.Strong proficiency with Linux systems administration (Debian/Ubuntu).Solid understanding of networking fundamentals and ability to design complex network architectures.Experience building and maintaining automation workflows (Ansible, Bash/Python, Git-based).Experience with observability stacks such as Prometheus, Grafana, ELK, Loki, or Graylog.Background with virtualization technologies (OpenStack, Proxmox, VMware).Experience with bare-metal provisioning and MAAS (Metal as a Service).Strong understanding of distributed systems and container orchestration.Process-oriented mindset with ability to develop SOPs and operational procedures from scratch.Experience with incident response, escalation procedures, and on-call rotations.Ability to work autonomously in a fast-paced, engineering-driven environment.Strong technical skills combined with alignment to team values.Nice To HaveExperience with service mesh (Istio, Linkerd) or advanced CNI implementations.Knowledge of Cloudflare APIs, DNS automation, or tunnel configurations.Experience with GPU infrastructure, node preparation, or resource scheduling.Familiarity with security best practices (RBAC, firewalls, network policies).Exposure to IT asset management or license tracking workflows.Experience working in multi-timezone environments and coordinating across distributed teams.Background establishing reliability practices and SRE frameworks in growing organizations.Why Join Us:100% remote work with flexible hoursHigh-impact role with autonomy and ownershipCollaborative and international engineering teamCutting-edge tech stack with strong focus on reliability and automation.

Site Reliability Engineer

1 settimana fa

Milano, Italia Altro A tempo pieno

About the job Site Reliability Engineer (SRE) Job DescriptionLocation: Full remote, EU timezone (CET +/- 2 hours)Start Date: As soon as possibleLanguages: English requiredWe are looking for a skilled Site Reliability Engineer (SRE) with deep expertise in AWS to help us scale and secure our infrastructure.As an SRE, you will be instrumental in ensuring the...
Site Reliability Engineer

2 settimane fa

Milano, Italia Blackfluo.Ai A tempo pieno

About the job Site Reliability Engineer (SRE)Job DescriptionLocation:Full remote, EU timezone (CET +/- 2 hours)Start Date:As soon as possibleLanguages:English requiredWe are looking for a skilledSite Reliability Engineer (SRE)with deep expertise inAWSto help us scale and secure our infrastructure.As an SRE, you will be instrumental in ensuring the...
Site Reliability Engineer

1 settimana fa

Milano, Lombardia, Italia Blackfluo A tempo pieno

Job DescriptionLocation: Full remote, EU timezone (CET +/- 2 hours)Start Date: As soon as possibleLanguages: English requiredWe are looking for a skilled Site Reliability Engineer (SRE) with deep expertise in AWS to help us scale and secure our infrastructure. As an SRE, you will be instrumental in ensuring the reliability, performance, and scalability of...
Senior Site Reliability Engineer

2 giorni fa

Milano, Italia Canonical A tempo pieno

Senior Site Reliability Engineer Join Canonical’s leading open source software and operating systems platform as a Senior Site Reliability Engineer. Overview Canonical is a global provider of open source software and the platform for AI, IoT and the cloud. Our team runs hundreds of private cloud, Kubernetes and application clusters for customers across the...
Site Reliability Engineer II

3 settimane fa

Milano, Italia Agile Lab A tempo pieno

Agile Lab is a company founded in 2014 with the mission to create value for its customers in data-intensive environments through customisable solutions that establish performance-driven processes, sustainable architectures and automated platforms based on data governance best practices. Having delivered over 100 successful Elite Data Engineering initiatives,...
Lead Site Reliability Engineer

2 settimane fa

Milano, Lombardia, Italia Pragmatike A tempo pieno

Job DescriptionLocation:Fully remote EU timezone (CET ±2h)Start date:ASAPLanguages:Fluent English is mandatoryIndustry:Cloud ComputingWe are hiring at Pragmatike to expand our team and drive the growth of our internal projects.Our focus is on developing cutting-edge solutions in Cloud Computing, while fostering a culture of collaboration and innovation....
Site Reliability Engineer

5 giorni fa

Milano, Italia Michael Page A tempo pieno

Michael Page Provide day-to-day operational support for production environments, ensuring high availability and reliability of critical services. Develop, maintain and enhance automation scripts and tools using Bash, Python and Ansible to streamline operational tasks and incident response. Monitor system performance, proactively identify issues and...
Site Reliability Engineer

1 settimana fa

Milano, Italia Immobiliare.It A tempo pieno

Il/la Site Reliability Engineer lavorerà sull'affidabilità dei sistemi, l'automazione, e la gestione degli incidenti critici per garantire uptime e prestazioni delle piattaforme.
Senior Site Reliability Engineer – Automation

1 settimana fa

Milano, Italia Canonical A tempo pieno

A leading open source software provider is seeking a Senior Site Reliability Engineer in Milan, Italy.In this role, you will design and maintain infrastructure automation for cloud and Kubernetes environments.The ideal candidate should have a degree in Software Engineering or Computer Science and strong skills in Linux, Python, and networking.Enjoy benefits...
Senior Site Reliability Engineer – Automation

7 giorni fa

milano, Italia Canonical A tempo pieno

A leading open source software provider is seeking a Senior Site Reliability Engineer in Milan, Italy. In this role, you will design and maintain infrastructure automation for cloud and Kubernetes environments. The ideal candidate should have a degree in Software Engineering or Computer Science and strong skills in Linux, Python, and networking. Enjoy...

Americhe

Europa

Asia / Oceania

Africa

Lead Site Reliability Engineer