Lead Site Reliability Engineer

2 settimane fa


Milano, Lombardia, Italia Pragmatike A tempo pieno

Job Description
Location:
Fully remote EU timezone (CET ±2h)

Start date:
ASAP

Languages:
Fluent English is mandatory

Industry:
Cloud Computing

We are hiring at Pragmatike to expand our team and drive the growth of our internal projects.

Our focus is on developing cutting-edge solutions in Cloud Computing, while fostering a culture of collaboration and innovation. Joining us means being part of a passionate team where your ideas and skills directly contribute to shaping tomorrows technologies.

If you're excited about working on ambitious projects in a dynamic and flexible environment, we'd love to hear from you

Responsibilities

  • Operate and maintain Linux-based infrastructure (Debian/Ubuntu).
  • Deploy, manage, and scale Kubernetes clusters across bare-metal, virtualized, and on-prem environments.
  • Oversee full cluster lifecycle: upgrades, node pools, networking, storage, and security hardening.
  • Implement automation for provisioning and operations using Ansible, Bash/Python, and GitOps workflows.
  • Design and maintain networking architecture including VLANs, L2/L3 routing, VPNs, and multi-site connectivity.
  • Build automated deployment workflows (PXE boot, Preseed, cloud-init).
  • Deploy and maintain observability stacks (Prometheus/Grafana, Loki, ELK, Graylog).
  • Lead incident response and escalation activities across the platform.
  • Improve system availability and reduce latency at all levels.
  • Define and implement SLOs/SLIs at multiple infrastructure levels (physical network/hardware, platform virtualization, software services).
  • Optimize alerting and monitoring pipelines to provide actionable insights.
  • Establish and maintain on-call schedules to ensure coverage across timezones.
  • Develop Standard Operating Procedures (SOPs) for repeatable operations and maintenance tasks.
  • Coordinate physical maintenance for Policlouds (periodic maintenance, hardware issues, DC-Ops).
  • Manage virtualization and orchestration layers (OpenStack, Proxmox, VMware).
  • Help develop and maintain overall architecture across all products.
  • Plan resources for future initiatives, accounting for demand and growth projections.
  • Work with development teams to improve overall quality and optimize resource utilization.
  • Collaborate with cross-functional stakeholders (Hivenet, Policloud, Customer Success teams).

Requirements

  • Expert-level, hands-on experience operating Kubernetes in production environments.
  • Strong network engineering skills (VLANs, L2/L3 routing, VPNs, multi-site connectivity) - this is essential for the role.
  • Strong proficiency with Linux systems administration (Debian/Ubuntu).
  • Solid understanding of networking fundamentals and ability to design complex network architectures.
  • Experience building and maintaining automation workflows (Ansible, Bash/Python, Git-based).
  • Experience with observability stacks such as Prometheus, Grafana, ELK, Loki, or Graylog.
  • Background with virtualization technologies (OpenStack, Proxmox, VMware).
  • Experience with bare-metal provisioning and MAAS (Metal as a Service).
  • Strong understanding of distributed systems and container orchestration.
  • Process-oriented mindset with ability to develop SOPs and operational procedures from scratch.
  • Experience with incident response, escalation procedures, and on-call rotations.
  • Ability to work autonomously in a fast-paced, engineering-driven environment.
  • Strong technical skills combined with alignment to team values.

Nice To Have

  • Experience with service mesh (Istio, Linkerd) or advanced CNI implementations.
  • Knowledge of Cloudflare APIs, DNS automation, or tunnel configurations.
  • Experience with GPU infrastructure, node preparation, or resource scheduling.
  • Familiarity with security best practices (RBAC, firewalls, network policies).
  • Exposure to IT asset management or license tracking workflows.
  • Experience working in multi-timezone environments and coordinating across distributed teams.
  • Background establishing reliability practices and SRE frameworks in growing organizations.

Why Join Us:

  • 100% remote work with flexible hours
  • High-impact role with autonomy and ownership
  • Collaborative and international engineering team
  • Cutting-edge tech stack with strong focus on reliability and automation.

  • Site Reliability Engineer

    1 settimana fa


    Milano, Lombardia, Italia Blackfluo A tempo pieno

    Job DescriptionLocation: Full remote, EU timezone (CET +/- 2 hours)Start Date: As soon as possibleLanguages: English requiredWe are looking for a skilled Site Reliability Engineer (SRE) with deep expertise in AWS to help us scale and secure our infrastructure. As an SRE, you will be instrumental in ensuring the reliability, performance, and scalability of...


  • Milano, Lombardia, Italia Agile Lab A tempo pieno

    Agile Labis a company founded in 2014 with the mission to create value for its customers in data-intensive environments through customisable solutions that establish performance-driven processes, sustainable architectures and automated platforms based on data governance best practices.Having delivered over 100 successful Elite Data Engineering initiatives,...


  • Milano, Lombardia, Italia Prima A tempo pieno

    Are you looking for a new challenge?Fancy helping us shape the future of motor insurance?Prima could be the place for you.Since 2015, we've been using our love of data and tech to rethink motor insurance and bring drivers a great experience at a great price. Our story began in Italy, where we've quickly become the number one online motor insurance provider....


  • Milano, Lombardia, Italia Prima A tempo pieno

    Are you looking for a new challenge?Fancy helping us shape the future of motor insurance?Prima could be the place for you. Since 2015, we've been using our love of data and tech to rethink motor insurance and bring drivers a great experience at a great price. Our story began in Italy, where we've quickly become the number one online motor insurance...


  • Milano, Lombardia, Italia Worldline A tempo pieno

    Job DescriptionNetwork Reliability EngineerNetwork Reliability EngineerMilan, ItalyThisisWorldlineWe are the innovators at the heart of the payments technology industry, shaping how the world pays and gets paid. The solutions our people build today power the growth of millions of businesses tomorrow. From your local coffee shop to unicorns and international...

  • Site Engineer

    4 giorni fa


    Milano, Lombardia, Italia agap2 Italia A tempo pieno

    CHI SIAMOAGAP2 è un gruppo europeo di consulenza ingegneristica e operativa, parte del Gruppo MoOngy. Presenti in 14 paesi europei con oltre 8.500 dipendenti, abbiamo inaugurato la nostra prima sede italiana a Milano sei anni fa. Data la nostra costante crescita, stiamo potenziando e ampliando il nostro team con individui che condividano i nostri valori...

  • Site Development Engineer

    1 settimana fa


    Milano, Lombardia, Italia Arup A tempo pieno

    Shape a future with purpose at Arup in MilanArup's purpose, shared values and collaborative approach has set us apart for over 75 years, guiding how we shape a better world.The OpportunityOur infrastructure team provides a range of design services on projects in both the public and private sectors. We work very closely with other Arup teams as part of...


  • Milano, Lombardia, Italia Arup A tempo pieno

    Civil EngineeringEurope RegionMIL00009ESenior Site Development EngineerShape a future with purpose at Arup in MilanArup's purpose, shared values and collaborative approach has set us apart for over 75 years, guiding how we shape a better world.The OpportunityOur infrastructure team provides a range of design services on projects in both the public and...

  • QA/QC Site Lead

    1 settimana fa


    Milano, Lombardia, Italia CET Connect A tempo pieno

    PositionQA/QC Site LeadLocationMilan, ItalyRole OverviewAs a QA/QC Site Lead, you will play a crucial role in ensuring the quality and compliance of structured cabling and fibre installations on-site. Reporting to the QA Manager, you will be responsible for leading inspection, monitoring, and verification activities to meet client requirements and project...

  • Site Engineer FTTH Milano

    1 settimana fa


    Milano, Lombardia, Italia Transtec services A tempo pieno

    Transtec Services Srl, società di servizi e consulenza che opera nel settore ICT e dell'Innovazione tecnologica, è alla ricerca di un Site Engineer FTTH a Milano per una delle sue aziende clienti.Principali mansioni:Supervisionare e coordinare il team di costruzioneGestire il coordinamento di clienti, team di ingegneria e tecniciSupervisionare e aggiornare...