Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training
1 giorno fa
Sr. Software Engineer- AI/ML, AWS Neuron Distributed Training Annapurna Labs designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world. AWS Neuron is the complete software stack for the AWS Trainium (Trn1/Trn2) and Inferentia (Inf1/Inf2) cloud‑scale Machine Learning accelerators. This role is for a Senior Machine Learning Engineer in the Distributed Training team for AWS Neuron, responsible for development, enablement, and performance tuning of a wide variety of ML model families, including large‑scale LLMs such as GPT and Llama, as well as Stable Diffusion and Vision Transformers (ViT). The ML Distributed Training team works side by side with chip architects, compiler engineers, and runtime engineers to create, build, and tune distributed training solutions with Trainium instances. Experience training these large models using Python is a must. FSDP (Fully‑Sharded Data Parallel), Deepspeed, Nemo, and other distributed training libraries are central to this work; extending them for the Neuron‑based system is key. Key job responsibilities Lead efforts to build distributed training support into PyTorch and JAX using XLA, the Neuron compiler, and runtime stacks. Optimize models to achieve peak performance and maximize efficiency on AWS custom silicon, including Trainium and Inferentia, as well as Trn1, Trn2, Inf1, and Inf2 servers. Apply strong software development skills, deep dive into complex problems, and work effectively with cross‑functional teams. Build a solid foundation in Machine Learning to deliver high‑quality solutions. About the team Annapurna Labs was a startup company acquired by AWS in 2015 and is now fully integrated. The team operates across silicon engineering, hardware design and verification, software, and operations, supporting AWS Neuron, Inferentia, and Trainium ML Accelerators. We foster a collaborative environment with mentorship, thorough code reviews, and opportunities for career growth. Basic Qualifications Bachelor’s degree in computer science or equivalent. 5+ years of non‑internship professional software development experience. 5+ years of programming with at least one software programming language experience. 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems. 5+ years of full software development life cycle experience, including coding standards, code reviews, source control management, build processes, testing, and operations. Experience as a mentor, tech lead, or leading an engineering team. Experience in machine learning, data mining, information retrieval, statistics, or natural language processing. Preferred Qualifications Master’s degree in computer science or equivalent. Experience in computer architecture. Previous software engineering expertise with PyTorch, Jax/TensorFlow, distributed libraries and frameworks, and end‑to‑end model training. Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status. #J-18808-Ljbffr
-
Torino, Italia Amazon A tempo pienoSr.Software Engineer- AI/ML, AWS Neuron Distributed TrainingAnnapurna Labs designs silicon and software that accelerates innovation.Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday.Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have...
-
torino, Italia Amazon A tempo pienoSr. Software Engineer- AI/ML, AWS Neuron Distributed Training Annapurna Labs designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that...
-
Torino, Italia Amazon A tempo pienoSr. Software Engineer- AI/ML, AWS Neuron Distributed Training Annapurna Labs designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that...
-
sant'ambrogio di torino, Italia Amazon A tempo pienoSr. Software Engineer- AI/ML, AWS Neuron Distributed TrainingAnnapurna Labs designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that...
-
Sant'Ambrogio di Torino, Italia Amazon A tempo pienoSr. Software Engineer- AI/ML, AWS Neuron Distributed TrainingAnnapurna Labs designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that...
-
Sant'Ambrogio di Torino, Italia Amazon A tempo pienoSr. Software Engineer- AI/ML, AWS Neuron Distributed Training Annapurna Labs designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that...
-
Torino, Italia Amazon A tempo pienoSr.Software Engineer- AI/ML, AWS Neuron Distributed TrainingAnnapurna Labs designs silicon and software that accelerates innovation.Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday.Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have...
-
Torino, Italia Amazon A tempo pienoSr. Software Engineer- AI/ML, AWS Neuron Distributed Training Annapurna Labs designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that...
-
Sant'Ambrogio di Torino, Italia Amazon A tempo pienoSr. Software Engineer- AI/ML, AWS Neuron Distributed Training Annapurna Labs designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that...
-
sant'ambrogio di torino, Italia Amazon A tempo pienoSr. Software Engineer- AI/ML, AWS Neuron Distributed TrainingAnnapurna Labs designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago—even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that...