Lavori attuali relativi a AI Engineer - Napoli, Campania - Aieng
-
Senior Software Engineer
4 settimane fa
casalnuovo di napoli, Italia Stellar AI A tempo pienoOverviewAt Stellar AI (Permanent / Contractor). Remote policy: Global remote. Expires at .We are seeking experienced Software Engineers to contribute to projects across a wide range of technologies and programming languages, including JavaScript, Python, Go, C++, Ruby, and more.This is an open-ended contract opportunity, structured around project-based work...
AI Engineer
4 giorni fa
Aieng
is a
high-growth innovative startup
focused on delivering next-generation engineering solutions. We bridge the gap between ambitious vision and technical reality by investing heavily in research, development, and top-tier talent. Currently in a dynamic phase of expansion, we pride ourselves on our agility and our ability to navigate complex industrial challenges. At Aieng, we don't just follow industry trends—we aim to set them. Join us as we build the infrastructure of tomorrow.
Your Role
As an
AI Engineer (Inference & RAG Architect)
, you will design, optimize, and operate
local, production-grade LLM systems
, owning the full lifecycle from low-level inference performance to high-level semantic memory and agent orchestration.
Key Responsibilities
- Architect and optimize
high-throughput LLM inference pipelines - Design and implement
enterprise-grade RAG systems - Benchmark, validate, and fine-tune open-source models for domain-specific workloads
- Build
agentic AI systems
with deterministic, auditable behavior - Ensure scalability, observability, and reliability of AI systems in production
Technical Skills (Hard Skills)
LLM Inference & Systems Optimization
- Advanced configuration and tuning of
vLLM
,
Ollama
, and
TGI - Deep understanding of
PagedAttention
, continuous batching, and KV-cache optimization - Model
quantization
techniques (INT8, INT4, GPTQ, AWQ, GGUF) - GPU scheduling
, VRAM optimization, multi-GPU and multi-node inference - CUDA-aware performance tuning (conceptual and practical)
- Deployment of LLMs in
on-prem, edge, and air-gapped environments
Retrieval-Augmented Generation (RAG) & Knowledge Systems
- Design of
multi-stage RAG pipelines - Integration with
vector databases
(Qdrant, Weaviate, FAISS) - Hybrid retrieval strategies (dense, sparse, BM25)
- Re-ranking
using cross-encoders and LLM-based rankers - Metadata-driven access control and document-level security
- Chunking, embedding strategy design, and context window optimization
Model Lifecycle & Evaluation
- Evaluation and benchmarking of models such as
Llama 3, Mistral, Phi, Mixtral - Domain adaptation via
LoRA / QLoRA - Prompt and system prompt engineering with reproducibility guarantees
- Offline and online evaluation frameworks (faithfulness, groundedness, latency, cost)
- Versioning and rollback strategies for models and prompts
Agentic Architectures & Orchestration
- Design of
agent-based systems
with tool use, memory, and planning - Development using
Semantic Kernel
, LangGraph, or custom agent frameworks - Deterministic execution, guardrails, and fallback strategies
- Implementation in
Python and C#
MLOps, DevOps & Observability
- Containerization with
Docker
and orchestration via
Kubernetes - CI/CD for AI systems
- Monitoring of latency, throughput, hallucination rates, and failures
- Logging, tracing, and observability for LLM pipelines
- Infrastructure-as-Code (Terraform or equivalent)
Soft Skills
- Strong
system-level thinking
and architectural mindset - Obsession for performance, reliability, and correctness
- Ability to translate business requirements into technical architectures
- Clear communicator in cross-functional, high-complexity environments
- Ownership mentality and engineering rigor
Experience & Education
- 3+ years of experience in
AI Engineering, ML Systems, or Platform Engineering - Strong academic background in
Computer Science, Engineering, or related fields - Proven experience deploying
self-hosted LLMs
in production - Exposure to enterprise constraints (security, compliance, scalability)