Evaluation Scenario Writer
4 giorni fa
This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English.
At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI.
What we do
The Mindrift platform connects specialists with AI projects from major tech innovators. Our mission is to unlock the potential of Generative AI by tapping into real-world expertise from across the globe.
About the Role
We're looking for someone who can design realistic and structured evaluation scenarios for LLM-based agents. You'll create test cases that simulate human-performed tasks and define gold-standard behavior to compare agent actions against. You'll work to ensure each scenario is clearly defined, well-scored, and easy to execute and reuse. You'll need a sharp analytical mindset, attention to detail, and an interest in how AI agents make decisions.
Although every project is unique, you might typically:
- Create structured test cases that simulate complex human workflows.
- Define gold-standard behavior and scoring logic to evaluate agent actions.
- Analyze agent logs, failure modes, and decision paths.
- Work with code repositories and test frameworks to validate your scenarios.
- Iterate on prompts, instructions, and test cases to improve clarity and difficulty.
- Ensure that scenarios are production-ready, easy to run, and reusable.
How to get started
Simply apply to this post, qualify, and get the chance to contribute to projects aligned with your skills, on your own schedule. From creating training prompts to refining model responses, you'll help shape the future of AI while ensuring technology benefits everyone.
Requirements
- Bachelor's and/or Master's Degree in Computer Science, Software Engineering, Data Science / Data Analytics, Artificial Intelligence / Machine Learning, Computational Linguistics / Natural Language Processing (NLP), Information Systems or other related fields.
- Background in QA, software testing, data analysis, or NLP annotation.
- Good understanding of test design principles (e.g., reproducibility, coverage, edge cases).
- Strong written communication skills in English.
- Comfortable with structured formats like JSON/YAML for scenario description.
- Can define expected agent behaviors (gold paths) and scoring logic.
- Basic experience with Python and JS.
- Curious and open to working with AI-generated content, agent logs, and prompt-based behavior.
Nice to Have
- Experience in writing manual or automated test cases.
- Familiarity with LLM capabilities and typical failure modes.
- Understanding of scoring metrics (precision, recall, coverage, reward functions).
Benefits
Contribute on your own schedule, from anywhere in the world. This opportunity allows you to:
- Get paid for your expertise, with rates that can go up to $30/hour depending on your skills, experience, and project needs.
- Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.
- Participate in an advanced AI project and gain valuable experience to enhance your portfolio.
- Influence how future AI models understand and communicate in your field of expertise.
-
senior manager, project planning
1 settimana fa
Roma, Lazio, Italia Eco World Development Group Berhad A tempo pienoIt's fun to work in a company where people truly BELIEVE in what they're doingThe Senior Manager, Project Planning & Development (PPD) provides strategic leadership in the end-to-end planning and development lifecycle of property projects. The role involves setting project vision and planning frameworks, formulating development strategies, and overseeing the...
-
HR Strategic Planning Manager
2 giorni fa
Roma, Lazio, Italia PRAXI A tempo pienoMissionThe role ensures organizational development and workforce & labour cost planning through advanced HR analysis, adopting a strategic planning and forecasting approach to align workforce actions with business needs.Main activitiesWorkforce and headcount planning aligned with strategic and business priorities;Development of HR plans and labour cost...
-
Lead Civil Engineer
1 settimana fa
Roma, Lazio, Italia Iberdrola A tempo pienoIberdrola, as a world leader in renewable energies utility, is looking for Lead Civil Engineer for Grid Infrastructure Construction to connect Solar, Wind and BESS (Battery Energy Storage System) plants in Italy.The role of the Lead Civil Engineer for Grid Construction is to guarantee the correct execution of the construction of Electrical Substations (High...
-
Enterprise & IT Architect | CIO Advisory
14 ore fa
Roma, Lazio, Italia KPMG Italy A tempo pienoThe chance to build a better future is right in front of youDo Work That Matters- Il tuo ruolo e le tue responsabilitàVuoi fare un lavoro che sia davvero significativo e di impatto? In KPMG avrai l'opportunità di aiutare i clienti, la società, le colleghe e i colleghi ad affrontare e risolvere le sfide più attuali e complesse. Gli esperti diNolan Norton,...
-
Programme Policy Officer
1 settimana fa
Roma, Lazio, Italia World Food Programme A tempo pienoDEADLINE FOR APPLICATIONS22 January :59-GMT+01:00 Central European Time (Rome)WFP celebrates and embraces diversity. It is committed to the principle of equal employment opportunity for all its employees and encourages qualified candidates to apply irrespective of race, colour, national origin, ethnic or social background, genetic information, gender, gender...
-
Evaluation Scenario Writer
3 giorni fa
roma, Italia Mindrift A tempo pienoAt Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI.What We DoThe Mindrift platform connects specialists with AI projects from major tech innovators. Our mission is to unlock the potential of Generative AI by tapping into real‑world expertise from across the...
-
Roma, Italia Medtronic A tempo pienoA leading medical technology company in Milan seeks a Clinical Evaluation Medical Writer to analyze clinical evidence supporting regulatory submissions. The role requires strong technical writing skills and proficiency in clinical research to ensure comprehensive literature reviews. Successful candidates will be well-versed in FDA regulations and able to...
-
Evaluation Scenario Writer
3 giorni fa
giuliano di roma, Italia Mindrift A tempo pieno3 days ago Be among the first 25 applicantsThis opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to...
-
Giuliano di Roma, Italia Medtronic A tempo pienoA leading medical technology company in Milan seeks a Clinical Evaluation Medical Writer to analyze clinical evidence supporting regulatory submissions. The role requires strong technical writing skills and proficiency in clinical research to ensure comprehensive literature reviews. Successful candidates will be well-versed in FDA regulations and able to...
-
Remote AI Evaluation Scenario Architect
7 giorni fa
Roma, Italia Mindrift A tempo pienoA cutting-edge AI consultancy is seeking an Entry-Level Tester to design evaluation scenarios for LLM-based agents. Responsibilities include creating structured test cases to simulate human workflows, ensuring clarity and effectiveness of scenarios, and analyzing agent behaviors. This flexible part-time freelance role allows you to work around your academic...
-
AI Agent Evaluation Analyst
5 giorni fa
roma, Italia Mindrift A tempo pienoImportant NoticeThis opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.About MindriftAt Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to...
-
Freelance Agent Evaluation Analyst
2 settimane fa
roma, Italia Mindrift A tempo pieno1 week ago Be among the first 25 applicantsThis opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human...
-
Freelance Agent Evaluation Analyst
3 settimane fa
roma, Italia Mindrift A tempo pienoThis opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of...
-
Space Technical Writer
1 settimana fa
Roma, Italia Serco A tempo pienoWith this in mind, we are seeking European-based Technical Writers that will be responsible for developing elements of bid-winning solutions for the European Space division. You would contribute to the development of valuable propositions that meet our customers’ requirements, leading to successful outcomes. You would have the chance to work and win on...
-
Assessment Specialist, 2 Hour Learning
4 giorni fa
Roma, Italia Crossover A tempo pienoReady to revolutionize education with AI-powered assessments? Join 2 Hour Learning as our Assessment Specialist and shape the future of personalized learning. Blend your expertise in psychometrics with cutting-edge AI to create assessments that don't just measure—they inspire. **What you will be doing** - Craft AI-driven test content that sets new...