Freelance Agent Evaluation Analyst

2 settimane fa

Rome, Italia Mindrift A tempo pieno

1 week ago Be among the first 25 applicantsThis opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency.At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI.What We DoThe Mindrift platform, launched and powered by Toloka, connects domain experts with cutting-edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into real-world expertise from across the globe.Who we're looking forWe're looking for curious and intellectually proactive contributors, the kind of person who double-checks assumptions and plays devil's advocate. Are you comfortable with ambiguity and complexity? Does an async, remote, flexible opportunity sound exciting? Would you like to learn how modern AI systems are tested and evaluated?This is a flexible, project-based opportunity well-suited for:Analysts, researchers, or consultants with strong critical thinking skillsStudents (senior undergrads / grad students) looking for an intellectually interesting gigPeople open to a part-time and non-permanent opportunityAbout the projectWe're on the hunt for QAs for autonomous AI agents for a new project focused on validating and improving complex task structures, policy logic, and agent evaluation frameworks. Throughout the project, you'll have to balance quality assurance, research, and logical problem-solving. This project opportunity is ideal for people who enjoy looking at systems holistically and thinking through scenarios, implications, and edge cases. You do not need a coding background, but you must be curious, intellectually rigorous, and capable of evaluating the soundness and consistency of complex setups.What you'll be doingReviewing evaluation tasks and scenarios for logic, completeness, and realismIdentifying inconsistencies, missing assumptions, or unclear decision pointsHelping define clear expected behaviors (gold standards) for AI agentsAnnotating cause‑effect relationships, reasoning paths, and plausible alternativesThinking through complex systems and policies as a human would to ensure agents are tested properlyWorking closely with QA, writers, or developers to suggest refinements or edge‑case coverageHow to get startedApply to this post, qualify, and get the chance to contribute to a project aligned with your skills, on your own schedule. Shape the future of AI while building tools that benefit everyone.RequirementsExcellent analytical thinking: Can reason about complex systems, scenarios, and logical implicationsStrong attention to detail: Can spot contradictions, ambiguities, and vague requirementsFamiliarity with structured data formats: Can read, not necessarily write JSON/YAMLAbility to assess scenarios holistically: Know what's missing, what's unrealistic, what might breakGood communication and clear writing in English to document your findingsWe also value applicants who haveExperience with policy evaluation, logic puzzles, case studies, or structured scenario designBackground in consulting, academia, olympiads (e.g., logic/math/informatics), or researchExposure to LLMs, prompt engineering, or AI-generated contentFamiliarity with QA or test‑case thinking (edge cases, failure modes, "what could go wrong")Some understanding of how scoring or evaluation works in agent testing (precision, coverage, etc.)BenefitsGet paid for your expertise, with rates that can go up to $50/hour depending on your skills, experience, and project needsTake part in a flexible, remote, freelance project that fits around your primary professional or academic commitmentsParticipate in an advanced AI project and gain valuable experience to enhance your portfolioInfluence how future AI models understand and communicate in your field of expertise#J-18808-Ljbffr

Freelance Agent Evaluation Analyst

2 settimane fa

Rome, Italia Mindrift A tempo pieno

1 week ago Be among the first 25 applicants This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English proficiency. At Mindrift, innovation meets opportunity. We believe in using the power of collective human...
Remote AI Evaluation Scenario Architect

5 giorni fa

Rome, Italia Mindrift A tempo pieno

A cutting-edge AI consultancy is seeking an Entry-Level Tester to design evaluation scenarios for LLM-based agents. Responsibilities include creating structured test cases to simulate human workflows, ensuring clarity and effectiveness of scenarios, and analyzing agent behaviors. This flexible part-time freelance role allows you to work around your academic...
Evaluation Scenario Writer

5 giorni fa

Rome, Italia Mindrift A tempo pieno

At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI.What We DoThe Mindrift platform connects specialists with AI projects from major tech innovators. Our mission is to unlock the potential of Generative AI by tapping into real‑world expertise from across the...
Evaluation Scenario Writer

5 giorni fa

Rome, Italia Mindrift A tempo pieno

At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. What We Do The Mindrift platform connects specialists with AI projects from major tech innovators. Our mission is to unlock the potential of Generative AI by tapping into real‑world expertise from across the globe....
Data Analyst Power BI

3 settimane fa

Rome, Italia Collective.work A tempo pieno

Data Analyst Power BI - Italien courant - Freelance Budget: 600 Contexte Leader industriel mondial - filiale italienne. Dans le cadre du nouveau plan et d'acquisitions récentes, le groupe cherche un Data Analyst pour sa filiale italienne. Profil recherché5 à 10 ans d'expérience Data Analyst Expertise Power BI Maîtrise de la langue italienne Expériences...
MCP & Tools Python Developer - Agent Evaluation Infrastructure

5 giorni fa

Rome, Italia Mindrift A tempo pieno

OverviewAt Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. The Mindrift platform, launched and powered by Toloka, connects domain experts with cutting‑edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into...
MCP & Tools Python Developer - Agent Evaluation Infrastructure

5 giorni fa

Rome, Italia Mindrift A tempo pieno

Overview At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI. The Mindrift platform, launched and powered by Toloka, connects domain experts with cutting‑edge AI projects from innovative tech clients. Our mission is to unlock the potential of GenAI by tapping into...
Remote Python Engineer for MCP Tooling

5 giorni fa

Rome, Italia Mindrift A tempo pieno

An innovative AI consultancy in Italy seeks hands-on Python engineers for a part-time remote role. In this position, you will develop and maintain MCP-compatible evaluation servers, implement logic for verifying agent actions, and extend tools for testing agents. Ideal candidates have 4+ years of Python experience, solid API building skills, and familiarity...
Quantitative Statistics Expert

2 settimane fa

Rome, Italia Altro A tempo pieno

Quantitative Statistics Expert - Freelance AI TrainerBe among the first 25 applicants.Located in Italy. Submit resume in English.What We DoAt Mindrift, innovation meets opportunity. We believe in using the power of collective intelligence to ethically shape the future of AI.About the RoleGenAI models are improving very quickly, and one of our goals is to...
Remote AI Agent Evaluation Analyst

3 giorni fa

Rome, Italia Mindrift A tempo pieno

A technology consultancy is seeking QAs for autonomous AI agents to work on a project focused on validating complex task structures. Candidates should possess excellent analytical thinking, strong attention to detail, and be comfortable documenting findings in English. This flexible and remote position offers competitive hourly rates up to $50, making it...

Americhe

Europa

Asia / Oceania

Africa

Freelance Agent Evaluation Analyst