We're big believers in the power of IRL, so for most roles we ask Campers to work from their local Culture Amp office an average of 2 days a week to unlock connection, pace and culture together.
Join us on our mission to make a better world of work.
Culture Amp is the world's leading employee experience platform, revolutionizing how 25 million employees across more than 6,000 companies create a better world of work. Culture Amp empowers companies of all sizes and industries to transform employee engagement, drive performance management, and develop high-performing teams. Powered by people science and the most comprehensive employee dataset in the world, the most innovative companies including Canva, On, Asana, Dolby, McDonalds and Nasdaq depend on Culture Amp every day.
Culture Amp is backed by leading venture capital funds and has offices in the US, UK, Germany and Australia. Culture Amp has been recognized as one of the world's top private cloud companies by Forbes and most innovative companies by Fast Company.
For more information visit cultureamp.com.
How you can help make a better world of work
Shipping an AI product is only the beginning. The harder challenge, one few teams have mastered is continuous production evaluation: diagnosing performance shifts in real-time, decoding 'why' they occur, and driving measurable quality improvements at scale. We are looking for a Staff level Applied AI Scientist with a strong AI Engineering background to solve this problem for our Coach AI system, establishing the observability and evaluation frameworks that turn early production releases into robust, high-performance production products and then to make this sustainable by enabling the rest of our engineering org to do the same.
As part of this team of amazing humans,
You will
- Own the end-to-end feedback loop: establish a rigorous cycle of prompt engineering, evaluation at scale, and continuous improvement. You will build LLM-powered analysis tools that diagnose performance shifts, provide deep-dive insights, and automate recommendations for prompt or system-level enhancements.
- Contribute to Context engineering: design and optimise what actually enters the model's context: retrieval, memory across sessions, context assembly and compression, and managing context budget in long or multi-turn agentic flows. Validate each change against eval rather than opinion or adhoc testing.
- Design and run evals: sampling, LLM-as-a-judge, and labelling systems over de-identified production traces (for example, with Langfuse) to build longitudinal evaluation monitoring and alerting.
- Eval-driven agentic orchestration: contribute to the agent architecture (planning, tool use, routing, decomposition, verification/critique steps) and let eval findings drive structural changes — e.g. when a failure mode surfaces, add a self-check step, change tool selection, or re-route.
- Model and provider selection: make and own model/routing decisions against quality, latency and cost trade-offs, including when to prompt vs fine-tune vs swap models.
- Create and Monitor guardrails and safety in production: given sensitive coaching and people data, design input/output guardrails, PII handling, content-safety and jailbreak resistance as part of the system.
- Enable others: through reusable frameworks, tooling and documentation so product and engineering teams run their own evaluations. Lead from the front, then hand over.
- Partner closely: with the AI Coach team, product, data science and people science so measured quality maps to real customer value.
- Stay current: with the latest evaluation, observability and LLMOps research and provider offerings.
You have
- Experience building and turning production agentic systems, including context engineering, RAG, memory, cost, model selection and performance.
- Proven experience analysing the performance of AI or data products in production and turning it into changes that maintained and improved the product.
- Hands-on LLM evaluation in production: LLM-as-judge, eval datasets, human-in-the-loop labelling, scoring against thresholds.
- Experience with Observability tooling for LLM and agentic systems (traces, sampling, prompt management, production monitoring such as Langfuse or comparable).
- Experience with longitudinal measurement: metrics and baselines, regression detection, quality tracking over time.
- AI-native daily practice, comfortable using agentic coding tools (Claude Code, Cursor, Codex or similar) on multi-step tasks, with clear judgment on when to direct an agent versus write code yourself.
- Strong technical writing and communication, and a track record of building capability into systems and teaching others to own it.
- Strong signals: built or scaled an eval and observability practice across multiple teams; evolved existing enterprise codebases with AI; production agentic systems (orchestration, RAG); a postgraduate degree in ML, CS, Applied Maths or related; public writing, talks or open-source work in eval, observability or LLMOps.
You are
- Motivated by the effective scaling of AI system performance and adoption in production with the humility to learn in public and the resilience to be a self-starter.
- Motivated by enablement. Your biggest wins come from teaching others and building this into our systems, which can mean you do not own what you build forever.
The way we build at Culture Amp
At Culture Amp, our engineers are increasingly orchestrating agents that write code, rather than just writing it directly themselves. We guide, plan, build, and review loops where AI takes the initiative on routine work, allowing you to steer architecture, trade-offs, and quality. We're investing in a shared "harness" of tooling and standards so agents can do real product work safely, and we all embrace these capabilities as a core part of how we ship.
Please note: candidates must be legally authorised to work in the Australia for the duration of employment, the role is based out of our Melbourne or Sydney hubs.
Perks & Benefits
At Culture Amp, our people are at the heart of our success. We offer competitive pay and a total rewards package designed to support you at work and in life. This includes:
- Equity through our Employee Share Option Program, so you can share in our long-term success
- Learning programs and coaching to help you thrive and grow
- Quarterly refresh days, an extended end-of-year break and a monthly allowance to support your wellbeing and lifestyle
- Inclusive parental leave from day one
- A MacBook and budget to set up your home workspace, enabling flexibility
- Five annual social impact days to to give back to causes that matter to you
- Medical insurance coverage for you and your family (Available for US & UK only)
Our rewards are designed to support different needs and life stages, recognising that what matters most can vary from person to person.
Research shows that candidates from underrepresented backgrounds may hesitate to apply if they don't meet every requirement, but your unique experience matters. If you're interested in joining us, we strongly encourage you to apply and help us build a more diverse and impactful team.
Accommodations & Data Privacy
If you require reasonable accommodations or adjustments due to a disability to complete the online application or to participate in the interview process, please contact [email protected] and identify the type of accommodation or assistance you are requesting. Do not include any medical or health information in this email. The Reasonable Accommodations team will respond to your email promptly.
Culture Amp will retain your CV & personal information for a period of two years (four years for the US) from the date of your application process completion. Culture Amp may contact you in relation to future job opportunities during this time period. For further information please see our privacy policy here or contact [email protected].