Socialtrait builds AI communities that allow organisations to interact with statistically representative target audiences and test ideas, products, messages, and strategies at unprecedented speed. The platform features behavioural simulation of thousands of AI personas across diverse environments, including surveys, focus group discussions, creative evaluation, social media simulation and UX testing.
We are seeking a highly analytical Lead Data Scientist to lead the research, design, development, evaluation, and operation of our core audience simulation technology stack. In this senior position based at our Sydney office, you will own the algorithmic and statistical foundations of our platform. The role spans the full lifecycle of our AI systems, from research and methodology design through implementation, evaluation, and production operation. You will combine deep individual contributor work on the most technically demanding statistical and machine learning problems with the leadership of a small team of engineers and data scientists. As a technical leader, you will drive our core AI roadmap, direct team workflows, and set the long-term technical direction for Socialtrait's data science and AI simulation capabilities.
The successful candidate must be based in, or willing to relocate to, Sydney, NSW.
Key Responsibilities:
- Lead the research, design, development and evaluation of Socialtrait’s machine learning and artificial intelligence frameworks, including machine learning based systems for simulating realistic human-like behaviour across AI persona communities.
- Own the algorithmic and statistical foundations of Socialtrait’s audience simulation platform, including persona sampling, behavioural simulation methodology, synthetic panel design and market research simulation workflows.
- Prepare, clean, transform and validate complex datasets from public and proprietary datasets and synthetic panel outputs, including resolving anomalies, missingness and other data quality issues and apply statistical modelling, mathematical analysis, machine learning and natural language processing to discover trends and extract actionable insights.
- Develop and optimise algorithms, features and analytical pipelines for persona generation, behavioural simulation, response evaluation, ranking, clustering, dimensionality reduction, distributional calibration and insight comparison.
- Build, validate, monitor and improve predictive, statistical and machine learning models, including models that compare AI-generated behavioural patterns against real human reference data.
- Architect, implement and maintain production AI systems, data pipelines and cloud infrastructure supporting Socialtrait’s audience simulation platform, including model serving, LLM orchestration, retrieval-augmented generation, structured outputs, monitoring and observability.
- Translate ambiguous market research problems, client requirements and product objectives into technical specifications, datasets, experiments, model configurations, evaluation objectives and product roadmap items.
- Create data visualisations, dashboards, analytical reports and technical documentation to communicate model behaviour, statistical findings, evaluation results and simulation insights to technical and non-technical stakeholders.
- Provide strategic input into Socialtrait’s data science, machine learning and AI roadmap, including evaluation methodology, model improvement priorities, technical architecture and product-facing AI capabilities.
- Lead and mentor engineers and data scientists by setting technical priorities, breaking down complex R&D work, coordinating experiments, reviewing code and technical designs, and establishing implementation standards.
- Collaborate with founders, product management, market research experts, platform engineering and client-facing teams to align core AI capabilities with product strategy, client requirements and commercial priorities.
Minimum Requirements:
- Bachelor's degree or higher in computer science, data science, statistics, mathematics, physics or another strongly quantitative discipline.
- At least 2 years of professional experience in data science, applied AI, machine learning, or AI engineering, including hands-on work on production LLM systems and AI agent systems.
- Advanced Python capability and strong working knowledge of data science and machine learning tooling (e.g., Pandas, NumPy, Jupyter, PyTorch, scikit-learn, Hugging Face Transformers, or equivalent frameworks)
- Strong data science capability, including applied statistics, data preparation, cleaning, exploratory data analysis, statistical modelling, model training and evaluation, data visualisation and presentation of findings to non-technical stakeholders.
- Practical experience preparing, cleaning, transforming and validating complex datasets, including identifying and resolving missingness, anomalies, outliers, inconsistencies and other data quality issues.
- Experience designing algorithms, statistical models or machine learning models to analyse complex datasets and generate insights for business, product or research decision-making.
- Hands-on experience fine-tuning, evaluating, deploying or operating LLM-based systems, including LLMOps, observability and orchestration tooling such as Langfuse, LangSmith, LangChain or equivalent frameworks.
- Working knowledge of Retrieval-Augmented Generation (RAG) workflows, text embeddings, or vector databases to support language model performance.
- Practical experience designing and maintaining data pipelines and working with production data stores such as SQL or NoSQL databases.
- Proven track record of building, deploying, and maintaining end-to-end machine learning or AI systems in a production environment (from initial architecture and data pipelining to cloud deployment and monitoring).
- Proven capability to lead technical projects and mentor junior engineers, with a demonstrated ability to set technical direction, conduct code and work reviews and guide team workflows and implementation standards.
- Practical experience with cloud platforms such as AWS, GCP, or Azure.
- Excellent verbal and written English communication skills, with a demonstrated ability to translate complex technical AI and data science concepts into clear business insights for cross-functional stakeholders.
Preferred Skills & Attributes (Highly Regarded):
- Postgraduate degree (Masters or PhD) in computer science, data science, statistics, mathematics, physics or another strongly quantitative discipline.
- Proven experience operating within an early-stage startup environment, with demonstrated flexibility, resilience, and a proactive, autonomous work style requiring minimal supervision.
- Practical or theoretical expertise in LLM-based behavioural simulation, synthetic audiences, synthetic panels, agent-based simulation, computational social science or adjacent AI simulation systems.
- Familiarity with traditional market research methodologies, social sciences, or behavioural modelling, including a proven ability to interpret and apply relevant academic and industry literature.
- Exceptional ability to translate ambiguous, high-level business objectives into robust, production-grade technical solutions and execute them efficiently while balancing speed, reliability, maintainability and product impact.
- Extensive experience leveraging advanced agentic engineering systems and AI-assisted development workflows (e.g., Anthropic’s Claude Code, OpenAI Codex, or advanced LLM-driven coding agents) to optimise engineering velocity and code delivery.
- Strong practical experience managing enterprise-grade system resilience, including workflow orchestration frameworks (e.g., Temporal), message queuing systems, load balancing, and distributed systems architecture.
- Experience with serverless deployment platforms such as RunPod, Modal, or Baseten.
Pay: $150,000.00 – $170,000.00 per year
Work Location: Hybrid remote in North Sydney NSW 2060