Data Engineer — Job Description
Summary
Designs, builds, and maintains scalable data pipelines and infrastructure to collect, store, process, and serve reliable data for analytics, BI, and machine learning.
Key Responsibilities
- Design, develop, and maintain ETL/ELT pipelines to ingest structured and unstructured data from multiple sources.
- Build and operate data warehouses, data lakes, and lakehouses; model data for analytical and operational use.
- Implement batch and real-time streaming data processing using appropriate frameworks.
- Ensure data quality, validation, provenance, and lineage; implement monitoring and alerting.
- Optimize data storage, partitioning, and query performance for cost and speed.
- Collaborate with data scientists, analysts, product, and engineering to define schemas, APIs, and SLAs.
- Develop and maintain data ingestion, transformation, and orchestration workflows (scheduling, retries, backfills).
- Implement security, access controls, and data governance practices (masking, encryption, auditing).
- Troubleshoot production incidents, perform root‑cause analysis, and implement long‑term fixes.
- Automate deployments, CI/CD pipelines, and infrastructure provisioning for data platforms.
- Evaluate and pilot new data technologies and tools; drive migrations and upgrades.
Required Qualifications
- Bachelor’s degree in Computer Science, Engineering, Mathematics, or related field (or equivalent experience).
- 2+ years experience building production data pipelines and data platforms (adjust by seniority).
- Strong programming skills in Python, Scala, or Java.
- Experience with SQL and data modeling for analytics (star/snowflake schemas, dimensional modeling).
- Familiarity with big‑data frameworks and ecosystems (Spark, Flink, Hadoop) and query engines (Presto/Trino, Hive).
- Experience with cloud data services (AWS/GCP/Azure): data warehouses (Redshift, BigQuery, Snowflake), object storage (S3/GCS/Blob).
- Experience with orchestration tools (Airflow, Prefect, Dagster) and message/streaming systems (Kafka, Pub/Sub).
- Knowledge of data quality tools/practices, monitoring, and observability.
- Strong problem‑solving, debugging, and communication skills.
Preferred Qualifications
- Experience with MLOps/dataops and feature stores.
- Familiarity with dbt, data catalogs, lineage tools (e.g., Amundsen, DataHub), and governance platforms.
- Experience with Infrastructure as Code (Terraform, CloudFormation) and containerization (Docker, Kubernetes).
- Advanced knowledge of performance tuning for large datasets and cost optimization.
- Certifications in cloud platforms or data engineering.
Pay: $93.21 – $132.32 per hour
Benefits:
- Dental insurance
- Health insurance
- Life insurance
- Salary packaging
Work Location: In person