Data Engineer
📝 Opis główny / Wstęp
Data Engineer – Databricks
The Role
We are hiring a Data Engineer with strong Databricks expertise to design and scale our clients Lakehouse architecture supporting AI-driven clinical and real-world data workflows. You will work closely with AI engineers, biostatisticians, and client-facing strategy teams to build robust, high-performance data pipelines that power agentic AI systems, simulation environments, and advanced analytics. This is infrastructure for AI, not just reporting.
Key Responsibilities
- Design and implement scalable data pipelines using Databricks (Delta Lake) and PySpark.
- Architect medallion-style (Bronze/Silver/Gold) workflows optimized for AI and analytics.
- Build ingestion pipelines for structured and unstructured clinical datasets.
- Optimize Spark clusters for performance, cost efficiency, and reliability.
- Partner with AI engineers to structure data for RAG pipelines, simulations, and agentic workflows.
- Implement best practices using Unity Catalog for governance, lineage, and access control.
- Build data validation, monitoring, and observability into pipelines.
- Deploy and manage infrastructure within AWS (S3, Glue, IAM, Redshift).
- Monitor Spark jobs and troubleshoot performance bottlenecks.
Required Qualifications
- Strong hands-on experience with Databricks (Delta Lake).
- Deep knowledge of Spark / PySpark.
- Experience with AWS cloud services.
- Strong SQL and modern data modeling experience.
- Experience building scalable Lakehouse architectures.
- Experience working with large, complex datasets.
Preferred Experience (Life Sciences Context)
- Experience with clinical trial data, RWD/RWE, or healthcare datasets.
- Familiarity with IQVIA, Veeva, or pharma commercial data ecosystems.
- Experience working in regulated environments (GxP).
- Exposure to ML workflows or AI platform engineering.
Preferred Certifications
- Databricks Certified Data Engineer Associate.
- Databricks Certified Data Engineer Professional.
- Databricks Certified Machine Learning Associate (nice to have).
- AWS Certified Data Analytics – Specialty (nice to have).
Education
Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related technical field.
Why This Role Matters
You will be part of building the data backbone for clinically grounded AI systems used in global life sciences environments. This is an opportunity to work on modern Lakehouse architecture powering agentic AI and simulation systems, not just dashboards.