Job Title: Data Engineer

Location: Krakow, Poland (Hybrid)

Employment Type: Contract

Job Overview

Role Summary

We are seeking an experienced Data Engineer with strong expertise in Azure Databricks (ADB) and Azure Data Factory (ADF) to design, build, and optimize scalable data pipelines across batch and streaming workloads. The ideal candidate is hands-on with PySpark and Python, has a solid grasp of SQL and database concepts, and can drive performance and memory optimizations in cloud-native data platforms.

Key Responsibilities

Design, build, and maintain data pipelines using ADF (or Synapse Pipelines) and Databricks for batch and real-time ingestion and transformations.
Develop scalable data processing solutions using PySpark on Azure Databricks (Delta Lake preferred).
Implement streaming pipelines (e.g., Structured Streaming, Event Hubs, Kafka) and robust batch workflows with monitoring and alerting.
Optimize Spark jobs for performance (partitioning, caching, broadcast joins, AQE, file size tuning) and memory management (executor/driver sizing, shuffle tuning, persistence strategies).
Model data for analytics and ML use cases (medallion architecture—bronze/silver/gold).
Implement CI/CD and DevOps practices for data engineering (branching, testing, deployment).
Ensure data quality, lineage, and governance (e.g., Great Expectations/Deequ, Purview desirable).
Collaborate with data architects, analysts, and product teams to translate requirements into technical solutions.
Produce high-quality documentation and knowledge transfer materials.

Must-Have Skills

Azure Databricks (ADB): PySpark, Spark SQL, Delta Lake, job clusters vs. all-purpose clusters, workflows.
Azure Data Factory (ADF): pipelines, data flows, triggers, integration runtime, parameterization, and orchestration.
Programming: PySpark and Python (modular code, error handling, unit testing).
Streaming & Batch: Structured Streaming, checkpointing, watermarking; batch orchestration patterns.
Performance & Memory Optimization: Spark tuning, partitioning strategies, file formats (Parquet/Delta), caching/persistence, adaptive query execution, skew handling, shuffle and join optimization, cluster sizing.
SQL: Strong SQL for complex transformations, performance tuning, and debugging.
Databases/Data Stores: Working knowledge of relational databases (e.g., SQL Server, PostgreSQL) and data lake storage on ADLS Gen2.
Azure Fundamentals: ADLS, Key Vault, Event Hubs/Service Bus, Azure Functions (nice to know), networking basics, IAM/Role assignments.
Version Control & CI/CD: Git (GitHub/Azure DevOps), build/release pipelines for Databricks/ADF assets.

Good-to-Have

Delta Live Tables (DLT), Unity Catalog, Databricks Workflows.
Data Quality/Observability: Great Expectations, Monte Carlo (or similar).
Data Modeling: Dimensional modeling, star/snowflake schemas.
Orchestration: Airflow.
Messaging/Streaming: Kafka (Confluent), Event Hubs, Spark-Kafka integration.
Infra as Code: Terraform/Bicep for Azure resources.
Cost Optimization in Azure analytics stack.
Exposure to ML pipelines (MLOps) and feature stores (nice to have).

Qualifications

Bachelor’s or Master’s degree in Computer Science, Engineering, or related field (or equivalent experience).
8–10 years total experience in data engineering with 4–5 years hands-on in ADB/ADF.
Proven experience delivering production-grade batch and streaming data solutions at scale.

Data Engineer

📝 Opis główny / Wstęp

📡 Metadata statystyk

📋 Informacje

🔗 Podobne oferty