Principal Data Engineer

🔒 Confidential Employer

Posted 24 March 2026

LOCATION

London

TYPE

Full-time

LEVEL

Mid-Senior level

SKILLS

ETL/ELT data pipelines Python Scala/Spark Microsoft Azure Databricks Terraform Data Warehouse Design SQL

FULL DESCRIPTION

Principal Data Engineer

Lead [Employer hidden — view at passion-project.co.uk]’s data transformation as Principal Data Engineer: build a robust, scalable platform powering scientific insights, business intelligence, and global supply chain integrity.

Key Responsibilities:

Data Architecture & Strategy

Platform Leadership: Define and own the technical strategy and architecture for our entire data platform covering ingestion, storage, processing, governance, and consumption. To include use-cases in support of Operations, Data Science, Customer-Facing Portals and Business Intelligence.
Pipeline Design: Design and implement highly scalable, performant, and reliable ETL/ELT data pipelines to handle diverse data sources, including complex scientific datasets and supply chain inputs alongside business information.
Technology Selection: Evaluate, recommend, and drive the adoption of new data services and modern data tools to ensure we have a future-proof data ecosystem.
Data Modeling: Lead the design of canonical data models for our data warehouse and operational data stores, ensuring data quality, consistency, and integrity across the platform.
Single Source of Truth: Define and maintain identifiers for clients, suppliers and transactions to ensure consistency across systems (e.g. Salesforce, Netsuite, internal databases) and portals.

Implementation & Technical Excellence

Hands-on Development: Serve as the most senior, hands-on developer, writing high-quality, production-grade code (primarily Python and/or Scala/Spark) to build initial pipelines and core data services.
Data Governance & Security: Architect data security and governance policies, ensuring compliance and best practices around data access, masking, and retention, particularly for sensitive origin data.
Data Quality: Implement automated deduplication, conflict resolution and anomaly detection to maintain data integrity across ingestion sources.
Operational Health: Implement robust monitoring, logging, and alerting for all data pipelines and infrastructure, ensuring high data reliability and performance.
Infrastructure as Code (IaC): Work closely with the Infrastructure team to define and automate the provisioning of all Azure data resources using Terraform or similar IaC tools.

Cross Functional Leadership

Scientific Collaboration: Partner closely with the Science teams to understand the structure, complexity, and requirements of raw scientific data, ensuring accurate data translation and ingestion.
Mentorship: Provide technical guidance and mentorship to software engineers on best practices for interacting with and consuming data services.
Product Partnership: Collaborate with the Product Director to understand commercial and user-facing data requirements, translating these needs into actionable data infrastructure features.

Skills & Experience

Principal/Lead Expertise: Extensive experience (typically 7+ years) focused on data engineering, including significant time spent in a Principal, Lead, or Architect role defining data strategy from the ground up.
Databricks: Deep, practical, and architectural experience of the Databricks platform.
Azure Data Stack: Operational experience of building and running within the Microsoft Azure data ecosystem (e.g., Azure Data Factory, Azure Data Lake, Azure Synapse Analytics, Azure SQL/Cosmos DB).
Coding Proficiency: Expert-level proficiency in Python (or Scala) and SQL, with a strong focus on writing clean, tested, and highly performant data processing code.
Data Warehouse Design: Proven track record designing and implementing scalable data warehouses/data marts for analytical and operational use cases.
Pipeline Automation: Strong experience with workflow orchestration tools and implementing CI/CD for data pipelines.
Cloud Infrastructure: Familiarity with Infrastructure as Code (Terraform) and containerisation.