Principal Data Engineer

🔒 Confidential Employer
Posted 24 March 2026
LOCATION
London
TYPE
Full-time
LEVEL
Mid-Senior level
CATEGORY
Technology
This employer holds a UK Home Office sponsor license — sponsorship for this specific role is at the employer’s discretion

SKILLS

ETL/ELT data pipelines Python Scala/Spark Microsoft Azure Databricks Terraform Data Warehouse Design SQL

FULL DESCRIPTION

Principal Data Engineer

Lead [Employer hidden — view at passion-project.co.uk]’s data transformation as Principal Data Engineer: build a robust, scalable platform powering scientific insights, business intelligence, and global supply chain integrity.

Key Responsibilities:

Data Architecture & Strategy

  • Platform Leadership: Define and own the technical strategy and architecture for our entire data platform covering ingestion, storage, processing, governance, and consumption. To include use-cases in support of Operations, Data Science, Customer-Facing Portals and Business Intelligence.
  • Pipeline Design: Design and implement highly scalable, performant, and reliable ETL/ELT data pipelines to handle diverse data sources, including complex scientific datasets and supply chain inputs alongside business information.
  • Technology Selection: Evaluate, recommend, and drive the adoption of new data services and modern data tools to ensure we have a future-proof data ecosystem.
  • Data Modeling: Lead the design of canonical data models for our data warehouse and operational data stores, ensuring data quality, consistency, and integrity across the platform.
  • Single Source of Truth: Define and maintain identifiers for clients, suppliers and transactions to ensure consistency across systems (e.g. Salesforce, Netsuite, internal databases) and portals.

Implementation & Technical Excellence

  • Hands-on Development: Serve as the most senior, hands-on developer, writing high-quality, production-grade code (primarily Python and/or Scala/Spark) to build initial pipelines and core data services.
  • Data Governance & Security: Architect data security and governance policies, ensuring compliance and best practices around data access, masking, and retention, particularly for sensitive origin data.
  • Data Quality: Implement automated deduplication, conflict resolution and anomaly detection to maintain data integrity across ingestion sources.
  • Operational Health: Implement robust monitoring, logging, and alerting for all data pipelines and infrastructure, ensuring high data reliability and performance.
  • Infrastructure as Code (IaC): Work closely with the Infrastructure team to define and automate the provisioning of all Azure data resources using Terraform or similar IaC tools.

Cross Functional Leadership

  • Scientific Collaboration: Partner closely with the Science teams to understand the structure, complexity, and requirements of raw scientific data, ensuring accurate data translation and ingestion.
  • Mentorship: Provide technical guidance and mentorship to software engineers on best practices for interacting with and consuming data services.
  • Product Partnership: Collaborate with the Product Director to understand commercial and user-facing data requirements, translating these needs into actionable data infrastructure features.

Skills & Experience

  • Principal/Lead Expertise: Extensive experience (typically 7+ years) focused on data engineering, including significant time spent in a Principal, Lead, or Architect role defining data strategy from the ground up.
  • Databricks: Deep, practical, and architectural experience of the Databricks platform.
  • Azure Data Stack: Operational experience of building and running within the Microsoft Azure data ecosystem (e.g., Azure Data Factory, Azure Data Lake, Azure Synapse Analytics, Azure SQL/Cosmos DB).
  • Coding Proficiency: Expert-level proficiency in Python (or Scala) and SQL, with a strong focus on writing clean, tested, and highly performant data processing code.
  • Data Warehouse Design: Proven track record designing and implementing scalable data warehouses/data marts for analytical and operational use cases.
  • Pipeline Automation: Strong experience with workflow orchestration tools and implementing CI/CD for data pipelines.
  • Cloud Infrastructure: Familiarity with Infrastructure as Code (Terraform) and containerisation.
Sign up free — access 45,000+ UK sponsor-licensed jobs