Principal Machine Learning Engineer – Production Systems
🔒 Confidential Employer
Posted 7 May 2026
LOCATION
Bristol
TYPE
Full-time
LEVEL
Mid-Senior level
CATEGORY
Software Engineering
This employer holds a UK Home Office sponsor license — sponsorship for this specific role is at the employer’s discretion
SKILLS
TensorFlow
ONNX Runtime
Python
gRPC
MLOps
Docker
Kubernetes
CUDA
FULL DESCRIPTION
Principal Machine Learning Engineer – Production Systems
[Employer hidden — sign up to reveal] is seeking a highly experienced ML Systems Architect to design and implement a scalable, production-grade architecture for our machine learning solver. This role bridges research prototypes and commercial deployment, ensuring reliability, maintainability, and performance in a mixed technology stack.
Responsibilities
- Architect the ML Solver Platform: Define modular architecture for data preprocessing, model execution, and post-processing. Establish clear API contracts between Python/TensorFlow and C# services.
- Productionize ML Workflows: Convert research code into robust, testable, and observable services. Implement CI/CD pipelines, automated testing, and reproducibility standards.
- Integration & Interoperability: Design REST/gRPC endpoints for cross-language communication. Ensure compatibility with C#/.NET services.
- Performance & Scalability: Optimize GPU/CPU utilization, batching strategies, and memory management. Plan for multi-model and multi-tenant scenarios.
- MLOps & Lifecycle Management: Implement model versioning, artifact registries, and deployment workflows. Set up monitoring, logging, and alerting for solver performance.
- Security & Compliance: Apply best practices for secrets management, dependency scanning, and secure artifact storage.
Required Skills & Experience
- Expert in TensorFlow (TF2/Keras), experience with ONNX Runtime for inference.
- Advanced Python for ML; strong understanding of packaging, type checking, and performance profiling.
- Proven experience designing scalable ML systems for production.
- Proficiency in gRPC/Protobuf and REST for cross-language integration.
- CI/CD pipelines, containerization (Docker/Kubernetes), model registries, reproducibility.
- GPU acceleration (CUDA/cuDNN), mixed precision, XLA, profiling.
- Metrics, tracing, structured logging, dashboards.
- SBOM, image signing, role-based access, vulnerability scanning.
Preferred Qualifications
- Experience with ONNX Runtime Training, PyTorch, or hybrid ML architectures.
- Familiarity with distributed training strategies and multi-GPU setups.
- Knowledge of feature stores and data validation frameworks.
- Exposure to regulated environments and compliance frameworks.
Tools & Technologies
- ML: TensorFlow, ONNX Runtime, tf2onnx.
- APIs: FastAPI, gRPC.
- DevOps: GitLab CI/GitHub Actions, Docker, Kubernetes.
- Monitoring: Prometheus, Grafana, OpenTelemetry.
- Security: HashiCorp Vault, Sigstore.
Why Join [Employer hidden — sign up to reveal]?
- Work on cutting-edge ML solutions integrated into commercial engineering software.
- Define architecture that scales across global deployments.
- Collaborate with a team of experts in ML, software engineering, and UI development.
To apply: Send your resume and a brief cover letter to [Employer hidden — sign up to reveal]
Sign up free — access 45,000+ UK sponsor-licensed jobs