Early Career Health Data Scientist - BHF DSC
SKILLS
FULL DESCRIPTION
The Early Career Health Data Scientist will join our Health Data Science team and will contribute to the development of scalable, reusable resources that support researchers with the data curation phase of their research projects to produce high-quality, analysis-ready data. These resources may include: Data dictionaries, dataset summaries and shared exploratory analyses and insights that inform researchers about datasets and how they can be used on research projects. Coding tutorials, guidance notes, and worked examples to help researchers develop the technical skills needed to curate data within Secure Data Environments. Re-usable code, functions, and data curation pipelines that researchers can adapt for their own projects, reducing duplication and accelerating the data curation phase of their project. An example data curation pipeline for research projects being undertaken within the NHS England SDE can be found in the Centre’s GitHub here. Curated data methods. These are methods to produce cleaned and enhanced views of datasets, designed to integrate with our data curation pipelines to prevent repeated reimplementation of equivalent logic. The post-holder will also provide direct, hands-on support to researchers either by providing guidance and signposting to existing data curation resources relevant to their project, or by providing targeted, bespoke development of data curation pipelines to generate analysis ready data. The post-holder will also be required to perform analyses of data for quality control purposes and to help better understand the utility of the data, and how it can be appropriately used for research purposes.
Main responsibilities
- Providing data engineering and data curation support in secure data environments (SDEs) and trusted research environments (TREs) to produce robust, analysis-ready datasets.
- Contributing to the development, testing, and maintenance of data curation pipelines and shared resources under the supervision of senior colleagues.
- Developing and applying expertise in the assessment of data quality, completeness, and data utility of the various routinely collected health datasets across the four devolved nations, including contributing to early feasibility and exploratory assessments to inform study design.
- Summarising and disseminating findings and lessons from data quality and data utility assessments to inform research design and appropriate use of routinely collected data.
- Under the supervision of the senior colleagues, writing, organising and maintaining support documentation for linked data resources (e.g. data dictionaries, variable mapping tables, data access process documentation, and Git repositories).
- Carry out technical validation checks on linked data sources (e.g. duplicates, linkage errors, temporal inconsistencies) and develop reusable functions to check these data rigorously for errors and inconsistencies.
- Working with relevant researchers to identify and apply appropriate existing and novel phenotype definitions and algorithms from linked national health data.
- Preparing clear numerical summaries and visualisations to communicate findings (e.g. data characteristics, quality, and decision making) to researchers when curating data.
- Preparing and presenting results in oral and written reports, technical notes, and academic publications.
- Actively participating and attending the regular Centre and project meetings, reporting on progress and presenting analytical results.
- Demonstrating a strong commitment to open source, transparent, and reproducible research, as the post will involve releasing tools, code, documentation under an open-source licence.