Notice
- Important: This guidance is under active development by NHS England and content may be added or updated on a regular basis.
- This Implementation Guide is currently in Draft and SHOULD NOT be used for development or active implementation without express direction from the NHS England Genomics Unit.
UGR Integration
Background
The Unified Genomic Record (UGR) is a key technological component designed to unify patient genomic data into a single, patient-centric record. The UGR is intended to standardise the collection, storage, and sharing of genomic data across the NHS, ensuring that genomic information is interoperable and accessible across all care settings.
The UGR architecture is structured into three key layers:
- End User Functionality – Integrating with existing NHS services to facilitate direct interaction with the UGR
- Interoperability – Serving as the primary gateway for accessing and sharing genomic data across different healthcare systems.
- Data Storage – Implementing a hybrid centralised and federated data storage model, leveraging the Patient Data Manager (PDM) as the central data entry point.
The key problems the UGR aims to address in support of delivering the Genomic Medicine Service are:
- Fragmented and Inconsistent Genomic Data Management, by bringing the genomic data into a single genomic record framework and enforcing UK Core FHIR interoperability standards. Access is centralised through APIM and MNS. Centralising genomic data access will help promote standards adoption by suppliers.
- Limited Reusability of Genomic Test Reports, by bringing the genomic data into a single genomic record framework and enforcing GA4GH and UK Core FHIR interoperability standards.
- Inefficient Data Use for Research and Population Health, by providing a centralised unified point of access for genomic data that is readily available for population health, management information and research, reducing the burden on GLHs and enabling more robust data analysis and insights.
- Limited Integration of Genomic Data with Clinical Pathways, by decoupling genomic data from the systems involved with the originating test request and making available for any other provider with other clinical data, enabling more comprehensive patient management and facilitating the use of precision medicine across the NHS.
- High Administrative Burden and Operational Inefficiencies, through use of the genomic order management service. Adoption of standards and a unified point of data access will significantly reduce administrative workloads and improving the overall efficiency of genomic services.
- Inadequate Data Linkage for Inherited and Rare Diseases, by providing a simple mechanism for securely linking genomic records, respecting patient preferences, and enabling the implementation of targeted, family-based care strategies.
- Lack of Centralised System for Identifying Clinical Trial Eligibility, by potentially serving as a centralised virtual repository that enables researchers to identify eligible patient cohorts, enhancing patient access to innovative therapies and supporting the growth of clinical research.
- Absence of a centralised access control, by providing an opportunity to streamline access control processes, allowing patient-based policies instead of organisation or system-based policies. With the data decoupled from end-user systems, enforcement of policies at source is assured and includes comprehensive audit and transparency for patients improving confidence and trust.
- Inconsistent Access to Pharmacogenomic (PGx) Data Across NHS, by providing the master PGx record for patients and making the data available nationally to any clinical decision support system involved in prescribing.
- Inability to Provide Comprehensive Patient Access and Transparency, by centralising access to the genomic record via the APIM. This significantly simplifies the integration of the UGR with the NHS App.
Genomic Data Reuse
One of the key aims of the UGR is to enable reuse of data across different health contexts: Direct Care; Population Health Management; Management Information; and Research. To support each context, slightly different data access approaches are needed.
Genomic data in direct care is mostly used as a diagnostic source. Data is shared using national standards and provider systems directly interact with the UGR via APIs. All direct care data is shared through legitimate access, managed by national RBAC, although data redaction and other minimisation approaches can be built into the API.
Population health management and management information requires access to groups of patient genomic records. This can be achieved in two ways:
- Separate API calls to the PDM via APIM per patient record.
- Bulk Query API calls to the PDM using GA4GH Beacon API requesting sets of data across multiple patients.
Purpose-Based Access Control (PBAC) is implemented for each patient, respecting their preferences for secondary data sharing. Data returned via the API is pseudonymised using NHS England's national pseudonymisation system or in the clear if used for commissioning purposes.
Using genomic data for research requires request-specific data processing. The UGR utilises Purpose-Based Access Control to ensure legitimacy of all data requests regardless of the clinical context. For research, the request must be supported by a PBAC policy within each patient’s UGR for the ability to query their data as part of a cohort search. The PBAC rules can be defined to be as explicit as required. For example, a patient may permit their data to be used for cancer research, but request to be asked for permission for other types of research. The rules are defined, and patient preferences managed via the NHS App.
The returned data may be modified depending on the PBAC policies and the context:
- Data minimisation – only return what is necessary for the functionality required.
- Data redaction – make requested aware of data existing but respond with ‘data redacted’ in fields as defined in the PBAC rules.
- Pseudonymisation – remove all patient identifiable information and provide an NHS England derived pseudonym for the patient ID.
- Anonymisation - remove all patient identifiable information and ensure patient identity is unable to be derived from the patient data returned.
Scope
- Order Management: The UGR will surface all test order requests and reports, and associated meta data provided by the order management services.
- Test Directory Services: The UGR may reference this service for data validation of storing test order requests and reports.
- Specimen Data: The UGR will record sample metadata and reference any sample storage resources used as part of genomic diagnostics.
- Sequence Data: The UGR will record and provide access to all data created from the DNA sequencing process. This applies to all genomic diagnostic modalities, including whole genome sequencing (WGS).
- Family Linking: Linking UGR records is a key requirement and is within scope to support inherited disease diagnostics.
- Purpose Based Access Control (PBAC): The UGR implements purpose-based access control at the data layer. Storing the access control policies and enforcement via data sharing processes are in scope.
- Secondary uses: Storage and sharing of UGR data for population health management, management information and research.
- National Genomic Research Library: Providing the structured data and transport into the NGRL from the UGR is in scope. Specific implementations must honour appropriate information governance.
The data output formats in scope are:
- OMOP
- FHIR
- PLCM
- GA4GH
The data transports used to share data are in scope are:
- APIM
- MNS
- Genomic Order Management Service API
- GA4GH DRS
- GA4GH HTSGET
Architecture
Each patient will have a self-contained Unified Genomic Record. A simplified logical model can be expressed as a folder tree. Each folder holds relevant data and metadata.
The data structure, storage type, and location within each folder may vary depending on the data requirements. The table below is an example of possible content of the UGR.
| Folder | Data Structure | Storage Type | Location |
|---|---|---|---|
| Demographics | JSON (FHIR) | FHIR Repository | NHS England |
| Test Data | JSON (FHIR) | FHIR Repository | NHS England |
| Test Reports | JSON (FHIR) | FHIR Repository | NHS England |
| Genomic Data | BAM, CRAM, VCF | Local/Cloud File Store | GLH or GEL |
| Family History | JSON (FHIR) | FHIR Repository | NHS England |
| Purpose Based Access Controls | JSON (CEDAR/FHIR) | FHIR Repository | NHS England |
| Audit | Logfile/FHIR | Cloud File Store/PARS | NHS England |
Genomic data is highly reusable, and it is possible to perform new genomic tests upon existing genomic data, negating the need to repeat a specimen collection and wet laboratory process. For this reason, test data can reference existing genomic data. The genomic data can be hosted in multiple places, and a FHIR document reference resource can refer to a GA4GH Data Repository Standard (DRS) location.
Data Storage Components
The UGR contains three primary classes of data, each necessitating a distinct approach to data management and storage:
- Structured data that captures the details of genomic test orders, sample processing, bioinformatics analyses, test reports and reporting etc. This data is characterised as being complex, highly structured based on HL7 FHIR standard, but low in volume compared to the primary genomic data.
- Unstructured data held in other general file formats such as CSV and PDF to support legacy systems incapable of consuming structured data.
- The large-scale data generated by DNA genotyping and sequencing technologies, such as the primary sequencing reads (in SAM, BAM or CRAM formats) and derived data such as variant calls (in VCF) and other data produced by bioinformatics analyses.
To accommodate this diversity and maximise functionality, multiple data repositories are utilised rather than a single physical repository. The Diagram below shows the separation components required to support the UGR.
Central FHIR Store
It is expected all FHIR resources will be stored in a national FHIR store, available through RESTful API Access via the national API platorm.
Federated Object Store
Not all genomics-related data is suitable for storage within FHIR repository. To accommodate different data classes, multiple Federated Object Stores can be utilised, leveraging cloud object stores. To ensure consistency, the GA4GH Data Repository Service (DRS) protocol will be adopted as the standard retrieval mechanism for all data accessible via the UGR. The use of DRS enables the UGR to support both centralised and federated storage models, or a combination thereof. DRS URIs stored within FHIR DocumentReference Resources for each patient will serve as the logical identifiers for all files hosted within the Federated Object Stores.
Bulk Query Interface
For population health management and research use cases, the UGR requires the ability to perform queries across larger sets of data. The APIs supporting these queries are typically read-only and may operate asynchronously to facilitate population-level analysis. The Bulk Query Interface component is designed to support these use cases, with the data made available potentially being transformed and reformatted to optimise bulk querying an RDMS source.
FHIR API
Composition
The main resource type supporting implementation of the UGR will be the UKCore-Composition resource. This resource is used to align with existing Summary Care Record implementations and mirrors the EU Patient Summary guidance, whereby sections are defined for the data categories, which contain references to the data, e.g. Lab reports, Demographics etc.
It is expected the UGR could be represeanted as a section under a more general patient summary.
The sections included within the UGR are coded using https://fhir.hl7.org.uk/CodeSystem/UKCore-RecordStandardHeadings, as follows:
| Section Title | Code | Entry Resource Type |
|---|---|---|
| Patient demographics | patient-demographics | Patient (may be NHS Identifier if registered on PDS) |
| Investigations and procedures requested | investigations-and-procedures-requested | ServiceRequest |
| Investigation results | investigation-results | DiagnosticReport (this resource will link off to the various Genomic Data Files and Observations, Note: a separate section for genomic data and observations irrespective of the report/request which generated this is currently being investigated, e.g. for on demand CDS) |
| Consent for information sharing | consent-for-information-sharing | Consent |
| Family history | family-history | RelatedPerson/FamilyMemberHistory |
An example of a UGR record can be found at Composition-UGR-Example
Type and Category
To appropriately categorise the UGR alongside other Compositions and Documents, the type and category SHALL be fixed to the below.
"type": { "coding": [ { "system": "http://snomed.info/sct", "code": "824321000000109", "display": "Summary record" } ] }, "category": [ { "coding": [ { "system": "http://snomed.info/sct", "code": "321401000000106", "display": "Genomics" } ] } ],
Author and Custodian
As the UGR is created and managed by NHS England, the author and custodian elements will be fixed to the X26 ODS code.
"author": [ { "identifier": { "system": "https://fhir.nhs.uk/Id/ods-organization-code", "value": "X26" } } ], "custodian": { "identifier": { "system": "https://fhir.nhs.uk/Id/ods-organization-code", "value": "X26" } },
Section
To better conform to the EU Patient Summary (EPS) Implementation Guide, section.text has been added to provide a human readable/HTML representation of the UGR sections. an example is provided below.
"section": [ { "title": "Patient demographics", "code": { "coding": [ { "system": "https://fhir.hl7.org.uk/CodeSystem/UKCore-RecordStandardHeadings", "code": "patient-demographics", "display": "Patient demographics" } ] }, "text": { "status": "generated", "div": "<div xmlns=\"http://www.w3.org/1999/xhtml\">Pheobe Smitham, Female, DOB: 2013-09-27</div>" }, "entry": [ { "identifier": { "system": "https://fhir.nhs.uk/Id/nhs-number", "value": "9449307539" } } ] },
Other Fixed Elements
statusSHALL have a fixed value offinaltitleSHALL have a fixed value ofUnified Genomic Record SummaryconfidentialitySHALL have a fixed value ofRdue to the sensitivity of the data within the UGR
RelatedPerson
To link UGRs to UGR entries for family members, UKCore-RelatedPerson resources will be used, following profiling used for the Genomic Order Management Service.
TODO: Elaborate on difference between FMH/RelatedPerson and PersonalRelationship resources as well as link to GA4GH Pedigree standard/kinship ontology
Consent and Permission
Consent will be used in the UGR to support sharing of genomic information with clinicians to support testing/interpretation of family members.
The expected sequence for adding a familial relationship asserted by a patient, and subsequent verification/consent from the family member is as below:
TODO: Add sequence diagram for requesting to add relationship and consent approval/rejection
Consent will also be used to capture patient consent for their UGR, and elements within their UGR to be used for particular purposes, e.g. research, population health management etc. Permission resources will be used to