Abstract
The proliferation of digital data, particularly sensitive personal information such as health records, has heralded an era of unprecedented opportunities for research, policy formulation, and service improvement. Concurrently, it has intensified the imperative for robust data governance frameworks to safeguard individual privacy and maintain public trust. The Five Safes Framework, initially conceptualized by the UK Anonymisation Network and later formalized by organizations like the Office for National Statistics (ONS), has emerged as a globally recognized and highly effective set of principles for facilitating secure and ethical access to sensitive data for research and statistical purposes. This comprehensive research report undertakes an exhaustive examination of the Five Safes Framework, meticulously detailing each of its five foundational components—Safe People, Safe Projects, Safe Settings, Safe Data, and Safe Outputs. It elucidates their intricate interdependencies and collective contribution to establishing a resilient data governance ecosystem. Furthermore, the report delves into the practical application of this framework within real-world healthcare contexts, analyzing its inherent adaptability to the dynamic landscape of emerging technologies and evolving regulatory environments. A comparative analysis against other prominent data governance models is also presented, highlighting the distinctive strengths and holistic nature of the Five Safes in balancing data utility with paramount privacy protection.
Many thanks to our sponsor Esdebe who helped us prepare this research report.
1. Introduction
The 21st century has witnessed an explosion in data generation, transforming nearly every sector, from commerce to public health. In the domain of healthcare, the digitization of patient records, the advent of genomic sequencing, and the proliferation of wearable health technologies have created vast repositories of incredibly rich and sensitive information. This wealth of data holds immense potential to unlock groundbreaking medical discoveries, optimize treatment protocols, predict disease outbreaks, and inform evidence-based public health policies. However, the inherent sensitivity of health data—encompassing diagnoses, treatments, genetic information, and lifestyle details—renders its protection a paramount ethical and legal obligation. Unauthorized access, misuse, or inadvertent disclosure of such data can precipitate severe consequences, including significant privacy breaches, erosion of individual autonomy, legal ramifications such as hefty fines and litigation, and, critically, a profound degradation of public trust in healthcare institutions and research endeavors.
Recognizing this delicate balance between maximizing the utility of data for societal benefit and rigorously protecting individual privacy, the Five Safes Framework was developed as a pragmatic and comprehensive risk management approach. Unlike frameworks that focus solely on technical security measures or legal compliance, the Five Safes offers a multi-dimensional lens, addressing the human element, the purpose of data use, the environment of access, the intrinsic nature of the data itself, and the outputs generated. This report aims to provide a deeply analytical and comprehensive understanding of the framework, tracing its conceptual underpinnings, detailing its operational components, illustrating its diverse applications—particularly within the sensitive domain of healthcare—and critically assessing its enduring relevance in an increasingly data-driven and technologically advanced world.
Many thanks to our sponsor Esdebe who helped us prepare this research report.
2. The Five Safes Framework: An Overview and Historical Context
The Five Safes Framework originated in the early 2010s in the United Kingdom, evolving from discussions within the UK Anonymisation Network and subsequently refined by institutions like the Office for National Statistics (ONS). Its development was driven by a pressing need to facilitate access to sensitive administrative and survey data for legitimate research and statistical purposes while rigorously protecting the privacy of individuals. Traditional approaches often relied heavily on anonymization techniques, which, while crucial, proved insufficient on their own to address the complex ecosystem of data access and use. The creators recognized that data security is not solely a technical problem but a multifaceted challenge involving people, purpose, environment, and the nature of the information itself.
The framework posits five interrelated principles that, when collectively and robustly implemented, create a secure and ethical environment for sensitive data utilization. These principles are not hierarchical but form a synergistic system, where weaknesses in one ‘safe’ can compromise the integrity of the entire framework. Its strength lies in its holistic, risk-based approach, moving beyond mere compliance to foster a culture of responsible data stewardship.
- Safe People: This pillar focuses on the individuals granted access to the sensitive data, emphasizing their trustworthiness, competency, and adherence to ethical and legal obligations. It recognizes that even the most secure technical systems can be undermined by human error or malicious intent.
- Safe Projects: This principle ensures that the intended use of the data is legitimate, ethical, and aligned with public benefit, validating that the research questions or statistical analyses are appropriate and justified, particularly when involving sensitive personal information.
- Safe Settings: This component addresses the physical and technical environments where data is accessed and processed, demanding secure infrastructures that prevent unauthorized access, breaches, and data exfiltration.
- Safe Data: This dimension scrutinizes the data itself, assessing and mitigating the inherent risk of re-identification or disclosure. It mandates that the data made available is appropriate for the approved project and has undergone necessary transformations to minimize privacy risks.
- Safe Outputs: This final safe ensures that any results, findings, or analyses derived from the sensitive data do not inadvertently disclose confidential information about individuals or small groups, safeguarding against inferential disclosure in disseminated knowledge.
Each of these components represents a critical control point in the data lifecycle, from initial request to final dissemination of results, collectively ensuring that the pursuit of knowledge does not come at the expense of individual privacy and trust.
Many thanks to our sponsor Esdebe who helped us prepare this research report.
3. Safe People: Ensuring Trustworthy and Competent Data Stewards
The ‘Safe People’ principle is foundational, acknowledging that the human element is often the weakest link in any security chain. It asserts that secure and ethical data handling hinges upon the integrity, knowledge, and accountability of individuals who interact with sensitive data. This principle extends beyond mere authorization, encompassing comprehensive measures to cultivate a highly trustworthy and competent user base.
3.1. Rigorous Training and Accreditation
Organizations must implement multi-layered training programs that are mandatory for all individuals seeking or having access to sensitive data. These programs should cover:
- Data Ethics and Privacy Principles: Deep dives into ethical considerations surrounding data use, the moral imperative of privacy protection, and the potential societal impact of data misuse. This includes discussions on concepts like data discrimination and algorithmic bias.
- Legal and Regulatory Frameworks: Thorough instruction on relevant data protection laws and regulations, such as the General Data Protection Regulation (GDPR) in Europe, the Health Insurance Portability and Accountability Act (HIPAA) in the United States, national data governance acts, and specific research governance guidelines. This ensures researchers understand their legal obligations and the consequences of non-compliance.
- Data Handling Protocols and Security Best Practices: Practical training on secure data storage, transmission, processing, and destruction. This covers password hygiene, phishing awareness, secure communication channels, proper use of secure environments, and incident reporting procedures.
- Statistical Disclosure Control (SDC): For researchers who will analyze data and produce outputs, specialized training in SDC techniques is crucial. This equips them with the knowledge to identify and mitigate disclosure risks in their analyses and outputs, preventing inadvertent identification.
- Domain-Specific Context: Training tailored to the specific nature of the data (e.g., health data sensitivity, specific terminology, ethical considerations unique to patient information).
Beyond initial training, continuous professional development and refresher courses are essential to keep personnel abreast of evolving threats, technologies, and regulatory changes. Formal accreditation or certification processes, often involving examinations or practical assessments, serve to validate a researcher’s understanding and commitment to these principles. Many data providers require researchers to sign legally binding data access agreements and oaths of confidentiality, reinforcing personal accountability.
3.2. Comprehensive Background Checks and Vetting
Before granting access, thorough vetting procedures are indispensable. These typically include:
- Identity Verification: Robust checks to confirm the identity of the applicant.
- Institutional Affiliation and Credentials: Verification of employment with a recognized research institution or organization, academic qualifications, and professional standing. This helps to establish a legitimate research purpose and institutional accountability.
- Criminal Record Checks: Depending on the sensitivity of the data and national regulations, criminal background checks may be conducted to assess trustworthiness and mitigate risks associated with past offenses, particularly those related to fraud or data misuse.
- Professional Conduct and Disciplinary History: Review of any prior history of professional misconduct or breaches of ethical guidelines, which could indicate a propensity for irresponsible data handling.
These checks aim to establish a baseline of trust, ensuring that individuals entrusted with sensitive data possess the integrity and reliability requisite for such responsibility.
3.3. Granular Access Control and Principle of Least Privilege
Even after vetting and training, access to sensitive data must be strictly controlled and proportionate. Key measures include:
- Role-Based Access Control (RBAC): Assigning data access privileges based on an individual’s specific role within a project. This ensures that a researcher only has access to the data elements and functionalities strictly necessary for their approved research objectives.
- Multi-Factor Authentication (MFA): Implementing MFA mechanisms (e.g., something you know, something you have, something you are) to verify the identity of individuals attempting to access data environments, adding an extra layer of security beyond traditional passwords.
- Principle of Least Privilege (PoLP): This fundamental security tenet dictates that users should be granted the minimum necessary permissions to perform their authorized tasks and no more. Access privileges should be regularly reviewed and revoked immediately upon project completion or changes in roles.
- Secure Credential Management: Implementing policies for strong passwords, regular password rotation, and secure storage of access credentials.
By meticulously implementing these ‘Safe People’ measures, organizations significantly reduce the risk of unauthorized data access, misuse, or inadvertent disclosure stemming from human factors, thereby fostering a culture of accountability and responsibility among data users.
Many thanks to our sponsor Esdebe who helped us prepare this research report.
4. Safe Projects: Ensuring Ethical, Lawful, and Beneficial Data Use
The ‘Safe Projects’ principle establishes the critical necessity that any use of sensitive data must be legitimate, ethically sound, legally compliant, and demonstrably for the public good. It acts as a gatekeeper, ensuring that the inherent privacy risks associated with data access are justified by the potential benefits derived from the research or analysis. This component is pivotal in maintaining public trust, as citizens are more likely to consent to data sharing if they are confident their information is used for beneficial and ethical purposes.
4.1. Rigorous Ethical Review Processes
Every project involving sensitive data must undergo a stringent ethical review. This typically involves:
- Institutional Review Boards (IRBs) / Research Ethics Committees (RECs): Independent bodies composed of experts from diverse fields (e.g., medical, legal, ethical, lay representatives) who scrutinize research proposals. They assess the project’s scientific validity, methodological soundness, potential risks to data subjects, and the adequacy of privacy safeguards.
- Proportionality Assessment: A key aspect of ethical review is determining whether the proposed use of sensitive data is proportionate to the research question’s importance. This involves considering if the research objectives could be achieved with less sensitive data or through alternative methodologies.
- Public Involvement and Engagement (PIE): Increasingly, ethical review processes incorporate the perspectives of patients and the public. Involving lay representatives in ethics committees or conducting public consultations on research proposals helps ensure that the research aligns with societal values and addresses concerns from those whose data is being used.
- Dynamic Ethical Oversight: Ethical approval is not a one-time event. Projects may require periodic review, especially if significant changes occur in methodology, data scope, or if new ethical considerations arise during the research lifecycle.
4.2. Public Benefit Assessment and Justification
Beyond mere ethical compliance, projects must demonstrate a clear and compelling public benefit to justify the use of sensitive data. This involves:
- Societal Impact: Articulating how the research findings will contribute to improved public health outcomes, better policy decisions, scientific advancement, or economic benefits. This requires a robust research question and a clear hypothesis.
- Scientific Merit: The project must be scientifically sound, employing appropriate methodologies and analytical techniques to ensure that reliable and valid conclusions can be drawn. Research lacking scientific merit cannot justify the use of sensitive data.
- Necessity and Sufficiency: Researchers must explicitly demonstrate that the requested data is necessary to achieve the stated public benefit and that no less sensitive or aggregated data would suffice. This aligns with the data minimization principle (discussed under Safe Data).
- Transparency: Clear communication about the project’s aims, anticipated benefits, and data protection measures to stakeholders and the public enhances trust and demonstrates accountability.
4.3. Compliance with Legal and Regulatory Frameworks
‘Safe Projects’ mandates strict adherence to the multifaceted legal and regulatory landscape governing data use. This includes:
- Data Protection Legislation: Compliance with comprehensive data protection laws such as GDPR, HIPAA, UK Data Protection Act, and other national privacy regulations. This covers requirements for lawful basis of processing, data subject rights, security safeguards, and data breach notification.
- Research Governance Frameworks: Adherence to specific national or institutional research governance guidelines that outline the responsibilities of researchers and organizations when conducting research involving human participants or their data.
- Sector-Specific Regulations: In healthcare, this includes regulations pertaining to clinical trials, pharmaceutical research, and the ethical conduct of medical studies.
- Data Sharing Agreements (DSAs): Formal, legally binding agreements between data providers and data users, meticulously detailing the terms and conditions of data access, permissible uses, security requirements, data retention, and responsibilities of all parties. These agreements often specify the duration of data access and the conditions for renewal or termination.
By robustly implementing ‘Safe Projects’, organizations ensure that sensitive data is not merely secured but is also purposefully and ethically harnessed for legitimate societal advancement, reinforcing the foundational principle of responsible data stewardship.
Many thanks to our sponsor Esdebe who helped us prepare this research report.
5. Safe Settings: Securing the Data Access Environment
The ‘Safe Settings’ principle mandates that the physical and technical environments where sensitive data is accessed, processed, and stored must be impenetrable to unauthorized access and robust against various threats. It focuses on creating a secure ‘data safe haven’ or ‘Trusted Research Environment (TRE)’—a controlled, monitored, and isolated infrastructure designed specifically for sensitive data analysis. This approach significantly reduces the risk of data exfiltration or inadvertent exposure, moving away from data copies being held on individual researchers’ machines.
5.1. Advanced Physical Security Measures
Physical security is the first line of defense, preventing unauthorized individuals from gaining direct access to hardware or data storage facilities. Key measures include:
- Secure Data Centres: Housing servers and storage devices in purpose-built, highly secure data centers with limited entry points, reinforced walls, and robust environmental controls (e.g., climate control, fire suppression).
- Access Controls: Multi-layered physical access controls, such as biometric scanners (fingerprint, iris recognition), key card systems, mantrap entries, and strict visitor logging and escort policies. Entry should be restricted to authorized personnel only.
- Surveillance Systems: Continuous CCTV monitoring of all access points, server rooms, and critical infrastructure, with recordings securely stored for audit purposes.
- Segregation of Duties: Ensuring that different personnel are responsible for different aspects of physical and logical security to prevent single points of failure or malicious collusion.
- Clean Desk Policy: Implementing and enforcing policies that prevent sensitive information from being left visible or unattended on desks or workstations.
5.2. Robust Technical Security Controls
Technical security forms the backbone of a safe setting, protecting data in transit and at rest within the digital environment. This involves a comprehensive suite of technologies and protocols:
- Trusted Research Environments (TREs) / Data Safe Havens: These are isolated, secure computing environments (often virtualized) where data can be accessed and analyzed. Data never leaves the TRE, and researchers can only access it via secure, audited remote connections. TREs typically prevent data download, screen capturing, and external internet access.
- Data Encryption: Implementing strong encryption for data both at rest (e.g., full disk encryption, database encryption) and in transit (e.g., TLS/SSL for network communications). This renders data unintelligible to unauthorized parties even if it is intercepted or stolen.
- Firewalls and Intrusion Detection/Prevention Systems (IDS/IPS): Sophisticated network security devices that monitor and control incoming and outgoing network traffic, blocking suspicious activity and preventing unauthorized access to the TRE.
- Segregated Networks and Virtual Private Networks (VPNs): Using logically separated networks for sensitive data and requiring secure VPN connections for remote access, creating encrypted tunnels for data transmission.
- Endpoint Security: Implementing anti-malware, anti-virus software, and host-based intrusion detection on all workstations and servers within the secure environment.
- Vulnerability Management: Regularly conducting vulnerability assessments, penetration testing, and security audits to identify and remediate weaknesses in the system before they can be exploited.
- Secure Software Development Lifecycle (SSDLC): For custom-built TRE platforms, ensuring security considerations are integrated into every stage of software development.
- Data Loss Prevention (DLP) Systems: Deploying DLP tools that monitor, detect, and block sensitive data from being moved, copied, or transmitted in violation of organizational security policies.
5.3. Continuous Monitoring, Auditing, and Incident Response
Establishing a secure setting is an ongoing process that requires constant vigilance:
- Audit Trails and Logging: Comprehensive logging of all user activities within the TRE, including data access, queries executed, files opened, and output requests. These logs are immutable and regularly reviewed for suspicious patterns.
- Anomaly Detection: Employing automated systems to detect unusual user behavior or system activity that might indicate a security breach or policy violation.
- Incident Response Plan: Developing and regularly testing a robust incident response plan to effectively identify, contain, eradicate, recover from, and learn from security incidents. This includes clear communication protocols for data breaches.
- Regular Security Reviews and Compliance Audits: Independent third-party audits and internal reviews to assess compliance with security policies, regulations, and the Five Safes framework itself. These reviews help identify areas for improvement and ensure the effectiveness of controls.
The integration of ‘Safe Settings’ principles ensures that sensitive data resides within a highly fortified and continuously monitored environment, dramatically reducing the risk of unauthorized access or disclosure, and thereby underpinning the integrity of the entire data governance system.
Many thanks to our sponsor Esdebe who helped us prepare this research report.
6. Safe Data: Mitigating Intrinsic Disclosure Risks
The ‘Safe Data’ principle focuses on the intrinsic characteristics of the data itself, aiming to minimize the risk of re-identification or disclosure while retaining sufficient utility for research purposes. This is often the most technically complex aspect of the framework, requiring a nuanced understanding of privacy-preserving techniques and a careful balancing act between privacy protection and data utility.
6.1. Data Anonymization and Pseudonymization
These techniques are central to reducing disclosure risk:
- Pseudonymization: Replacing direct identifiers (e.g., name, NHS number) with artificial identifiers (pseudonyms). The original identifiers are kept separately and securely, making it possible, under strictly controlled conditions and with proper authorization, to link data back to individuals. Pseudonymized data remains personal data under regulations like GDPR due to the possibility of re-identification.
- Anonymization: Processing personal data so that it can no longer be attributed to a specific individual. True anonymization aims for irreversible de-identification, making it practically impossible to identify data subjects. Techniques include:
- Generalization: Broadening categories of data (e.g., replacing exact age with age ranges like ’30-39′, or specific postcodes with broader geographic regions).
- Suppression: Removing highly identifying data points (e.g., rare diseases, specific dates) entirely from the dataset.
- Perturbation: Adding noise or making small alterations to the data to obscure individual values while preserving statistical properties. This includes techniques like microaggregation or data swapping.
- K-anonymity: Ensuring that for any combination of quasi-identifiers (attributes that can be combined to uniquely identify individuals, e.g., age, sex, postcode), there are at least ‘k’ individuals sharing those same attributes, making it difficult to pinpoint a single person.
- L-diversity: An extension of k-anonymity, which ensures that for each group of k individuals, there are at least ‘l’ distinct sensitive values (e.g., different diagnoses) to prevent attribute disclosure.
- T-closeness: Further refines l-diversity by ensuring that the distribution of a sensitive attribute within each group is close to the distribution of that attribute in the overall dataset, preventing inference based on skewed distributions.
- Differential Privacy: A more recent and robust mathematical approach that adds carefully calibrated noise to data or query results to obscure individual contributions, providing a strong privacy guarantee regardless of attacker knowledge, often at the cost of some data utility.
The choice of technique depends on the nature of the data, the specific research question, and the acceptable level of risk. It’s crucial to understand that complete, irreversible anonymization, particularly for rich datasets, is often challenging, if not impossible, to achieve while retaining sufficient data utility. Therefore, anonymization is often viewed as a spectrum rather than a binary state, and its effectiveness must be assessed in conjunction with the other ‘safes’.
6.2. Data Minimization and Purpose Limitation
- Collect Only What’s Necessary: The principle dictates that organizations should collect and retain only the minimum amount of personal data strictly required to achieve the specified purpose. This reduces the ‘attack surface’ and limits potential harm in case of a breach.
- Granular Data Access: Researchers should only be granted access to the specific variables and records necessary for their approved project. Access to entire raw datasets should be exceptional and heavily justified.
- Data Retention Policies: Implementing clear policies for the retention and secure destruction of data. Data should not be kept longer than necessary for its intended purpose, as defined in data sharing agreements.
6.3. Rigorous Risk Assessment Methodologies
Before any data is shared or made accessible, a comprehensive risk assessment must be performed. This involves:
- Re-identification Risk Assessment: Systematically evaluating the likelihood that an individual could be re-identified from the anonymized or pseudonymized dataset, often considering external data sources that could be linked.
- Utility-Risk Trade-off Analysis: A careful evaluation of how different privacy-preserving transformations (e.g., aggregation, suppression) impact the analytical utility of the data, ensuring that the data remains fit for purpose while meeting privacy requirements.
- Expert Review Panels: Engaging independent experts in statistics, privacy, and the specific data domain to review the data and proposed anonymization strategies, providing an objective assessment of disclosure risk.
- Data Curation and Documentation: Maintaining detailed metadata, data dictionaries, and provenance information for all datasets. This helps researchers understand the data’s limitations, ethical context, and transformation history, which is critical for both utility and risk management.
By diligently applying the ‘Safe Data’ principles, data custodians strive to create datasets that strike an optimal balance: sufficiently private to protect individuals, yet rich enough to yield valuable insights for public benefit, all while acknowledging the inherent complexities and continuous evolution of re-identification risks.
Many thanks to our sponsor Esdebe who helped us prepare this research report.
7. Safe Outputs: Ensuring Non-Disclosive Results and Publications
‘Safe Outputs’ is the final and often overlooked pillar of the framework, focusing on ensuring that the results, analyses, statistics, or any other information derived from sensitive data do not inadvertently disclose confidential information about individuals or small groups. This principle safeguards against inferential disclosure, where seemingly innocuous aggregate statistics can, when combined with external information, lead to re-identification. It reinforces public trust by guaranteeing that the findings shared with the wider world uphold the same high standards of privacy as the raw data itself.
7.1. Robust Output Checking Procedures
All outputs generated from sensitive data analysis must undergo a rigorous review process before they can be released or disseminated. This process typically involves:
- Trained Statistical Disclosure Control (SDC) Experts: Reviewers, often independent from the research team, with specialized knowledge in SDC techniques, examine all proposed outputs (tables, graphs, regression coefficients, model parameters) for potential disclosure risks.
- Minimum Cell Counts: A common SDC rule is to prohibit the release of any cell in a table that represents fewer than a predefined number of individuals (e.g., 3 or 5). Cells below this threshold are typically suppressed or aggregated.
- Thresholds for Small Numbers: Similarly, rules are applied to prevent the disclosure of small numbers, even if they are counts. For example, if a table cell indicates ‘0’ or ‘1’ instances of a rare event, it might be suppressed or reported as ‘<3’ to prevent exact counts for sensitive attributes.
- Upper and Lower Bounding: For continuous variables, output checking may involve ensuring that minimum and maximum values are not so extreme as to identify an individual, or that ranges are sufficiently broad.
- Resistance to Differencing: Outputs are checked to ensure that it is not possible to derive sensitive information by subtracting one published statistic from another (e.g., if a total and a sub-total are published, the difference should not disclose a small group).
- Microdata vs. Aggregate Outputs: Strict policies differentiate between what can be released as aggregate statistics versus what is considered microdata (even if perturbed), with the latter almost universally prohibited from public release.
- Review of Visualisations: Graphs, charts, and maps are also scrutinized, as visual representations can sometimes inadvertently reveal patterns or outliers that lead to disclosure, especially in small geographic areas or unique demographic groups.
7.2. Application of Statistical Disclosure Control (SDC) Methods
Researchers and output checkers employ various SDC methods to prevent disclosure in outputs:
- Cell Suppression: The most common method, where sensitive cells in a table are replaced with a symbol (e.g., ‘*’) to indicate suppressed information. Complementary suppression might be used to prevent derivation of suppressed values.
- Rounding: Rounding reported figures (e.g., to the nearest 5 or 10) to obscure exact counts or values, particularly for small numbers.
- Top-coding and Bottom-coding: Grouping extreme values in continuous variables (e.g., incomes above £100,000 reported as ‘£100,000+’) to protect outliers.
- Perturbation Techniques: Introducing small amounts of noise to data before aggregation (e.g., adding random error to values) or to output statistics directly to make it harder to infer exact individual values.
- Aggregation: Combining categories or groups to ensure that each reported category meets minimum size requirements.
- Restricted Access to Complex Models: In some cases, particularly with machine learning models, the model itself (e.g., its parameters or coefficients) might be considered an ‘output’. Access to these might be restricted, or their interpretability might be limited to prevent inferential disclosure.
7.3. Feedback Mechanisms and Continuous Improvement
- Appeals Processes: Establishing clear processes for researchers to appeal output checking decisions, providing justification for why certain outputs are necessary and proposing alternative SDC methods if needed.
- Transparency and Guidance: Providing clear guidelines to researchers on acceptable output formats and SDC expectations from the outset of the project. This minimizes frustration and ensures researchers design their analyses with disclosure control in mind.
- Post-Publication Monitoring: In rare cases, if concerns arise about a published output, mechanisms should be in place to review and potentially retract or amend the information. This underscores the continuous nature of data protection.
- Learning and Adaptation: Feedback from output checking processes should inform improvements in training, guidelines, and SDC tools, fostering a cycle of continuous learning and adaptation within the data governance framework.
By diligently implementing ‘Safe Outputs’, organizations complete the privacy protection lifecycle, ensuring that the valuable insights gained from sensitive data are disseminated responsibly, thereby upholding ethical standards and reinforcing the public’s trust in data-driven research.
Many thanks to our sponsor Esdebe who helped us prepare this research report.
8. Application of the Five Safes Framework in Healthcare
The healthcare sector is uniquely positioned with vast quantities of highly sensitive personal data, making the Five Safes Framework particularly salient. Its application enables health data custodians to facilitate vital research and innovation while rigorously safeguarding patient privacy. The framework provides a structured approach to navigate the ethical, legal, and technical complexities inherent in health data sharing.
8.1. National Health Service (NHS) Data Sharing in the UK
The UK’s National Health Service (NHS) is a prominent adopter of the Five Safes Framework, particularly through its data sharing initiatives for health and social care data. The NHS Digital (now part of NHS England) and various Secure Data Environments (SDEs) across the UK leverage the framework to govern access to de-identified patient data for approved research and public health purposes. For instance, the North West Secure Data Environment (SDE) explicitly outlines its adherence to the Five Safes, stating that it ensures ‘only authorized personnel can access de-identified data for approved research projects’.
- Safe People: Researchers seeking access to NHS data through SDEs must complete mandatory data security awareness training, sign data access agreements, and often undergo vetting by their host institutions. Access is strictly role-based.
- Safe Projects: All projects require rigorous ethical approval from NHS Research Ethics Committees and must demonstrate a clear public benefit. The proposed research questions are scrutinized for scientific validity and necessity.
- Safe Settings: Data is accessed within highly controlled, secure virtual environments (the SDEs themselves), which prevent data download, direct printing, or connection to the open internet. These environments are subject to strict technical and physical security measures, including audit trails of all user activity.
- Safe Data: Data provided to researchers is typically pseudonymized or de-identified at source. Aggregation, generalization, and other anonymization techniques are applied to reduce re-identification risk, particularly for rare conditions or demographics.
- Safe Outputs: All statistical outputs, reports, and analyses generated by researchers within the SDE are subjected to an independent manual and automated review by trained statistical disclosure control experts before they can be extracted from the secure environment. This ensures no small numbers or potentially identifying information are released. (northwestsde.nhs.uk)
This comprehensive approach has significantly enhanced data security for millions of patient records, enabling valuable epidemiological studies, clinical effectiveness research, and policy evaluation while maintaining a high level of patient privacy protection.
8.2. University College London Hospitals (UCLH) Example
University College London Hospitals (UCLH) NHS Foundation Trust provides another compelling example of the framework’s operationalization within a major academic healthcare institution. UCLH has formalized its ‘Principles for the use of UCLH patient data in research’, which are explicitly guided by the Five Safes framework.
- Safe People: UCLH emphasizes that only ‘trained and authorised researchers’ are granted access to patient data, underlining the importance of formal agreements and adherence to a strict code of conduct.
- Safe Projects: Research projects must undergo ‘ethical approval and demonstrate clear scientific merit and public benefit’, aligning with national research governance standards.
- Safe Settings: Patient data for research is managed within ‘secure computing environments’ that meet stringent NHS security standards, preventing unauthorized access and ensuring data remains within controlled boundaries.
- Safe Data: UCLH prioritizes ‘de-identified data’ wherever possible, employing techniques to minimize the risk of patient re-identification. They ensure that the data provided is ‘proportionate’ to the research need.
- Safe Outputs: Before publication or dissemination, all research outputs derived from UCLH patient data are ‘checked for confidentiality and statistical disclosure risk’, preventing any inadvertent release of sensitive patient information. (uclh.nhs.uk)
These examples demonstrate the framework’s effectiveness in balancing data accessibility with privacy protection in the highly sensitive context of healthcare research. The consistent application of the Five Safes principles helps build trust among patients and the public, which is crucial for the continued success of data-driven healthcare advancements.
8.3. Broader Applications in Health Research
Beyond national health services and individual hospitals, the Five Safes is increasingly adopted by:
- Research Consortia and Collaborations: Large-scale research initiatives, often involving multiple institutions and international partners (e.g., genomics consortia), use the framework to standardize data access and security protocols across diverse settings.
- Public Health Agencies: Agencies responsible for population health surveillance and outbreak management utilize the framework to access and analyze aggregated health data for public good while respecting individual privacy.
- Pharmaceutical and Biotech Companies: While often dealing with proprietary data, many companies collaborating with academic institutions or accessing real-world evidence (RWE) from health systems are required to adhere to Five Safes-like principles for ethical data use.
The framework’s systematic nature provides a clear roadmap for health data custodians, fostering a culture of responsible data stewardship and enabling the safe secondary use of health data for public benefit.
Many thanks to our sponsor Esdebe who helped us prepare this research report.
9. Comparative Analysis: Five Safes vs. Other Data Governance Models
While the Five Safes Framework offers a robust and holistic approach to data governance, it is essential to understand its position relative to other prominent frameworks and regulations. Each model has a distinct scope and focus, and in practice, organizations often adopt a hybrid approach, integrating elements from multiple frameworks.
9.1. Five Safes vs. FAIR Principles (Findable, Accessible, Interoperable, Reusable)
- Scope and Focus: The core distinction lies in their primary objectives. The Five Safes Framework is fundamentally a risk management framework designed to enable secure and ethical access to sensitive data while mitigating disclosure risks. Its focus is on privacy protection, trustworthiness, and responsible use. In contrast, the FAIR principles (Findable, Accessible, Interoperable, Reusable) primarily focus on enhancing the usability and discoverability of data and digital assets for computational and human agents, promoting open science and data sharing. FAIR emphasizes metadata standards, persistent identifiers, and standardized formats to maximize data reuse.
- Implementation Complexity: Implementing the Five Safes requires significant organizational, legal, ethical, and technical infrastructure. It demands comprehensive training, ethical review boards, secure computing environments (TREs), and sophisticated statistical disclosure control. This can be resource-intensive. FAIR principles, while also requiring effort (e.g., in metadata creation, API development), tend to be more technically focused on data description and delivery mechanisms rather than the strict access and security controls inherent in Five Safes.
- Complementarity: The frameworks are not mutually exclusive but highly complementary. FAIR data is ‘findable’ and ‘accessible’ in the sense that it can be located and potentially accessed under appropriate conditions. The Five Safes defines what those ‘appropriate conditions’ are for sensitive data. An ideal scenario involves making sensitive data FAIR-compliant (i.e., discoverable and technically accessible) while ensuring that actual access to the content of the sensitive data is governed by the Five Safes. For example, metadata about a sensitive health dataset could be FAIR, allowing researchers to discover its existence, but access to the actual data would then be subject to Five Safes protocols.
9.2. Five Safes vs. GDPR (General Data Protection Regulation) / HIPAA (Health Insurance Portability and Accountability Act)
- Nature of the Framework: GDPR and HIPAA are legal and regulatory frameworks. They set out mandatory legal obligations for processing personal data (GDPR) or protected health information (HIPAA), including principles of data processing, data subject rights, security requirements, and accountability mechanisms. The Five Safes is an operational and risk management framework that provides a practical methodology for meeting many of these legal obligations, particularly concerning secondary use of data.
- Granularity: GDPR and HIPAA are high-level legislative instruments. They mandate ‘appropriate technical and organizational measures’ without prescribing the exact methods. The Five Safes offers concrete, actionable steps and control points (People, Projects, Settings, Data, Outputs) that organizations can implement to demonstrate compliance with these broader legal mandates, especially for sensitive data access.
- Scope: GDPR is broad, covering all personal data processing within the EU/EEA. HIPAA is specific to protected health information in the US healthcare sector. The Five Safes, while often applied to health data, is conceptually applicable to any sensitive data where controlled access is required.
- Synergy: The Five Safes can be seen as a practical tool for operationalizing GDPR principles like ‘purpose limitation’, ‘data minimization’, ‘integrity and confidentiality’, and ‘accountability’. For HIPAA, the ‘Safe Settings’ principle directly supports the security rule, and ‘Safe Data’ helps manage de-identification requirements. Adherence to the Five Safes often provides strong evidence of ‘appropriate technical and organizational measures’ under these regulations.
9.3. Five Safes vs. NIST Cybersecurity Framework / ISO 27001
- Primary Focus: The NIST Cybersecurity Framework and ISO 27001 are comprehensive information security management frameworks. They provide guidelines and standards for establishing, implementing, maintaining, and continually improving an Information Security Management System (ISMS). Their focus is broadly on enterprise-wide information security across all data types and business functions.
- Data Sensitivity vs. Security: While both frameworks are critical for security, the Five Safes places a unique emphasis on the specific challenges of sensitive data sharing and analysis. It adds layers of control beyond generic IT security, focusing on the ethical justification (Safe Projects), the trustworthiness of individuals (Safe People), and the inherent disclosure risk of the data and its outputs (Safe Data, Safe Outputs) that are often less central to pure cybersecurity standards.
- Integration: Organizations typically use ISO 27001 or NIST CSF to establish their foundational IT security posture, providing a secure environment (part of ‘Safe Settings’). The Five Safes then builds upon this foundation, adding specific controls and processes tailored for the ethical and privacy-preserving use of sensitive data, particularly for research and statistical analysis. They are complementary layers of security and governance.
In conclusion, while other frameworks provide crucial legal mandates, technical security standards, or data usability guidelines, the Five Safes Framework stands out for its holistic, risk-based approach specifically tailored to the complex challenge of securely and ethically enabling access to sensitive data for public benefit. It acts as a bridge, translating broad principles and security standards into actionable control points for data custodians and researchers.
Many thanks to our sponsor Esdebe who helped us prepare this research report.
10. Challenges and Best Practices in Implementing the Five Safes Framework
Implementing a robust Five Safes Framework is a complex undertaking, requiring significant commitment and strategic planning. Organizations often encounter several challenges, but these can be mitigated through the adoption of best practices.
10.1. Challenges
- Resource Allocation: Establishing and maintaining a full Five Safes infrastructure is resource-intensive. This includes financial investment for secure physical and technical environments (e.g., Trusted Research Environments), skilled personnel for data curation, security monitoring, ethical review, and statistical disclosure control, as well as ongoing training programs. Smaller institutions or those with limited funding may struggle to meet the high standards.
- Balancing Accessibility and Security/Utility-Privacy Trade-off: Striking the right balance between making data accessible enough for meaningful research and maintaining stringent security and privacy protection is a perpetual challenge. Overly restrictive measures can hinder research innovation and utility, while insufficient controls lead to unacceptable risks. This tension often manifests in debates over the level of anonymization (Safe Data) and the stringency of output checks (Safe Outputs), where greater privacy often comes at the cost of some data granularity or detail.
- Evolving Regulations and Technologies: The landscape of data privacy laws (e.g., updates to GDPR, new sector-specific legislation) and technological advancements (e.g., quantum computing, sophisticated AI for re-identification) is constantly shifting. Keeping data governance practices, security protocols, and training curricula up-to-date requires continuous adaptation, which can be demanding for organizations.
- Cultural Resistance and User Acceptance: Researchers may perceive the stringent controls of the Five Safes (e.g., secure settings, output checks) as burdensome, slowing down their work or limiting their analytical freedom. Overcoming this resistance requires clear communication, demonstrating the benefits of responsible data use, and fostering a culture of privacy awareness.
- Complexity of Data Integration and Harmonization: When bringing together multiple datasets from different sources, each with its own governance and technical standards, applying the Five Safes consistently can be incredibly challenging. Harmonizing data for ‘Safe Data’ transformations and ensuring ‘Safe Settings’ across disparate systems requires significant effort.
- Lack of Skilled Expertise: There is a global shortage of professionals with expertise in statistical disclosure control, privacy engineering, and secure data environment management. This makes it difficult for organizations to recruit and retain the necessary talent.
10.2. Best Practices
- Strong Leadership and Governance Structures: Top-level commitment is crucial. Establishing clear governance structures, assigning accountability, and providing continuous executive support are vital. This includes defining clear roles and responsibilities for data custodians, ethics committees, security teams, and researchers.
- Stakeholder Engagement and Co-creation: Involve all relevant stakeholders—researchers, data subjects (e.g., patient groups), ethicists, legal experts, IT security, and policymakers—from the initial design and implementation phases. This fosters buy-in, addresses concerns proactively, and ensures the framework is practical and relevant. Public involvement can enhance trust and identify unforeseen ethical considerations.
- Continuous Training and Education: Beyond initial onboarding, provide ongoing, specialized training for all personnel involved with data access. This includes regular refreshers on data ethics, legal updates, new security threats, and advanced SDC techniques. Foster a culture where continuous learning and adherence to best practices are incentivized.
- Robust and Scalable Technological Solutions (TREs): Invest in state-of-the-art Trusted Research Environments (TREs) that are designed from the ground up to support the Five Safes. These should be scalable, user-friendly, and integrate automated security features, audit trails, and output checking capabilities. Ensure TREs are regularly updated and subjected to independent security audits (e.g., penetration testing).
- Proactive Risk Management and Regular Audits: Implement a continuous risk management framework. Regularly conduct internal and external audits of all Five Safes components to identify weaknesses, assess compliance, and validate the effectiveness of controls. Learn from incidents and near-misses to continuously refine practices and protocols.
- Transparency and Communication: Be transparent with data subjects and the public about how data is being protected under the Five Safes. Clearly communicate the public benefits of research enabled by the framework. Maintain open channels for feedback and address concerns promptly. Transparency builds and maintains public trust.
- Standardization and Documentation: Develop clear, comprehensive, and accessible policies, procedures, and guidelines for each of the Five Safes. Standardize data access request processes, data sharing agreements, and output checking protocols. Good documentation is critical for consistency, training, and auditing.
- International Collaboration and Harmonization: Participate in international forums and collaborate with other data custodians globally to share best practices, develop common standards, and address the challenges of cross-border data sharing in a Five Safes compliant manner.
By proactively addressing these challenges with these best practices, organizations can successfully implement and sustain a highly effective Five Safes Framework, maximizing the utility of sensitive data for societal good while upholding the highest standards of privacy and trust.
Many thanks to our sponsor Esdebe who helped us prepare this research report.
11. Adapting the Five Safes Framework to Emerging Technologies and Privacy Regulations
The digital landscape is characterized by relentless innovation in technology and continuous evolution in regulatory frameworks. The Five Safes Framework, by virtue of its principle-based rather than prescriptive nature, possesses inherent adaptability. However, its continued effectiveness hinges on active and intelligent adaptation to new challenges.
11.1. Incorporating New Data Types and Sources
The definition of ‘sensitive data’ is expanding beyond traditional health records to include:
- Genomic and Multi-omics Data: This highly granular and predictive data presents unique re-identification challenges, as an individual’s genome is inherently unique. ‘Safe Data’ principles need to evolve to consider the inherent identifiability of genetic sequences and apply specialized anonymization or access controls.
- Real-time Sensor Data and Wearables: Data from fitness trackers, continuous glucose monitors, and other IoT devices provide continuous, fine-grained insights into health and behavior, increasing the volume and velocity of sensitive data. ‘Safe Settings’ must accommodate real-time data streams, and ‘Safe Data’ needs to address the re-identification risks from movement patterns or physiological signals.
- Social Media and Unstructured Data: Textual data from electronic health records (EHRs), social media posts, or patient forums, while rich, often contains implicit identifiers. ‘Safe Data’ techniques must advance to redact and de-identify unstructured text effectively, potentially using Natural Language Processing (NLP) tools, while ‘Safe Outputs’ need to control for textual inference.
- Neurotechnologies: Emerging brain-computer interfaces and neuroimaging data pose new ethical and privacy dilemmas, requiring ‘Safe Projects’ to consider the profound implications of accessing and analyzing neural activity.
Adapting involves developing new anonymization techniques for these data types, updating risk assessment methodologies, and incorporating domain-specific ethical guidelines into ‘Safe Projects’.
11.2. Updating Security Measures for Emerging Technologies
Technological advancements introduce both new threats and new defensive capabilities:
- Artificial Intelligence (AI) and Machine Learning (ML): While AI/ML can enhance data analysis, it also poses new risks. Sophisticated AI models can be trained to re-identify individuals from seemingly anonymized data (e.g., linking health records with publicly available demographic data). ‘Safe Data’ must account for these enhanced re-identification capabilities, and ‘Safe Outputs’ must consider risks from model memorization or inferential capabilities of released AI models. Conversely, AI can also be used for anomaly detection in ‘Safe Settings’ or to automate aspects of ‘Safe Outputs’ checking.
- Federated Learning and Homomorphic Encryption: These privacy-enhancing technologies (PETs) allow models to be trained on decentralized datasets without the raw data ever leaving its source, or enable computations on encrypted data. ‘Safe Settings’ can be extended to incorporate these architectures, allowing for distributed analysis while significantly reducing data transfer risks. This represents a significant shift in how data can be ‘accessed’ without being ‘moved’ in the traditional sense.
- Blockchain for Data Provenance: Blockchain technology can provide immutable, transparent records of data access and use. This can enhance accountability under ‘Safe People’ and ‘Safe Projects’ by providing an unalterable audit trail, strengthening the integrity of ‘Safe Settings’.
- Quantum Computing: While still in its nascent stages, the eventual advent of quantum computing could potentially break many current encryption standards. ‘Safe Settings’ must keep an eye on post-quantum cryptography research to ensure long-term data security.
Continuous updating of ‘Safe Settings’ protocols, investing in research into new PETs, and integrating advanced threat intelligence are crucial.
11.3. Aligning with Evolving Global Privacy Regulations and Standards
The regulatory landscape is becoming increasingly fragmented and complex, particularly with the rise of new national data protection laws (e.g., CCPA in California, various Asian data privacy laws). The Five Safes must continuously align with:
- Cross-border Data Flows: Global collaborations often involve data transfer across jurisdictions with differing legal standards. The framework must guide organizations in establishing robust international data sharing agreements that satisfy the most stringent applicable regulations (e.g., GDPR’s adequacy decisions, standard contractual clauses) while maintaining the integrity of the Five Safes.
- Sector-Specific Regulations: As new industries or data uses emerge (e.g., autonomous vehicles generating health-related data), new sector-specific regulations will arise. The Five Safes provides a flexible lens through which to interpret and implement these new mandates.
- International Harmonization Efforts: Organizations like the OECD and UN continue to develop principles for privacy and data governance. The Five Safes, being a widely recognized set of principles, can contribute to and benefit from these harmonization efforts, facilitating global data sharing and collaboration while maintaining robust ethical and privacy standards.
Adapting the Five Safes Framework requires a proactive, forward-looking strategy that integrates technological foresight with ethical considerations and legal expertise. By continuously evolving its application, the framework can remain a cornerstone of responsible data governance in an ever-changing world.
Many thanks to our sponsor Esdebe who helped us prepare this research report.
12. Conclusion
The digital age, while offering transformative potential for health research and public policy, concurrently presents unprecedented challenges in safeguarding sensitive personal information. The Five Safes Framework stands as an exemplary model for navigating this complex duality, providing a comprehensive, structured, and inherently adaptable approach to managing sensitive data. Its enduring relevance lies in its holistic nature, recognizing that data security and privacy are not merely technical problems but multi-dimensional challenges encompassing human trustworthiness, ethical purpose, secure environments, data characteristics, and responsible dissemination of knowledge.
By meticulously addressing each of its five interrelated principles—Safe People, Safe Projects, Safe Settings, Safe Data, and Safe Outputs—the framework provides a robust blueprint for organizations, particularly within the highly sensitive healthcare sector, to unlock the immense value of data while rigorously upholding individual privacy and fostering public trust. Case studies from national health services and leading academic medical centers underscore its practical effectiveness in real-world scenarios, demonstrating its capacity to facilitate groundbreaking research under stringent ethical and security controls.
While implementation presents challenges such as resource allocation, balancing data utility with privacy, and adapting to a dynamic technological and regulatory landscape, best practices centered on strong governance, continuous training, stakeholder engagement, and investment in secure environments can effectively mitigate these hurdles. Furthermore, the framework’s principle-based design positions it uniquely for continuous adaptation, allowing it to integrate emerging technologies like federated learning and address new data types such as genomics, thereby ensuring its ongoing efficacy in the face of future challenges.
In conclusion, the Five Safes Framework transcends a mere checklist; it embodies a philosophy of responsible data stewardship. Its continued application and evolution are critical for building and sustaining the societal trust essential for harnessing the full potential of data-driven innovation, ensuring that progress in research and policy goes hand-in-hand with an unwavering commitment to individual privacy and ethical conduct.
Many thanks to our sponsor Esdebe who helped us prepare this research report.
References
- Bailie, J., & Gong, R. (2025). The Five Safes as a Privacy Context. arXiv preprint. (arxiv.org)
- GOV.UK. (n.d.). The Five Safes Framework. Retrieved from (gov.uk)
- North West SDE. (n.d.). How data is protected. Retrieved from (northwestsde.nhs.uk)
- Office for National Statistics (ONS). (n.d.). The Five Safes: A Framework for Secure Data Access. Retrieved from various ONS publications and guidance documents.
- Research Data Scotland. (n.d.). What is the Five Safes framework? Retrieved from (researchdata.scot)
- University College London Hospitals NHS Foundation Trust. (n.d.). Principles for the use of UCLH patient data in research. Retrieved from (uclh.nhs.uk)

Be the first to comment