Synthetic data is emerging as a critical innovation to improve access to real-world data (RWD) while protecting patient privacy. Amgen has been applying and evaluating synthetic data methods to responsibly unlock insights from sensitive healthcare data. Through projects on using, generating and evaluating synthetic data on different data sources, Amgen has built expertise that can now be applied within the Innovative Health Initiative (IHI) SEARCH project to advance solutions for synthetic data generation.
The Simulacrum is a synthetic version of the Cancer Analysis System (CAS) database that includes a national cancer registry and linked health care datasets in England 1. Amgen worked in collaboration with healthcare data partners to establish an operating model that leverages Simulacrum to conduct hypothesis generation, feasibility analyses and programming code development. This work illustrates how synthetic data can be used to reduce reliance on restricted patient-level data custodians, increase transparency, and accelerate research while safeguarding privacy 2.
Amgen has been involved in research applying synthetic data generation methods to German health insurance claims, developing a framework for assessing key dimensions including privacy, fidelity, scalability and utility. Using systemic lupus erythematosus (SLE) as a case study, several synthetic data generation approaches were compared. The study demonstrated that synthetic data could provide strong privacy protection and support exploratory analyses and programming code development. However, fidelity remained variable, with differences in distributions and outcomes across methods. These findings highlight the importance of continued methodological research to improve the quality and reliability of synthetic data 3.
Amgen is now extending this expertise through active participation in the IHI SEARCH consortium, which is focused on building a European ecosystem for synthetic data and federated learning across different therapeutic areas. Within this initiative, Amgen will be focusing on the standardisation and scalability of synthetic data generation. The aim is to establish a standardised methodology for synthetic data generation that can be applied consistently across RWD sources.
Synthetic data represents a transformative approach to bridge the need for data access whilst protecting patient privacy. Amgen’s work with Simulacrum and German claims data has demonstrated the feasibility and value of synthetic data generation, while also highlighting areas requiring further refinement.
As part of IHI SEARCH, Amgen is committed to translating these insights into advancing synthetic data generation approaches, with the goal to making synthetic data more widely available across Europe. Expanding access to high-quality synthetic datasets will enable deeper insights from RWD and accelerate healthcare research, while maintaining strong privacy protections.
1 Bright CJ, Lawton S, Benson S, Bomb M, Dodwell D, Henson KE, McPhail S, Miller L, Rashbass J, Turnbull A, Smittenaar R (2020) Data Resource Profile: The Systemic Anti-Cancer Therapy (SACT) dataset. International Journal of Epidemiology 49(1): 15-15l. doi: 10.1093/ije/dyz137.
2 Kafatos G, Levy J, Jose S, Hindocha P, Archangelidi O, Vernon S, Frayling L (2025) Leveraging Synthetic Data to Facilitate Research: A Collaborative Model for Analyzing Sensitive National Cancer Registry Data in England. Therapeutic Innovation & Regulatory Science doi: 10.1007/s43441-025-00820-z. Epub ahead of print.
3 Heidler T, Schultze M, Kafatos G, Behera B, Lienau C, Unger A, Balko V, Brandenburg J, Wang Z, Großer P, Hilbert A, Kossack N, Pignot M (2024) Improving Access to German Health Claims Data through Synthetic Data Generation: A methodological comparison of different approaches. ISPOR Europe 2024; Barcelona, Spain