Harmonising and linking biomedical and clinical data across disparate data archives to enable integrative cross-biobank research.

Author

Ola Spjuth
Maria Krestyaninova
Janna Hastings
Huei-Yi Shen
Jani Heikkinen
Melanie Waldenberger
Arnulf Langhammer
Claes Ladenvall
Tõnu Esko
Mats-Åke Persson
Jon Heggland
Joern Dietrich
Sandra Ose
Christian Gieger
Janina S Ried
Annette Peters
Isabel Fortier
Eco Jc de Geus
Janis Klovins
Linda Zaharenko
Gonneke Willemsen
Jouke-Jan Hottenga
Jan-Eric Litton
Juha Karvanen
Dorret I Boomsma
Leif Groop
Johan Rung
Juni Palmgren
Nancy L Pedersen
Mark I McCarthy
Cornelia M van Duijn
Kristian Hveem
Andres Metspalu
Samuli Ripatti
Inga Prokopenko
Jennifer R Harris

Show all

Summary, in English

A wealth of biospecimen samples are stored in modern globally distributed biobanks. Biomedical researchers worldwide need to be able to combine the available resources to improve the power of large-scale studies. A prerequisite for this effort is to be able to search and access phenotypic, clinical and other information about samples that are currently stored at biobanks in an integrated manner. However, privacy issues together with heterogeneous information systems and the lack of agreed-upon vocabularies have made specimen searching across multiple biobanks extremely challenging. We describe three case studies where we have linked samples and sample descriptions in order to facilitate global searching of available samples for research. The use cases include the ENGAGE (European Network for Genetic and Genomic Epidemiology) consortium comprising at least 39 cohorts, the SUMMIT (surrogate markers for micro- and macro-vascular hard endpoints for innovative diabetes tools) consortium and a pilot for data integration between a Swedish clinical health registry and a biobank. We used the Sample avAILability (SAIL) method for data linking: first, created harmonised variables and then annotated and made searchable information on the number of specimens available in individual biobanks for various phenotypic categories. By operating on this categorised availability data we sidestep many obstacles related to privacy that arise when handling real values and show that harmonised and annotated records about data availability across disparate biomedical archives provide a key methodological advance in pre-analysis exchange of information between biobanks, that is, during the project planning phase.European Journal of Human Genetics advance online publication, 26 August 2015; doi:10.1038/ejhg.2015.165.

Department/s

Publishing year

2015-08-26

Language

English

Publication/Series

European Journal of Human Genetics

Full text

Available as PDF - 616 kB
Download statistics

Links

Document type

Journal article

Publisher

Nature Publishing Group

Topic

Endocrinology and Diabetes

Status

Published

Research group

Genomics, Diabetes and Endocrinology

ISBN/ISSN/Other

ISSN: 1476-5438

Harmonising and linking biomedical and clinical data across disparate data archives to enable integrative cross-biobank research.

Summary, in English

Contact information

Shortcuts

Find us on social media

Collaboration and networks