Skip to content

A Public Resource Facilitating Clinical Use of Genomes

July 14, 2012

I’m proud to announce the open access publication of the first major research paper for the Personal Genome Project, published as an Inaugural Article in the Proceedings of the National Academy of Sciences: “A Public Resource Facilitating the Clinical Use of Genomes” by Ball, Thakuria, Zaranek, et al.

This paper ties together several related stories in the PGP’s research. It is an introduction to the PGP participant data as a public resource, it discusses some of our experiences with the pilot PGP-10 genomes, and it details our development of GET-Evidence in response to these experiences.

PGP participant data

Many reading this post already know it: the PGP has an exciting group of participants who have volunteered to make some highly personal data public for the sake of research. Our paper reviews the innovative open consent method that made this possible — by publicly sharing our human subjects research protocols, we hope to encourage other projects to adopt similar methods for publicly shared genomes and other re-identifiable research data. We highlight the many participants who have demonstrated their commitment by publicly sharing data from electronic health records and describe the PGP-10 genome data we have released.

Experiences with the pilot PGP-10 genomes

As we release genome data publicly we are also returning that data to participants — and, to provide participants with some understanding of what they are making public, we try to give an interpretation of their genome data. (Albeit a very rudimentary, tentative interpretation which almost certainly contains gaps and errors.) Thus, genome interpretation has become one of the core areas the PGP has focused on. The field is in its infancy though, and so far whole genome interpretation efforts by other groups have focused on discovering new disease-causing variants in people believed to have genetic disorders. Whole genomes have been effective for discovering new disease-causing variants, but most PGP participants (and most people!) aren’t believed to have serious undiscovered diseases. What happens when you interpret genomes of presumed-healthy people?

What happened was that we found several rare variants predicting diseases our participants didn’t have. Sometimes these were scary! The variant MYL2-A13T in PGP6 (hu04FD18, Stephen Pinker) was predicted to cause hypertrophic cardiomyopathy. The variant SCN5A-G615E in PGP9 (hu034DB1, Rosalynn Gill) was predicted to cause long QT syndrome. Both of these are late onset diseases that the participant could be unaware of, and could cause sudden death.

Some variants predicting severe effects in the PGP-10

Participant Variant Putative effect
PGP5 (hu9385BA) PKD1-R4276W Autosomal dominant polycystic kidney disease
PGP6 (hu04FD18) MYL2-A13T Hypertrophic cardiomyopathy
PGP9 (hu034DB1) SCN5A-G615E Long QT syndrome
PGP10 (hu604D39) PKD2-S804N Autosomal dominant polycystic kidney disease
PGP10 (hu604D39) RHO-G51A Autosomal dominant retinitis pigmentosa

These predictions couldn’t all be correct; the PGP-10 couldn’t possibly have all of these diseases. In the process of interpreting these genomes and reviewing genetic variants, we developed a system for reviewing variants that critically examines the evidence for the variant — not merely how bad the putative effect is, but how strong the evidence is supporting that hypothesis.

GET-Evidence: a system for personal genome interpretation

To facilitate the process of genome interpretation, we have created the Genome-Environment-Trait Evidence (GET-Evidence) system. Genome analysis is facilitated by GET-Evidence in a two step process: variants are prioritized for review, and then the review of a variant is recorded and used to create a genome report.

Prioritizing variants for review combines two reasons that one might want to pay special attention to a variant: the existence of published information associating the variant with an effect, and a computational prediction that the variant is disruptive and more likely to cause disease. As a result, the system combines interpretation based on existing knowledge with the potential for discovery of new disease-causing variants.

Variant interpretation then occurs through variant pages which gather numerous resources assisting the review process: variant frequency, computational predictions, and links to external databases. An editor can then add information to the variant’s page, including: the variant’s effect, inheritance pattern, links to relevant articles (through Pubmed IDs), and summaries of the variant’s effect. Most importantly, scores can be entered for the variant in a series of categories related to evidence and clinical effect. These scores allow for the automatic sorting and filtering of variants — once entered, a variant is considered “sufficiently evaluated” and can be used to automatically produce genome reports.

In keeping with the public sharing of genome and trait data, variant interpretations in GET-Evidence are freely shared as public domain under a CC0 license. GET-Evidence is a “peer production” model where all users are able to edit variants — by allowing others to edit, mistakes can be easily corrected, updates in understanding based on new literature can be applied more rapidly, and consensus can form as multiple editors combine their knowledge and perspectives.

A public resource

We’re thrilled to have this paper published, formally introducing the PGP as a resource for researchers. We believe publicly shared data are invaluable for research and a key component of the scientific method. We also hope that GET-Evidence and our experiences with genome interpretation help others in the development of methods for genome interpretation. In publicly sharing data, the PGP has adopted a bold new method for human subjects research: an educated cohort consenting to the unforeseeable risks involved and a highly participatory ongoing relationship. A big thanks goes out not only to the coauthors on this paper but also to our many participants, for making this dream of a public resource a reality.

3 Comments
  1. July 24, 2012 11:05 am

    Heh. I just tried to send an email to the list.
    What it shows: http://screencast.com/t/p4RggghQHygO
    What I sent: http://screencast.com/t/7c1BKXEkjT

    I cannot explain this….er, sorry…

  2. July 17, 2012 12:23 pm

    Thanks for the post Madeleine. I’ve been trying out the GET-Evidence database and I have some questions that aren’t addressed in the paper or the documentation. Where’s the best place to bring those?

    • July 19, 2012 1:13 pm

      I’m sorry it’s taken me a little while to respond, it’s a busy time! I think the GET-Evidence mailing lists are the ideal place to ask questions, because answers get shared with everyone subscribed and are archived publicly. Your request to join the mailing list has been approved now, and you’re welcome to email me privately if you’d prefer.

Comments are closed.

%d bloggers like this: