Genomics England is the company that will be sequencing the United Kingdom 100K Genomes Project. In response to a question raised during a recent “Town Hall” event, they stated that participants will have access to their data:
Q: Can I have access to my data? And how soon?
A: A patient can have access to their data if they wish and this can be provided to them in the appropriate format. The patient will receive the feedback from the sequencing and analysis of their genome via the clinician who is providing them with on-going care for their disease or condition.
We hope this means that participants will have access to the same “raw data” about their genomes that researchers will. If so, this represents an excellent step forward for both participants and researchers.
In this survey of GWAS studies by Ramoni et al., 4% of studies surveyed had returned individual results to participants. An NIH policy mandating data access for participants, as we recommended last week, would greatly improve this statistic. We hope providing participants access to their personal and identifiable study data becomes the norm rather than the exception.
Yesterday (11/20/13) we submitted to the United States National Institutes of Health (NIH) our public comments on their draft Genomic Data Sharing Policy. This policy will impact numerous participants, mandating the sharing of genetic data – data we know to be identifiable and meaningful. Please read our recommendations below, tell the NIH if you have similar concerns, and share this with others.
The Personal Genome Project (PGP) is a global network of research studies with thousands of participants dedicated to the creation of public resources composed of genome and phenotype data. The first PGP research study was founded at Harvard Medical School in 2005, and international sites now exist in three additional countries.1 The PGP has been at the forefront of participatory research in genome sequencing and has extensive experience with the ethical, privacy, and consent issues involved. We welcome this opportunity to publicly comment on the NIH draft Genomic Data Sharing (GDS) Policy and make recommendations for improvements.
Our recommendations can be summarized as two areas for improvements in section IV.C. of the draft policy: (1) to adequately inform researchers and participants of the inherent identifiability of genetic data, and (2) to require researchers share with participants their personal research data to establish reciprocity and to increase data sharing.
The inherent identifiability of genetic data
The draft GDS Policy makes no mention of the inherent identifiability of genetic data. All genetic and phenotype data shared is mandated to be “de-identified”. Footnote eight of the draft states: “’De-identified’ refers to removing information that could be used to associate a dataset or record with a human individual. Under this Policy, data should be de-identified according to the standards set forth in the HHS Regulations for the Protection of Human Subjects and the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule.”
This definition of “de-identified” is inconsistent: genetic data is inherently identifiable. Using nothing more than genetic data and other publicly available data, researchers were able to identify nearly 50 individuals whose samples were “de-identified” (i.e. all public data met the same standards mandated by this draft).2 It is now a documented fact that this type of genetic data, even if scrubbed of personal information as described in this draft, “could be used to associate a dataset or record with a human individual”. Genetic data itself violates the draft’s definition of “de-identified”.
In the past, de-identification of samples or data sets by stripping personal data (name, social security number, date of birth, etc.) was sufficient to avoid re-identification of a particular subject. Genetic data was not seen as an equivalently identifiable piece of information. This is demonstrated to no longer be the case, and the identifiability of genetic data is likely to increase and may eventually become trivial. Ancestry databases currently link genetic elements to surname and in the future are likely to link genetic elements to individual ancestors. Controlled-access databases create a legal barrier to re-identification, but data security breaches are possible and have been an increasingly high profile issue in recent years. If the NIH is to mandate that all participants in NIH-funded studies producing large-scale genetic data agree to broad sharing of their genetic and phenotypic data, it is mandating an exposure of many participants to a known re-identification risk.
If the NIH wishes to uphold the public trust in biomedical research, it must respect the right of research participants to be informed of relevant risks. If all potential participants in these studies are asked to agree that their “genomic and phenotypic data may be shared broadly for future research use” (link), they must also be adequately informed regarding the identifiability of that data.
We recommend this draft be amended to:
- Add language that acknowledges the inherent identifiability of human genetic data.
- Add to section IV.C.4 instructions for researchers to inform participants regarding the potential identifiability of the genomic data they are sharing (despite planned de-identification procedures) and, in the case of controlled-access data sets, the potential for data security breaches.
Sharing research data with participants
The draft GDS Policy mandates all NIH-funded research studies that wish to produce “large-scale”3 human genetic data require that all participants from whom samples are collected consent that their “genomic and phenotypic data may be shared broadly for future research use” (link). This is elsewhere defined as NIH-designated controlled-access or open-access databases (the latter only if participants “have provided explicit consent for sharing their data through open-access mechanisms”).
What is not addressed in this draft is a statement about genomic data sharing with the participants themselves. We strongly recommend the NIH consider including such a requirement for two reasons.
The first reason is to establish reciprocity in the data sharing mandate. This draft mandates all participants in NIH-funded studies generating large-scale genetic data allow broad access to their genomic and phenotypic data to unknown individuals – without ever having access to that data themselves. Participants’ genetic data is sensitive, meaningful, and identifiable. Participants deserve the reciprocal mandate that their personal data being shared with others also be shared with them.
The second reason is that this is a significant opportunity to further the NIH’s data sharing goals. Participant-managed data sharing is a promising mechanism for open-access data sharing. Even if participants would not have agreed to open-access at the outset of a study, their attitudes may change. Additionally, participants may wish to share their data with future studies in a selective manner. Participant access to data enables an additional participant-managed model for data sharing, and we can imagine a future where numerous studies benefit from participant-donated data.
We recommend the following:
- For participants consented after the effective date of this policy, add a requirement for researchers to give these participants access to their personal data that is shared with other researchers.
- Because some researchers may be unable to comply with this requirement, also allow researchers to instead provide specific reasons for why this data sharing cannot be performed. Some mechanism should also be provided for participants to access these reasons in a study-specific manner (such as in a public database).
1) In section IV.C.4: “If there are compelling scientific reasons that necessitate the use of cell lines or clinical specimens that were created or collected after the effective date of this Policy and that lack consent for research use and data sharing, investigators should provide a justification for the use of any such materials in the funding request.” We suggest clarification of whether the lack of informed consent automatically exempts the researcher from data sharing, or if data sharing is expected to occur despite the exemption.
2) We suggest clarification confirming that “sample identification” using genomic data or other genotypic assays which are not intended to identify individual human participants is acceptable (e.g. detection of duplicate samples across different studies for statistical validity or for quality assurance).
3) “Binary alignment matrix (BAM)” should probably be “Binary Alignment/Map (BAM)”. Assuming this is a reference to SAM and BAM files, there is no clear definition what the BAM acronym abbreviates (“B” could potentially mean “BGZF” or “Binary”), but a SAM file is defined here as a “Sequence Alignment/Map”: http://samtools.sourceforge.net/SAMv1.pdf
Many thanks to the Harvard PGP staff that contributed to these recommendations: Madeleine Ball, Jason Bobe, Michael Chou, George Church, Tom Clegg, Preston Estep, Jeantine Lunshof, and Alexander Wait Zaranek
 Three PGP sites exist currently outside the United States: (1) PGP-Canada, based out of the McLaughlin Centre, University Toronto & Sick Kids Hospital (2) PGP-UK, based out of the University College London and (3) another site in the EU with ethics approval, set to launch in early 2014. The PGP Global network is coordinated by PersonalGenomes.org, a 501(c)(3) nonprofit based in Boston, Massachusetts. To learn more please visit: http://www.personalgenomes.org/mission
 Gymrek M, McGuire AL, Golan D, Halperin E, Erlich Y. “Identifying personal genomes by surname inference.” Science. 2013 Jan 18;339(6117):321-4.
 Defined as more than 100 participants for genotyping or multi-gene sequence data, or whole genome sequence from a single participant.
Link to PDF version of these public comments: NIH_PGP_Public_Comments_GDS_Policy_11202013.pdf
We are pleased to announce the launch of the Personal Genome Project UK (PGP-UK), which is the third site in our global network and the first in Europe! The PGP-UK team is composed of Stephan Beck (University College London), Jane Kaye (University of Oxford), Rifat Hamoudi (University College London) and a great groups of advisors.
Nearly 500 people had written to PersonalGenomes.org over the past few years requesting that we launch a PGP research study for residents of the United Kindom. Until today the PGP was available only to people in the USA (Harvard Medical School) and Canada (McLaughlin Centre, University of Toronto). We’re working on growing the global PGP network to include dozens of countries where individuals and researchers are interested in working to create public data resources.
A press briefing was held at the Wellcome Trust yesterday that included:
- George Church, Professor of Genetics, Harvard Medical School, Director of PersonalGenomes.org
- Stephan Beck, Professor of Medical Genomics, University College London
- Jane Kaye, Director HeLEX – Centre for Health, Law and Emerging Technologies, University of Oxford
- Richard Durbin, Joint Head of Human Genetics and Head of Computational Genomics, Wellcome Trust Sanger Institute
A summary of press so far:
- Science Magazine online, 11/06/2013 – U.K. Researchers Launch Open-Access Genomes Project
- Nature News Blog, 11/07/2013 - Open-access genome project lands in UK
- The Telegraph online, 11/07/2013 - Volunteers sought for ‘open access’ genome project
- also: Daily Telegraph (printed version), p4
- Reuters online, 11/07/2013 - Britons invited to post their genomes online for science
- Press Association (via Yahoo News), 11/07/2013 - Volunteer hope for genome project
- The Guardian online, 11/07/2013 - Critics urge caution as UK genome project hunts for volunteers
- Financial Times online, 11/07/2013 - UK volunteers sought to put their genomes under public microscope
- Jersey Evening Post online, 11/07/2013 - Volunteer hope for genome project
- The Times online – It’s one giant leap into the genetic unknown
- BBC News Health online, 11/07/2013 - Massive DNA volunteer hunt begins
- phgfoundation 11/07/2013 - New player enters UK race to sequence 100,000 human genomes
- Topnews US online, 11/07/2013 - 100,000 Volunteers for DNA Sequencing
This weekend we switched www.personalgenomes.org to a nice new site. The new design is cleaner and hopefully easier to use.
It also features a redesign which represents the Harvard Personal Genome Project as one of multiple PGP sites. The Harvard PGP has been a pilot site and is still the origin of almost all current PGP public data and cell lines, but our project only enrolls volunteers with United States citizenship or permanent residency. The world is larger than that! Right now the only other site is the Canadian PGP site, but PersonalGenomes.Org looks forward to seeing to many other PGP sites around the world.
Abigail Wark is a research fellow in the Tabin Laboratory in the Department of Genetics at Harvard Medical School. Her research focuses on understanding the causes and consequences of variation in uniquely human traits. She is the Project Director for Circles in Human Evolution: the Areola, a citizen science collaboration with the PGP that is gearing up to provide the worlds first genetic study of diversity in human areolas.
PersonalGenomes.org has partnered with the Tabin Laboratory and Circles in Human Evolution to create a third party research opportunity for PGP volunteers. Abby recently sat down with us to answer questions about this project.
Why study the genetics of human diversity?
We are living in the golden age of human genetics. The bulk of what we know about genetics so far has to do with genes that have gone awry. Focusing on genetic dysfunction makes sense because we want to prevent and cure human disease. But strongly detrimental genetic variants are a very small part of human genetic diversity. The truth is that most of the variants we carry in our genomes do not cause us catastrophic problems. These variants help make each of us who we are; they make us biologically unique. But while we have learned so much about genetic origins of disease, we know almost nothing about the genetic signature of healthy diversity. The tools are available now to update our view on this, to examine and understand a bigger slice of human biodiversity.
Are there meaningful consequences to this genetic diversity?
Of course! Nature is full of examples where genetic diversity can have real functional significance. Sensory systems provide dramatic examples of this because they can lead animals to interpret the world in completely different ways. For example, think of an insect with a gene that enables it to see UV light. That insect has access to information that you and I don’t have. The same thing is true for much more subtle cases of diversity. If you are a fish, subtle changes to your sensory systems can affect your likelihood to school, which has big ramifications for how you live your life as a fish, how your respond to predators, etc. I think we all know intuitively that this is true for humans too. We vary in all kinds of traits, from physical traits like height and hair color to sensory traits like taste or odor perception. And there’s reason to believe that these traits can impact how we live our lives, from what foods we eat to how well we deal with hot weather. The era of personal genomics offers the chance to understand what this diversity is made of and what role it plays in our lives.
Why did you choose to study the areola?
Areolas probably seem like such a strange topic! But they turn out to be pretty fascinating. Areolas are the circular markings that surround the human nipple. Did you know that no other animals have these spots? They are defining marks of our species! The fact that areolas are circular is really significant for this study. Circles are one of nature’s simplest forms. Many interesting human traits are the opposite of this – they are extremely complex. The genetic recipe for building a human hand or brain is complicated which makes it very challenging to draw a line from even the simplest genetic changes to effects on human traits. But circles are simple to build and simple to change. It turns out that the simplicity of circles might give us a foothold for discovering the genetics pathways that define human traits.
So that is why areolas are a practical, though somewhat unusual topic, for studying human genetics. That said, one thing that has made this project really fascinating for us is that areolas are much more than just circular markings. In women, the pigmentation of the areola is believed to signal fertility and some evidence suggests that this may be one of the first indications of pregnancy. Areolas play an important role in nursing, providing both a visible target and a pheromonal attractant for newborn infants. In fact, the number of areolar glands, which differs from person to person, has been associated with infant weight gain. Developmentally, these glands are related to the mammary glands and one of the really exciting angles of our work is to see whether the areola might provide a window into the inner biology of breast. If it does, the results could be really important for many aspects of breast health, from lactation to cancer.
Why are you using a citizen science approach? What role do participants play?
Citizen-science is a movement to give non-scientists an active role in scientific discovery. I like to think that we are all potential scientists, just with different amounts of training. As children, many of us took delight in the act of discovery. We explored our backyards, tried to grow plants, observed animals, and mixed toothpaste and ketchup to see what we would get. As I see it, some of those kids became scientists and most did not, but there’s no reason to exclude non-scientists from that sense of discovery. In our project, participants become part of our field research team. They are given the tools and information they need to make observations about their own bodies. They share that data with us and our job is to compile and analyze the findings for the whole community. The whole project is a partnership.
Can participants sign up?
Yes, please do sign up! This is a PGP-specific study, so only PGP participants can join. This limits the number of people who can help, so we need you and all of your PGP friends! I want to emphasize that both men and women are encouraged to join our study. Submitting photographs through our server is very helpful, but is completely optional. Your data is still useful to us even if you choose not to include photographs.
To read more about the study or to sign up, please go to: https://my.personalgenomes.org/third_party/12
The data and samples that participants share in the Personal Genome Project (PGP) are considered highly identifiable. One of the key aspects for defining what it means to be an implementation of the Personal Genome Project is an absence of anonymity:
From our guidelines for PGP implementations:
“Non-anonymous. The risks of participant re-identification are addressed up front, as an integral part of the consent and enrollment process; neither anonymity nor confidentiality of participant identities or their data are promised to research participants.”
We have designed a consent process that includes many layers of upfront and ongoing education about the unique nature of public genomics research studies like the PGP. One of the important messages to participants is that their data are highly identifiable and therefore not “anonymous”. For example, the study guide that accompanies our mandatory entrance exam provides one of the more famous examples of how only a few pieces of demographic data can reveal a person’s identity:
From the PGP Study Guide:
“Identities can be discovered with surprisingly little information — for example, the combination of sex, birth date and ZIP code is specific enough to be uniquely identifying information for 87% of people!”
We know that hands-on demonstrations of otherwise abstract concepts can be extremely valuable for learning. Talking about a “personal genome” in the abstract can be a far different experience compared to wading through millions of variants contained in your very own personal genome sequence! So to enhance understanding of identifiability, we invited two research groups to demonstrate how re-identification is possible using public PGP data during GET Labs in Boston (April 25-26).
Latanya Sweeney’s Data Privacy Lab drew upon her pioneering work on the identifiability of demographic data to show how these techniques can be applied to public PGP profiles containing sex, birth date, and ZIP code. It was no surprise to find that many PGP participants are, in fact, identifiable. Indeed, all PGP participants should expect this potential outcome.
This is important, considering Harvard PGP participants are able to add ZIP codes to their public profiles in anticipation of research activities that explore how geographic location — and all the associated chemical exposures, microbes, viruses, air quality, allergens, etc. — impacts health. For anyone who was not at the GET Conference, Sweeney’s group has created a tool showing how identifiable you are in your own zip code. Check it out here: http://aboutmyinfo.org/
A word of caution is required here about the best way for PGP participants to respond: we strongly advise any participant concerned about the identifiability of their data to reconsider their participation in the Personal Genome Project. Another viewpoint, one that we find worrisome, is for participants in the PGP to deploy clever tricks for reducing the identifiability of their public data. As part of their demonstration, the Data Privacy Lab is providing tools to participants that “scrub” their data (e.g. replacing a 5 digit zip code with a 3 digit zip code, etc). This may create the impression of privacy, but it will not make participants anonymous. Earlier this year an exciting study published by Gymrek et al. in Yaniv Erlich’s lab forcefully demonstrated that genome data alone is extremely identifying. Melissa Gymrek also had a table at the GET Conference this year where she demonstrated the technique to participants. Their research matched whole genome Y-chromosome data to ancestry databases, which link surnames with Y-chromosome markers. With these surname clues and just a few other pieces of publicly available data, their group was able to identify specific individuals and families from their highly distributed “anonymous” cell lines.
Thus, all participants should believe that they are identifiable: there is no such thing as an “anonymous” genome!
In our experience, many participants want to be identified and are very open about which public profile is theirs. The PGP does not require participants to reveal their names, but with the media coverage of the Sweeney group’s work we realize that the project appears to outsiders as “anonymous” — even though participants, after passing our enrollment exam, know better (or should)! To meet the desires of some participants and to further clarify the non-anonymous nature of the PGP, we’re going to work on allowing participants to add their photos and/or name to their public PGP profiles. I expect it will make PGP profile pages much more “personal” and create a provocatively different scientific database!