We are delighted to announce the launch yesterday of Genom Austria, the fourth member of the Global Network of Personal Genome Projects! This research study is a joint project of the CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, the Medical University of Vienna, and PersonalGenomes.org. Check out the team.
They launched having already sequenced the whole genomes of two volunteers and plan to enroll and sequence a total of 20 volunteers in the first year. With the addition of Genom Austria, the global network now has member sites at leading institutions in the United States, Canada, United Kingdom and Austria!
Read the press release (PDF).
PersonalGenomes.org is hiring! We are a start-up nonprofit, transforming big ideas about participatory research and open data into resources that can benefit everyone’s health. We are looking for people who are passionate about our mission and excited by the opportunity to work with amazing people all over the globe. We have several open positions, please check them out and share with your family and friends looking for new opportunities:
PGP Harvard is planning another weekend blood collection event in Boston. The event will take place at Harvard Medical School Saturday, September 20, 10am-4pm.
PGP Harvard participants who have completed the PGP Participant Survey and all twelve trait surveys are invited to apply to donate blood. Importantly, this event is NOT for those who already have a genome or gave blood at GET2013, GET2014, or at recent Boston or Mountain View collection events.
To apply, please log in to your participant account at my.pgp-hms.org and visit the collection event page. You can complete surveys (or check if you’ve already done them) by visiting the trait surveys page.
The following is a copy of our comments as submitted through the online interface at genomicsandhealth.org.
These comments pertain to the International Code of Conduct for Genomic and Health-Related Data Sharing – DRAFT # 6, produced by the Regulatory and Ethics Working Group of the Global Alliance for Genomics and Health. That draft document can be found at this URL: http://genomicsandhealth.org/our-work/work-products/international-code-conduct-genomic-and-health-related-data-sharing-draft-6
Our most important points are the first two. The first suggests an explicit mandate to inform individuals, families, and communities regarding identifiability of their data. The second suggests individuals, families, and communities from whom data is derived also be considered as potential data sharing recipients.
These comments come from the following Personal Genome Project (PGP)-associated contributors:
- Misha Angrist (PersonalGenomes.org Board Member)
- Madeleine P Ball (PGP Harvard, Director of Research & PersonalGenomes.org staff member)
- Stephan Beck (PGP United Kingdom, Director)
- Jason R Bobe (PersonalGenomes.org Executive Director & PGP Harvard, Director of Community)
- Michael F Chou (PGP Harvard, Director of Human Subjects Research)
- George M Church (PGP Harvard, Principal Investigator & PersonalGenomes.org President)
- Preston W Estep (PGP Harvard, Director of Gerontology and Director of Collections)
- Rifat Hamoudi (PGP United Kingdom, Computational Analysis and Development Leader)
- Ryan Phelan (PersonalGenomes.org Board Member)
- Jane Kaye (PGP United Kingdom, Ethics and Social Implications Leader)
- Jeantine E Lunshof (PGP Harvard, Ethics Consultant)
- Michelle N Meyer (PersonalGenomes.org Board Member)
- Stephen W Scherer (PGP Canada, Principal Investigator)
- Alexander Wait Zaranek (PGP Harvard, Director of Informatics)
1. We strongly suggest explicitly stating participants be informed about identifiability.
(Section 4, Guidelines 4.2)
To respect individuals, families, and communities, and to foster trust and integrity, we strongly believe the foundational principles should mean that individuals, families, and communities be informed about the identifiability of data relating to them. In particular, participants should be informed of the inherent identifiability of an individual from their genome, or from genotype profiling of multiple loci in their genome. To make this clear, section 4.2 of the guidelines:
4.2 Informing individuals, families and communities about the use and exchange of data relating to them, depending on the nature of the data.
Could be changed to specifically mention identifiability:
4.2 Informing individuals, families and communities about the use and exchange of data relating to them, including its identifiability, depending on the nature of the data.
2. We strongly suggest reciprocal consideration of data sharing to and from individuals.
(Section 4, Guidelines 5.2)
To respect individuals, families, and communities, and to foster trust and reciprocity, we strongly believe the foundational principles should mean that individuals, families, and communities from whom data are derived also be considered as potential data sharing recipients. To reflect this, section 5.2 of the guidelines could be updated to also describe consideration of the risks of data sharing to/with individuals, families, and communities (in addition to on/about):
5.2 Considering the realistic harms and benefits of data sharing on individuals, families and communities, including opportunity costs.
To also state “with” individuals, families, and communities:
5.2 Considering the realistic harms and benefits of data sharing on and with individuals, families and communities, including opportunity costs associated with both sharing and not sharing.
3. We suggest avoiding some terms with markedly variable legal meaning.
(Preamble & Section 1)
There are a couple of terms in the draft that have meanings that vary considerably depending on country and legal context. Because this document is intended to convey global policy, we suggest avoiding these terms and, if appropriate, replacing them with terms which avoid unintended or inconsistent legal interpretation.
The first of these is the phrase “moral interests”. One interpretation of this is as “moral rights”, a term that, to our knowledge, varies markedly in its legal meaning. While we recognize the phrase “moral interests” reflects language in Article 27 of the UDHR, we recommend possibly avoiding it to reduce divergent understandings of the meaning of this document.
The other phrase with variable legal meaning is the term “good faith”. As with “moral rights”, in some countries and legal contexts “good faith” has a concrete legal meaning and can be breached. In other contexts, it is an appeal for fair behavior with no legal force.
4. We wonder if there is an expectation that this code may be binding, beyond the signees?
If not generally binding or enforceable, we suggest changing the phrase:
This code applies to
This code can potentially be applied to
5. We suggest wording changes to the founding principles.
The third foundational principle refers to what seems like two principles that aren’t strongly related: “advancing research” and “fair distribution of [research] benefits”. Also, because genomics research is often not related to health (e.g. ancestry), emphasis on “health and wellbeing” as a principle in themselves (the first principle) could be seen as implicitly excluding these fields of research. We suggest stronger emphasis of “research and scientific knowledge” would be more inclusive. Because “health and wellbeing” seem more related to “fair distribution of benefits”, we suggest rewording the foundational principles from:
1. Promote Health and Wellbeing
2. Respect Individuals, Families and Communities
3. Advance Research and the Fair Distribution of Benefits
4. Foster Trust, Integrity and Reciprocity
To instead be:
1. Advance Research and Scientific Knowledge
2. Respect Individuals, Families and Communities
3. Promote Health, Wellbeing, and Fair Distribution of Benefits
4. Foster Trust, Integrity and Reciprocity
6. We suggest explicitly recognizing donors as actors in consent.
In keeping with the second foundational principle (respect for individuals, families, and communities), we suggest explicitly naming “donors” as those who are giving consent in this sentence:
This Code applies to data that has been consented to for use and/or approved therefor by competent authorities.
This Code applies to data that has been consented to by donors (or their legal representatives) for use and/or approved therefor by competent authorities.
7. We suggest specifying data provenance trace to the data source.
(Section 4, Guidelines 2.1)
To enable investigators to ensure that their data has been generated from well-consented sources, we recommend updating the phrase:
…tracking the chain of data exchange.
…tracking the chain of data exchange to its source.
8. We suggest avoiding potentially implying that perfect data security can be achieved.
(Section 4, Guidelines 3.3)
Because perfect data security is not achievable, we recommend changing the phrase:
Installing strict data security measures to prevent unauthorized access, data loss and misuse….
Installing strict data security measures to mitigate the risk of unauthorized access, data loss and misuse….
9. We suggest clarifying Part 5 of the Guidelines to communicate balancing of risk and benefit.
(Section 4, Guidelines 5)
The title for this section, “Minimizing Harm and Maximizing Benefits”, refers two very different extremes in decision-making. To communicate balancing consideration, we recommend changing the phrase and title:
minimizing harm and maximizing benefits
To instead be:
It was also unclear to us what outcomes would be considered as potential harms or benefits; it might also be helpful to give examples of these.
We’ve collected blood in Boston before at the GET conference, but attending the event isn’t always possible for local residents, so we’ve decided to hold a blood collection event on a weekend. We’re planning a sample collection at Harvard Medical School next Saturday June 21st, 10am-4pm.
PGP Harvard participants who have completed the PGP Participant Survey and all twelve trait surveys are invited to apply to donate blood. Also, this is for folks that aren’t already in the sequencing pipeline – no need to attend if you already have a genome or gave blood at GET2013 or GET2014. To apply, please log in to your participant account at my.pgp-hms.org and visit the collection event page. You can complete surveys (or check if you’ve already done them) by visiting the trait surveys page.
At PGP Harvard our participants are, by and large, very enthusiastic about understanding genetics and their own genomes. Many participants are programmers, researchers, and often both! It should come as no surprise that our staff are often asked “can I see more of the raw data?”
We’ve always wanted the entire “raw data” to be public, for participants and researchers alike. One issue that stymied us was the intractable size of the data: this sort of data is typically shipped on terabyte disks. I’m now happy to share that we now have an answer and a place to find the data, although accessing this requires some familiarity with using a command line interface and maybe a smidge of programming.
The full data sets PGP Harvard received from Complete Genomics are now shared on a public bucket on Google Cloud Storage, using credits generously donated by Google. Data is organized by huID.
The bucket: gs://pgp-harvard-data-public
To access the bucket, you should read about installing and using gsutil.
Some example commands
List contents of bucket top level:
gsutil ls gs://pgp-harvard-data-public
Download/copy the var file from hu011C57 Complete Genomics data to your current directory (234 MB):
gsutil cp gs://pgp-harvard-data-public/hu011C57/GS000018120-DID/GS000015172-ASM/GS01669-DNA_B05/ASM/var-GS000015172-ASM.tsv.bz2 .
With multi-threading and recursion, copy the hu011C57 directory to your current directory. (40.8 GB):
gsutil -m cp -R gs://pgp-harvard-data-public/hu011C57 .
Use a Google Compute Engine VM to analyze the data
You can also access this data using virtual machines in the Google Compute Engine – this could save you a lot of disk space! Once you have a virtual machine you can, for example, use the Python Client Library to automatically access data.