Skip to content

Comments on GA4GH Data Sharing Draft

June 30, 2014

The following is a copy of our comments as submitted through the online interface at genomicsandhealth.org.

These comments pertain to the International Code of Conduct for Genomic and Health-Related Data Sharing – DRAFT # 6, produced by the Regulatory and Ethics Working Group of the Global Alliance for Genomics and Health. That draft document can be found at this URL: http://genomicsandhealth.org/our-work/work-products/international-code-conduct-genomic-and-health-related-data-sharing-draft-6

Our most important points are the first two. The first suggests an explicit mandate to inform individuals, families, and communities regarding identifiability of their data. The second suggests individuals, families, and communities from whom data is derived also be considered as potential data sharing recipients.

These comments come from the following Personal Genome Project (PGP)-associated contributors:

  • Misha Angrist (PersonalGenomes.org Board Member)
  • Madeleine P Ball (PGP Harvard, Director of Research & PersonalGenomes.org staff member)
  • Stephan Beck (PGP United Kingdom, Director)
  • Jason R Bobe (PersonalGenomes.org Executive Director & PGP Harvard, Director of Community)
  • Michael F Chou (PGP Harvard, Director of Human Subjects Research)
  • George M Church (PGP Harvard, Principal Investigator & PersonalGenomes.org President)
  • Preston W Estep (PGP Harvard, Director of Gerontology and Director of Collections)
  • Rifat Hamoudi (PGP United Kingdom, Computational Analysis and Development Leader)
  • Ryan Phelan (PersonalGenomes.org Board Member)
  • Jane Kaye (PGP United Kingdom, Ethics and Social Implications Leader)
  • Jeantine E Lunshof (PGP Harvard, Ethics Consultant)
  • Michelle N Meyer (PersonalGenomes.org Board Member)
  • Stephen W Scherer (PGP Canada, Principal Investigator)
  • Alexander Wait Zaranek (PGP Harvard, Director of Informatics)

1. We strongly suggest explicitly stating participants be informed about identifiability.

(Section 4, Guidelines 4.2)

To respect individuals, families, and communities, and to foster trust and integrity, we strongly believe the foundational principles should mean that individuals, families, and communities be informed about the identifiability of data relating to them. In particular, participants should be informed of the inherent identifiability of an individual from their genome, or from genotype profiling of multiple loci in their genome. To make this clear, section 4.2 of the guidelines:

4.2 Informing individuals, families and communities about the use and exchange of data relating to them, depending on the nature of the data.

Could be changed to specifically mention identifiability:

4.2 Informing individuals, families and communities about the use and exchange of data relating to them, including its identifiability, depending on the nature of the data.

2. We strongly suggest reciprocal consideration of data sharing to and from individuals.

(Section 4, Guidelines 5.2)

To respect individuals, families, and communities, and to foster trust and reciprocity, we strongly believe the foundational principles should mean that individuals, families, and communities from whom data are derived also be considered as potential data sharing recipients. To reflect this, section 5.2 of the guidelines could be updated to also describe consideration of the risks of data sharing to/with individuals, families, and communities (in addition to on/about):

5.2 Considering the realistic harms and benefits of data sharing on individuals, families and communities, including opportunity costs.

To also state “with” individuals, families, and communities:

5.2 Considering the realistic harms and benefits of data sharing on and with individuals, families and communities, including opportunity costs associated with both sharing and not sharing.

Additional Recommendations

3. We suggest avoiding some terms with markedly variable legal meaning.
(Preamble & Section 1)

There are a couple of terms in the draft that have meanings that vary considerably depending on country and legal context. Because this document is intended to convey global policy, we suggest avoiding these terms and, if appropriate, replacing them with terms which avoid unintended or inconsistent legal interpretation.

The first of these is the phrase “moral interests”. One interpretation of this is as “moral rights”, a term that, to our knowledge, varies markedly in its legal meaning. While we recognize the phrase “moral interests” reflects language in Article 27 of the UDHR, we recommend possibly avoiding it to reduce divergent understandings of the meaning of this document.

The other phrase with variable legal meaning is the term “good faith”. As with “moral rights”, in some countries and legal contexts “good faith” has a concrete legal meaning and can be breached. In other contexts, it is an appeal for fair behavior with no legal force.

4. We wonder if there is an expectation that this code may be binding, beyond the signees?
(Section 2)

If not generally binding or enforceable, we suggest changing the phrase:

This code applies to

To state:

This code can potentially be applied to

5. We suggest wording changes to the founding principles.
(Section 3)

The third foundational principle refers to what seems like two principles that aren’t strongly related: “advancing research” and “fair distribution of [research] benefits”. Also, because genomics research is often not related to health (e.g. ancestry), emphasis on “health and wellbeing” as a principle in themselves (the first principle) could be seen as implicitly excluding these fields of research. We suggest stronger emphasis of “research and scientific knowledge” would be more inclusive. Because “health and wellbeing” seem more related to “fair distribution of benefits”, we suggest rewording the foundational principles from:

1. Promote Health and Wellbeing
2. Respect Individuals, Families and Communities
3. Advance Research and the Fair Distribution of Benefits
4. Foster Trust, Integrity and Reciprocity

To instead be:

1. Advance Research and Scientific Knowledge
2. Respect Individuals, Families and Communities
3. Promote Health, Wellbeing, and Fair Distribution of Benefits
4. Foster Trust, Integrity and Reciprocity

6. We suggest explicitly recognizing donors as actors in consent.
(Section 4)

In keeping with the second foundational principle (respect for individuals, families, and communities), we suggest explicitly naming “donors” as those who are giving consent in this sentence:

This Code applies to data that has been consented to for use and/or approved therefor by competent authorities.

To state:

This Code applies to data that has been consented to by donors (or their legal representatives) for use and/or approved therefor by competent authorities.

7. We suggest specifying data provenance trace to the data source.
(Section 4, Guidelines 2.1)

To enable investigators to ensure that their data has been generated from well-consented sources, we recommend updating the phrase:

…tracking the chain of data exchange.

to state:

…tracking the chain of data exchange to its source.

8. We suggest avoiding potentially implying that perfect data security can be achieved.
(Section 4, Guidelines 3.3)

Because perfect data security is not achievable, we recommend changing the phrase:

Installing strict data security measures to prevent unauthorized access, data loss and misuse….

To state:

Installing strict data security measures to mitigate the risk of unauthorized access, data loss and misuse….

9. We suggest clarifying Part 5 of the Guidelines to communicate balancing of risk and benefit.
(Section 4, Guidelines 5)

The title for this section, “Minimizing Harm and Maximizing Benefits”, refers two very different extremes in decision-making. To communicate balancing consideration, we recommend changing the phrase and title:

minimizing harm and maximizing benefits

To instead be:

risk-benefit analysis

It was also unclear to us what outcomes would be considered as potential harms or benefits; it might also be helpful to give examples of these.

June 21 (Sat) Boston: PGP Harvard blood sampling

June 13, 2014

We’ve collected blood in Boston before at the GET conference, but attending the event isn’t always possible for local residents, so we’ve decided to hold a blood collection event on a weekend. We’re planning a sample collection at Harvard Medical School next Saturday June 21st, 10am-4pm.

PGP Harvard participants who have completed the PGP Participant Survey and all twelve trait surveys are invited to apply to donate blood. Also, this is for folks that aren’t already in the sequencing pipeline – no need to attend if you already have a genome or gave blood at GET2013 or GET2014. To apply, please log in to your participant account at my.pgp-hms.org and visit the collection event page. You can complete surveys (or check if you’ve already done them) by visiting the trait surveys page.

PGP Harvard data in Google Cloud Storage

May 30, 2014

At PGP Harvard our participants are, by and large, very enthusiastic about understanding genetics and their own genomes. Many participants are programmers, researchers, and often both! It should come as no surprise that our staff are often asked “can I see more of the raw data?”

Some drives our genomes arrived on. Porsche design! That’s how you know it’s quality. © 2012 Alexander Wait Zaranek, released as CC-BY.

Some drives our genomes arrived on. Porsche design! That’s how you know it’s quality.
© 2012 Alexander Wait Zaranek, CC-BY license.

We’ve always wanted the entire “raw data” to be public, for participants and researchers alike. One issue that stymied us was the intractable size of the data: this sort of data is typically shipped on terabyte disks. I’m now happy to share that we now have an answer and a place to find the data, although accessing this requires some familiarity with using a command line interface and maybe a smidge of programming.

The full data sets PGP Harvard received from Complete Genomics are now shared on a public bucket on Google Cloud Storage, using credits generously donated by Google. Data is organized by huID.

The bucket: gs://pgp-harvard-data-public

To access the bucket, you should read about installing and using gsutil.

Some example commands

List contents of bucket top level:
gsutil ls gs://pgp-harvard-data-public

Recursively list contents of hu011C57 directory, with date and file size details:
gsutil ls -Rl gs://pgp-harvard-data-public/hu011C57

Download/copy the var file from hu011C57 Complete Genomics data to your current directory (234 MB):
gsutil cp gs://pgp-harvard-data-public/hu011C57/GS000018120-DID/GS000015172-ASM/GS01669-DNA_B05/ASM/var-GS000015172-ASM.tsv.bz2 .

With multi-threading and recursion, copy the hu011C57 directory to your current directory. (40.8 GB):
gsutil -m cp -R gs://pgp-harvard-data-public/hu011C57 .

Use a Google Compute Engine VM to analyze the data

You can also access this data using virtual machines in the Google Compute Engine – this could save you a lot of disk space! Once you have a virtual machine you can, for example, use the Python Client Library to automatically access data.

Annual Group Photo of PGP Participants at GET Labs 2014

May 16, 2014

Taking a group photo of Harvard Personal Genome Project participants in attendance at the GET Conference has become a fun annual tradition (20142013, 2012).  This year, the group photo was taken on April 29, 2014 at the GET Labs event held at the IBM Innovation Center in Cambridge MA:

GET Labs 2014 Group Photo

Group photo of Harvard Personal Genome Project participants who attended GET Labs on April 29, 2014 in Cambridge MA. Photo credit: Aurelien Dailly for PersonalGenomes.org, CC-BY.  We were lucky to have French photographer Aurelien Dailly, who snapped this photo.  He is traveling throughout the United States for three months exploring people and places involved in open innovation and DIYbio.  Check out his portfolio of photos from his journey thus far.

Next Wednesday (May 7): Blood Collection in Mountain View CA

May 1, 2014
Blood samples in EDTA tubes, our current favored DNA source for whole genome sequencing. License: CC-BY-SA, by Lennart B.

Blood samples in EDTA tubes, CC-BY-SA, by Lennart B.

Blood is our current best source for getting DNA for whole genome sequencing. For PGP Harvard, GET conference blood collections in Boston have been a great success. But we know not all participants can travel to Boston for these events, so we want to pilot blood collection events in other cities. Our next event will be in Mountain View next week on Wednesday May 7th, between 1pm-5pm.

PGP Harvard participants who have completed the PGP Participant Survey and all twelve trait surveys are invited to apply to donate blood. To apply, please log in to your participant account at my.pgp-hms.org and visit the collection event page. You can complete surveys (or check if you’ve already done them) by visiting the trait surveys page.

GET Labs featured in the New York Times

April 28, 2014

Tomorrow we are bringing together over 100 Harvard Personal Genome Project participants and 20 research groups who wish to collaborate with them and make some science together!  We’re thrilled that the New York Times is featuring a profile of the event and its attendees (i.e. “omic astronauts”) in tomorrow’s print edition.  Check it out!

This is the fifth year that the nonprofit PersonalGenomes.org has organized the GET Conference, and it is going to be the best year yet.  One really exciting aspect is GET Labs, which we made into a standalone event the day before the regular conference. The focus is on *doing* science, not talking about it. We bring together a cohort of extremely well-characterized and well-consented individuals enrolled in the Harvard Personal Genome Project & researchers who wish to study them. Around 20 research groups signed-up to attend this year, and will be performing a wide range of activities from armpit microbiomes to adult stem cell establishment.  Everyone gets a little passport for documenting their adventures in health research.

getlabs_agenda_floorplan_500px

Agenda and map of GET Labs, seea high-resolution PDF.

 

 

You can read more about the research groups participating in GET Labs, here:

http://www.getconference.org/GET2014/labs.html

GET Labs Passports!

April 24, 2014

GET Labs and GET Conference next week are going to be a blast! We had a really cool idea I’m going to be excited to see in action: GET Labs passports!

Jason got some little notebooks for attendees to use during the day, with the plan of stamping the front with a GET Conference stamp:

GETLabs2014_booklet

Then Mike Chou suggested: why not a stamp for every activity? Then participants can collect stamps from each activity through the day! So we designed those and they just arrived – and they look awesome:

getlabs2014_stampsCheck out the GET conference site to learn more about the participants and researchers that will attend GET Labs!

Follow

Get every new post delivered to your Inbox.

Join 118 other followers