Skip to content

The First Rule of Genomic Identifiability

February 8, 2013

20130208_misha_2This is a guest blog post from Misha Angrist, Ph.D., an author, assistant professor at Duke University, and PGP-4.

In the 7 February 2013 edition of Nature I have a commentary on genomic privacy arguing that it is time to re-frame how we think about this issue. I wrote this partly in response to the Science paper by Gymrek et al in which the authors used a combination of public genetic and genealogical data to re-identify the surnames of supposedly anonymous—or at least “de-identified”—people.

The initial reaction to my commentary from the senior author of the paper, Yaniv Erlich, was anger. He felt that I was denigrating his work, that I was implying that it had been done before and therefore it was not a big deal. Rereading my piece, I can see how he would think that; given a do-over and more space I reckon I would phrase some things differently. But I did not mean anything of the sort.

Yaniv and I had an honest and respectful exchange. His revised and measured response is here; I admire and appreciate his willingness to reconsider his initial reaction. I did my best to assure him that my goal was never to minimize what he and his team had done. On the contrary: Academic “privacy hackers” like Yaniv Erlich, Latanya Sweeney and Brad Malin are essential to understanding how secure our genomic data might—and might not—be.

What I was really trying to convey was frustration: we now have a decade’s worth of data demonstrating that genetic information is identifying. NIH (disclosure: I am a recipient of NIH funding), in its commentary that appears alongside the Erlich group’s paper, says it is concerned about this issue, but doesn’t seem all that willing to entertain policy alternatives that fundamentally challenge the status quo. The PGP, for example, is not mentioned.

Folks who study human beings are nervous. Indeed, in a phone conversation today Yaniv told me that a senior colleague of his said that his paper should not even be published lest it lead to a shutdown of public sequencing resources.

Really? So genomic identifiability is like Fight Club?


Three clarifications:

Genome re-identification in the news

January 17, 2013

Since its founding, the Personal Genome Project has only accepted participants who understand and acknowledge re-identification as a potential risk. This “open consent” approach arose from our argument that privacy may be over-promised and that re-identification is increasingly possible as technology advances.

Dramatic progress in re-identification has been published today in Science (Gymrek et al.), and is reported on in Wired (“Scientists Discover How to Identify People From ‘Anonymous’ Genomes”). Wired’s article features some quotes from George Church and highlights our project.

PGP Forum and Wiki

December 14, 2012

James_Turner_photo

This is a guest blog post from James M. Turner, a Boston-area software engineer, freelance journalist, author, and PGP-65. James recently created a forum and wiki for discussion of topics related to the Personal Genome Project. While our staff isn’t responsible for these sites, we plan to contribute to them and hope they provide an additional place for PGP participants to find useful information and answers to their questions.

When I was 8, I read “The Andromeda Strain”, by Michael Crichton (yes, the geek force was strong in this one, even at a young age.) The book left a strong impression on me about the future of genetics, to the degree that I was writing programs in high school to convert DNA sequences to amino acids. Mind you, this was in the late 1970s, when you had to walk uphill both ways to school to save your BASIC programs on paper tape, but tell that to the kids today, I tell you…

For a while, I was sure that my future would be in the biosciences, perhaps as a geneticist. Unfortunately (or perhaps fortunately) for the biosciences, I was distracted from base pairs by the other up and coming technology of the time, computers. Although I continued to avidly follow the life sciences, I fell in love with software and have spent the last 35 years making hardware jump through hoops with clever code.

But a funny thing happened on the way to 2013. Genomics, and the obscenely steep slope of the $/genome price slide, has created another example of what trendy geeks like to call ‘Big Data.’ Big data is a challenge, because it strains the computational and storage limits of computers to analyze, but it’s also an opportunity to correlate and draw new insights from datasets that used to live in their own private silos. The PGP, microbiome atlases, health data, exercise records, phenotypic traits, diet and much more are now digital and starting to hang around in the same neighborhoods. Between the years of 8 and 50, my two passions became “two great tastes, that taste great together.”

I first became aware of the new field of personal genomics when I researched 23andMe and deCODEme for an article I wrote in 2009. As part of the research for the piece, I got my Single Nucleotide Polymorphism (SNP) data from 23andMe, while my wife tried out deCODEme. As anyone who has used the SNP services knows, there’s interesting data to be looked at, but it’s just a tiny fraction of what goes on in the entire genome. I had experienced a taste of my genome, now I wanted the real deal.

That’s what led me to the PGP, drooling into test tubes, hanging out at GET 2012, and finally receiving my whole genome sequence a few months ago. Once I had all those lovely base pairs to play with, it immediately became clear that there’s not a really good user manual for the data, a “Genome Interpretation for Dummies.” I’m a pretty tech-savvy guy, and know enough about biology to be dangerous, but I quickly found myself dealing with the subtleties of GFF vs GTF vs BED format, comparison shopping genome browsers, and coming to the realization that a “whole” genome has small holes scattered throughout it (this must be why they call it shotgun sequencing…)

One of the things I know well from software is that crowdsourcing works. That’s the entire model behind GET-Evidence, many eyes and fingers building up a larger and more useful database of gene to phenotype relationships, so that eventually a newcomer will have a wealth of information about their genome. But what’s missing right now is a place to talk about the process, learn from each other, and share what works and what doesn’t.

I started talking to the folks at the PGP a few months ago about the idea of setting up a forum and Wiki for PGP participants (and researchers, and anyone else who wants to join in) to share information, look for help, or just chat. For a number of reasons, it was decided that it would be better to have them hosted and administered outside of the formal PGP organization, and I was asked if I was interested in setting them up. I was, and have.

At http://forum.personal-genome.org/, you’ll find the Personal Genome Project Forum. It’s a place to introduce yourself, discuss the PGP, GET-Evidence, genomics, and anything else that you want to. The PGP is a community as much as a project, and people in a community should have a town square to mill around and chat.

For more formalized knowledge transfer, there’s also http://wiki.personal-genome.org/, the PGP Wiki. Hopefully, this will grow into a fount of information about the how and why of genomes. I expect there will be a fair amount of cross-pollination between the Wiki and the forums, with forum discussions turning into Wiki articles, and people discussing wiki topics on the forums. There aren’t a lot of rules at the moment (beyond the obvious about spam and privacy and civility), so the personality of the fraternal sites will evolve as people use them.

So, they’re there, they’re open for business, and with this blog posting, they’re announced. A good first start would be to drop by the forum, register as a user, and put up an introduction in the appropriate topic. There’s only two posts there right now, and it’s a little lonely. We could also really use some articles on the basics of genomic data in the Wiki, I’m going to try to contribute as I have time, and some of the PGP staff have indicated there’s stuff they’d like to write, but the more the merrier!

Personal Genome Project Canada Launches

December 9, 2012
Canadian Flag

Canadian flag. Thanks to flickr user: ianalexandermartin (CC-BY-NC)

After several years of work, I am very happy to say that PGP-Canada has officially launched!  I have had the great pleasure of working with the team led by Stephen Scherer at University of Toronto and the Hospital for Sick Kids to help organize a Canadian Personal Genome Project (PGP-Canada).  This story stretches all the way back to July 2006 when George Church and his wife Ting Wu went to Toronto to speak about the Personal Genome Project. Read the press release from University of Toronto.

The Toronto Globe and Mail has created an amazing series about PGP-Canada and personal genomics generally called “Our Time to Lead: The DNA Dilemma”: http://www.theglobeandmail.com/news/national/time-to-lead/

The content is really impressive in its scope, detail, balance and emotion.  Especially touching are the videos exploring the human condition through the lens of individuals coping with disease and genetic risk.  Altogether the series includes articles, videos, a poll, a digital game and other interactive material.  A live debate about the risks and rewards of genetics research is schedule for December 18th.

 

For ease of navigating, I organized the content for you into 4 sets of links:

 

Series homepage: 

 

Personal Genome Project and Public Genomics

 

Case Studies: Genetics in the Real World

These stories are touching and inspiring. I have to say that I’m compelled to give hugs to all of these people.

 

Background Information:

Presentation on PGP to the Duke University School of Medicine Class of 1972

December 6, 2012
Clifford_Andrew

Clifford Andrew (PGP-84) hiking on the Appalachian Trail in North Carolina (Photo posted with permission)

This is a guest blog post from Clifford G. Andrew, M.D., Ph.D., Duke University School of Medicine, Class of 1972.  Adjunct Assistant Professor of Neurology, Johns Hopkins University and PGP-84.

As a practicing physician, health protagonist and amateur genealogist, I have been an enthusiastic participant in the Personal Genome Project since enrolling in October 2010. This past fall, I returned to my alma mater, Duke University, for my Medical School Class of 1972 40th reunion. As in the past, we had set up a “mini-symposium” for discussion of various medical topics of interest to lay persons including spouses and families of our graduating class.

I decided that it would be appropriate to give a presentation on PGP to our group. As it turned out, our reunion was taking place within weeks after Robert Lefkowitz was announced as the 2012 Nobel Laureate in Chemistry for his research while we were at Duke on “G-Protein-Coupled Receptors.” (1) In effort to put my talk in some context, I went back and did some research on previous Nobel laureates, and found to my surprise something which tied directly into the Personal Genome Project.

It turns out that 50 years earlier, the Nobel Prize in Physiology and Medicine for 1962 (while our group was still in high school) was awarded jointly to Francis Crick, James Watson, Maurice Wilkins “for their discoveries concerning the molecular structure of nucleic acids and its significance for information transfer in living material.” (2)

In his Stockholm banquet acceptance speech, Watson was quoted as having said, “With our discovery of the structure of DNA, we knew that a new world had been opened and an old world, which seemed rather mystical was gone. At that time some biologists were not very sympathetic with us because we wanted to solve a biological truth by physical means. But fortunately some understood that by using the techniques of physics and chemistry, a real contribution to biology could be made.”(3)

I went on to describe the Human Genome Project, first begun in 1990 with working draft in 2000 and final published draft in 2003, mapping for the first time the entire haploid reference genome for Homo sapiens containing 3 billion base pairs and 25,000 identified genes (but accounting for only 1% of the base pairs). (4) In 2007 James Watson was the second person to publish his fully sequenced genome online, stating that he did this “to encourage the development of an era of personalized medicine in which information contained in our genomes can be used to identify and prevent disease and to create individualized medical therapies.” (5)

Indeed, that is what the Personal Genome Project is all about: using an innovative method for “open consent” in human subjects research, wherein all of the genomic data, as well as personalized details about environmental factors, and traits is posted on the internet, achieving a critical mass (105 individuals) of associated data, and allowing open access to physicians, medical scientists, and researchers for advancement of knowledge on how DNA sequence combine with environmental factors to result in  clinical factors, traits, health and disease. (6)

In 1997, I was contacted by a Harvard researcher to participate in a 15 year study with 15,000 other physicians wherein we took four pills daily: vitamin C, vitamin E, beta-carotene, and multivitamin; or placebo for each. We reported annually on environmental traits, medications, and diseases. When the code was broken for the first three, I learned that I had been on PLACEBO for all, and that none of them influenced to any significant degree the incidence of cardiovascular disease or cancer. The study was concluded the month of the Duke mini-symposium, and I learned not only that I had been on the REAL multivitamin, but also that this had resulted in a small but significant 12% decrease in the incidence of overall cancer. (7)

The Physicians Health Study II was a demonstration of how evidenced-medicine is supposed to work. As meaningful as PHSII was, I expect the PGP to contribute several orders of magnitude more in terms of significant and long-term advancement of our understanding of how genetics and environment contribute to physical characteristics and health.

At Duke I passed out copies of the PGP pamphlet (8) and encouraged participants to enroll in the project. I am now doing the same with select patients of mine. I would suggest that each of us reach out to family, and friends to raise our numbers and get PGP to the 100,000 participant goal. Can you imagine what the GET Conference will look like in 10 years with numbers like that?

REFERENCES:

(1)  http://www.nobelprize.org/nobel_prizes/chemistry/laureates/2012/

(2)  http://www.nobelprize.org/nobel_prizes/medicine/laureates/1962/

(3)  http://www.nobelprize.org/nobel_prizes/medicine/laureates/1962/watson-speech.html

(4)  http://www.ornl.gov/sci/techresources/Human_Genome/project/about.shtml

(5)  http://www.cshl.edu/Archive/watson-genotype-viewer-now-on-line

(6)  http://www.personalgenomes.org/

(7)  http://jama.jamanetwork.com/article.aspx?articleid=1380451

(8)  http://www.personalgenomes.org/newsletter/pgp_flyer_longedgeflip.pdf

You may download a copy of the presentation here (1.2 MB PDF):

Personal Genome Project Duke Med School 40th Reunion 2012

American Gut: Q&A with Jeff Leach

November 30, 2012
Copyright Human Food Project

The American Gut Project.  Copyright Human Food Project, with permission.

Jeff Leach is an anthropologist by background – now an eager, albeit gray-haired graduate student at the London School of Hygiene & Tropical Medicine – and the Founder of the Human Food Project. In collaboration with Rob Knight at the University of Colorado-Boulder and Jack Gilbert at Argonne National Laboratory they have launched a large-scale citizen science project to document the diversity of the American gut microbial ecosystem – giving anyone in the U.S. the opportunity to participate and to compare their microbes with thousands of others with the hope of revealing patterns in diet and lifestyle that shape our microbial communities.

PersonalGenomes.org has partnered with American Gut to create a third party research opportunity for PGP volunteers. We recently sat down with Jeff for a brief Q & A:

Where did the idea for American Gut come from and why use crowd sourcing?

The idea was hatched out of some pilot research we were doing among traditional groups in Southern Africa. In 2011, we started laying the groundwork to characterize the gut microbiome of hunter-gatherers and subsistence farmers living on the edge of urbanization with the hopes of getting a glimpse at what our ancestral microbiome may have looked like before it ran gut first into the buzz saw of globalization. In other words, if we ever want to understand what a ‘normal’ or ‘healthy’ gut microbiome looks like – and how we might achieve that ancestral gut, again – we will need to look outside our modern world. So the idea was to assemble a large and diverse dataset of ‘modern’ microbiomes with extensive metadata (the American Gut Project) to get a handle on the diets and lifestyles that were potentially driving variability, to compare with more ‘untainted’ microbiomes around the world. Jeff Gordon, Maria Dominguez-Bello and others are doing important work in this regard as well.

Crowd sourcing was a great platform to launch this effort – quickly and efficiently. Citizen science has a long history and the gut microbiome lends itself well to public participation. As I believe food and health is this generation’s civil rights movement, social and crowd-like platforms for building a community, awareness and actionable solutions will emerge as great tools.

I see that a pretty impressive team has been assembled for the project

Yeah. The credit for that all goes to Rob Knight. He and Jack, along with others, launched the Earth Microbiome Project to analyze the microbial diversity of the entire earth. It’s the big-idea-can-do thinking that makes the American Gut project. The list of collaborators that have signed on to the project will participate via inter-lab/university agreements in the data interpretation. Some labs may also do some select portions of the analysis as well. The depth of the ‘collaborator bench’ will allow for some pretty interesting insight into the data as it comes available.

How can the PGP volunteers contribute or benefit from American Gut

No doubt your genetic makeup plays a significant role in the shaping your second genome (your microbiome).  I think Ruth Ley from Cornell, who is also a collaborator on this project, is doing some interesting research in that regard. The PGP community is obviously very interested in their health and well-being and participating in the process. The detailed metadata collected with American Gut, coupled with the existing data from PGP, will make for some interesting insight. And the fact that PGP volunteers that join the American Gut project will be able to ‘claim’ their results, thus sharing of computable datasets between the two projects. This will be unprecedented.

We are very excited about this effort – but there is a window of opportunity that closes on January 7.  So anyone that think they might want to help with American Gut, will need to do so quickly.  Note also, first-in-first-out rule applies – ie, those who sign up the earliest will receive the quickest results.

Learn more and sign-up at www.indiegogo.com/americangut

12/1: There is a problem with the indiegogo site. They have been notified so hopefully it will be fixed soon.

12/3: Fixed!  The American Gut page is live again after being down for the weekend due to some sort of administrative hiccup that required IndieGogo staff to intervene and nobody was available Saturday or Sunday.  I guess I’m happy that IndieGogo staff got to take off for the weekend!  :)

Seeking Diversity (Especially Families)

November 29, 2012

As Alex blogged earlier, the Personal Genome Project (PGP) is hoping to work with the National Institutes of Science and Technology (NIST) to use PGP materials (cell lines and DNA) for NIST’s “Genome in a Bottle” reference material. One of the things NIST is looking for, and that we’d love to see more of, is diversity.

Seeking diversity

Because the PGP is self-recruiting, we don’t have a very balanced set of participants. “Self-recruitment” means that all participants have enrolled in our project through word of mouth, finding our website and enrolling online. To put it bluntly, that means we mostly end up with young white men. Here are some graphs from our recent paper:

Researchers would love to see more diversity in PGP data. However, the self-recruitment model is ideally suited for the PGP: self-recruited participants are more likely to have a good understanding of the goals and risks involved. And so we’ll simply put out the word here for people already following us: underrepresented groups are especially appreciated. Research within one or two racial/ethnic categories isn’t necessarily a virtue, biracial and multiracial heritage may be even more interesting to some researchers and can open more areas for future research.

Why NIST wants PGP material

What does it mean for NIST to be considering the PGP for genome reference material? Major advances have been occurring in DNA sequencing and personal genomics; it’s a competitive and rapidly evolving market. Manufacturers of instruments need standard human genome material to use for calibration of their machines, and others would like to use it to compare different devices and create common quality metrics. For example, the Food and Drug Administration may use this material for certification of sequencing instruments. It is possible this reference material will become ubiquitous — spread far and wide, in a variety of commercial devices, with little ability to protect or regulate the uses of it.

To visualize the potential widespread usage of reference material, I’ve made this informal sketch of the sequencing process. Not to scale, lasers may not be included in all models.

Why use PGP samples? Even though NIST’s genome reference material will be manufactured using cell lines, those cell lines originally come from a person — society is realizing that tissues and DNA are very personal things! In the wake of the experiences of Henrietta Lacks and HeLa cells (as documented in Rebecca Skloot’s recent book), NIST wants to make sure the material they use comes from people who understood and agreed to potentially widespread usage. The PGP’s “open consent” is a gold standard for careful consent to broad usage: PGP participants acknowledge and agree to things other subjects have not, including the risk of re-identification and commercial uses of their material.1

Parents and children

In particular, NIST is looking for “trios”: two parents and a child. Researchers like to use samples from trios because they know every piece of DNA in the child comes from one of the parents. This makes it easier to assess error rates — and that sort of quality control is what NIST expects the genome material to be used for. We think all such family groups are valuable, but current trios in the PGP haven’t been the most diverse…

Trio Self-reported race/ethnicity
hu5D9DE3/huFE1569/huA8BCB0 White
hu91BD69/hu38168C/huCA017E Asian
hu16360E/hu28DA07/hu1A7894 White
huAA53E0/hu8E87A9/hu6E4515 White
huB4E01A/hu39790F/hu781C4E White
huCDC3B8/huFE01E1/hu1E8957/hu961968 White
huAA8CF9/hu7DB29E/hu2ED134 White
hu620F18/huD4BF17/huD62596 White, American Indian / Alaskan Native2
hu1053CC/huFAF1FE/hu40D515 Unknown
huC434ED/huD44B2B/hu25DE85 White
hu36CDF1/hu210C97/huCFD87D White

NIST’s team has told us they would like to have samples representing the breadth of human genetic diversity — various ethnicities and multiracial heritages. Our project would love to enable that, but we are sensitive to the history of minorities in human subjects research. Participation in the PGP has many acknowledged risks and no promised benefits — it definitely isn’t for everyone. I can’t even promise that NIST will use your samples (many would see that as a benefit rather than a risk). I’m simply going to write that NIST and other researchers wish they could have more diversity, and about the lack of it in the PGP — maybe, if we’re lucky, it will inspire some new participants to self-recruit.


Footnotes

1: More specifically, the PGP promises not to seek financial gain or commercial profit from materials (although cost recovery is allowed), but may “permit your cell lines to be used for research, patient care, commercial or other purposes”. We don’t expect anyone’s genome to be uniquely valuable, but blocking any and all commercial uses of shared material is often viewed as overly restrictive. If a company wants to include NIST’s DNA standard in a commercial machine, that would be a commercial usage of the material.
2: I have observed a high rate of people reporting both White and Native American ancestry in PGP participants (it’s the second most common category, see the table below). While not questioning any specific individual, some genealogists have cast doubts on the high frequency of these cases. Elizabeth Warren’s experience may be a common one.

Self-reported race/ethnicity # of PGP participants
White 1285
American Indian / Alaska Native, White
40
Asian 38
Hispanic or Latino, White 24
Hispanic or Latino 21
Black or African American 12
Black or African American 12
Asian, White 11
Black or African American, White 7
American Indian / Alaska Native, Black or African American, White 4
Hispanic or Latino, Black or African American, White 3
American Indian / Alaska Native, Hispanic or Latino, White 3

 

Follow

Get every new post delivered to your Inbox.

Join 102 other followers