Skip to content

Trait survey data download and analysis

February 25, 2013

A couple months ago I posted about our new trait surveys. Thank you to all the participants who completed these so far! I’m following up now with links to the data, a bit of Python code for interpreting them, and a little analysis.

The website has been updated to make the csv format files containing results from google surveys publicly available. Here are links to the PGP participant survey, and the twelve trait surveys:

Cancer Respiratory system
Endocrine/Metabolic/Nutritional/Immunity Digestive system
Blood Genitourinary systems
Nervous system Skin and subcutaneous tissue
Vision and hearing Musculoskeletal and connective tissue
Circulatory system Congenital Traits and Anomalies

To help anyone interested in parsing the data, I’ve shared the Python code I’ve used on Github. There’s also a copy of the survey data as of Feb 23 there along with some demo code, and a Readme.

Finally, I did a bit of parsing of the trait survey data, combined with some features of the participant survey data (age and sex) to see if I could find anything interesting for you. The top 20 pairwise correlations below don’t look terribly surprising, but I learned some new things. For example, I didn’t know that TMJ disorder is much more common in women. (Of course, a quick web search discovers confirms this is a well known association.) This type of analysis isn’t my forte — maybe someone with more experience with machine learning can do cool stuff with this data!

(To read the table below: the first row indicates “60% of females reported a UTI, while only 12% of others reported one”.)

Trait 1 Trait 2 p-value1 % 1 with 2 % others with 2
Female Urinary tract infection (UTI) 3.1e-23 60.3% 12.0%
High cholesterol (hypercholesterolemia) High triglycerides
(hypertriglyceridemia)
2.6e-15 36.9% 3.1%
Female Ovarian cysts 1.9e-13 21.9% 0.4%2
60+ years Age-related cataract 9.1e-13 36.8% 2.8%
Female Iron deficiency anemia 2.4e-11 28.5% 4.0%
Myopia (Nearsightedness) Astigmatism 5.7e-11 59.2% 25.5%
High cholesterol (hypercholesterolemia) Hypertension 1.6e-09 42.9% 11.6%
Iron deficiency anemia Urinary tract infection (UTI) 1.9e-09 69.2% 25.3%
60+ years Age-related hearing loss 6.5e-09 31.6% 4.1%
Urinary tract infection (UTI) Ovarian cysts 1.7e-08 22.0% 3.1%
Polycystic ovary syndrome (PCOS) Ovarian cysts 2.1e-08 66.7% 6.6%
Hypothyroidism Hashimoto’s thyroiditis 2.7e-08 23.1% 0.6%
Temporomandibular joint (TMJ) disorder Fibrocystic breast disease 5.3e-08 28.6% 2.4%
Nasal polyps Chronic sinusitis 7.0e-08 61.1% 7.8%
Osteoarthritis Bone spurs 1.1e-07 29.8% 3.6%
Female Temporomandibular joint (TMJ) disorder 1.4e-07 21.9% 4.0%
Female Fibrocystic breast disease 1.9e-07 12.6% 0.4%2
Male Hair loss (includes female
and male pattern baldness)
3.0e-07 29.5% 8.3%
Carpal tunnel syndrome Temporomandibular joint (TMJ) disorder 4.4e-07 52.2% 8.5%
Urinary tract infection (UTI) Fibrocystic breast disease 5.4e-07 14.4% 1.2%


1As calculated using a Fisher’s Exact test. Note that these are not corrected for multiple hypothesis testing. I think a pessimistic Bonferroni correction would demand around 1e-6 for the magic ‘p = 0.05’ cutoff.
2I didn’t look closely, but I suspect these non-zero numbers are because we have some transgender participants whose sex at birth differs from the gender they identify with (and the latter was what we have recorded on the participant survey).

Comments are closed.

%d bloggers like this: