Trait survey data download and analysis
A couple months ago I posted about our new trait surveys. Thank you to all the participants who completed these so far! I’m following up now with links to the data, a bit of Python code for interpreting them, and a little analysis.
To help anyone interested in parsing the data, I’ve shared the Python code I’ve used on Github. There’s also a copy of the survey data as of Feb 23 there along with some demo code, and a Readme.
Finally, I did a bit of parsing of the trait survey data, combined with some features of the participant survey data (age and sex) to see if I could find anything interesting for you. The top 20 pairwise correlations below don’t look terribly surprising, but I learned some new things. For example, I didn’t know that TMJ disorder is much more common in women. (Of course, a quick web search discovers confirms this is a well known association.) This type of analysis isn’t my forte — maybe someone with more experience with machine learning can do cool stuff with this data!
(To read the table below: the first row indicates “60% of females reported a UTI, while only 12% of others reported one”.)
|Trait 1||Trait 2||p-value1||% 1 with 2||% others with 2|
|Female||Urinary tract infection (UTI)||3.1e-23||60.3%||12.0%|
|High cholesterol (hypercholesterolemia)||High triglycerides
|60+ years||Age-related cataract||9.1e-13||36.8%||2.8%|
|Female||Iron deficiency anemia||2.4e-11||28.5%||4.0%|
|High cholesterol (hypercholesterolemia)||Hypertension||1.6e-09||42.9%||11.6%|
|Iron deficiency anemia||Urinary tract infection (UTI)||1.9e-09||69.2%||25.3%|
|60+ years||Age-related hearing loss||6.5e-09||31.6%||4.1%|
|Urinary tract infection (UTI)||Ovarian cysts||1.7e-08||22.0%||3.1%|
|Polycystic ovary syndrome (PCOS)||Ovarian cysts||2.1e-08||66.7%||6.6%|
|Temporomandibular joint (TMJ) disorder||Fibrocystic breast disease||5.3e-08||28.6%||2.4%|
|Nasal polyps||Chronic sinusitis||7.0e-08||61.1%||7.8%|
|Female||Temporomandibular joint (TMJ) disorder||1.4e-07||21.9%||4.0%|
|Female||Fibrocystic breast disease||1.9e-07||12.6%||0.4%2|
|Male||Hair loss (includes female
and male pattern baldness)
|Carpal tunnel syndrome||Temporomandibular joint (TMJ) disorder||4.4e-07||52.2%||8.5%|
|Urinary tract infection (UTI)||Fibrocystic breast disease||5.4e-07||14.4%||1.2%|
1As calculated using a Fisher’s Exact test. Note that these are not corrected for multiple hypothesis testing. I think a pessimistic Bonferroni correction would demand around 1e-6 for the magic ‘p = 0.05’ cutoff.
2I didn’t look closely, but I suspect these non-zero numbers are because we have some transgender participants whose sex at birth differs from the gender they identify with (and the latter was what we have recorded on the participant survey).