promonk t1_ish7g8b wrote
Reply to comment by adc34 in When it's said 99.9% of human DNA is the same in all humans, is this referring to only coding DNA or both coding and non-coding DNA combined? by PeanutSalsa
Now I'm curious: whose genome is the human reference genome?
Kandiru t1_ishbuwl wrote
It's no one person's. It's a mishmash of several different high quality genomes, and then over time it's been changed to have the more common variants as the reference rather than the reference being a rare mutation for some genes.
promonk t1_ishdkr1 wrote
When you say "more common variants," common in what way?
I'm fascinated by the idea of a "reference human."
Kandiru t1_ishg2ww wrote
Say a certain position is a A for 90% of people, but a C for 10%. The A variant is more common than the C.
So when the reference had previously had a C there, in a later version it's often been changed to the most frequent base.
promonk t1_ishu3eb wrote
I get that. What I'm curious about is sampling. 90% of which population? Is it 90 of some college-age kids being paid a hundred bucks for a cheek swab? Or is it drawn from a broad swathe of demographics and locations?
emfts t1_isiaa4s wrote
The first human reference genome (from the human genome project) was a group of people from all over, random volunteers.
You can read all about it here:
https://www.genome.gov/12513430/2004-release-ihgsc-describes-finished-human-sequence
Kandiru t1_isimfb1 wrote
The 1000 genome project used populations around the world
http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/README_populations.md
Has a list of the ones used.
[deleted] t1_ishhjrb wrote
[removed]
[deleted] t1_isiu3kr wrote
[removed]
tsunamisurfer t1_ishsh54 wrote
Originally though, the reference genome was that of the first sequenced human genome, which I believe belonged to J Craig Venter.
Kandiru t1_isim3ny wrote
Actually there were two competing approaches at the beginning. Venter did sequence himself with shotgun sequencing, while the high fidelity BAC sequencing with Sanger sequencing was done on a range of different individuals spanning the genome.
So the first version of the reference was a mixture of them all.
Angdrambor t1_isjtnw2 wrote
What makes a genome "High quality"?
[deleted] t1_isjukzq wrote
[removed]
danby t1_islg7bo wrote
Though I only spent a handful of years in genome sequencing I suspect what is probably meant here is that the sequence was based on several genomes where they were able to prepare high quality genomic libraries for those genomes.
Angdrambor t1_ismpsxm wrote
What makes a genomic library high or low quality? Few errors? Faithful representation of the original?
Splatulance t1_isia4bj wrote
Typically the question of variance comes down to an aggregate statistic. The most common is "the maximum likelihood estimate", which for a normal enough distribution (bell curve) is the mean.
It's called maximum likelihood because most of x is most likely to be close to the mean.
The more samples you have, the more genomes in this case, the better you can estimate the actual average. With enough samples the actual population mean is overwhelmingly likely to be the same as your estimate.
If the vast majority of people have 99% identical whatever, that's a very tightly grouped distribution around the mean with very low variance. It's practically a vertical line instead of a curve.
[deleted] t1_isime79 wrote
[removed]
Viewing a single comment thread. View all comments