Viewing a single comment thread. View all comments

promonk t1_ish7g8b wrote

6

Kandiru t1_ishbuwl wrote

It's no one person's. It's a mishmash of several different high quality genomes, and then over time it's been changed to have the more common variants as the reference rather than the reference being a rare mutation for some genes.

25

promonk t1_ishdkr1 wrote

When you say "more common variants," common in what way?

I'm fascinated by the idea of a "reference human."

5

Kandiru t1_ishg2ww wrote

Say a certain position is a A for 90% of people, but a C for 10%. The A variant is more common than the C.

So when the reference had previously had a C there, in a later version it's often been changed to the most frequent base.

18

promonk t1_ishu3eb wrote

I get that. What I'm curious about is sampling. 90% of which population? Is it 90 of some college-age kids being paid a hundred bucks for a cheek swab? Or is it drawn from a broad swathe of demographics and locations?

2

tsunamisurfer t1_ishsh54 wrote

Originally though, the reference genome was that of the first sequenced human genome, which I believe belonged to J Craig Venter.

2

Kandiru t1_isim3ny wrote

Actually there were two competing approaches at the beginning. Venter did sequence himself with shotgun sequencing, while the high fidelity BAC sequencing with Sanger sequencing was done on a range of different individuals spanning the genome.

So the first version of the reference was a mixture of them all.

4

Angdrambor t1_isjtnw2 wrote

What makes a genome "High quality"?

1

danby t1_islg7bo wrote

Though I only spent a handful of years in genome sequencing I suspect what is probably meant here is that the sequence was based on several genomes where they were able to prepare high quality genomic libraries for those genomes.

1

Angdrambor t1_ismpsxm wrote

What makes a genomic library high or low quality? Few errors? Faithful representation of the original?

1

Splatulance t1_isia4bj wrote

Typically the question of variance comes down to an aggregate statistic. The most common is "the maximum likelihood estimate", which for a normal enough distribution (bell curve) is the mean.

It's called maximum likelihood because most of x is most likely to be close to the mean.

The more samples you have, the more genomes in this case, the better you can estimate the actual average. With enough samples the actual population mean is overwhelmingly likely to be the same as your estimate.

If the vast majority of people have 99% identical whatever, that's a very tightly grouped distribution around the mean with very low variance. It's practically a vertical line instead of a curve.

1