RebornSage t1_j69hlhe wrote on January 28, 2023 at 6:40 PM

Well, that's a questions I can answer! I have been working for more than ten years in the field of ML and omics data.

First, definitions (aka nit picking):

Machine Learning is a part of Artificial Intelligence.
Genomics means to study the DNA and all genes (the genome) of an organism. There are other "omics" like Transcriptomics and Metabolomics.

One straightforward example is the detection of changes in the DNA (mutations) that cause a certain disease or lead to a phenotype of interest. Possible research questions are: Which mutations allow modern humans to digest milk as adults? Or: What is the ideal combination of gene variations to choose to breed a drought resistant corn variety?

A popluar approach to finde those mutations is called Genome-wide association study (GWAS). Basically, you collect a big table of every mutation you can detect in thousands of individuals. Additionally, you record the disease state (digest milk yes/no) or observed phenotype (corn yield under moderate drought):

Individual	Mutation 1	Mutation 2	...	Phenotype
A123	A	A	...	23
B456	G	A	...	42
...	...	...	...	...

One option is to run a statistical test for separately for each mutation and see if there is any significant relation to the phenotype. This has been done in the past with success. However, this approach misses cases where two or more mutations are necessary to produce a phenotype. Say, humans have two copies of a gene and a for an observable disease to occur both need to be damaged by a mutation (e.g. introduction of a stop codon). To catch such cases, we can use (simple) Machine Learning models. Again, there are many options. One is a constrained linear model called Ridge regression. We encode the mutations using numbers and train a model to predict the phenotype based on those numbers. During the training, the model finds patterns of mutations that are best suited to predict the phenotype as accurately as possible. There are many caveats I skip over here. Afterwards, we can inspect the model to extract these patterns and thereby find the responsible mutations (and the genes they affect). However, these may or may not be the causal mutations. They are "just" predictive - or correlating with the phenotype if you like. Nevertheless, these genes could be a starting point to develop a new drug or serve as biomarkers for a diagnostic test.

That's very briefly ML in genomics.

Throwing in an example paper for good measure: Genome-wide association study and genomic selection for yield and related traits in soybean - this is basically the corn example from above.

physics_defector t1_j6od47a wrote on January 31, 2023 at 7:26 PM

Great work on this answer, and even beyond it's quality I'm always pleased to see people acknowledging the difference between AI and ML. The former gets bandied about by so many folks in medicine and bioscience because they don't recognize that it's some collection of ML, logic processing, or other systems combined with an agent. I understand it's just a knowledge gap, but I think it's analogous to someone from outside biomed fields conflating basic bioscience and translational bioscience.

In practice I've even seen it hamper collaborations with true AI folks in other fields who might be great partners if they didn't have to be responsible for bridging the fields' "language" gap. Just a specific example of that more general issue on the biomed side, though, and something the education side of my work is focused on addressing.

asap_einstein t1_j67xjc8 wrote on January 28, 2023 at 10:44 AM

There are many answers. Overall, general concepts like neuronal networks are tailored and packaged to specific data situations and purposes. Perhaps one good example is drug response prediction, where such tools are trained on genomic features of model organisms such as cancer cell lines to predict patient-specific cancer drug response (very broadly speaking here).

varontron t1_j687izh wrote on January 28, 2023 at 12:51 PM

Similarly, for biomarker discovery

Beginning_Cat_4972 t1_j691ep9 wrote on January 28, 2023 at 4:51 PM

Similarly, for identifying genetic variations linked to diseases and how other genes may be related or interact with those genes.

[deleted] t1_j696vvo wrote on January 28, 2023 at 5:28 PM

[removed]

[deleted] t1_j6c93bn wrote on January 29, 2023 at 8:17 AM

[removed]

[deleted] t1_j6oblf0 wrote on January 31, 2023 at 7:16 PM

[removed]

[deleted] t1_j6pc2qc wrote on January 31, 2023 at 11:05 PM

[removed]

How are scientists using AI and machine learning to analyze large datasets in the field of genomics?

Comments

RebornSage t1_j69hlhe wrote on January 28, 2023 at 6:40 PM

physics_defector t1_j6od47a wrote on January 31, 2023 at 7:26 PM

asap_einstein t1_j67xjc8 wrote on January 28, 2023 at 10:44 AM

varontron t1_j687izh wrote on January 28, 2023 at 12:51 PM

Beginning_Cat_4972 t1_j691ep9 wrote on January 28, 2023 at 4:51 PM

[deleted] t1_j696vvo wrote on January 28, 2023 at 5:28 PM

[deleted] t1_j6c93bn wrote on January 29, 2023 at 8:17 AM

[deleted] t1_j6oblf0 wrote on January 31, 2023 at 7:16 PM

[deleted] t1_j6pc2qc wrote on January 31, 2023 at 11:05 PM