Comments
physics_defector t1_j6od47a wrote
Great work on this answer, and even beyond it's quality I'm always pleased to see people acknowledging the difference between AI and ML. The former gets bandied about by so many folks in medicine and bioscience because they don't recognize that it's some collection of ML, logic processing, or other systems combined with an agent. I understand it's just a knowledge gap, but I think it's analogous to someone from outside biomed fields conflating basic bioscience and translational bioscience.
In practice I've even seen it hamper collaborations with true AI folks in other fields who might be great partners if they didn't have to be responsible for bridging the fields' "language" gap. Just a specific example of that more general issue on the biomed side, though, and something the education side of my work is focused on addressing.
asap_einstein t1_j67xjc8 wrote
There are many answers. Overall, general concepts like neuronal networks are tailored and packaged to specific data situations and purposes. Perhaps one good example is drug response prediction, where such tools are trained on genomic features of model organisms such as cancer cell lines to predict patient-specific cancer drug response (very broadly speaking here).
varontron t1_j687izh wrote
Similarly, for biomarker discovery
Beginning_Cat_4972 t1_j691ep9 wrote
Similarly, for identifying genetic variations linked to diseases and how other genes may be related or interact with those genes.
[deleted] t1_j696vvo wrote
[removed]
[deleted] t1_j6c93bn wrote
[removed]
RebornSage t1_j69hlhe wrote
Well, that's a questions I can answer! I have been working for more than ten years in the field of ML and omics data.
First, definitions (aka nit picking):
One straightforward example is the detection of changes in the DNA (mutations) that cause a certain disease or lead to a phenotype of interest. Possible research questions are: Which mutations allow modern humans to digest milk as adults? Or: What is the ideal combination of gene variations to choose to breed a drought resistant corn variety?
A popluar approach to finde those mutations is called Genome-wide association study (GWAS). Basically, you collect a big table of every mutation you can detect in thousands of individuals. Additionally, you record the disease state (digest milk yes/no) or observed phenotype (corn yield under moderate drought):
One option is to run a statistical test for separately for each mutation and see if there is any significant relation to the phenotype. This has been done in the past with success. However, this approach misses cases where two or more mutations are necessary to produce a phenotype. Say, humans have two copies of a gene and a for an observable disease to occur both need to be damaged by a mutation (e.g. introduction of a stop codon). To catch such cases, we can use (simple) Machine Learning models. Again, there are many options. One is a constrained linear model called Ridge regression. We encode the mutations using numbers and train a model to predict the phenotype based on those numbers. During the training, the model finds patterns of mutations that are best suited to predict the phenotype as accurately as possible. There are many caveats I skip over here. Afterwards, we can inspect the model to extract these patterns and thereby find the responsible mutations (and the genes they affect). However, these may or may not be the causal mutations. They are "just" predictive - or correlating with the phenotype if you like. Nevertheless, these genes could be a starting point to develop a new drug or serve as biomarkers for a diagnostic test.
That's very briefly ML in genomics.
Throwing in an example paper for good measure: Genome-wide association study and genomic selection for yield and related traits in soybean - this is basically the corn example from above.