apfejes

apfejes t1_j7qablb wrote

> In my mind, biology isn’t even a biology problem as much as it is a particle physics problem.

Emergence is a thing, but 3.7 Billion years of emergent property evolution has created levels of complexity that are far FAR beyond the level of the simple software tools that can mimic the surface level complexity you see in "computer life" simulations.

The computer complexity you're talking about with wonky solutions and poorly documented code are, on average, about 40 years old.

The biological equivalence would be to continue building the same way for about 100,000,000x longer.

I don't dispute the analogy, but it's a bit of Dunning-Kruger, again. The level of complexity isn't going to be obvious to you until you start trying to solve the problems. 3.7 Billion years of wonky solutions layered on top of each other is a lot different than 40 years.

2

apfejes t1_j7q4thu wrote

Actually, I don't have a recommendation, unfortunately. There are many different fields in biology, and learning each one can be a few years of work, plus the common foundations - so the question isn't how do you learn but "How much do you need to know to do a specific job?"

Unfortunately, biology is the opposite of programming. Programming is a logical set of tools that build on each other. If you learn arrays, or dictionaries or data structures, you can go out and apply them logically. You can figure out which one will have the best performance in a given situation, and optimization is a logical extension of what you know. You can spend a life time learning, but the basics don't change.

In biology, EVERYTHING is an exception to something else. Learn the entire "biochemical pathway" chart, and then you'll discover than some animals do things differently, or short circuit pieces of it, or just get a specific chemical from their diet and don't need to do a certain part of it. It's all chaos. Biology is the mad hatter's perspective and there's no real guarantee that something is going to work the way you think it should, or the way you were taught. eg. Translation of RNA to protein always begins with a Methionine (AUG codon)... except that sometimes it doesn't. Sometime organisms have found a way to get things started with a missing base, or sometime just that things are wobbly.. or maybe sometimes it's just not at all what you think it's going to be.

That's the rambly way of saying that you'll never know what you need to know until it's too late and you discover something was wrong. For my Masters thesis, I worked on a really slow growing bacteria, and was trying to convince it to do something for months (take up a plasmid so I could knock out a gene). I worked on that system for about a year, and never got it to work. A couple years later, working on a different project, I discovered that the post-doc who set up the system had missed a critical detail: the half life of one of the antibiotics, to which the entire system had been build around, was shorter than the incubation time of the bacteria we were growing. The system could never have worked on that organism, and no amount of work would ever have changed it. I wasted months on that, and never once thought to validate the actual system that had been used by the guy for a year before I started. Who knows what to make of the data he'd recorded.... is it all garbage? I really don't know.

How deep would I have to have studied to know to look at the half life of Kanemycin? I haven't a clue. In biology, it's not what you know that gets you - it's what you don't know.

2

apfejes t1_j7pw9ls wrote

Feel free to join the crowd of people who are trying to do that.

I've spent the last year talking with people in this space, and all of the big pharmaceutical companies are now saying they won't work with AI-based companies because their algorithms don't work on complex biology data. Too many people have made the claim that they could use machine learning to mine patterns out of biology data sets and failed.

It's not a knock on ML or AI. How would your algorithm know that the data it's working on is unreliable and that biology data often has 50% false positive rates on yeast-2-hybrid screens, or a given SNP may be a miscall that has propagated through 10 generations of reference genomes? Or that the assay that generated the data you're looking at used a promiscuous antibody that's triggered on a related protein that happens to express in the lab culture you're working on? If the data you're working on isn't clean, how are you planning on getting a clean signal out?

Rubik's cubes are child's play compared to the networks that Recursion is working on.

4

apfejes t1_j7pr60b wrote

Well, it is not my job to stop you from trying. I just wanted to explain the issue, to save you some pain and surprise.

Please prove me wrong - that’s how it’s done in this field. Let me know when you have solved the problems that have stumped us for the last 60-odd years without a deep understanding of what those problems are.

Edit: can you design a rubiks cube solver without understanding how to solve a Rubik’s cube?

2

apfejes t1_j7pm3bk wrote

It’s not being “stuck up” - it’s just that the field doesn’t really lack for your skill set. There are plenty of people who know what you know who also know the biology side.

I’m not trying to gate keep, or tell you that your skills are useless. I am just trying to tell you the harsh reality that the skill sets that make you so valuable in your own field aren’t sufficient for you to solve difficult problems in my field.

Do you know the Dunning -Kruger curve? It basically translates to people having way more confidence in their own skills than warranted when venturing into areas they know little about, and are usually faced with a shocking “wake up call” when they start learning the complexity of the problems.

I’m not saying you can’t contribute, but I am saying that the low hanging “let’s apply AI to all these problems!” Days are already here for biology. What’s needed isn’t more programmers, it’s programmers who understand the complexity of the data they’re working on.

I’ve watched two decades of programmers volunteer to solve biology’s greatest problems all fall short. That shouldn’t stop you from trying, but keep in mind that your skills are necessary - but not sufficient - to accomplish your goals.

4

apfejes t1_j7oofx7 wrote

Let me take a crack at this. It’s not my AMA, but it’s a question that comes up periodically in bioinformatics - the cross disciplinary field that deals with data science/programming and biology.

Most importantly, the field already exists, and the low hanging fruit was mostly picked 30 years ago, when it was reasonably possible for a programmer to work on a problem that hadn’t been tackled yet, and automate something that the biologists hadn’t gotten around to.

Alas, those days are gone. Bioinformaticians are usually very competent programmers, and rarely can make use of people from computer science without training them in biology first. Biology, after all, is the field in which nature has evolved solutions to problems, and exceptions are more common than the rules they break.

Thus, time may be short in this field, but insight is truly the valuable commodity. Understanding how to interpret the biological data is far far more important than automation. While we do see machine learning helping somewhat, pattern finding and the patterns themselves are useless without someone to interpret them and decide if they’re real. Or worth following up on. Usually they aren’t. Biology data is inherently very noisy.

So, all of that is the long way of saying that longevity or curing cancer isn’t going to be a question of automating our way to a solution. If you want to understand the complexity of the problem, you would need to understand more about the problem itself. There’s no simple solutions here, and time is only part of the missing piece needed to make real progress.

5