Viewing a single comment thread. View all comments

IHaque_Recursion t1_j7mn89n wrote

So, data sharing in industrial science is complicated. I’ve spent my career in biotech driving for greater openness and data release in the companies where I’ve been. The “natural” state of data is to be siloed. This isn’t just an industrial thing – I’ve read plenty of papers from academic groups with “data available on request” (lol nope, I tried) – and the driver is always the same: a fear that “we spent this money to make the data, how do we get value out of it?”

One of the reasons I joined Recursion in 2019 was that Chris and the team shared that commitment to sharing learnings back to the world. The balance we’ve struck to support open science, but also use this data to drive internal research and develop therapeutics as a public company, is to share a huge dataset that is partially blinded. In RxRx3 we are revealing ~700 genes and 1600 compounds. We’ve sometimes chosen different points on the balance; for example, our COVID datasets RxRx19a and RxRx19b were released completely openly (CC-BY) because we thought the public health crisis was more important than any commercial interest we might have in the data. Our current aim is to continue to unblind parts of the RxRx3 dataset over time, so please stay tuned for additional releases over time.

We have also contributed to open science releasing not just datasets, but tools. Associated with our COVID datasets, we released a data explorer allowing folks to explore the results from our COVID screens. Along with RxRx3, we released a tool (MolRec) where people outside of Recursion can explore some of the same insights that our scientists use to generate novel therapeutic hypotheses and advance new discovery programs, and get a look at how Recursion is turning drug discovery from a trial-and-error process into a search problem.

16