Today we launched the Vesuvius Challenge, an open competition to read a set of charred papyrus scrolls that were buried by the eruption of Mount Vesuvius 2000 years ago. The scrolls can't be physically opened, but we have released 3d tomographic x-ray scans of two of them at 8µm resolution. The scans were made at a particle accelerator.

A team at UKY led by Prof Brent Seales has very recently demonstrated the ability to detect ink inside the CT scans using CNNs, and so we believe that it is possible for the first time in history to read what's in these scrolls without opening them. There are hundreds of carbonized scrolls that we could read once the technique works – enough to more than double our total corpus of literature from antiquity.

Many of us are fans of /r/MachineLearning and we thought this group would be interested in hearing about it!

Comments

You must log in or register to comment.

blablanonymous t1_jce5h65 wrote on March 16, 2023 at 4:24 AM

I bet you $249.99k it’s just a bunch of dad jokes

mostancient t1_jcfg2md wrote on March 16, 2023 at 1:32 PM

Or just plain accounting. Like what most recovered ancient written documents tend to be.

janpaul123 t1_jcg2kqk wrote on March 16, 2023 at 4:04 PM

Given that the villa was likely owned by a Roman consul and senator, that could make for some exciting accounting!

IntelArtiGen t1_jcdxtrb wrote on March 16, 2023 at 3:15 AM

The challenge looks very cool but also quite hard. However, if it's truly possible to read that ink and unfold these scrolls, I'm sure ML and data processing will be able to do it.

4.7 TB (for two scrolls) seems a lot, but I also get it's due to the required resolution to detect ink. I guess people can test their algorithms first on the other datasets and find a way to process these 4.7 TB if they need to. Perhaps the task could be more accessible if people could easily access 1/4~1/8 of 1 scroll (0.5/1 TB)

nat_friedman OP t1_jcdzndc wrote on March 16, 2023 at 3:31 AM

You can download arbitrary subsets of the scroll, and we provide scripts to do so on the download page. Each file is about 120MB and represents an 8µm horizontal slice (stacked from bottom to top). So if you download 125 of these files, that's a millimeter slice through the scroll. A centimeter is about 150GB. Still big, but more manageable.

IntelArtiGen t1_jce09i7 wrote on March 16, 2023 at 3:36 AM

Oh nice! Thanks for the clarification. I thought it was just one big archive, but yeah it makes much more sense that way

nat_friedman OP t1_jce2uq5 wrote on March 16, 2023 at 3:59 AM

It's good feedback to know this wasn't clear! I will edit the scrollprize.org/data page to be even more explicit about this.

WaterslideOfSuccess t1_jce9fcg wrote on March 16, 2023 at 5:06 AM

Brent was working on this when I was at UK in 2014 I might waste some time on this since I just lost my job and have disposable time lol

Disastrous_Elk_6375 t1_jcem481 wrote on March 16, 2023 at 7:53 AM

Has there been any attempt to replicate the condition of these scrolls with replicas containing known text? (i.e. take the best papyrus analogue, paint it with the best ink analogue, burn it? in a way that would be a good guess as to what's actually inside)

janpaul123 t1_jcg1w91 wrote on March 16, 2023 at 3:59 PM

Yes, see for example the "carbon phantom scroll" used in this paper: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0215775

Though I don't think attempts at the same resolution (4-8µm) have been made.

noxiousmomentum t1_jce35h7 wrote on March 16, 2023 at 4:02 AM

so i can do this. but is the prize real? who funds this?

WH7EVR t1_jce6k91 wrote on March 16, 2023 at 4:35 AM

nat friedman is a multi-millionaire tech entrepreneur, since he uh -- didn't really introduce himself.

/u/nat_friedman not everyone knows who you are, or that you're loaded bro.

NamerNotLiteral t1_jcf8eo1 wrote on March 16, 2023 at 12:29 PM

Former CEO of Github as well.

dataclinician t1_jcia118 wrote on March 17, 2023 at 12:46 AM

Lmao

nat_friedman OP t1_jce40o0 wrote on March 16, 2023 at 4:10 AM

I am funding it, together with Daniel Gross.

codename_failure t1_jchft1r wrote on March 16, 2023 at 9:16 PM

Thanks for funding this, it looks like a cool project.

banuk_sickness_eater t1_jcwz87s wrote on March 20, 2023 at 5:02 AM

Thank you for doing this, doubling the corpus of literature from antiquity is absolutely a net positive for humanity.

Kronod1le t1_jch97js wrote on March 16, 2023 at 8:32 PM

https://en.m.wikipedia.org/wiki/Nat_Friedman

Username912773 t1_jcdu408 wrote on March 16, 2023 at 2:45 AM

Well, is there an existing dataset to actually train a model off of?

IntelArtiGen t1_jcdw6ih wrote on March 16, 2023 at 3:02 AM

It seems that everything is explained quite clearly on the website. The challenge is a mix of data processing & machine learning, the hardest part is probably in the data processing. (1) flatten (2) detect ink. They gave a dataset for the ink task on Kaggle.

janpaul123 t1_jcdw68q wrote on March 16, 2023 at 3:02 AM

Yes! We've released the CT scans (model input) and binary ink mask (ground truth) for 3 fragments of scrolls.

Balance- t1_jcete3n wrote on March 16, 2023 at 9:39 AM

Thanks for organizing and funding this!

londons_explorer t1_jcer3ci wrote on March 16, 2023 at 9:06 AM

Seems this can be cleanly split into 'unrolling' and 'ink recognition'.

Unrolling at first seems like the easy bit... But it could be made complex if there are fragments of material which have internally become detached and fallen

nat_friedman OP t1_jchujo7 wrote on March 16, 2023 at 10:54 PM

That's what I think too, but obviously people are free to solve this any way they want!

DamienLasseur t1_jcezbpd wrote on March 16, 2023 at 10:56 AM

This is actually really cool! I've been demotivated about how much progress is occurring in the field of ML that I would've liked to contribute to. I'll give it a shot!

Additionally, if anyone would like to collaborate on this challenge, feel free to shoot me a PM and I'll set up a Discord or something.

[deleted] t1_jci47j9 wrote on March 17, 2023 at 12:03 AM

[removed]

supreme_harmony t1_jcieyt3 wrote on March 17, 2023 at 1:23 AM

There was a recent attempt at reading hieroglyphs from temple walls in Egypt using ML, but that failed spectacularly.

Despite having tons of high quality training data available, being announced with much fanfare and ample funding in 2018, it got completely pulled by now and even its website has been erased.

I am struggling to find any results apart from some of the initial marketing material:

https://www.psycle.com/casestudy/hieroglyphics-initiative

https://www.youtube.com/watch?v=TfdWNY7priQ

I have briefly interacted with some people involved and the consensus was that its not realistically doable.

Therefore, although I do not doubt the good intention behind this prize, I am quite sceptical any results will come of it, as a seemingly simpler project with more resources failed to deliver.

nat_friedman OP t1_jcijnjx wrote on March 17, 2023 at 1:59 AM

Well you definitely won't solve it with that attitude!

supreme_harmony t1_jcjjx9p wrote on March 17, 2023 at 8:33 AM

I definitely hope someone proves me wrong and I wish all the people attempting the challenge the best.

davorrunje t1_jcjjzqi wrote on March 17, 2023 at 8:34 AM

Wow!!! This is fantastic!

geminy123 t1_jchso71 wrote on March 16, 2023 at 10:42 PM

You spent more money in the website than the project itself…

nat_friedman OP t1_jci2wcd wrote on March 16, 2023 at 11:53 PM

definitely not.

AmandaBines t1_jci9bne wrote on March 17, 2023 at 12:40 AM

bro just open the scroll bam two fiddy grand plz

(lol jk this is really cool i think i remember being in high school watching a doc about this and at the time they had like hardly any data)