Today we launched the Vesuvius Challenge, an open competition to read a set of charred papyrus scrolls that were buried by the eruption of Mount Vesuvius 2000 years ago. The scrolls can't be physically opened, but we have released 3d tomographic x-ray scans of two of them at 8µm resolution. The scans were made at a particle accelerator.

A team at UKY led by Prof Brent Seales has very recently demonstrated the ability to detect ink inside the CT scans using CNNs, and so we believe that it is possible for the first time in history to read what's in these scrolls without opening them. There are hundreds of carbonized scrolls that we could read once the technique works – enough to more than double our total corpus of literature from antiquity.

Many of us are fans of /r/MachineLearning and we thought this group would be interested in hearing about it!

Comments

Username912773 t1_jcdu408 wrote on March 16, 2023 at 2:45 AM

#2,241,327

Well, is there an existing dataset to actually train a model off of?

janpaul123 t1_jcdw68q wrote on March 16, 2023 at 3:02 AM

#2,241,441

Replying to Username912773 (#2,241,327)

Yes! We've released the CT scans (model input) and binary ink mask (ground truth) for 3 fragments of scrolls.

IntelArtiGen t1_jcdw6ih wrote on March 16, 2023 at 3:02 AM

#2,241,443

Replying to Username912773 (#2,241,327)

It seems that everything is explained quite clearly on the website. The challenge is a mix of data processing & machine learning, the hardest part is probably in the data processing. (1) flatten (2) detect ink. They gave a dataset for the ink task on Kaggle.

IntelArtiGen t1_jcdxtrb wrote on March 16, 2023 at 3:15 AM

#2,241,517

The challenge looks very cool but also quite hard. However, if it's truly possible to read that ink and unfold these scrolls, I'm sure ML and data processing will be able to do it.

4.7 TB (for two scrolls) seems a lot, but I also get it's due to the required resolution to detect ink. I guess people can test their algorithms first on the other datasets and find a way to process these 4.7 TB if they need to. Perhaps the task could be more accessible if people could easily access 1/4~1/8 of 1 scroll (0.5/1 TB)

nat_friedman OP t1_jcdzndc wrote on March 16, 2023 at 3:31 AM

#2,241,597

Replying to IntelArtiGen (#2,241,517)

You can download arbitrary subsets of the scroll, and we provide scripts to do so on the download page. Each file is about 120MB and represents an 8µm horizontal slice (stacked from bottom to top). So if you download 125 of these files, that's a millimeter slice through the scroll. A centimeter is about 150GB. Still big, but more manageable.

IntelArtiGen t1_jce09i7 wrote on March 16, 2023 at 3:36 AM

#2,241,624

Replying to nat_friedman (#2,241,597)

Oh nice! Thanks for the clarification. I thought it was just one big archive, but yeah it makes much more sense that way

nat_friedman OP t1_jce2uq5 wrote on March 16, 2023 at 3:59 AM

#2,241,729

Replying to IntelArtiGen (#2,241,624)

It's good feedback to know this wasn't clear! I will edit the scrollprize.org/data page to be even more explicit about this.

noxiousmomentum t1_jce35h7 wrote on March 16, 2023 at 4:02 AM

#2,241,739

so i can do this. but is the prize real? who funds this?

nat_friedman OP t1_jce40o0 wrote on March 16, 2023 at 4:10 AM

#2,241,765

Replying to noxiousmomentum (#2,241,739)

I am funding it, together with Daniel Gross.

blablanonymous t1_jce5h65 wrote on March 16, 2023 at 4:24 AM

#2,241,813

I bet you $249.99k it’s just a bunch of dad jokes

WH7EVR t1_jce6k91 wrote on March 16, 2023 at 4:35 AM

#2,241,862

Replying to noxiousmomentum (#2,241,739)

nat friedman is a multi-millionaire tech entrepreneur, since he uh -- didn't really introduce himself.

/u/nat_friedman not everyone knows who you are, or that you're loaded bro.

WaterslideOfSuccess t1_jce9fcg wrote on March 16, 2023 at 5:06 AM

#2,241,973

Brent was working on this when I was at UK in 2014 I might waste some time on this since I just lost my job and have disposable time lol

Disastrous_Elk_6375 t1_jcem481 wrote on March 16, 2023 at 7:53 AM

#2,242,432

Has there been any attempt to replicate the condition of these scrolls with replicas containing known text? (i.e. take the best papyrus analogue, paint it with the best ink analogue, burn it? in a way that would be a good guess as to what's actually inside)

londons_explorer t1_jcer3ci wrote on March 16, 2023 at 9:06 AM

#2,242,606

Seems this can be cleanly split into 'unrolling' and 'ink recognition'.

Unrolling at first seems like the easy bit... But it could be made complex if there are fragments of material which have internally become detached and fallen

Balance- t1_jcete3n wrote on March 16, 2023 at 9:39 AM

#2,242,678

Thanks for organizing and funding this!

DamienLasseur t1_jcezbpd wrote on March 16, 2023 at 10:56 AM

#2,242,920

This is actually really cool! I've been demotivated about how much progress is occurring in the field of ML that I would've liked to contribute to. I'll give it a shot!

Additionally, if anyone would like to collaborate on this challenge, feel free to shoot me a PM and I'll set up a Discord or something.

NamerNotLiteral t1_jcf8eo1 wrote on March 16, 2023 at 12:29 PM

#2,243,368

Replying to WH7EVR (#2,241,862)

Former CEO of Github as well.

mostancient t1_jcfg2md wrote on March 16, 2023 at 1:32 PM

#2,243,816

Replying to blablanonymous (#2,241,813)

Or just plain accounting. Like what most recovered ancient written documents tend to be.

janpaul123 t1_jcg1w91 wrote on March 16, 2023 at 3:59 PM

#2,245,260

Replying to Disastrous_Elk_6375 (#2,242,432)

Yes, see for example the "carbon phantom scroll" used in this paper: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0215775

Though I don't think attempts at the same resolution (4-8µm) have been made.

janpaul123 t1_jcg2kqk wrote on March 16, 2023 at 4:04 PM

#2,245,289

Replying to mostancient (#2,243,816)

Given that the villa was likely owned by a Roman consul and senator, that could make for some exciting accounting!

codename_failure t1_jchft1r wrote on March 16, 2023 at 9:16 PM

#2,247,830

Replying to nat_friedman (#2,241,765)

Thanks for funding this, it looks like a cool project.

geminy123 t1_jchso71 wrote on March 16, 2023 at 10:42 PM

#2,248,487

You spent more money in the website than the project itself…

nat_friedman OP t1_jchujo7 wrote on March 16, 2023 at 10:54 PM

#2,248,601

Replying to londons_explorer (#2,242,606)

That's what I think too, but obviously people are free to solve this any way they want!

nat_friedman OP t1_jci2wcd wrote on March 16, 2023 at 11:53 PM

#2,249,085

Replying to geminy123 (#2,248,487)

definitely not.

[deleted] t1_jci47j9 wrote on March 17, 2023 at 12:03 AM

#2,249,154

Replying to DamienLasseur (#2,242,920)

[removed]

AmandaBines t1_jci9bne wrote on March 17, 2023 at 12:40 AM

#2,249,433

bro just open the scroll bam two fiddy grand plz

(lol jk this is really cool i think i remember being in high school watching a doc about this and at the time they had like hardly any data)

supreme_harmony t1_jcieyt3 wrote on March 17, 2023 at 1:23 AM

#2,249,737

There was a recent attempt at reading hieroglyphs from temple walls in Egypt using ML, but that failed spectacularly.

Despite having tons of high quality training data available, being announced with much fanfare and ample funding in 2018, it got completely pulled by now and even its website has been erased.

I am struggling to find any results apart from some of the initial marketing material:

https://www.psycle.com/casestudy/hieroglyphics-initiative

https://www.youtube.com/watch?v=TfdWNY7priQ

I have briefly interacted with some people involved and the consensus was that its not realistically doable.

Therefore, although I do not doubt the good intention behind this prize, I am quite sceptical any results will come of it, as a seemingly simpler project with more resources failed to deliver.

nat_friedman OP t1_jcijnjx wrote on March 17, 2023 at 1:59 AM

#2,250,003

Replying to supreme_harmony (#2,249,737)

Well you definitely won't solve it with that attitude!

supreme_harmony t1_jcjjx9p wrote on March 17, 2023 at 8:33 AM

#2,251,617

Replying to nat_friedman (#2,250,003)

I definitely hope someone proves me wrong and I wish all the people attempting the challenge the best.

davorrunje t1_jcjjzqi wrote on March 17, 2023 at 8:34 AM

#2,251,620

Wow!!! This is fantastic!

banuk_sickness_eater t1_jcwz87s wrote on March 20, 2023 at 5:02 AM

#2,277,045