Submitted by nat_friedman t3_11sgn67 in MachineLearning

Today we launched the Vesuvius Challenge, an open competition to read a set of charred papyrus scrolls that were buried by the eruption of Mount Vesuvius 2000 years ago. The scrolls can't be physically opened, but we have released 3d tomographic x-ray scans of two of them at 8µm resolution. The scans were made at a particle accelerator.

A team at UKY led by Prof Brent Seales has very recently demonstrated the ability to detect ink inside the CT scans using CNNs, and so we believe that it is possible for the first time in history to read what's in these scrolls without opening them. There are hundreds of carbonized scrolls that we could read once the technique works – enough to more than double our total corpus of literature from antiquity.

Many of us are fans of /r/MachineLearning and we thought this group would be interested in hearing about it!

277

Comments

You must log in or register to comment.

IntelArtiGen t1_jcdxtrb wrote

The challenge looks very cool but also quite hard. However, if it's truly possible to read that ink and unfold these scrolls, I'm sure ML and data processing will be able to do it.

4.7 TB (for two scrolls) seems a lot, but I also get it's due to the required resolution to detect ink. I guess people can test their algorithms first on the other datasets and find a way to process these 4.7 TB if they need to. Perhaps the task could be more accessible if people could easily access 1/4~1/8 of 1 scroll (0.5/1 TB)

35

nat_friedman OP t1_jcdzndc wrote

You can download arbitrary subsets of the scroll, and we provide scripts to do so on the download page. Each file is about 120MB and represents an 8µm horizontal slice (stacked from bottom to top). So if you download 125 of these files, that's a millimeter slice through the scroll. A centimeter is about 150GB. Still big, but more manageable.

26

IntelArtiGen t1_jce09i7 wrote

Oh nice! Thanks for the clarification. I thought it was just one big archive, but yeah it makes much more sense that way

8

WaterslideOfSuccess t1_jce9fcg wrote

Brent was working on this when I was at UK in 2014 I might waste some time on this since I just lost my job and have disposable time lol

29

Disastrous_Elk_6375 t1_jcem481 wrote

Has there been any attempt to replicate the condition of these scrolls with replicas containing known text? (i.e. take the best papyrus analogue, paint it with the best ink analogue, burn it? in a way that would be a good guess as to what's actually inside)

17

noxiousmomentum t1_jce35h7 wrote

so i can do this. but is the prize real? who funds this?

11

nat_friedman OP t1_jce40o0 wrote

I am funding it, together with Daniel Gross.

31

banuk_sickness_eater t1_jcwz87s wrote

Thank you for doing this, doubling the corpus of literature from antiquity is absolutely a net positive for humanity.

1

Username912773 t1_jcdu408 wrote

Well, is there an existing dataset to actually train a model off of?

7

IntelArtiGen t1_jcdw6ih wrote

It seems that everything is explained quite clearly on the website. The challenge is a mix of data processing & machine learning, the hardest part is probably in the data processing. (1) flatten (2) detect ink. They gave a dataset for the ink task on Kaggle.

16

janpaul123 t1_jcdw68q wrote

Yes! We've released the CT scans (model input) and binary ink mask (ground truth) for 3 fragments of scrolls.

10

Balance- t1_jcete3n wrote

Thanks for organizing and funding this!

6

londons_explorer t1_jcer3ci wrote

Seems this can be cleanly split into 'unrolling' and 'ink recognition'.

Unrolling at first seems like the easy bit... But it could be made complex if there are fragments of material which have internally become detached and fallen

5

nat_friedman OP t1_jchujo7 wrote

That's what I think too, but obviously people are free to solve this any way they want!

4

DamienLasseur t1_jcezbpd wrote

This is actually really cool! I've been demotivated about how much progress is occurring in the field of ML that I would've liked to contribute to. I'll give it a shot!

Additionally, if anyone would like to collaborate on this challenge, feel free to shoot me a PM and I'll set up a Discord or something.

5

supreme_harmony t1_jcieyt3 wrote

There was a recent attempt at reading hieroglyphs from temple walls in Egypt using ML, but that failed spectacularly.

Despite having tons of high quality training data available, being announced with much fanfare and ample funding in 2018, it got completely pulled by now and even its website has been erased.

I am struggling to find any results apart from some of the initial marketing material:

https://www.psycle.com/casestudy/hieroglyphics-initiative

https://www.youtube.com/watch?v=TfdWNY7priQ

I have briefly interacted with some people involved and the consensus was that its not realistically doable.

Therefore, although I do not doubt the good intention behind this prize, I am quite sceptical any results will come of it, as a seemingly simpler project with more resources failed to deliver.

3

nat_friedman OP t1_jcijnjx wrote

Well you definitely won't solve it with that attitude!

4

supreme_harmony t1_jcjjx9p wrote

I definitely hope someone proves me wrong and I wish all the people attempting the challenge the best.

3

davorrunje t1_jcjjzqi wrote

Wow!!! This is fantastic!

2

geminy123 t1_jchso71 wrote

You spent more money in the website than the project itself…

1

AmandaBines t1_jci9bne wrote

bro just open the scroll bam two fiddy grand plz

(lol jk this is really cool i think i remember being in high school watching a doc about this and at the time they had like hardly any data)

1