Submitted by nat_friedman t3_11sgn67 in MachineLearning

Today we launched the Vesuvius Challenge, an open competition to read a set of charred papyrus scrolls that were buried by the eruption of Mount Vesuvius 2000 years ago. The scrolls can't be physically opened, but we have released 3d tomographic x-ray scans of two of them at 8µm resolution. The scans were made at a particle accelerator.

A team at UKY led by Prof Brent Seales has very recently demonstrated the ability to detect ink inside the CT scans using CNNs, and so we believe that it is possible for the first time in history to read what's in these scrolls without opening them. There are hundreds of carbonized scrolls that we could read once the technique works – enough to more than double our total corpus of literature from antiquity.

Many of us are fans of /r/MachineLearning and we thought this group would be interested in hearing about it!

277

Comments

You must log in or register to comment.

Username912773 t1_jcdu408 wrote

Well, is there an existing dataset to actually train a model off of?

7

IntelArtiGen t1_jcdw6ih wrote

It seems that everything is explained quite clearly on the website. The challenge is a mix of data processing & machine learning, the hardest part is probably in the data processing. (1) flatten (2) detect ink. They gave a dataset for the ink task on Kaggle.

16

IntelArtiGen t1_jcdxtrb wrote

The challenge looks very cool but also quite hard. However, if it's truly possible to read that ink and unfold these scrolls, I'm sure ML and data processing will be able to do it.

4.7 TB (for two scrolls) seems a lot, but I also get it's due to the required resolution to detect ink. I guess people can test their algorithms first on the other datasets and find a way to process these 4.7 TB if they need to. Perhaps the task could be more accessible if people could easily access 1/4~1/8 of 1 scroll (0.5/1 TB)

35

nat_friedman OP t1_jcdzndc wrote

You can download arbitrary subsets of the scroll, and we provide scripts to do so on the download page. Each file is about 120MB and represents an 8µm horizontal slice (stacked from bottom to top). So if you download 125 of these files, that's a millimeter slice through the scroll. A centimeter is about 150GB. Still big, but more manageable.

26

WaterslideOfSuccess t1_jce9fcg wrote

Brent was working on this when I was at UK in 2014 I might waste some time on this since I just lost my job and have disposable time lol

29

Disastrous_Elk_6375 t1_jcem481 wrote

Has there been any attempt to replicate the condition of these scrolls with replicas containing known text? (i.e. take the best papyrus analogue, paint it with the best ink analogue, burn it? in a way that would be a good guess as to what's actually inside)

17

londons_explorer t1_jcer3ci wrote

Seems this can be cleanly split into 'unrolling' and 'ink recognition'.

Unrolling at first seems like the easy bit... But it could be made complex if there are fragments of material which have internally become detached and fallen

5

Balance- t1_jcete3n wrote

Thanks for organizing and funding this!

6

DamienLasseur t1_jcezbpd wrote

This is actually really cool! I've been demotivated about how much progress is occurring in the field of ML that I would've liked to contribute to. I'll give it a shot!

Additionally, if anyone would like to collaborate on this challenge, feel free to shoot me a PM and I'll set up a Discord or something.

5

geminy123 t1_jchso71 wrote

You spent more money in the website than the project itself…

1

AmandaBines t1_jci9bne wrote

bro just open the scroll bam two fiddy grand plz

(lol jk this is really cool i think i remember being in high school watching a doc about this and at the time they had like hardly any data)

1

supreme_harmony t1_jcieyt3 wrote

There was a recent attempt at reading hieroglyphs from temple walls in Egypt using ML, but that failed spectacularly.

Despite having tons of high quality training data available, being announced with much fanfare and ample funding in 2018, it got completely pulled by now and even its website has been erased.

I am struggling to find any results apart from some of the initial marketing material:

https://www.psycle.com/casestudy/hieroglyphics-initiative

https://www.youtube.com/watch?v=TfdWNY7priQ

I have briefly interacted with some people involved and the consensus was that its not realistically doable.

Therefore, although I do not doubt the good intention behind this prize, I am quite sceptical any results will come of it, as a seemingly simpler project with more resources failed to deliver.

3