IntelArtiGen t1_jcdxtrb wrote on March 16, 2023 at 3:15 AM

The challenge looks very cool but also quite hard. However, if it's truly possible to read that ink and unfold these scrolls, I'm sure ML and data processing will be able to do it.

4.7 TB (for two scrolls) seems a lot, but I also get it's due to the required resolution to detect ink. I guess people can test their algorithms first on the other datasets and find a way to process these 4.7 TB if they need to. Perhaps the task could be more accessible if people could easily access 1/4~1/8 of 1 scroll (0.5/1 TB)

nat_friedman OP t1_jcdzndc wrote on March 16, 2023 at 3:31 AM

You can download arbitrary subsets of the scroll, and we provide scripts to do so on the download page. Each file is about 120MB and represents an 8µm horizontal slice (stacked from bottom to top). So if you download 125 of these files, that's a millimeter slice through the scroll. A centimeter is about 150GB. Still big, but more manageable.

IntelArtiGen t1_jce09i7 wrote on March 16, 2023 at 3:36 AM

Oh nice! Thanks for the clarification. I thought it was just one big archive, but yeah it makes much more sense that way

nat_friedman OP t1_jce2uq5 wrote on March 16, 2023 at 3:59 AM

It's good feedback to know this wasn't clear! I will edit the scrollprize.org/data page to be even more explicit about this.