Submitted by nat_friedman t3_11sgn67 in MachineLearning
IntelArtiGen t1_jcdxtrb wrote
The challenge looks very cool but also quite hard. However, if it's truly possible to read that ink and unfold these scrolls, I'm sure ML and data processing will be able to do it.
4.7 TB (for two scrolls) seems a lot, but I also get it's due to the required resolution to detect ink. I guess people can test their algorithms first on the other datasets and find a way to process these 4.7 TB if they need to. Perhaps the task could be more accessible if people could easily access 1/4~1/8 of 1 scroll (0.5/1 TB)
nat_friedman OP t1_jcdzndc wrote
You can download arbitrary subsets of the scroll, and we provide scripts to do so on the download page. Each file is about 120MB and represents an 8µm horizontal slice (stacked from bottom to top). So if you download 125 of these files, that's a millimeter slice through the scroll. A centimeter is about 150GB. Still big, but more manageable.
IntelArtiGen t1_jce09i7 wrote
Oh nice! Thanks for the clarification. I thought it was just one big archive, but yeah it makes much more sense that way
nat_friedman OP t1_jce2uq5 wrote
It's good feedback to know this wasn't clear! I will edit the scrollprize.org/data page to be even more explicit about this.
Viewing a single comment thread. View all comments