nagumi

nagumi t1_jefe184 wrote

This will change soon. Datasets are starting to include video (youtube, etc) and audio (radio, podcasts...)

One thing to think about is that the majority of content online is either content that people wanted to put online (a blog post, a youtube video of their kid, etc) or content that was recorded/published to note something notable (a youtube video of a person freaking out, security camera footage of a crime...)

What about mundanity? The 99.99% of human life that is unrecorded because it is boring and unremarkable. Surely there's data there, and not using it seems like it would poison the well.

2