I have to accelerate the labelling process of a collection of sounds. To do so, I would like to build a representation from audio data and compute distances/find clusters. Do you know the most frequent representation used and/or the possible embedding techniques ?
Viewing a single comment thread. View all comments