camaradorjk OP t1_iqvmzun wrote on October 3, 2022 at 1:05 PM

Reply to comment by camaradorjk in What is the best audio classification model to use? by camaradorjk

Not really the music sheets, just the musical notes. I will display the proper finger position on a flute based on the musical notes predicted. I'm trying to create a learning tool.

beingsubmitted t1_iqvs8ix wrote on October 3, 2022 at 1:46 PM

All you need to know over time is the pitch being played, which is a frequency. The audio file represents a waveform, and all you need to know is the frequency of that waveform over time. There's no need for anything sequential. 440Hz is "A" no matter where it comes in a sequence. It's A if it comes after C, and it's A if it comes after F#.

A sequential model might be useful for natural language, for example, because meaning is carried between words. "Very tall" and "Not Tall" are different things. "He was quite tall" and "No one ever accused him of not being tall" are remarkably similar things. Transcribing music is just charting the frequency over time.

That said, you cannot get the frequency from a single data point, so there is a somewhat sequential nature to things, but it's really just that you need to transform the positions of the waveform over time into frequency, which the fourier transform does. When music visualizations show you an EQ (equalizer) chart to go with your music, this is what they're doing - showing you how much of various frequencies are present at a given time in the music, using a FFT. A digital equalizer similarly transforms audio into a frequency spectrum, allows you to adjust the frequency spectrum, and then transforms back into a waveform.