Hello everyone, I am an undergraduate student and I'm currently working on musical note detection for flute (specifically recorder flute). The model will accept a full instrumental music (1-3 mins) as an input for prediction. And will predict all the musical notes by sequence. What could be the best model or algorithm to use for this project? If you want me to clarify something please do. Thanks!

Comments

beingsubmitted t1_iqvdo7x wrote on October 3, 2022 at 11:38 AM

I'm a little unclear - there are three different things you might be trying to do here. The first would be transcription - taking an audio file and interpreting it into notes. That wouldn't typically require deep learning on it's own, just a fourier transform. The second would be isolating a specific instrument in an ensemble - finding just the recorder in a collection of different instruments all playing different things. The third would be generation, inferring unplayed future notes based on previous notes.

Are you wanting to transcribe, isolate, generate, or some combination?

I'm thinking you're wanting to transcribe. If that's the case, FFT (fast fourier transform) would be the algo to choose. If you google "FFT music transcription" you'll get a lot of info. https://ryan-mah.com/files/posts.amt_part_2.main.pdf

camaradorjk OP t1_iqvje9f wrote on October 3, 2022 at 12:34 PM

Thank you so much for taking the time to answer my question. You're right on the first one, my goal is to transcribe music or flute music into notes. But I'm a little confused about why there's no need to use a deep learning model because I thought in the first place that I could also use sequential models. Could you elaborate on that for me? Thank you so much.

PS:I will surely look into your recommendation about FFT.

camaradorjk OP t1_iqvmzun wrote on October 3, 2022 at 1:05 PM

Not really the music sheets, just the musical notes. I will display the proper finger position on a flute based on the musical notes predicted. I'm trying to create a learning tool.

beingsubmitted t1_iqvs8ix wrote on October 3, 2022 at 1:46 PM

All you need to know over time is the pitch being played, which is a frequency. The audio file represents a waveform, and all you need to know is the frequency of that waveform over time. There's no need for anything sequential. 440Hz is "A" no matter where it comes in a sequence. It's A if it comes after C, and it's A if it comes after F#.

A sequential model might be useful for natural language, for example, because meaning is carried between words. "Very tall" and "Not Tall" are different things. "He was quite tall" and "No one ever accused him of not being tall" are remarkably similar things. Transcribing music is just charting the frequency over time.

That said, you cannot get the frequency from a single data point, so there is a somewhat sequential nature to things, but it's really just that you need to transform the positions of the waveform over time into frequency, which the fourier transform does. When music visualizations show you an EQ (equalizer) chart to go with your music, this is what they're doing - showing you how much of various frequencies are present at a given time in the music, using a FFT. A digital equalizer similarly transforms audio into a frequency spectrum, allows you to adjust the frequency spectrum, and then transforms back into a waveform.

[deleted] t1_iqv8voj wrote on October 3, 2022 at 10:43 AM

[removed]

mr_birrd t1_iqxl8ol wrote on October 3, 2022 at 8:51 PM

If it's only pure recorder and since it can only play one note per time, you could find the frequency using the fourier transform. Best if you use denoising before. Then you can just map the loudest frequencies to actual notes.

BobDope t1_iqwqnaw wrote on October 3, 2022 at 5:37 PM

JOMAMA Classification Model 192

Iamhandsomesorry t1_iqv8tt6 wrote on October 3, 2022 at 10:42 AM

Ligma Classification Network