Submitted by camaradorjk t3_xu9jvt in deeplearning

Hello everyone, I am an undergraduate student and I'm currently working on musical note detection for flute (specifically recorder flute). The model will accept a full instrumental music (1-3 mins) as an input for prediction. And will predict all the musical notes by sequence. What could be the best model or algorithm to use for this project? If you want me to clarify something please do. Thanks!

3

Comments

You must log in or register to comment.

beingsubmitted t1_iqvdo7x wrote

I'm a little unclear - there are three different things you might be trying to do here. The first would be transcription - taking an audio file and interpreting it into notes. That wouldn't typically require deep learning on it's own, just a fourier transform. The second would be isolating a specific instrument in an ensemble - finding just the recorder in a collection of different instruments all playing different things. The third would be generation, inferring unplayed future notes based on previous notes.

Are you wanting to transcribe, isolate, generate, or some combination?

I'm thinking you're wanting to transcribe. If that's the case, FFT (fast fourier transform) would be the algo to choose. If you google "FFT music transcription" you'll get a lot of info. https://ryan-mah.com/files/posts.amt_part_2.main.pdf

3

camaradorjk OP t1_iqvje9f wrote

Thank you so much for taking the time to answer my question. You're right on the first one, my goal is to transcribe music or flute music into notes. But I'm a little confused about why there's no need to use a deep learning model because I thought in the first place that I could also use sequential models. Could you elaborate on that for me? Thank you so much.

PS:I will surely look into your recommendation about FFT.

1

camaradorjk OP t1_iqvmzun wrote

Not really the music sheets, just the musical notes. I will display the proper finger position on a flute based on the musical notes predicted. I'm trying to create a learning tool.

1

beingsubmitted t1_iqvs8ix wrote

All you need to know over time is the pitch being played, which is a frequency. The audio file represents a waveform, and all you need to know is the frequency of that waveform over time. There's no need for anything sequential. 440Hz is "A" no matter where it comes in a sequence. It's A if it comes after C, and it's A if it comes after F#.

A sequential model might be useful for natural language, for example, because meaning is carried between words. "Very tall" and "Not Tall" are different things. "He was quite tall" and "No one ever accused him of not being tall" are remarkably similar things. Transcribing music is just charting the frequency over time.

That said, you cannot get the frequency from a single data point, so there is a somewhat sequential nature to things, but it's really just that you need to transform the positions of the waveform over time into frequency, which the fourier transform does. When music visualizations show you an EQ (equalizer) chart to go with your music, this is what they're doing - showing you how much of various frequencies are present at a given time in the music, using a FFT. A digital equalizer similarly transforms audio into a frequency spectrum, allows you to adjust the frequency spectrum, and then transforms back into a waveform.

2

mr_birrd t1_iqxl8ol wrote

If it's only pure recorder and since it can only play one note per time, you could find the frequency using the fourier transform. Best if you use denoising before. Then you can just map the loudest frequencies to actual notes.

1

BobDope t1_iqwqnaw wrote

JOMAMA Classification Model 192

0