Submitted by AutoModerator t3_11pgj86 in MachineLearning
RainbowRedditForum t1_jd64o4e wrote
A CRNN is trained with logmel as input, calculated as follows:
the input audio is split in 30ms frames with 10ms hop size, and 40 logmel are calculated for each frame.
The CRNN performs a binary classification.
With this setup, are these two considerations true?
- two consecutive output labels generated by the CRNN are associated with two overlapped audio frames (each of size 30ms (0.03s) and hop size 10ms);
- for 10 minutes audio the CRNN should generate about 30000 output labels, each one associated with a 30ms frame with 10ms of overlap
Viewing a single comment thread. View all comments