Comments

You must log in or register to comment.

Eresbonitaguey t1_j6qic3n wrote

Possibly not the ideal solution but I would suggest taking sections of the spectrogram as images (perhaps with overlap) and feeding that into a multi-label classifier. If you’re after a bounding box then the upper and lower bounds should be apparent based on the location of your classes within the spectrogram i.e. sound intensity occurs at similar frequency. If transfer learning from a general image model I would advise against using false colour to generate the three channels and instead would generate different types of spectrograms (Reassignment method/Multi-tapered/etc.) Due to the nature of spectrograms you don’t really want scale invariance so segmentation models that use feature pyramids can be problematic. I found decent success using Compact Convolutional Transformers but that may not be what you need for your task.

3

jiamengial t1_j6sj3l2 wrote

Using something like a CTC loss might be a good shout - you could basically say you're doing "speech recognition", but instead of recognising (sub)words you're recognising classes

2

uhules t1_j6sp5wk wrote

CTC is better suited for unaligned sequences, if OP has precise timings for the sound events, plain frame-wise classification should work better.

2

jiamengial t1_j6t854s wrote

That's true, was thinking that flat frame-wise predictions could lead to incorrect mid-segment predictions, which might be an annoying model error to get

1

uhules t1_j6spu3f wrote

What kind of model would work in this case is heavily dependent on data availability and the quality of your annotation. Check these datasets from Papers With Code and see whether any one of those is similar enough to your setting, and pick models or code from their leaderboards.

2

1bir t1_j6s0ito wrote

Possible solution:

  • train minirocket/hydra, which were designed for time series classification, on the labelled dataset (probably as four one-vs-many problems, eg s1 vs the rest, s2 vs the rest etc)
  • you'll get sets of 1D convolutional kernels; these can be convolved with time series of any length
  • only one of these should 'fire' strongly for each different heartbeat phase, so you should get univariate signals for each phase
  • convolve these kernel sets with your unsegmented data
  • segment the data based on the strongest signal corresponding to the relevant phase of the heartbeat.

You may need to apply some transformations to the signals to get this to work well though (eg softmax &/ smoothing, or some kind of changepoint detection, which I don't know much about).

1