Submitted by t3_10wffmg in MachineLearning

I have had a lot of fun and success using YOLO and other image object detection models on 2D or 3D image data for personal projects.

I am now working on some projects where I need to scan long periods of timeseries data and find specific waveforms that are variable durations.

Are there techniques or models that function like YOLO that can scan large amounts of data and only highlight specific segments of interest as specific classes?

If it doesn’t exist, I wonder how well the underlying CNN architecture of YOLO would translate to 1 dimensional CNN architectures.

Any info is appreciated, thanks!

0

Comments

You must log in or register to comment.

t1_j7puon5 wrote

How you would approach this really depends on a few things. The most important question is, do you have the target data you want to get out of the network? It is possible, in some cases, to highlight regions of interest using only sample-level classification data. However, this usually is very context specific. If you have target data where these regions are already specified, a normal supervised learning method for wave forms should be perfectly workable, and will likely use 1D CNNs.

2

OP t1_j7rkrc3 wrote

I would ideally be able to go in and select a start time and end time for each event within the longer timeseries and assign it a class, like you do for YOLO training labels, but in 1D. I can assemble and label the dataset, but the labeled segments will be extremely variable in length.

1

t1_j7w5otz wrote

So, the simple strategy here, which kind of ignores your variable length objects, is to simply classify CNN receptive fields directly, and then Max Pool the multiple classification frames.

So, lets say that your sequence is 1024. You build a CNN that has a receptive field of 32, and a stride of 16. This network applied to the sequence will offer something like 63 "frames". Typically, the CNN would expand this network representation up with a large number of channels, take the GlobalMaxPooling to merge these frame's information, and then classify the sample.

Instead, you should classify the frames directly, meaning your output looks like 63 separate sigmoid classifications associated with regions of the signal. Then, you simply take the maximum of each classification likelihood, and use this for your image-level classification.

After training, you can remove the GlobalMaxPooling layer, and look at the segment classifications directly.

1

t1_j7murs8 wrote

[deleted]

0

OP t1_j7mvn63 wrote

I actually tried this. It didn’t work very well for some reason. Maybe I need to change the way the line plot looked. I kept my x and y axis scaling consistent. I also tried making 2D scalograms using wavelet transforms.

1

t1_j7nmza4 wrote

You should look at audio networks. I'm not very familiar with recent work in this area, but one reasonable idea is to produce a spectrogram from the data and use an image detector on it.

1d CNNs are also a thing, although I think it's more challenging for them to have frequency/scale invariance.

0