Gazorpazzor

Gazorpazzor t1_ixq5oa5 wrote

Usually Image recognition models are the ones used for video recognition too. Yolo models are often used for video recognition thanks to their near real-time inference time.

I invite you to check YoloV7 github page, they also have a script implementation of their model for video recognition on the main page.

1

Gazorpazzor t1_ixc37ng wrote

Hello,

  1. Extract Features using "TF-IDF" (If the classification is likely led by few specific words)
  2. Train an SVM classifier ( In your case, with few data samples, I would train different classifiers with different hyperparameters and keep the best model. NN architectures like GRUs and LSTMs give decent results, unfortunately they might need more data to produce good results)
  3. Increase your iteration / epochs to compensate for the really small dataset size (keep and eye on the evaluation set loss to prevent overfitting)

As for the data imbalance problem, I would try with undersampling the 4000 samples class set to 250 samples first, then try to improve results later on by data augmentation or cost sensitive algorithms ( cost-sensitive SVM, weighted cross-entropy,...)

2