Viewing a single comment thread. View all comments

Gazorpazzor t1_ixc37ng wrote

Hello,

  1. Extract Features using "TF-IDF" (If the classification is likely led by few specific words)
  2. Train an SVM classifier ( In your case, with few data samples, I would train different classifiers with different hyperparameters and keep the best model. NN architectures like GRUs and LSTMs give decent results, unfortunately they might need more data to produce good results)
  3. Increase your iteration / epochs to compensate for the really small dataset size (keep and eye on the evaluation set loss to prevent overfitting)

As for the data imbalance problem, I would try with undersampling the 4000 samples class set to 250 samples first, then try to improve results later on by data augmentation or cost sensitive algorithms ( cost-sensitive SVM, weighted cross-entropy,...)

2

bankCC t1_ixc6lk0 wrote

Thank you very much for the answer! I highly appreciate it. You gave me a realy good base to start from. Huge thanks

2