Submitted by rich_atl t3_ydpnal in MachineLearning
m98789 t1_ittly1y wrote
There’s several strategies to combine multimodal data. Here’s some simple approaches:
-
First train the cnn classsifer. Then use it as a feature extractor by extracting the feature vector from the penultimate layer. Then augment those image features with the features from your tabular data. And then train it all with a classifier like xgboost.
-
If you want to train both your feature extractor and classifier end to end, you could try different strategies for encoding the tabular data into the input tensor. A simple and fun way to try is to encode them visually into your images themselves, such as adding a few more pixel rows at bottom of image. One row can represent country (uniquely color by a country index), and so on.
Viewing a single comment thread. View all comments