One-hot-encoding is a popular method for encoding categories due to its simplicity and interpretability. Interpretability also makes it compatible method for simple machine learning algorithms (such as polynomial regression, which, yes, I classify as AI). Nonetheless, there are alternatives, such as base-n encoding which I find an appealing idea. However, I am not sure what method of encoding neural networks prefer.

What is your advice? Should you prefer one-hot encoding, base-n encoding (and which n should you choose), or some other method?

Travolta1984 t1_it80hch wrote on October 21, 2022 at 4:29 PM

If you are using a neural network model, I would use Embedding layers instead, as it can learn a latent representation of your features (something that one-hot encoding can't for example).

Remarkable_Owl_2058 t1_itbh6oj wrote on October 22, 2022 at 11:05 AM

I used embedding layers too using FastAI API. Results were better than one-hot encoding !

Thijs-vW OP t1_it82bgp wrote on October 21, 2022 at 4:41 PM

I looked in to the embedding layer in Keras, but I was not impressed. They are merely fancy lookup tables. That is nice when you want to encode sentences or the like, but I have a variable with merely 51 categories. In this case, a dense layer to transform the one-hot encoded variable would achieve the same, if I am not mistaken.

TheCloudTamer t1_itau2sm wrote on October 22, 2022 at 5:43 AM

Embedding are dense layers, just for one-hot vector input.

Thijs-vW OP t1_itbb6xe wrote on October 22, 2022 at 9:41 AM

Thanks for the clarification. So if I one-hot encode my categorical variable and feed it to a dense layer, I would achieve the same as with an embedding layer?

Thijs-vW OP t1_itbjlyh wrote on October 22, 2022 at 11:35 AM

Apparently yes: https://stackoverflow.com/a/57807971/15589661

[deleted] t1_itbma8z wrote on October 22, 2022 at 12:06 PM

[deleted]

[deleted] t1_itbndqi wrote on October 22, 2022 at 12:18 PM