Submitted by AutoModerator t3_10oazg7 in MachineLearning
RogerKrowiak t1_j6diy8w wrote
I have a very basic question. If I have two columns of data:
"Students": ["John", "John", "Roger", "Eve", "John"]
"Sex": ["M", "M", "M", "F", "M"]
can I use different encoding for each column? E.g. frequency encoding for students and binary for sex?Thank you for your answer. If you have tip for basic readings on this, it would be appreciated.
Maleficent-Rate6479 t1_j6fx4hp wrote
If your response variable is sex then you meed to make it binary, otherwise I do not see a problem I think.
qalis t1_j6ir4fh wrote
Yes, you can. Variables in tabular learning are (in general) independent in terms of preprocessing. In fact, in most cases you will perform such different preprocessings, e.g. one-hot + SVD for high cardinality categorical variables, binary encoding for simple binary choices, integer encoding for ordinal variables.
Viewing a single comment thread. View all comments