Submitted by MichelMED10 t3_y5ylmc in MachineLearning
[removed]
Submitted by MichelMED10 t3_y5ylmc in MachineLearning
[removed]
In some application one way to represent dates for periodic cycles is by encoding year, month an week period by sin/cos pair. For example if you think yearly period (seasons) has meaning - create teo featurs cos(year_day/3652pi) an sin(). (In financial applications day of month makes sense,in consumer tradic - day of week)
Yes I by embedding I meant transforming each number of months to a vector, like nn.embedding in pytorch (knowing that the difference between dates can’t be more than 5 years so 60 months) Thanks for the answer !
I have couples of dates and procedures/tests ans their results. So having the date is important (per example a patient had cancer 5 years ago and was treated using conization)
In this case I think only the difference between dates matters
So maybe you could make every date an Unix timestamp, which is an integer, then you get the difference between those integers, then you can use an standard or min max scaler to put it under a certain interval.
I do not think anyone ever encoded dates as embeddings the way you're proposing, just because you can already get these kind of representations by using Unix timestamp.
What seems to matter here is not the dates but rather the amount of time between scans, right?
Gotcha. In that case I’d use sinusoid embedding like others have suggested. Another alternative is normalizing all of the dates onto some small range, eg [0,1]
Marvsdd01 t1_ismmzln wrote
If I understood you correctly, you can handle dates and diffs of dates as a diff of Unix timestamp representations of these dates. Any programming language should have a time data manipulation lib and should offer APIs for converting dates to they Unix timestamp values. It is an approach, but has its limitations. Using months, days and years as different features is also possible. Using cyclical encoding of dates is also possible, buy I use to see this kind of thing only when dealing with the hours, minutes and seconds of a date. Embedding these dates, if we're talking about embedding dates by using an ML algorithm to generate these representations, seems a really, really bad idea, as, in my point of view, it adds work without adding any benefits to your solution. If you're not talking about that, then sorry, but I couldn't understand what you meant by taking about these "embeddings of dates" :)