thiru_2718 t1_iwlth7x wrote on November 16, 2022 at 4:23 PM

You can add "velocity" and "time" as separate features, but if there's a known physical relationship between the two, you can encode that relationship more efficiently by combining them into a single feature. So for example, you can capture the kinematics of the data by adding a "position" feature where "position_i+1 = velocity_i+1 * (time_i+1 - time_i) + position_i".

snairgit OP t1_iwlw6g3 wrote on November 16, 2022 at 4:41 PM

This is an interesting thought. They do have some inherent relationship but defining it would be a problem. But I'll look into it, thanks for your input.

jobeta t1_iwlciy8 wrote on November 16, 2022 at 2:23 PM

Do you have one label for each set of position and speed measurements or are you trying to predict a label from sequences of measurements. Tbh this description is very limited and unclear.

snairgit OP t1_iwlfm3y wrote on November 16, 2022 at 2:46 PM

I understand, I'll update the main post with more info.

To answer your question, a label corresponds to all the readings of a subject. Here, all the readings are a combination of the 8 subsequent readings done from different sides. Which again have 5 values for each side. Easiest/closest example i could think of is like taking multiple snapshots of an object from different sides to create a 3d model. Similar to that. My question is regarding how to efficiently wrap this data up for each subject.

jobeta t1_iwlin8p wrote on November 16, 2022 at 3:08 PM

I’m not sure what you mean by wrap that up. What programming language are you using? Python? You can create a pandas dataframe that contains one row per subject and per time stamp and as many columns as measurements. Then depending on the problem, you would transform that and engineer features that work with the type of model you’re thinking using.

snairgit OP t1_iwlne50 wrote on November 16, 2022 at 3:41 PM

Ya this is almost what I did. My doubt is this: for a row, how to put the velocity and time together for a single feature. For eg: row 1 column 1 would be - 1st out of 8 side -> 1st out of 5 values -> velocity, distance. Here how do I represent (velocity, distance)? Do I keep them as a tuple or as separate adjacent features or any other format? This is exactly where I'm stuck.

et809 t1_iwlp423 wrote on November 16, 2022 at 3:53 PM

This seems more like a programming question than a machine learning one. It depends on your model implemention. Some frameworks don't even require any further preprocessing. Just leave each individual reading consistently separated in a column of their own. No need to combine velocity and distance manually.

snairgit OP t1_iwlw27z wrote on November 16, 2022 at 4:40 PM

Thanks for your input, ya i will try that out. The features do have a relation, that's why I was wondering whether packaging that in the input will make sense or not. I'll do some experiments and see how it turns out.

nins_ t1_iwlhcbk wrote on November 16, 2022 at 2:59 PM

So it seems like you have 8 * 5 * 2 = 80 features per training sample if you want to use all 8 sides to predict your binary label. Is this representation causing problems?

snairgit OP t1_iwlnn2v wrote on November 16, 2022 at 3:43 PM

Ya true, but it's a bit more than that. I've added an explanation to another comment, adding that below.

My doubt is this: for a row, how to put the velocity and time together for a single feature. For eg: row 1 column 1 would be - 1st out of 8 side -> 1st out of 5 values -> velocity, distance. Here how do I represent (velocity, distance)? Do I keep them as a tuple or as separate adjacent features or any other format? This is exactly where I'm stuck.

My confusion/question is on what could be the right way to put these values together. Do I directly stack them as 80 features like you mentioned or do I adopt another format such that the 2 values (velocity, time) are bundled together like a tuple or a dict or something else? Thanks.

visarga t1_iwlxe03 wrote on November 16, 2022 at 4:49 PM

About representing your features - I would not feed float values directly to a neural net. I think you either need to discretise the values or to embed them like absolute positional embeddings in transformers. Or try using a SIREN on your float values directly.

snairgit OP t1_iwnzie7 wrote on November 17, 2022 at 1:22 AM

Okay, thanks. I'll keep that in mind and I'll look into it.

olutobi01 t1_iwmpyq3 wrote on November 16, 2022 at 7:59 PM

You can represent it as 5 timestamps of 2 features with 8 channels i.e. 5, 2, 8. This can be used as input into a 2D CNN. Alternatively, you can represent as 5,2 X 8; 5 timestamps of 16 features. This is the sort of input expected by a 1D CNN.

snairgit OP t1_iwnzurp wrote on November 17, 2022 at 1:25 AM

This is a good thought, i haven't thought this way. Will try this out and see if it's possible to represent it this way. Thanks.

[P] Thoughts on representing a Real world data

Comments