snairgit OP t1_iwlfm3y wrote on November 16, 2022 at 2:46 PM

Reply to comment by jobeta in [P] Thoughts on representing a Real world data by snairgit

I understand, I'll update the main post with more info.

To answer your question, a label corresponds to all the readings of a subject. Here, all the readings are a combination of the 8 subsequent readings done from different sides. Which again have 5 values for each side. Easiest/closest example i could think of is like taking multiple snapshots of an object from different sides to create a 3d model. Similar to that. My question is regarding how to efficiently wrap this data up for each subject.

jobeta t1_iwlin8p wrote on November 16, 2022 at 3:08 PM

I’m not sure what you mean by wrap that up. What programming language are you using? Python? You can create a pandas dataframe that contains one row per subject and per time stamp and as many columns as measurements. Then depending on the problem, you would transform that and engineer features that work with the type of model you’re thinking using.

snairgit OP t1_iwlne50 wrote on November 16, 2022 at 3:41 PM

Ya this is almost what I did. My doubt is this: for a row, how to put the velocity and time together for a single feature. For eg: row 1 column 1 would be - 1st out of 8 side -> 1st out of 5 values -> velocity, distance. Here how do I represent (velocity, distance)? Do I keep them as a tuple or as separate adjacent features or any other format? This is exactly where I'm stuck.

et809 t1_iwlp423 wrote on November 16, 2022 at 3:53 PM

This seems more like a programming question than a machine learning one. It depends on your model implemention. Some frameworks don't even require any further preprocessing. Just leave each individual reading consistently separated in a column of their own. No need to combine velocity and distance manually.

snairgit OP t1_iwlw27z wrote on November 16, 2022 at 4:40 PM

Thanks for your input, ya i will try that out. The features do have a relation, that's why I was wondering whether packaging that in the input will make sense or not. I'll do some experiments and see how it turns out.