PsychologicalEgg9377
PsychologicalEgg9377 OP t1_jbrqmco wrote
I assumed most people on this sub would be familiar with nonlinear dimensionality reduction but it looks some are not.
https://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction
This family of algorithms takes a data point that is normally represented in a high dimensional space and maps it to a lower dimensional representation. You generally lose information in the process, but in high-dimensional spaces there's often a lot of empty space that you can get rid of without losing much. The closest analogy I can think of is compressing a CD to mp3. You are losing information in the process, but if done correctly, the human ear can't tell much of a difference.
Why do this? One obvious reason is so you can plot highly dimensional data in 2D and 3D and get a better sense of certain spatial relationships. Once reduced, the vector components are difficult to describe in plain language. So it's not like "x is time and y is trades."
It's a confusing concept if you've never seen it before but it's very powerful and a common technique used in data science.
PsychologicalEgg9377 OP t1_jbreync wrote
Reply to comment by maladmin in [OC] Dimensionality reduction of stock trading patterns by PsychologicalEgg9377
It's built using only the daily percentage returns. The sector is shown for reference. In some cases (like real estate and utilities) they form tight clusters, but others (like technology) are all over the map. This makes sense because technology companies can vary widely in the technology they are offering.
PsychologicalEgg9377 OP t1_jbrefbl wrote
Reply to comment by PsychologicalEgg9377 in [OC] Dimensionality reduction of stock trading patterns by PsychologicalEgg9377
Some interesting things I found on the interactive website version:
There's a group of Chinese companies (Baidu, Alibaba, etc). Las Vegas Sands and Wynn Resorts were in this region. After some googling, it turns out these two resorts get a huge amount of their revenue from China.
The "meme stock" region has all the favorites of a certain popular subreddit.
PsychologicalEgg9377 OP t1_jbr6keq wrote
Interactive version at https://www.mapmystocks.com
Non-linear dimensionality reduction of daily stock trading patterns.
The visualization is built using daily returns. The color coded sector is only used for reference to see which stocks actually trade like their sector.
Some interesting examples are that Ford and GM trade almost identically, but Tesla trades like a tech company.
There's the possibility that some movement is just noise, randomness in the algorithm, or dumb luck. Only high volume stocks were included to help reduce noise. Written in angular and plotly.
Disclaimer: This is for entertainment only. Do not make any financial decisions using this.
Edit: Interactive version doesn't work well on mobile due to the number of data points.
Submitted by PsychologicalEgg9377 t3_11o7mmt in dataisbeautiful
PsychologicalEgg9377 t1_j95zs8b wrote
Reply to comment by Ancient-Bread-3236 in [Topic][Open] Open Discussion Thread — Anybody can post a general visualization question or start a fresh discussion! by AutoModerator
A network graph using something like graphviz might be a better method. The problem is, the result represents a snapshot in time. So you'd likely have to animate it or otherwise make it interactive with a timeline slider.
PsychologicalEgg9377 t1_j945pha wrote
Reply to comment by Good_Sage in [Topic][Open] Open Discussion Thread — Anybody can post a general visualization question or start a fresh discussion! by AutoModerator
The two most common stacks I see are Python+Pandas+Seaborn or R+ggplot2.
Python has the added advantage there's a big industry demand and would be more likely to find a job with python experience.
PsychologicalEgg9377 t1_j94589f wrote
Reply to comment by donuthorse in [Topic][Open] Open Discussion Thread — Anybody can post a general visualization question or start a fresh discussion! by AutoModerator
This is awful to hear. If it's the same person or group, you might find some pattern based around weekdays, holidays, time of day, etc. Try labeling them based on date/time and see if you can run some summary statistics. If it's kids, it might increase during holidays. If it's someone jobless, there may be no pattern. This is all speculation, but it's a starting point.
Machine learning might be able to offer some insights as well, but that might be too steep a learning curve.
Good luck in finding them
PsychologicalEgg9377 t1_j943mp3 wrote
Reply to comment by tan_tan_tanuki in [Topic][Open] Open Discussion Thread — Anybody can post a general visualization question or start a fresh discussion! by AutoModerator
It's not a very satisfying answer, but Excel might be your best bet. You likely already have it on your work computer and there are tons of resources for creative ideas. You can download tons of data already in a spreadsheet usable form (csv mostly). Most services will either have a steep learning curve, or be too basic to be useful.
PsychologicalEgg9377 t1_j9432iq wrote
Reply to comment by yankee29 in [Topic][Open] Open Discussion Thread — Anybody can post a general visualization question or start a fresh discussion! by AutoModerator
This is common. There's a technical term for it (jitter). There's a balance between making your datapoints visible and changing their values too much.
PsychologicalEgg9377 t1_j942tn7 wrote
Reply to comment by Trick_Read in [Topic][Open] Open Discussion Thread — Anybody can post a general visualization question or start a fresh discussion! by AutoModerator
Excel. I dislike Excel for many reasons, but it does allow you to quickly visualize data and get practice with the process. If you can program, R+ggplot2 or Python+Pandas+Seaborn.
PsychologicalEgg9377 t1_j942krp wrote
Reply to comment by Good_Sage in [Topic][Open] Open Discussion Thread — Anybody can post a general visualization question or start a fresh discussion! by AutoModerator
I'm from an academic background and used to use a lot of R. There's a library called ggplot2 that is very formal and structured. Many other plotting libraries and methods are very disjoint, but ggplot2 gives you a good foundation because it's based on a lot of plotting theory. It depends if you program or not.
I found this PDF on datacamp that is very high level. I'm not sure I agree with all of them, but it's probably a good start.
https://s3.amazonaws.com/assets.datacamp.com/email/other/Data+Visualizations+-+DataCamp.pdf
PsychologicalEgg9377 t1_j941tlq wrote
Reply to [Topic][Open] Open Discussion Thread — Anybody can post a general visualization question or start a fresh discussion! by AutoModerator
I created an interactive plot written in javascript and plotly that I'd like to post here. I have it hosted on a server. Is it possible to embed as an iframe or similar? Or will I have to just post a hyperlink?
I generally don't click off-site links when scrolling reddit, so I suspect others have the same habit.
PsychologicalEgg9377 OP t1_jbrqr7z wrote
Reply to comment by Spillz-2011 in [OC] Dimensionality reduction of stock trading patterns by PsychologicalEgg9377
Tell me you are a data scientist without telling me you are a data scientist!