Submitted by yourmamaman t3_116bi6b in dataisbeautiful
Comments
[deleted] t1_j95w6jb wrote
[removed]
BujuArena t1_j95xyvh wrote
This is incomprehensible in its current form. It doesn't seem to link anything to anything else in any way that makes sense. I am utterly baffled trying to decipher it.
yourmamaman OP t1_j9602jy wrote
HHmm, I may need to add more backstory. Like "Euclidean distance represents ingredient similarities"
BujuArena t1_j960i8o wrote
Even with that, it would seem to have the level and quality of meaning I could attribute to any given audio track on Autechre's album, "Draft 7.30".
yourmamaman OP t1_j960xbk wrote
Ahh, you're not trying to help , you are just opinionated
BujuArena t1_j9626ue wrote
I don't know how to even approach consideration of this curiosity. It is intriguing in its layout, and yet indescribably baffling. Did the author intend any structure? Is there meaning in the clusters and islands? The work intrigues, but leaves the viewer hopelessly lost in the depths of its mystery.
I'm just enjoying commenting at this point. I am actually interested in the full essay that would enlighten us about exactly how this chart can be interpreted.
Edit: Also, I disagree with the insinuation that a viewer has any obligation to help with the presentation of the data. We have awarded feedback, and with that, you may make a choice: Will you try again and improve the work, portraying the glory of excitement that you've felt upon pondering the mystery you've sought to dispel? Or will you leave the viewership to ponder the mystery of the graphic they see before them, and the origin of the authorship which led to its creation? It is up to you, and you have no real obligation to choose the former option. The only difference you will see is in your Reddit point count, which you may or may not see as consequential, but which absolutely is inconsequential in the grand scheme of life.
WarmAppleNight t1_j962exi wrote
What's up with so many of the different-colored groupings having identical labels (e.g. the five adjacent clusters that are each labeled "crust pie"?)
WGoNerd t1_j962gdb wrote
Looks like a map from some fantasy universe.
yourmamaman OP t1_j9636kc wrote
Sees it is differences like: 'Cream cheese crust apple pie' vs 'No crust zucchini pie'. But because I could only label the cluster with common words in the recipe title it couldn't find common enough words
boganvegan t1_j963ezc wrote
What about steak, mushroom, pork, chicken, fish, venison etc.?
LinusMendeleev t1_j964yg4 wrote
No my friend, he is trying to help. It doesn't make sense at all.
yourmamaman OP t1_j9658wn wrote
To be honest, I had trouble visualizing this concept, hence the post. The idea was the take 3700 different recipes for pie, but understand how much variation there is in terms of their ingredients.
So the algorithm will cluster recipes that have very similar ingredients and give them one color, and place the cluster in such a manner that clusters with very different ingredients are far away from each other
yourmamaman OP t1_j965qul wrote
They seem to fall into the chicken cluster since the others don't seem the use such different ingredients that they form their own cluster, and there are more chicken pie recipes so that's why it got that label. But thank you, it's questions like these I need.
MJLDat t1_j966gwj wrote
Did you use t-SNE for this?
[deleted] t1_j966oir wrote
[removed]
AnonymousButIvekk t1_j968yra wrote
yeah sorry man, this post is dumb the way its been made. a lot better ways to visualise what you meant
yourmamaman OP t1_j969p9p wrote
For example?
Clutcheon t1_j969w21 wrote
This is the ugliest data ive ever seen presented LOL
tilapios t1_j96a1te wrote
"...viewers need to be able to understand what is being visualized based on information provided in the visualization and its title."
This is in the sub's rules. Let's see if it gets enforced.
Eokokok t1_j96b0uc wrote
Wherever this is it makes no sense, presents no data and if just mindbogglingly terrible...
[deleted] t1_j96b85q wrote
[removed]
AnonymousButIvekk t1_j96bfsu wrote
what does the distance between them mean? color means nothing other than to differentiate groups, on the first look. what is the difference, if there even is any, between horizontal and vertical distances, why are they clumped together on the left and the right. why so many crusts, why does it say pie every time, it obscures preception of size.
literally anything would make more sense, this is a piece of bad modern art before it is a representation of data.
yourmamaman OP t1_j96bict wrote
>what is being visualized
Maybe if I reordered the words to "Recipes for Pie clustered by ingredients"?
lgoldfein21 t1_j96bmdg wrote
I would have to actively work for hours to create a less clear representation of a really cool concept
Deepfriedwithcheese t1_j96bwgq wrote
I thought it was the primary pie type by country and the dark red splotch in the upper right was Italy, but it unraveled from there.
yourmamaman OP t1_j96bzat wrote
How could it be represented differently with data?
Honestly trying to figure this out for a client.
holdenontoyoubooks t1_j96c06u wrote
I promise that wasn’t the problem. What are the sections visualizing?
YournameisnotJohn t1_j96cc3v wrote
I don’t know what those tools are, but is this a type of principle components analysis? I’m guessing not because there are no axes labeled with principle components. Anyway, I feel like you could do what you’re trying to do with PCA and put different ingredients as different components (I have a very limited understanding of PCA though). Anyway I actually think it’s interesting visualizing the variation in things we lump into the same category, even when those categories are well founded.
yourmamaman OP t1_j96cum5 wrote
>presents no data
The distances between clusters are representative of the differences in recipe ingredients, further away, the more differences.
For example, that is why Sheperd's Pie is so far away from the others. But point taken, there are no very apparent conclusions.
yourmamaman OP t1_j96dbaf wrote
Each dot is a recipe, The clusters are similar recipes.
The distance is the difference in ingredients.
yourmamaman OP t1_j96ey2w wrote
UMAP is like a PCA in that they both do Dimension Reduction. UMAP seems to scale a bit better with lots of dimensions.
yourmamaman OP t1_j96g009 wrote
This is true.
Visualizations of high-dimensional data are hard to make intuitive since it attempts to plot a dataset with 768 dimensions in a 2-dimensional image. So the axes do not have a label, like in a PCA.
Honestly looking for a better way to do this
PianistAdditional t1_j96g889 wrote
I do prefer (chicken pot) pie over pot chicken pie any day of the week
holdenontoyoubooks t1_j96gw91 wrote
I think the downfall here is these look like countries and people thought you were saying this country has this recipe
EbMinor33 t1_j96h558 wrote
Number one is to add a title and explanation to the image. If your data visualization does its job, I shouldn't need to come to the comment section to have any clue what I'm looking at.
lgoldfein21 t1_j96h5iy wrote
It’s just very unclear! No idea what the circles mean or what the distance represents. I would maybe label a few circles as an example and have a clear title?
(Sorry my comment was way too mean for the effort that was put in! Just could be a really cool concept!)
PrimeNumbersby2 t1_j96hq4z wrote
Yeah, I was really hoping this was a map.
yourmamaman OP t1_j96j4rd wrote
Maybe I can put a circle at the top with the label "A internet Pie Recipe" to show that the circles are individual recipes.
And a scale diagram like you find on maps at the bottom, with ticks that say 'small differencess in ingredients' and 'big differencess' to indicate that the distance between circles are the amount of differences is recipe ingredients.
[deleted] t1_j96k06o wrote
[removed]
yourmamaman OP t1_j96k2r5 wrote
You might be right. But some of the comments are helpful. Like the one recommending to put a legend that indicates a dot is an individual internat pie recipe. And a scale diagram on the bottom like you find on maps to indicate the distance is differences in the recipe ingredients.
[deleted] t1_j96k822 wrote
[removed]
yourmamaman OP t1_j96k9o2 wrote
Any suggestions for a better title? This one my have been too data science-y
Mike_for_all t1_j96kdvx wrote
The distance between ingredients makes it a bit confusing to read. The idea of clustering though is neat.
yourmamaman OP t1_j96m3ls wrote
Yeah, that one irritated me also.
december-32 t1_j96n6np wrote
So there is a drastic difference between "Chocolate Pie" and "Chocolate Pie"?
yourmamaman OP t1_j96o2hl wrote
This is one of the observation. It seems there are two (very) different ways to make a chocolate pie, at least in terms of ingredients.
pavldan t1_j96pyky wrote
So why don’t you list the ingredients so we know what the dots are mapped against?
dragontracks t1_j96qxbo wrote
Ok, i love the idea of this guide, but I don't have any guides for how to read it. Your comment helps, but should the map have a legend explaining colors, size and distance? And are the axis significant?
CosmicBogWarrior t1_j96rvh6 wrote
And here it is. The last straw and the ultimate final form of being the opposite of data being beautiful that this sub can offer and I am out!
sveinb t1_j96tg3o wrote
I guess this is a PCA plot. Most people commenting clearly aren't familiar with the concept and therefore don't understand the plot. It's not the plot's fault, it's a perfectly good plot, and these plots have no meaningful units etc. Maybe a link to the wikipedia article about PCA would have helped those without a scientific background, which is unfortunately most of the world.
0ld6rumpy6uy t1_j96u0nu wrote
Well, it is kind of interesting to understand that a chocolate pie can be closer related (by ingredients) to hush puppies than to another chocolate pie.
Andie787 t1_j96u29u wrote
How does the colour work? What does it mean that there are so many blue clusters far apart from on another?
DanDanDan0123 t1_j96vmq0 wrote
Is this supposed to be a map or something? The U.S. or the world?
[deleted] t1_j96wun7 wrote
[removed]
yourmamaman OP t1_j96x56o wrote
No This is just a visualization of how similar different recipes for pie are.
FrickingFurniture t1_j96xsub wrote
Yo why the fuck are you guys putting fruit, chocolate and cream in your pies?
yourmamaman OP t1_j96xsxq wrote
It's a lot of different ingredients. Think sugar, brown sugar, powered sugar. And even if I grouped then, just the number of different main ingredients would be a long list.
But I see your point.
yourmamaman OP t1_j96xuhr wrote
Good point
Bejoty t1_j96xvxp wrote
You can plot the PCA loadings to give viewers an idea about what variables are affected moving along each axis.
yourmamaman OP t1_j96y2ha wrote
This might be a good topic for a deep dive. To have a way of explaining what the differences are between clusters.
Thanks for the idea.
yourmamaman OP t1_j96ycog wrote
It's like a hundred different variables. Seems people try to make their recipes just a bit unique just to be different.
But good idea, maybe an example in the plot.
goodluckonyourexams t1_j971ctx wrote
euclidean distance of baking ingredients
goodluckonyourexams t1_j971pj5 wrote
the colour means nothing, only the distance matters
and the distance is reduced in its dimensions
goodluckonyourexams t1_j972w0j wrote
I guess people need to know clustering to understand this at all
RvrTam t1_j976rl9 wrote
I can’t even see that black text on the blue blobs
Penguinkeith t1_j9771kl wrote
How about chicken pot pie and..... pot chicken pie?
OryxTempel t1_j977ywp wrote
What’s a crust pie and why is it broken up into different clusters and colors?
And hush puppies aren’t even close to pie.
Gnemlock t1_j97aozv wrote
Not only is this indecipherable, but there's a whole bunch of the most common pies completely absent.
I thought OP meant 'desert pies', but there's one that isn't.. and why is it called 'pet chicken'? WTH?
KezAzzamean t1_j97as4m wrote
I’m saving this post as an example of the worst data chart I’ve ever seen.
I have absolutely no fucking clue what is going on.
I don’t know why the dots are where they are. What direction means, if anything, on x y axis. I mean I have no idea at all. Is this art? Is it countries?
What the actual fuck.
MrLagzy t1_j97begd wrote
Is this an unfinished jackson pollock painting?
Marrrkkkk t1_j97em6h wrote
Sure, it's not necessarily lacking a scale though. It's lacking a meaningful definition of the labels. The words currently used mean little especially with what are essentially duplicates. It would also help to label it as PCA...
0ld6rumpy6uy t1_j97j2nu wrote
How about calculating the colour of each recipe based on ingredients and their proportions?
yourmamaman OP t1_j97szky wrote
Seems to be hard to do since you would to model the chemical process between hundreds of possible mixture and the effects of the different preparation instructions
blardycone t1_j992c7h wrote
Why are there 2 pumpkin pies?
sveinb t1_j9a2qx6 wrote
You mean the axes? They are simply the 1st and 2nd components. Not very useful, and quite ok to omit. Do you mean the cluster labels? They are useful enough, I think. It would have been useful to label it as a PCA plot, though.
sveinb t1_j9a91xi wrote
The fact that there are several distinct clusters with the same label means that there are several distinct recipes that go under the same name, which I think is interesting.
0ld6rumpy6uy t1_j9cmc18 wrote
Nah, now you’re over complicating things.
yourmamaman OP t1_j9eq27k wrote
Or do you mean the colour in the plot? That is just a random colour given to each cluster.
0ld6rumpy6uy t1_j9g5q19 wrote
Yes, I mean the colour of the plot.
yourmamaman OP t1_j95w50p wrote
This took way more NLP than I anticipated.
Recipes that have similar ingredients are closer together. The idea was the take 3700 different recipes for pie, but understand how much variation there is in terms of their ingredients. So the algorithm will cluster recipes that have very similar ingredients and give them one color, and place the cluster in such a manner that clusters with very different ingredients are far away from each other.
​
Tools: NLTK, UMAP, HDBSCAN
​
e.g. Sugar dimension represents-> ['brown sugar', 'light brown sugar', 'cinnamon sugar', 'white sugar', 'powdered sugar']