Comments

You must log in or register to comment.

yourmamaman OP t1_j95w50p wrote

This took way more NLP than I anticipated.

Recipes that have similar ingredients are closer together. The idea was the take 3700 different recipes for pie, but understand how much variation there is in terms of their ingredients. So the algorithm will cluster recipes that have very similar ingredients and give them one color, and place the cluster in such a manner that clusters with very different ingredients are far away from each other.

​

Tools: NLTK, UMAP, HDBSCAN

​

e.g. Sugar dimension represents-> ['brown sugar', 'light brown sugar', 'cinnamon sugar', 'white sugar', 'powdered sugar']

3

BujuArena t1_j95xyvh wrote

This is incomprehensible in its current form. It doesn't seem to link anything to anything else in any way that makes sense. I am utterly baffled trying to decipher it.

251

BujuArena t1_j9626ue wrote

I don't know how to even approach consideration of this curiosity. It is intriguing in its layout, and yet indescribably baffling. Did the author intend any structure? Is there meaning in the clusters and islands? The work intrigues, but leaves the viewer hopelessly lost in the depths of its mystery.

I'm just enjoying commenting at this point. I am actually interested in the full essay that would enlighten us about exactly how this chart can be interpreted.

Edit: Also, I disagree with the insinuation that a viewer has any obligation to help with the presentation of the data. We have awarded feedback, and with that, you may make a choice: Will you try again and improve the work, portraying the glory of excitement that you've felt upon pondering the mystery you've sought to dispel? Or will you leave the viewership to ponder the mystery of the graphic they see before them, and the origin of the authorship which led to its creation? It is up to you, and you have no real obligation to choose the former option. The only difference you will see is in your Reddit point count, which you may or may not see as consequential, but which absolutely is inconsequential in the grand scheme of life.

13

WarmAppleNight t1_j962exi wrote

What's up with so many of the different-colored groupings having identical labels (e.g. the five adjacent clusters that are each labeled "crust pie"?)

13

WGoNerd t1_j962gdb wrote

Looks like a map from some fantasy universe.

10

yourmamaman OP t1_j9636kc wrote

Sees it is differences like: 'Cream cheese crust apple pie' vs 'No crust zucchini pie'. But because I could only label the cluster with common words in the recipe title it couldn't find common enough words

−3

boganvegan t1_j963ezc wrote

What about steak, mushroom, pork, chicken, fish, venison etc.?

7

yourmamaman OP t1_j9658wn wrote

To be honest, I had trouble visualizing this concept, hence the post. The idea was the take 3700 different recipes for pie, but understand how much variation there is in terms of their ingredients.

So the algorithm will cluster recipes that have very similar ingredients and give them one color, and place the cluster in such a manner that clusters with very different ingredients are far away from each other

0

yourmamaman OP t1_j965qul wrote

They seem to fall into the chicken cluster since the others don't seem the use such different ingredients that they form their own cluster, and there are more chicken pie recipes so that's why it got that label. But thank you, it's questions like these I need.

0

MJLDat t1_j966gwj wrote

Did you use t-SNE for this?

3

Clutcheon t1_j969w21 wrote

This is the ugliest data ive ever seen presented LOL

59

Eokokok t1_j96b0uc wrote

Wherever this is it makes no sense, presents no data and if just mindbogglingly terrible...

31

AnonymousButIvekk t1_j96bfsu wrote

what does the distance between them mean? color means nothing other than to differentiate groups, on the first look. what is the difference, if there even is any, between horizontal and vertical distances, why are they clumped together on the left and the right. why so many crusts, why does it say pie every time, it obscures preception of size.

literally anything would make more sense, this is a piece of bad modern art before it is a representation of data.

9

lgoldfein21 t1_j96bmdg wrote

I would have to actively work for hours to create a less clear representation of a really cool concept

23

YournameisnotJohn t1_j96cc3v wrote

I don’t know what those tools are, but is this a type of principle components analysis? I’m guessing not because there are no axes labeled with principle components. Anyway, I feel like you could do what you’re trying to do with PCA and put different ingredients as different components (I have a very limited understanding of PCA though). Anyway I actually think it’s interesting visualizing the variation in things we lump into the same category, even when those categories are well founded.

1

yourmamaman OP t1_j96cum5 wrote

>presents no data

The distances between clusters are representative of the differences in recipe ingredients, further away, the more differences.

For example, that is why Sheperd's Pie is so far away from the others. But point taken, there are no very apparent conclusions.

−1

yourmamaman OP t1_j96g009 wrote

This is true.

Visualizations of high-dimensional data are hard to make intuitive since it attempts to plot a dataset with 768 dimensions in a 2-dimensional image. So the axes do not have a label, like in a PCA.

Honestly looking for a better way to do this

0

PianistAdditional t1_j96g889 wrote

I do prefer (chicken pot) pie over pot chicken pie any day of the week

3

EbMinor33 t1_j96h558 wrote

Number one is to add a title and explanation to the image. If your data visualization does its job, I shouldn't need to come to the comment section to have any clue what I'm looking at.

2

lgoldfein21 t1_j96h5iy wrote

It’s just very unclear! No idea what the circles mean or what the distance represents. I would maybe label a few circles as an example and have a clear title?

(Sorry my comment was way too mean for the effort that was put in! Just could be a really cool concept!)

5

yourmamaman OP t1_j96j4rd wrote

Maybe I can put a circle at the top with the label "A internet Pie Recipe" to show that the circles are individual recipes.

And a scale diagram like you find on maps at the bottom, with ticks that say 'small differencess in ingredients' and 'big differencess' to indicate that the distance between circles are the amount of differences is recipe ingredients.

3

yourmamaman OP t1_j96k2r5 wrote

You might be right. But some of the comments are helpful. Like the one recommending to put a legend that indicates a dot is an individual internat pie recipe. And a scale diagram on the bottom like you find on maps to indicate the distance is differences in the recipe ingredients.

0

Mike_for_all t1_j96kdvx wrote

The distance between ingredients makes it a bit confusing to read. The idea of clustering though is neat.

3

dragontracks t1_j96qxbo wrote

Ok, i love the idea of this guide, but I don't have any guides for how to read it. Your comment helps, but should the map have a legend explaining colors, size and distance? And are the axis significant?

3

CosmicBogWarrior t1_j96rvh6 wrote

And here it is. The last straw and the ultimate final form of being the opposite of data being beautiful that this sub can offer and I am out!

9

sveinb t1_j96tg3o wrote

I guess this is a PCA plot. Most people commenting clearly aren't familiar with the concept and therefore don't understand the plot. It's not the plot's fault, it's a perfectly good plot, and these plots have no meaningful units etc. Maybe a link to the wikipedia article about PCA would have helped those without a scientific background, which is unfortunately most of the world.

3

0ld6rumpy6uy t1_j96u0nu wrote

Well, it is kind of interesting to understand that a chocolate pie can be closer related (by ingredients) to hush puppies than to another chocolate pie.

2

DanDanDan0123 t1_j96vmq0 wrote

Is this supposed to be a map or something? The U.S. or the world?

1

FrickingFurniture t1_j96xsub wrote

Yo why the fuck are you guys putting fruit, chocolate and cream in your pies?

0

yourmamaman OP t1_j96xsxq wrote

It's a lot of different ingredients. Think sugar, brown sugar, powered sugar. And even if I grouped then, just the number of different main ingredients would be a long list.

But I see your point.

−1

yourmamaman OP t1_j96ycog wrote

It's like a hundred different variables. Seems people try to make their recipes just a bit unique just to be different.

But good idea, maybe an example in the plot.

1

RvrTam t1_j976rl9 wrote

I can’t even see that black text on the blue blobs

1

OryxTempel t1_j977ywp wrote

What’s a crust pie and why is it broken up into different clusters and colors?

And hush puppies aren’t even close to pie.

1

Gnemlock t1_j97aozv wrote

Not only is this indecipherable, but there's a whole bunch of the most common pies completely absent.

I thought OP meant 'desert pies', but there's one that isn't.. and why is it called 'pet chicken'? WTH?

1

KezAzzamean t1_j97as4m wrote

I’m saving this post as an example of the worst data chart I’ve ever seen.

I have absolutely no fucking clue what is going on.

I don’t know why the dots are where they are. What direction means, if anything, on x y axis. I mean I have no idea at all. Is this art? Is it countries?

What the actual fuck.

1

MrLagzy t1_j97begd wrote

Is this an unfinished jackson pollock painting?

1

Marrrkkkk t1_j97em6h wrote

Sure, it's not necessarily lacking a scale though. It's lacking a meaningful definition of the labels. The words currently used mean little especially with what are essentially duplicates. It would also help to label it as PCA...

1

sveinb t1_j9a2qx6 wrote

You mean the axes? They are simply the 1st and 2nd components. Not very useful, and quite ok to omit. Do you mean the cluster labels? They are useful enough, I think. It would have been useful to label it as a PCA plot, though.

1

sveinb t1_j9a91xi wrote

The fact that there are several distinct clusters with the same label means that there are several distinct recipes that go under the same name, which I think is interesting.

1