Submitted by yourmamaman t3_116bi6b in dataisbeautiful
yourmamaman OP t1_j95w50p wrote
This took way more NLP than I anticipated.
Recipes that have similar ingredients are closer together. The idea was the take 3700 different recipes for pie, but understand how much variation there is in terms of their ingredients. So the algorithm will cluster recipes that have very similar ingredients and give them one color, and place the cluster in such a manner that clusters with very different ingredients are far away from each other.
​
Tools: NLTK, UMAP, HDBSCAN
​
e.g. Sugar dimension represents-> ['brown sugar', 'light brown sugar', 'cinnamon sugar', 'white sugar', 'powdered sugar']
dragontracks t1_j96qxbo wrote
Ok, i love the idea of this guide, but I don't have any guides for how to read it. Your comment helps, but should the map have a legend explaining colors, size and distance? And are the axis significant?
yourmamaman OP t1_j96xuhr wrote
Good point
YournameisnotJohn t1_j96cc3v wrote
I don’t know what those tools are, but is this a type of principle components analysis? I’m guessing not because there are no axes labeled with principle components. Anyway, I feel like you could do what you’re trying to do with PCA and put different ingredients as different components (I have a very limited understanding of PCA though). Anyway I actually think it’s interesting visualizing the variation in things we lump into the same category, even when those categories are well founded.
yourmamaman OP t1_j96ey2w wrote
UMAP is like a PCA in that they both do Dimension Reduction. UMAP seems to scale a bit better with lots of dimensions.
Viewing a single comment thread. View all comments