resented_ape
resented_ape t1_j26ei9b wrote
The Quasi-Hyperbolic Momentum paper which attempts to generalize classical and Nesterov momentum discusses this in section 4.2. It references a blog post by Ben Recht.
resented_ape t1_iybprpj wrote
Reply to comment by trutheality in [D] What method is state of the art dimensionality reduction by olmec-akeru
FWIW, I don't think that is what the paper demonstrates. The Picasso method the authors introduce uses a totally different cost function based on distance reconstruction. For a specific set of metrics the authors are interested in, they say that Picasso produces results comparable with UMAP and t-SNE. But it's not the UMAP or t-SNE objective.
With the scRNAseq dataset in the python notebook at the github page for picasso, I found that for the metrics one might usually be interested in with UMAP and t-SNE, e.g. neighborhood preservation (what proportion of the k-nearest neighbors in the input and output space are preserved) or Spearman rank correlation (or triplet ordering preservation) of input vs output distances, Picasso did (quite a bit) worse than UMAP and t-SNE.
This might not be relevant to for downstream scRNAseq workflows -- I will take the authors' word on that. At any rate, on my machine Picasso runs very slowly, and I found its own output to be visually unsatisfactory with the default settings with other datasets that I tried it with (e.g. MNIST), so I have been unable to generate a similar analysis for a wide range of datasets. So take that for what it's worth.
resented_ape t1_iy3xso2 wrote
The t-SNE paper came out in 2008, so things are not moving as quickly as OP suggests IMO.
For the t-SNE/UMAP family of algorithms, which can be boiled down to attempting to embed the k-nearest neighbors graph, Dmitry Kobak has published several papers of interest. I would point you to Attraction-Repulsion Spectrum in Neighbor Embeddings as a good place to start.
resented_ape t1_irr5f0k wrote
Reply to [R] Clustering a set of graphs by No_Performer203
Your problem is really how to measure similarity between graphs. If you can do that, you can use pretty much any standard clustering technique.
This task is common in drug discovery and related cheminformatics fields where small molecules are represented as graphs. One common approach there is to generate a dictionary of subgraphs and then represent each molecule as a binary vector where a 1 means the subgraph is present and 0 otherwise. There is an obvious extension to counts of features (i.e. you have an integer vector with a N if the subgraph appears N times). This strategy crucially depends on how nodes and edges are labeled.
The similarity measure is usually something like Jaccard (often referred to as Tanimoto in the chemistry literature).
resented_ape t1_j3kiocc wrote
Reply to [R] Learning Learning-Rates: SteDy Optimizer by LahmacunBear
Have you looked at other related papers? In particular, hypergradient descent and papers that cite that work (which you can find via google scholar)?