vwings
vwings t1_j63c9z7 wrote
Reply to comment by guava-bandit in [P] Building a LSTM based model for binary classification by Thanos_nap
Yes, good point. I would recommend to use KERAS for this modeling task. As soon as you have the data in the right data structure, you can solve this with maybe 25 lines of code ...
vwings t1_j63c3ss wrote
Reply to comment by teenaxta in [P] Building a LSTM based model for binary classification by Thanos_nap
How do you know that the costumer is male?
vwings t1_j63c23v wrote
Reply to comment by Thanos_nap in [P] Building a LSTM based model for binary classification by Thanos_nap
Lol, LSTM for the sake of it. If there is no temporal component, then it's just the wrong model. Can you tell them that Transformers are the "new" LSTMs? Transformers handle sets (instead of sequences), so they would make a lot of sense in your application..
vwings t1_j5gowwn wrote
Reply to Evaluation for similarity search [P] by silverstone1903
For such retrieval systems, you would usually use Top-1, Top-5 or Top-something accuracy. Concretely, you have a list of product types (embeddings) in your database (let's say 100 or 100,000 whatever). Then you get your product description, you embed it with your ANN and then you compare it with all product type embs. then you check on which rank the correct product type ends up. From that you can calculate mean rank or top-k accuracy ..
vwings t1_j1nvfaf wrote
Reply to comment by wadawalnut in [D] Are reviewer blacklists actually implemented at ML conferences? by XalosXandrez
Ok, do you have an idea why it failed??
vwings t1_j1nmx5z wrote
Reply to comment by ktpr in [D] Are reviewer blacklists actually implemented at ML conferences? by XalosXandrez
I assume they mean that each paper gets a list of reviewers it cannot be assigned to, because of conflicts of interests. But the original question could also be about a general blacklist of reviewers that showed misconduct before (e.g. not reacting to the author rebuttal, etc). I don't think a general blacklist exists... But it might be that the conference organizers indeed have this.
vwings t1_j1nmbol wrote
On OpenReview at least, you can enter persons who you are somehow connected with (this is a kind of blacklist). Furthermore, Openreview automatically avoids that your coauthors or persons with the same email domain (e.g. *@google.com) are assigned as reviewers. Openreview is actually pretty smart, but I don't know if authors actively try to trick it out...
vwings t1_j13pguc wrote
Reply to comment by rjromero in [R] Nonparametric Masked Language Modeling - MetaAi 2022 - NPM - 500x fewer parameters than GPT-3 while outperforming it on zero-shot tasks by Singularian2501
It was expected, right? A retrieval system should be much more efficient than storing phrases in neural net weights as GPT does...
vwings t1_j0p92zm wrote
Send the article to the authors of the works you are mainly building on. They might endorse you if they find the article reasonable. And they have a small incentive because their citation count increases when your article is on arxiv...
vwings t1_j0osaa7 wrote
Reply to comment by bremen79 in [D] Is softmax a good choice for confidence? by thanderrine
Now you are completely deviating from the original scope of the discussion. We discussed what is more general, but - since you changed scope - you agree with me on that.
About "guarantees": also for CP, it is easy to construct examples where it fails. If the distribution of the new data is different from the calibration set, it's not exchangeable anymore, and the guarantee is gone.
vwings t1_j0mewdd wrote
Reply to comment by madhatter09 in [D] Is softmax a good choice for confidence? by thanderrine
This is the link to the NeurIPS2022 invited talk: https://neurips.cc/virtual/2022/invited-talk/55872 (don't know if it's accessible without registration)
Here is almost the same talk: https://www.youtube.com/watch?v=dmZBxW7oY1o
vwings t1_j0ljgfr wrote
Reply to comment by Extra_Intro_Version in [D] Is softmax a good choice for confidence? by thanderrine
That's what the CP guys say. :)
I would even say that Platt generalizes CP. Whereas CP focuses on the empirical distribution of prediction score only around a particular location at the tails, e.g. at confidence level 5%, Platt scaling tries to mold the whole empirical distribution into calibrated probabilities -- thus Platt considers the whole range of the distr of scores.
vwings t1_j0l5778 wrote
Reply to comment by visarga in [D] Is softmax a good choice for confidence? by thanderrine
A hard problem indeed. The methods in your list have use different settings. Deep Ensembles, MC Dropout don't require a calibration set. The prior networks (i love this paper) assume that during training OOD samples are available. Conformal prediction assumes the availability of a calibration set that follows the distribution of future data... For the other methods,I would have to check ...
vwings t1_j0l4q82 wrote
Reply to comment by bremen79 in [D] Is softmax a good choice for confidence? by thanderrine
This is not true... There are many other works,e.g. Platt Scaling that also provide calibrated classifiers (i suppose this is what you call "valid"). But conformal prediction indeed tackles this problem...
vwings t1_j0l4gq9 wrote
Reply to comment by madhatter09 in [D] Is softmax a good choice for confidence? by thanderrine
Great question and comment! In think the first statement here is that usually the CNNs are overconfident.
One thing that the original post is looking for is calibration of the classifier on a calibration set. On the calibration set, the softmax values can be re-adjusted to be close to probabilities. This is essentially what Conformal Prediction and Platt Scaling do.
I strongly recommend this year's talk on Conformal Prediction which provides insights into these problems. Will try to find the link...
vwings t1_j0l3xlc wrote
Reply to comment by ttt05 in [D] Is softmax a good choice for confidence? by thanderrine
Yes, indeed the Hendrycks paper is the first to read in this context: https://arxiv.org/abs/1610.02136
vwings t1_j0aqnet wrote
Try Musika: https://github.com/marcoppasini/musika
vwings t1_iza8lwm wrote
Reply to comment by robbsc in [D] If you had to pick 10-20 significant papers that summarize the research trajectory of AI from the past 100 years what would they be by versaceblues
Completely agree.
vwings t1_iz8n5m2 wrote
Reply to comment by huberloss in [D] If you had to pick 10-20 significant papers that summarize the research trajectory of AI from the past 100 years what would they be by versaceblues
i would add:
LSTMs (1997), Hochreiter & Schmidhuber
ImageNet (2012), Krishevsky et al
Deep Learning (2015), LeCun, Hinton & Bengio
Attention is all you need (2017).
vwings t1_iw857q2 wrote
Reply to comment by machinelearner77 in Relative representations enable zero-shot latent space communication by 51616
Yes, sure you can backprop, but what I meant is that you are able to train a network reasonably with this -- although in the backward pass the gradient gets diluted to all anchor samples. I thought you would at least need softmax attention (forward pass) to be able to route the gradients back reasonably.
vwings t1_iw768hl wrote
Reply to comment by huehue9812 in Relative representations enable zero-shot latent space communication by 51616
I think it's valuable, but not huge. There have been several recent works that use this concept that a sample is described by similar samples to enrich representations:
- the cross-attention mechanism in Transformers does this to some extent
- AlphaFold: a protein is enriched with similar (by multiple sequence alignment) proteins
- CLOOB: a sample is enriched with similar samples from the current batch
- MHNfs: a sample is enriched with similar samples from a large context.
This paper uses this concept, but does it differently: it uses the vector of cosine similarities, which in other works is softmaxed and then used a weights for averaging, directly as representation. That this works and that you can backprop over this is remarkable, but not huge... Just my two cents... [Edits: typos, grammar]
vwings t1_iv4z8sm wrote
Mass-conserving LSTM
vwings t1_iud9fb8 wrote
The best way is probably to use a feature encoding and plugging this into a Transformer. First sample: 200 features A and 5 features B. You encode this as set {[A feats, encoding for A]W_A, [B feats (possibly repeated), encoding for B]W_B]} Second sample with B and C features: {[C feats, encoding for C]W_C, [B feats (possibly repeated), encoding for B]W_B]}. The linear mappings W_A, W_B, and W_C must map to the same dimensions. The order of the feature groups does not play a role (permutation invariance of the transformer). Note that this also learns a feature or feature group embedding.
vwings t1_is99t4r wrote
Reply to [R] Clustering a set of graphs by No_Performer203
Check contrastive pre-training of graph neural networks. The clustering methods should work well for those pretrained representations. Furthermore: do the graphs come from a particular domain? You might find some useful pre-training objectives then..
vwings t1_j64itph wrote
Reply to comment by Thanos_nap in [P] Building a LSTM based model for binary classification by Thanos_nap
The batch dimensions are the different customers. You have N costumers, across T weeks and possible actions. This should give you a sparse tensor of dimensions [N,T,K] that you can easily plug into any LSTM....