vwings t1_j64itph wrote on January 27, 2023 at 5:36 PM

Reply to comment by Thanos_nap in [P] Building a LSTM based model for binary classification by Thanos_nap

The batch dimensions are the different customers. You have N costumers, across T weeks and possible actions. This should give you a sparse tensor of dimensions [N,T,K] that you can easily plug into any LSTM....

vwings t1_j63c9z7 wrote on January 27, 2023 at 12:38 PM

Reply to comment by guava-bandit in [P] Building a LSTM based model for binary classification by Thanos_nap

Yes, good point. I would recommend to use KERAS for this modeling task. As soon as you have the data in the right data structure, you can solve this with maybe 25 lines of code ...

vwings t1_j63c3ss wrote on January 27, 2023 at 12:36 PM

Reply to comment by teenaxta in [P] Building a LSTM based model for binary classification by Thanos_nap

How do you know that the costumer is male?

vwings t1_j63c23v wrote on January 27, 2023 at 12:36 PM

Reply to comment by Thanos_nap in [P] Building a LSTM based model for binary classification by Thanos_nap

Lol, LSTM for the sake of it. If there is no temporal component, then it's just the wrong model. Can you tell them that Transformers are the "new" LSTMs? Transformers handle sets (instead of sequences), so they would make a lot of sense in your application..

vwings t1_j5gowwn wrote on January 22, 2023 at 9:54 PM

Reply to Evaluation for similarity search [P] by silverstone1903

For such retrieval systems, you would usually use Top-1, Top-5 or Top-something accuracy. Concretely, you have a list of product types (embeddings) in your database (let's say 100 or 100,000 whatever). Then you get your product description, you embed it with your ANN and then you compare it with all product type embs. then you check on which rank the correct product type ends up. From that you can calculate mean rank or top-k accuracy ..

vwings t1_j1nvfaf wrote on December 25, 2022 at 11:24 PM

Reply to comment by wadawalnut in [D] Are reviewer blacklists actually implemented at ML conferences? by XalosXandrez

Ok, do you have an idea why it failed??

vwings t1_j1nmx5z wrote on December 25, 2022 at 10:14 PM

Reply to comment by ktpr in [D] Are reviewer blacklists actually implemented at ML conferences? by XalosXandrez

I assume they mean that each paper gets a list of reviewers it cannot be assigned to, because of conflicts of interests. But the original question could also be about a general blacklist of reviewers that showed misconduct before (e.g. not reacting to the author rebuttal, etc). I don't think a general blacklist exists... But it might be that the conference organizers indeed have this.

vwings t1_j1nmbol wrote on December 25, 2022 at 10:09 PM

Reply to [D] Are reviewer blacklists actually implemented at ML conferences? by XalosXandrez

On OpenReview at least, you can enter persons who you are somehow connected with (this is a kind of blacklist). Furthermore, Openreview automatically avoids that your coauthors or persons with the same email domain (e.g. *@google.com) are assigned as reviewers. Openreview is actually pretty smart, but I don't know if authors actively try to trick it out...

vwings t1_j13pguc wrote on December 21, 2022 at 1:16 PM

Reply to comment by rjromero in [R] Nonparametric Masked Language Modeling - MetaAi 2022 - NPM - 500x fewer parameters than GPT-3 while outperforming it on zero-shot tasks by Singularian2501

It was expected, right? A retrieval system should be much more efficient than storing phrases in neural net weights as GPT does...

vwings t1_j0p92zm wrote on December 18, 2022 at 11:34 AM

Reply to [R] Getting arXiv article published. Endorsement needed, cs. by No-Stay9943

Send the article to the authors of the works you are mainly building on. They might endorse you if they find the article reasonable. And they have a small incentive because their citation count increases when your article is on arxiv...

vwings t1_j0osaa7 wrote on December 18, 2022 at 7:44 AM

Reply to comment by bremen79 in [D] Is softmax a good choice for confidence? by thanderrine

Now you are completely deviating from the original scope of the discussion. We discussed what is more general, but - since you changed scope - you agree with me on that.

About "guarantees": also for CP, it is easy to construct examples where it fails. If the distribution of the new data is different from the calibration set, it's not exchangeable anymore, and the guarantee is gone.

vwings t1_j0mewdd wrote on December 17, 2022 at 7:46 PM

Reply to comment by madhatter09 in [D] Is softmax a good choice for confidence? by thanderrine

This is the link to the NeurIPS2022 invited talk: https://neurips.cc/virtual/2022/invited-talk/55872 (don't know if it's accessible without registration)

Here is almost the same talk: https://www.youtube.com/watch?v=dmZBxW7oY1o

vwings t1_j0ljgfr wrote on December 17, 2022 at 4:07 PM

Reply to comment by Extra_Intro_Version in [D] Is softmax a good choice for confidence? by thanderrine

That's what the CP guys say. :)

I would even say that Platt generalizes CP. Whereas CP focuses on the empirical distribution of prediction score only around a particular location at the tails, e.g. at confidence level 5%, Platt scaling tries to mold the whole empirical distribution into calibrated probabilities -- thus Platt considers the whole range of the distr of scores.

vwings t1_j0l5778 wrote on December 17, 2022 at 2:14 PM

Reply to comment by visarga in [D] Is softmax a good choice for confidence? by thanderrine

A hard problem indeed. The methods in your list have use different settings. Deep Ensembles, MC Dropout don't require a calibration set. The prior networks (i love this paper) assume that during training OOD samples are available. Conformal prediction assumes the availability of a calibration set that follows the distribution of future data... For the other methods,I would have to check ...

vwings t1_j0l4q82 wrote on December 17, 2022 at 2:10 PM

Reply to comment by bremen79 in [D] Is softmax a good choice for confidence? by thanderrine

This is not true... There are many other works,e.g. Platt Scaling that also provide calibrated classifiers (i suppose this is what you call "valid"). But conformal prediction indeed tackles this problem...

vwings t1_j0l4gq9 wrote on December 17, 2022 at 2:07 PM

Reply to comment by madhatter09 in [D] Is softmax a good choice for confidence? by thanderrine

Great question and comment! In think the first statement here is that usually the CNNs are overconfident.

One thing that the original post is looking for is calibration of the classifier on a calibration set. On the calibration set, the softmax values can be re-adjusted to be close to probabilities. This is essentially what Conformal Prediction and Platt Scaling do.

I strongly recommend this year's talk on Conformal Prediction which provides insights into these problems. Will try to find the link...

vwings t1_j0l3xlc wrote on December 17, 2022 at 2:02 PM

Reply to comment by ttt05 in [D] Is softmax a good choice for confidence? by thanderrine

Yes, indeed the Hendrycks paper is the first to read in this context: https://arxiv.org/abs/1610.02136

vwings t1_j0aqnet wrote on December 15, 2022 at 7:40 AM

Reply to [D] Why are there no good generative music AIs? by happyhammy

Try Musika: https://github.com/marcoppasini/musika

vwings t1_iza8lwm wrote on December 7, 2022 at 5:02 PM

Reply to comment by robbsc in [D] If you had to pick 10-20 significant papers that summarize the research trajectory of AI from the past 100 years what would they be by versaceblues

Completely agree.

vwings t1_iz8n5m2 wrote on December 7, 2022 at 7:28 AM

Reply to comment by huberloss in [D] If you had to pick 10-20 significant papers that summarize the research trajectory of AI from the past 100 years what would they be by versaceblues

i would add:

LSTMs (1997), Hochreiter & Schmidhuber

ImageNet (2012), Krishevsky et al

Deep Learning (2015), LeCun, Hinton & Bengio

Attention is all you need (2017).

vwings t1_iw857q2 wrote on November 13, 2022 at 6:20 PM

Reply to comment by machinelearner77 in Relative representations enable zero-shot latent space communication by 51616

Yes, sure you can backprop, but what I meant is that you are able to train a network reasonably with this -- although in the backward pass the gradient gets diluted to all anchor samples. I thought you would at least need softmax attention (forward pass) to be able to route the gradients back reasonably.

vwings t1_iw768hl wrote on November 13, 2022 at 2:11 PM

Reply to comment by huehue9812 in Relative representations enable zero-shot latent space communication by 51616

I think it's valuable, but not huge. There have been several recent works that use this concept that a sample is described by similar samples to enrich representations:

the cross-attention mechanism in Transformers does this to some extent
AlphaFold: a protein is enriched with similar (by multiple sequence alignment) proteins
CLOOB: a sample is enriched with similar samples from the current batch
MHNfs: a sample is enriched with similar samples from a large context.

This paper uses this concept, but does it differently: it uses the vector of cosine similarities, which in other works is softmaxed and then used a weights for averaging, directly as representation. That this works and that you can backprop over this is remarkable, but not huge... Just my two cents... [Edits: typos, grammar]

vwings t1_iv4z8sm wrote on November 5, 2022 at 10:20 AM

Reply to [D] Physics-inspired Deep Learning Models by ShadowKnightPro

Mass-conserving LSTM

vwings t1_iud9fb8 wrote on October 30, 2022 at 1:33 PM

Reply to [R] Deep model with inputs of unbalanced sizes by fedetask

The best way is probably to use a feature encoding and plugging this into a Transformer. First sample: 200 features A and 5 features B. You encode this as set {[A feats, encoding for A]W_A, [B feats (possibly repeated), encoding for B]W_B]} Second sample with B and C features: {[C feats, encoding for C]W_C, [B feats (possibly repeated), encoding for B]W_B]}. The linear mappings W_A, W_B, and W_C must map to the same dimensions. The order of the feature groups does not play a role (permutation invariance of the transformer). Note that this also learns a feature or feature group embedding.

vwings t1_is99t4r wrote on October 14, 2022 at 6:08 AM

Reply to [R] Clustering a set of graphs by No_Performer203

Check contrastive pre-training of graph neural networks. The clustering methods should work well for those pretrained representations. Furthermore: do the graphs come from a particular domain? You might find some useful pre-training objectives then..