Blutorangensaft OP t1_jdnzzx1 wrote on March 25, 2023 at 8:59 PM

Reply to comment by KarlKani44 in [D]: Different data normalisation schemes for GANs by Blutorangensaft

Love the visualisation, I will definitely do that. Thanks so much for answering all my questions.

Blutorangensaft OP t1_jdnxm94 wrote on March 25, 2023 at 8:41 PM

Reply to comment by KarlKani44 in [D]: Different data normalisation schemes for GANs by Blutorangensaft

I see, I will improve my critic then (maybe give it more depth) and abstain from tricks like TTUR for now.

What do you mean with "easily seperable distribution of output logits" btw? Plotting the scores the critic assigns for real and fake samples separately? Or do you mean taking mean and standard deviation of the logits for real and fake data and comparing those?

Blutorangensaft OP t1_jdnjfmu wrote on March 25, 2023 at 6:58 PM

Reply to comment by KarlKani44 in [D]: Different data normalisation schemes for GANs by Blutorangensaft

Thank you for the thorough answer. 1) I see, I will just trust my normalisation scheme then. 2) That makes sense. 3) Is the training curve you describe the only possible one for the critic loss? Because, with normalisation, I see the critic loss approaching 0 from a positive value. Could this mean that the generator's job became easier due to normalisation? Does it make sense to think about improving the critic then (like you described, with 3 times the params)? Also, I read about and tried scheduling, but I am using TTUR instead for its better convergence properties.

Blutorangensaft OP t1_jd7jaor wrote on March 22, 2023 at 12:20 PM

Reply to comment by YouAgainShmidhoobuh in [D]: Vanishing Gradients and Resnets by Blutorangensaft

Thank you for your comment. I have not worked with ResNets before, and the paper I used as a basis erroneously stated that they chose this architecture because of vanishing gradients. Wikipedia has the same error it seems.

Indeed, I am working with WGAN-GP. Unfortunately, implementing layer norm, while enabling me to scale the depth, completely changes the training dynamics. Training both G and C with the same learning rate and the same schedule (1:1), the critic seems to win, a behaviour I have never seen before in GANs. I suppose I will have to retune learning rates.

Blutorangensaft OP t1_jcywm46 wrote on March 20, 2023 at 4:51 PM

Reply to comment by IntelArtiGen in [D]: Vanishing Gradients and Resnets by Blutorangensaft

I don't. I have heard of using layer norm for RNNs, but I am unfamiliar with instance norm. Will look into it, thank you.

Blutorangensaft t1_j6uiygz wrote on February 1, 2023 at 11:50 PM

Reply to [P] NER output label post processing by hasiemasie

Disclaimer: no help, more a request

Once you're done with this project, would you mind sharing your speed and accuracy? I'm kind of on the lookout for a good English NER model. Problem is, spacy has some issues with casing.

Blutorangensaft t1_j6mdw93 wrote on January 31, 2023 at 10:31 AM

Reply to comment by andreichiffa in [Discussion] ChatGPT and language understanding benchmarks by mettle

Is the critic used for fine-tuning or as a part of the loss function during training?

Blutorangensaft OP t1_j654qyd wrote on January 27, 2023 at 7:54 PM

Reply to comment by jackilion in [D] Quantitative measure for smoothness of NLP autoencoder latent space by Blutorangensaft

Thank you for the reference, it looks very promising. I've heard of ways to smooth the latent space through Lipschitz regularisation, but then got disappointed again when I read "ah well it's just layer normalisation". So many things in ML come in a different appearance and actually mean the same thing once you implement them.

Blutorangensaft OP t1_j6356ho wrote on January 27, 2023 at 11:19 AM

Reply to comment by jackilion in [D] Quantitative measure for smoothness of NLP autoencoder latent space by Blutorangensaft

Compare different autoencoders in their ability to create valid language in a continuous space. Later, I want to generate sentences in its latent space by using another neural network, and have them decoded to real sentences by the autoencoder. I want the space to be smooth because the second neural net will naturally be using gradient descent, which involves infinitesimal changes. I believe this network will perform better if the changes that happen actually represent meaningful distances between real sentences.

Blutorangensaft OP t1_j6344jf wrote on January 27, 2023 at 11:06 AM

Reply to comment by crt09 in [D] Quantitative measure for smoothness of NLP autoencoder latent space by Blutorangensaft

Ahh, I get you now, my apologies. I'm more interested in the performance on the decoding side indeed, because I want to later generate sentences in that latent space with another neural net and have them decoded to normal tokens.

Blutorangensaft OP t1_j632b2s wrote on January 27, 2023 at 10:42 AM

Reply to comment by crt09 in [D] Quantitative measure for smoothness of NLP autoencoder latent space by Blutorangensaft

Using slightly different sentences to be decoded to the same sentence exists as an idea in the form of denoising autoencoders, yes. I plan to use this down the road, but for now I am interested in thinking about measuring performance.

Blutorangensaft OP t1_j5zu9ti wrote on January 26, 2023 at 6:42 PM

Reply to comment by jackilion in [D] Quantitative measure for smoothness of NLP autoencoder latent space by Blutorangensaft

Thank you for your answer. If a paper on diffusion models pops into your mind that uses this method, feel free to post it.

How would you derive a quantitative evaluation from t-SNE? I thought it's mostly used for visualisation. I'm looking to compute some kind of score from the interpolation.

Blutorangensaft t1_iza9zrt wrote on December 7, 2022 at 5:11 PM

Reply to comment by ThisIsMyStonerAcount in [D] If you had to pick 10-20 significant papers that summarize the research trajectory of AI from the past 100 years what would they be by versaceblues

I see. I meant the combination of a nonlinear activation function and another hidden layer. Was curious what people thought, thanks for your comment.

Blutorangensaft t1_iz9pl46 wrote on December 7, 2022 at 2:52 PM

Reply to comment by ThisIsMyStonerAcount in [D] If you had to pick 10-20 significant papers that summarize the research trajectory of AI from the past 100 years what would they be by versaceblues

I get that, but I find it a little hard to believe that nobody immediately pointed out a nonlinearity would solve the issue. Or is that just hindsight bias, thinking it was easier because of what we know today?

Blutorangensaft t1_iz92v1x wrote on December 7, 2022 at 11:18 AM

Reply to comment by Phoneaccount25732 in [D] If you had to pick 10-20 significant papers that summarize the research trajectory of AI from the past 100 years what would they be by versaceblues

What is people's take on Minsky? Because the XOR issue is easily relieved with a nonlinearity, and I never understood why not knowing how to learn connectedness is a major limtation of neural nets. In fact, even humans have trouble with that, and I fail to see how that generalises. He may be an important figure, but not in favour of progress in artificial intelligence.

Of course, if the question is just about portraying important junctions in AI research, regardless of whether they were helpful or not, then Minsky belongs there.

Blutorangensaft t1_ixgxhya wrote on November 23, 2022 at 10:40 AM

Reply to [D] Schmidhuber: LeCun's "5 best ideas 2012-22” are mostly from my lab, and older by RobbinDeBank

My favourite line from the blog: "ResNets (not intellectually deep, but useful) "

Blutorangensaft t1_irvo011 wrote on October 11, 2022 at 11:59 AM

Reply to comment by MohamedRashad in [D] Reversing Image-to-text models to get the prompt by MohamedRashad

Wikipedia: " Stable Diffusion is a deep learning, text-to-image model released by startup StabilityAI in 2022. It is primarily used to generate detailed images conditioned on text descriptions"

If we take the example prompt "a photograph of an astronaut riding a horse" (see Wiki), I don't see how that is much different from an image caption. I guess the only difference is it specifies the visual medium, so eg a photo, painting, or the like.

I don't think there is a difference between prompt and caption and you might be overthinking this. However, you could always make captions sound more like prompts (if the specified medium is the only difference) by looking for specific datasets with a certain wording or manually adapting the data yourself.

Blutorangensaft t1_irvmhb1 wrote on October 11, 2022 at 11:44 AM

Reply to comment by MohamedRashad in [D] Reversing Image-to-text models to get the prompt by MohamedRashad

What do you mean with "prompt"? How is it different from a caption?