Blutorangensaft
Blutorangensaft OP t1_jdnxm94 wrote
Reply to comment by KarlKani44 in [D]: Different data normalisation schemes for GANs by Blutorangensaft
I see, I will improve my critic then (maybe give it more depth) and abstain from tricks like TTUR for now.
What do you mean with "easily seperable distribution of output logits" btw? Plotting the scores the critic assigns for real and fake samples separately? Or do you mean taking mean and standard deviation of the logits for real and fake data and comparing those?
Blutorangensaft OP t1_jdnjfmu wrote
Reply to comment by KarlKani44 in [D]: Different data normalisation schemes for GANs by Blutorangensaft
Thank you for the thorough answer. 1) I see, I will just trust my normalisation scheme then. 2) That makes sense. 3) Is the training curve you describe the only possible one for the critic loss? Because, with normalisation, I see the critic loss approaching 0 from a positive value. Could this mean that the generator's job became easier due to normalisation? Does it make sense to think about improving the critic then (like you described, with 3 times the params)? Also, I read about and tried scheduling, but I am using TTUR instead for its better convergence properties.
Submitted by Blutorangensaft t3_121jylg in MachineLearning
Blutorangensaft OP t1_jd7jaor wrote
Reply to comment by YouAgainShmidhoobuh in [D]: Vanishing Gradients and Resnets by Blutorangensaft
Thank you for your comment. I have not worked with ResNets before, and the paper I used as a basis erroneously stated that they chose this architecture because of vanishing gradients. Wikipedia has the same error it seems.
Indeed, I am working with WGAN-GP. Unfortunately, implementing layer norm, while enabling me to scale the depth, completely changes the training dynamics. Training both G and C with the same learning rate and the same schedule (1:1), the critic seems to win, a behaviour I have never seen before in GANs. I suppose I will have to retune learning rates.
Blutorangensaft OP t1_jcywm46 wrote
Reply to comment by IntelArtiGen in [D]: Vanishing Gradients and Resnets by Blutorangensaft
I don't. I have heard of using layer norm for RNNs, but I am unfamiliar with instance norm. Will look into it, thank you.
Submitted by Blutorangensaft t3_11wmpoj in MachineLearning
Submitted by Blutorangensaft t3_11qejcz in MachineLearning
Blutorangensaft t1_j6uiygz wrote
Reply to [P] NER output label post processing by hasiemasie
Disclaimer: no help, more a request
Once you're done with this project, would you mind sharing your speed and accuracy? I'm kind of on the lookout for a good English NER model. Problem is, spacy has some issues with casing.
Blutorangensaft t1_j6mdw93 wrote
Reply to comment by andreichiffa in [Discussion] ChatGPT and language understanding benchmarks by mettle
Is the critic used for fine-tuning or as a part of the loss function during training?
Blutorangensaft OP t1_j654qyd wrote
Reply to comment by jackilion in [D] Quantitative measure for smoothness of NLP autoencoder latent space by Blutorangensaft
Thank you for the reference, it looks very promising. I've heard of ways to smooth the latent space through Lipschitz regularisation, but then got disappointed again when I read "ah well it's just layer normalisation". So many things in ML come in a different appearance and actually mean the same thing once you implement them.
Blutorangensaft OP t1_j6356ho wrote
Reply to comment by jackilion in [D] Quantitative measure for smoothness of NLP autoencoder latent space by Blutorangensaft
Compare different autoencoders in their ability to create valid language in a continuous space. Later, I want to generate sentences in its latent space by using another neural network, and have them decoded to real sentences by the autoencoder. I want the space to be smooth because the second neural net will naturally be using gradient descent, which involves infinitesimal changes. I believe this network will perform better if the changes that happen actually represent meaningful distances between real sentences.
Blutorangensaft OP t1_j6344jf wrote
Reply to comment by crt09 in [D] Quantitative measure for smoothness of NLP autoencoder latent space by Blutorangensaft
Ahh, I get you now, my apologies. I'm more interested in the performance on the decoding side indeed, because I want to later generate sentences in that latent space with another neural net and have them decoded to normal tokens.
Blutorangensaft OP t1_j632b2s wrote
Reply to comment by crt09 in [D] Quantitative measure for smoothness of NLP autoencoder latent space by Blutorangensaft
Using slightly different sentences to be decoded to the same sentence exists as an idea in the form of denoising autoencoders, yes. I plan to use this down the road, but for now I am interested in thinking about measuring performance.
Blutorangensaft OP t1_j5zu9ti wrote
Reply to comment by jackilion in [D] Quantitative measure for smoothness of NLP autoencoder latent space by Blutorangensaft
Thank you for your answer. If a paper on diffusion models pops into your mind that uses this method, feel free to post it.
How would you derive a quantitative evaluation from t-SNE? I thought it's mostly used for visualisation. I'm looking to compute some kind of score from the interpolation.
Submitted by Blutorangensaft t3_10ltyki in MachineLearning
Blutorangensaft t1_iza9zrt wrote
Reply to comment by ThisIsMyStonerAcount in [D] If you had to pick 10-20 significant papers that summarize the research trajectory of AI from the past 100 years what would they be by versaceblues
I see. I meant the combination of a nonlinear activation function and another hidden layer. Was curious what people thought, thanks for your comment.
Blutorangensaft t1_iz9pl46 wrote
Reply to comment by ThisIsMyStonerAcount in [D] If you had to pick 10-20 significant papers that summarize the research trajectory of AI from the past 100 years what would they be by versaceblues
I get that, but I find it a little hard to believe that nobody immediately pointed out a nonlinearity would solve the issue. Or is that just hindsight bias, thinking it was easier because of what we know today?
Blutorangensaft t1_iz92v1x wrote
Reply to comment by Phoneaccount25732 in [D] If you had to pick 10-20 significant papers that summarize the research trajectory of AI from the past 100 years what would they be by versaceblues
What is people's take on Minsky? Because the XOR issue is easily relieved with a nonlinearity, and I never understood why not knowing how to learn connectedness is a major limtation of neural nets. In fact, even humans have trouble with that, and I fail to see how that generalises. He may be an important figure, but not in favour of progress in artificial intelligence.
Of course, if the question is just about portraying important junctions in AI research, regardless of whether they were helpful or not, then Minsky belongs there.
Blutorangensaft t1_ixgxhya wrote
Reply to [D] Schmidhuber: LeCun's "5 best ideas 2012-22” are mostly from my lab, and older by RobbinDeBank
My favourite line from the blog: "ResNets (not intellectually deep, but useful) "
Blutorangensaft t1_irvo011 wrote
Reply to comment by MohamedRashad in [D] Reversing Image-to-text models to get the prompt by MohamedRashad
Wikipedia: " Stable Diffusion is a deep learning, text-to-image model released by startup StabilityAI in 2022. It is primarily used to generate detailed images conditioned on text descriptions"
If we take the example prompt "a photograph of an astronaut riding a horse" (see Wiki), I don't see how that is much different from an image caption. I guess the only difference is it specifies the visual medium, so eg a photo, painting, or the like.
I don't think there is a difference between prompt and caption and you might be overthinking this. However, you could always make captions sound more like prompts (if the specified medium is the only difference) by looking for specific datasets with a certain wording or manually adapting the data yourself.
Blutorangensaft t1_irvmhb1 wrote
Reply to comment by MohamedRashad in [D] Reversing Image-to-text models to get the prompt by MohamedRashad
What do you mean with "prompt"? How is it different from a caption?
Blutorangensaft OP t1_jdnzzx1 wrote
Reply to comment by KarlKani44 in [D]: Different data normalisation schemes for GANs by Blutorangensaft
Love the visualisation, I will definitely do that. Thanks so much for answering all my questions.