Blutorangensaft

Blutorangensaft OP t1_jdnxm94 wrote

I see, I will improve my critic then (maybe give it more depth) and abstain from tricks like TTUR for now.

What do you mean with "easily seperable distribution of output logits" btw? Plotting the scores the critic assigns for real and fake samples separately? Or do you mean taking mean and standard deviation of the logits for real and fake data and comparing those?

1

Blutorangensaft OP t1_jdnjfmu wrote

Thank you for the thorough answer. 1) I see, I will just trust my normalisation scheme then. 2) That makes sense. 3) Is the training curve you describe the only possible one for the critic loss? Because, with normalisation, I see the critic loss approaching 0 from a positive value. Could this mean that the generator's job became easier due to normalisation? Does it make sense to think about improving the critic then (like you described, with 3 times the params)? Also, I read about and tried scheduling, but I am using TTUR instead for its better convergence properties.

1

Blutorangensaft OP t1_jd7jaor wrote

Thank you for your comment. I have not worked with ResNets before, and the paper I used as a basis erroneously stated that they chose this architecture because of vanishing gradients. Wikipedia has the same error it seems.

Indeed, I am working with WGAN-GP. Unfortunately, implementing layer norm, while enabling me to scale the depth, completely changes the training dynamics. Training both G and C with the same learning rate and the same schedule (1:1), the critic seems to win, a behaviour I have never seen before in GANs. I suppose I will have to retune learning rates.

1

Blutorangensaft t1_j6uiygz wrote

Disclaimer: no help, more a request

Once you're done with this project, would you mind sharing your speed and accuracy? I'm kind of on the lookout for a good English NER model. Problem is, spacy has some issues with casing.

1

Blutorangensaft OP t1_j654qyd wrote

Thank you for the reference, it looks very promising. I've heard of ways to smooth the latent space through Lipschitz regularisation, but then got disappointed again when I read "ah well it's just layer normalisation". So many things in ML come in a different appearance and actually mean the same thing once you implement them.

1

Blutorangensaft OP t1_j6356ho wrote

Compare different autoencoders in their ability to create valid language in a continuous space. Later, I want to generate sentences in its latent space by using another neural network, and have them decoded to real sentences by the autoencoder. I want the space to be smooth because the second neural net will naturally be using gradient descent, which involves infinitesimal changes. I believe this network will perform better if the changes that happen actually represent meaningful distances between real sentences.

1

Blutorangensaft OP t1_j5zu9ti wrote

Thank you for your answer. If a paper on diffusion models pops into your mind that uses this method, feel free to post it.

How would you derive a quantitative evaluation from t-SNE? I thought it's mostly used for visualisation. I'm looking to compute some kind of score from the interpolation.

1

Blutorangensaft t1_iz92v1x wrote

What is people's take on Minsky? Because the XOR issue is easily relieved with a nonlinearity, and I never understood why not knowing how to learn connectedness is a major limtation of neural nets. In fact, even humans have trouble with that, and I fail to see how that generalises. He may be an important figure, but not in favour of progress in artificial intelligence.

Of course, if the question is just about portraying important junctions in AI research, regardless of whether they were helpful or not, then Minsky belongs there.

5

Blutorangensaft t1_irvo011 wrote

Wikipedia: " Stable Diffusion is a deep learning, text-to-image model released by startup StabilityAI in 2022. It is primarily used to generate detailed images conditioned on text descriptions"

If we take the example prompt "a photograph of an astronaut riding a horse" (see Wiki), I don't see how that is much different from an image caption. I guess the only difference is it specifies the visual medium, so eg a photo, painting, or the like.

I don't think there is a difference between prompt and caption and you might be overthinking this. However, you could always make captions sound more like prompts (if the specified medium is the only difference) by looking for specific datasets with a certain wording or manually adapting the data yourself.

5