like_a_tensor
like_a_tensor t1_j7r4mdx wrote
There's been some work on getting models to work at the byte level. An example: https://arxiv.org/abs/2105.13626
like_a_tensor t1_j6q5cs1 wrote
Reply to Loss function fluctuating by International_Deer27
Are you implementing the CNN from scratch? If so, the problem might be in your implementation.
Play with the batch size and batch norm. Try different optimizers. Your learning rate might also be too large; experiment with smaller learning rates or something like torch's ReduceLROnPlateau.
5500 sample is also pretty small, so maybe try a shallower network.
like_a_tensor t1_j6mcv1v wrote
Reply to Best practice for capping a softmax by neuralbeans
I'm not sure how to fix a minimum probability, but you could try softmax with a high temperature.
like_a_tensor t1_j6k1nde wrote
I feel like there's a lot of signal processing math in speech and voice that I have zero background in. Even though everything is deep learning now, speech and voice architectures seem more complex than in other fields.
like_a_tensor t1_j5st0c1 wrote
Reply to comment by agentfuzzy999 in Best cloud to train models with 100-200 GB of data? by Zealousideal-Copy463
Even bigger gains with fp8 support in CUDA 12.
like_a_tensor t1_j5o1j14 wrote
Reply to comment by SimonJDPrince in [P] New textbook: Understanding Deep Learning by SimonJDPrince
Sounds great, thanks!
like_a_tensor t1_j5n7akg wrote
Very nice work! Do you plan to release any solutions to the problems?
like_a_tensor t1_j7rbdno wrote
Reply to comment by MrOfficialCandy in [D] Is English the optimal language to train NLP models on? by MrOfficialCandy
Sounds like you want something like a logical representation of sentences. Reducing sentences to first order logic might be what you're looking for. There's also AMRs (Abstract Meaning Representations). The problem with AMRs is that they need to be built, which is non-trivial for machines and time-consuming for humans.