yannbouteiller
yannbouteiller t1_ja8cd7n wrote
Reply to Why does my validation loss suddenly fall dramatically while my training loss does not? by Apprehensive_Air8919
That is pretty strange indeed. Perhaps this would be a magical effect of dropout ?
yannbouteiller t1_j70o6y3 wrote
FPGAs are theoretically better than GPUs to deploy Deep Learning models simply because they are theoretically better than anything at doing anything. In practice, though, you never have enough circuitry on an FPGA to efficiently deploy a large model, and they are not targetted by the main Deep Learning libraries so you have to do the whole thing by hand including quantizing your model, extracting its weights, coding each layer in embedded C/VHDL/etc, and doing most of the hardware optimization by hand. It is tedious enough for preferring plug-and-play solutions like GPUs/TPUs in most cases, including embedded systems.
yannbouteiller t1_iz0v6y2 wrote
Reply to [D] Are ML platforms honestly useful or just money-making on software that's really free? by [deleted]
I suppose these are useful for companies who don't have the skill or will to develop their own pipelines from scratch. In research we barely ever use them, though.
yannbouteiller t1_iydt54w wrote
Reply to comment by entropyvsenergy in [D] Does Transformer need huge pretraining process? by minhrongcon2000
Considering fully connected networks as "less flexible" than transformers sounds misleading. Although very generic, as far as I can see, transformers have much more inductive bias than, e.g., an MLP that would take the whole sequence of word embeddings as input.
yannbouteiller t1_ix4aesl wrote
Reply to [R] Tips on training Transformers by parabellum630
We are also currently struggling to train a Transformer for 1D sequential data in the hope that this may eventually outperform our state-of-the-art model based on a mix of CNN, GRU and time-dilation. First, you need to be careful about what you use as positional encoding because in low-dimensional embeddings it can easily destroy your data. Then, according to the papers, dataset size will likely be a huge factor, in the sense that you will need a huge dataset, because Transformers might lack inductive bias compared to, e.g., GRUs and you need an enormous amount of data to compensate for that.
yannbouteiller t1_iqozhsf wrote
Alienware. Take minimum RAM/SSD et replace those by yourself (you can do this on the 17'' version, double-check that this is also true for the 15'' version. I think I remember some crap with the x15 like the RAM being soldered). You get the Dell on-site guarantee, and the machine is much cooler probably in both senses.
The real issue with both machines is that your GPU is soldered to the motherboard and thus will likely kill your laptop eventually.
Also, good to know before you buy, I got myself an AW x17 R2 for prototyping and gaming, and I realized that the built-in speakers make the chassis vibrate and create a terrible crackling noise if you use them at mid to high volume. This defect seems to be present on the whole series. Also, the webcam is crap, and the battery doesn't last long. Not sure if the Lambda laptop is any better in these regards, though.
A better bet might be the MSI Raider GE76 (if they have a 15 inches equivalent), but it looks a bit more flashy / less professional, you don't get on-site repairs, and the power supply is less transportable I think.
yannbouteiller t1_jb17aaw wrote
Reply to To RL or Not to RL? [D] by vidul7498
People will say anything in hope of drawing attention. Reframing an unexplored MDP into a supervised learning problem makes no sense.