elbiot
elbiot t1_je8ngu2 wrote
Reply to comment by LetGoAndBeReal in [D] The best way to train an LLM on company data by jaxolingo
Huh? Have you never included text in a prompt and asked it to answer questions about the text? Seems like that counts as "new knowledge" by your definition
elbiot t1_je8iym9 wrote
Looks like this was trained on just 150 x-rays and does very well: https://paperswithcode.com/paper/xnet-a-convolutional-neural-network-cnn
Edit: did you look for pre-existing solutions? This was like the second google result. If I were you I'd be looking for public datasets I could use for pretraining and then finetune on my data
elbiot t1_je8i0i2 wrote
Reply to comment by LetGoAndBeReal in [D] The best way to train an LLM on company data by jaxolingo
The second link says fine tuning is a substitute for lengthy prompts, including putting more into it than can fit in the longest prompt. Prompts are a way to give the model new information. What is your definition of knowledge that isn't something you can put into a prompt?
elbiot t1_jdpgqoz wrote
Reply to comment by OraOraP in Using Stable Diffusion's training method for Reverse engineering? by OraOraP
I'm just talking about diffusion models in general and the concept of denoising. LLMs are what you would use, not the way you'd train a diffusion model but the way you'd train an LLM
elbiot t1_jdo7ndu wrote
Compilation isn't a process of noising and diffusion doesn't have any relevance here. An LLM is what you would use
elbiot t1_jdlgxnz wrote
Reply to comment by light24bulbs in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
In my understanding, if you have text, it's not a challenge to train on next word prediction. Just keep the learning rate low. The reason there's a focus on the instruction based fine tuning is because that data is harder to come by.
My only experience is I've done this with a sentence embedding model (using sbert) and I just trained on my new text and the original training data 50/50 and it both got better at embedding my text and didn't forget how to do what it was originally trained on
elbiot t1_j26egmm wrote
Reply to Laptop for Machine Learning by sifarsafar
Keep the laptop, buy a used gaming desktop with a 3060 12GB VRAM and ssh into it from your laptop.
elbiot t1_j1tjpg7 wrote
The source I found this post through also referenced Retrieval Augmented Generation (https://ai.facebook.com/blog/retrieval-augmented-generation-streamlining-the-creation-of-intelligent-natural-language-processing-models/) and it seems like they've integrated document selection into the back propagation of the model training. You couldn't do this with chat GPT but maybe smaller pretrained LLM that could be fine tuned on consumer hardware would be enough for just that part
elbiot t1_j0nnc82 wrote
Reply to comment by lazazael in laptop for Data Science and Scientific Computing: proart vs legion 7i vs thinkpad p16/p1-gen5 by macORnvidia
Yeah I brought a used gaming desktop from Facebook and kept my 6 year old laptop. Crazy specs for the price and came with a 3060 with 12gb vram. I recommend a gpu with more vram vs one that's "faster" because it won't be fast if it can't load the model at all
elbiot t1_j0m3a67 wrote
Reply to comment by Logon1028 in Efficient Max Pooling Implementation by Logon1028
Huh?
idx = unravel_indices(indices, shape) Values=arr[*idx]
No loop required. If you're referring to the same loop you were using to get the argmax, you can just adjust your indices first so they apply to the unstrided array
elbiot t1_j0k91nm wrote
Reply to comment by Logon1028 in Efficient Max Pooling Implementation by Logon1028
I think unravel is a tuple so you can just star unpack it to use it as indices without having to do anything else with it
elbiot t1_j0k3evv wrote
Reply to comment by Logon1028 in Efficient Max Pooling Implementation by Logon1028
I'm away from a computer for a while but you could cast the tuple to an array I assume. And since creating an array is expensive and you'll keep needing an array of the same shape every step, you could just hold onto it and assign values into it instead of re-creating it every time
elbiot t1_j0ivmop wrote
Reply to comment by Logon1028 in Efficient Max Pooling Implementation by Logon1028
Yeah, I was just thinking in 1D. Im not at a computer so I can't try anything but roughly what I'm thinking is you have a (H, W, D) array and use stride tricks to get a (H, W, D, wx, wy). If you could get that to be (H, W, D, wx*wy) then argmax could give you a (H, W, D) array of indices. I dunno if you can reshape a strided array or use strides to get the shape in question
elbiot t1_j0hbr8o wrote
Reply to comment by breezedeus in [D] Is "natural" text always maximally likely according to language models ? by Emergency_Apricot_77
And we'd all be finishing each other's... sandwiches?
elbiot t1_j0fitwk wrote
Reply to Efficient Max Pooling Implementation by Logon1028
Can't you just reshape the array and use argmax (so no as_strided). Reshaping is often free. You'd have to do some arithmetic to get the indices for the original shape, but it would just be one operation
I.e. you can take a shape (99,) array and reshape it to (3,33) and then get 33 maxes.
elbiot t1_izqirv3 wrote
Reply to comment by maxToTheJ in [P] I made a command-line tool that explains your errors using ChatGPT (link in comments) by jsonathan
It was a joke
elbiot t1_izohltd wrote
Reply to comment by sabouleux in [P] I made a command-line tool that explains your errors using ChatGPT (link in comments) by jsonathan
I assume the program is installed into the virtual environment and so is operating within it. That would be done with the console_scripts entry point
elbiot t1_izoh372 wrote
Reply to comment by RaptorDotCpp in [P] I made a command-line tool that explains your errors using ChatGPT (link in comments) by jsonathan
It was trained on python 2
elbiot t1_ivexlrx wrote
Reply to comment by Emotional-Fox-4285 in In my deep NN with 3 layer, . In the second iteration of GD, The activation of Layer 1 and Layer 2 output all 0 due to ReLU as all the input are smaller than 0. And L3 output some value with high floating point which is opposite to first forward_ propagation . Is this how it should work ? by Emotional-Fox-4285
No I don't have time for that. Good luck
elbiot t1_ivdnwpb wrote
Reply to [D] Has anyone tried coding latent diffusion from scratch? or tried other conditioning information aside from image classes and text? by yamakeeen
Latent diffusion works with text because Clip was trained on millions of pairs of text and image already. You've got a huge project of training millions of brain activity/text pairs ahead of you
elbiot t1_ivdko3y wrote
Reply to In my deep NN with 3 layer, . In the second iteration of GD, The activation of Layer 1 and Layer 2 output all 0 due to ReLU as all the input are smaller than 0. And L3 output some value with high floating point which is opposite to first forward_ propagation . Is this how it should work ? by Emotional-Fox-4285
Maybe your learning rate is way too high. Is this tensor flow or something? Or are you writing this from scratch?
elbiot t1_irwyleo wrote
Reply to comment by _Arsenie_Boca_ in [D] Looking for some critiques on recent development of machine learning by fromnighttilldawn
The fact that you can throw a bunch of compute at transformers is part of their superiority. Even if it's the only factor, its really important
elbiot t1_iql92sm wrote
Neither. Just plan to ssh into an aws instance if you're going to have a laptop. I bought a used gaming desktop with a 3060 (12GB vram) and otherwise great specs for like 900 bucks and I ssh into it from my 6 year old laptop.
elbiot t1_je9s53t wrote
Reply to comment by LetGoAndBeReal in [D] The best way to train an LLM on company data by jaxolingo
Your claim that prompting can achieve what fine tuning can't contradicts the documentation for openai that you posted that said fine tuning can do whatever prompting can without the length limit