elbiot t1_je9s53t wrote on March 30, 2023 at 1:31 PM

Reply to comment by LetGoAndBeReal in [D] The best way to train an LLM on company data by jaxolingo

Your claim that prompting can achieve what fine tuning can't contradicts the documentation for openai that you posted that said fine tuning can do whatever prompting can without the length limit

elbiot t1_je8ngu2 wrote on March 30, 2023 at 5:28 AM

Reply to comment by LetGoAndBeReal in [D] The best way to train an LLM on company data by jaxolingo

Huh? Have you never included text in a prompt and asked it to answer questions about the text? Seems like that counts as "new knowledge" by your definition

elbiot t1_je8iym9 wrote on March 30, 2023 at 4:38 AM

Reply to [D] Improvements/alternatives to U-net for medical images segmentation? by viertys

Looks like this was trained on just 150 x-rays and does very well: https://paperswithcode.com/paper/xnet-a-convolutional-neural-network-cnn

Edit: did you look for pre-existing solutions? This was like the second google result. If I were you I'd be looking for public datasets I could use for pretraining and then finetune on my data

elbiot t1_je8i0i2 wrote on March 30, 2023 at 4:29 AM

Reply to comment by LetGoAndBeReal in [D] The best way to train an LLM on company data by jaxolingo

The second link says fine tuning is a substitute for lengthy prompts, including putting more into it than can fit in the longest prompt. Prompts are a way to give the model new information. What is your definition of knowledge that isn't something you can put into a prompt?

elbiot t1_jdpgqoz wrote on March 26, 2023 at 3:53 AM

Reply to comment by OraOraP in Using Stable Diffusion's training method for Reverse engineering? by OraOraP

I'm just talking about diffusion models in general and the concept of denoising. LLMs are what you would use, not the way you'd train a diffusion model but the way you'd train an LLM

elbiot t1_jdo7ndu wrote on March 25, 2023 at 9:56 PM

Reply to Using Stable Diffusion's training method for Reverse engineering? by OraOraP

Compilation isn't a process of noising and diffusion doesn't have any relevance here. An LLM is what you would use

elbiot t1_jdlgxnz wrote on March 25, 2023 at 7:19 AM

Reply to comment by light24bulbs in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry

In my understanding, if you have text, it's not a challenge to train on next word prediction. Just keep the learning rate low. The reason there's a focus on the instruction based fine tuning is because that data is harder to come by.

My only experience is I've done this with a sentence embedding model (using sbert) and I just trained on my new text and the original training data 50/50 and it both got better at embedding my text and didn't forget how to do what it was originally trained on

elbiot t1_j26egmm wrote on December 30, 2022 at 12:02 AM

Reply to Laptop for Machine Learning by sifarsafar

Keep the laptop, buy a used gaming desktop with a 3060 12GB VRAM and ssh into it from your laptop.

elbiot t1_j1tjpg7 wrote on December 27, 2022 at 7:28 AM

Reply to [P] Medical question-answering without hallucinating by tmblweeds

The source I found this post through also referenced Retrieval Augmented Generation (https://ai.facebook.com/blog/retrieval-augmented-generation-streamlining-the-creation-of-intelligent-natural-language-processing-models/) and it seems like they've integrated document selection into the back propagation of the model training. You couldn't do this with chat GPT but maybe smaller pretrained LLM that could be fine tuned on consumer hardware would be enough for just that part

elbiot t1_j0nnc82 wrote on December 18, 2022 at 1:15 AM

Reply to comment by lazazael in laptop for Data Science and Scientific Computing: proart vs legion 7i vs thinkpad p16/p1-gen5 by macORnvidia

Yeah I brought a used gaming desktop from Facebook and kept my 6 year old laptop. Crazy specs for the price and came with a 3060 with 12gb vram. I recommend a gpu with more vram vs one that's "faster" because it won't be fast if it can't load the model at all

elbiot t1_j0m3a67 wrote on December 17, 2022 at 6:26 PM

Reply to comment by Logon1028 in Efficient Max Pooling Implementation by Logon1028

Huh?

idx = unravel_indices(indices, shape) Values=arr[*idx]

No loop required. If you're referring to the same loop you were using to get the argmax, you can just adjust your indices first so they apply to the unstrided array

elbiot t1_j0k91nm wrote on December 17, 2022 at 7:26 AM

Reply to comment by Logon1028 in Efficient Max Pooling Implementation by Logon1028

I think unravel is a tuple so you can just star unpack it to use it as indices without having to do anything else with it

elbiot t1_j0k3evv wrote on December 17, 2022 at 6:16 AM

Reply to comment by Logon1028 in Efficient Max Pooling Implementation by Logon1028

I'm away from a computer for a while but you could cast the tuple to an array I assume. And since creating an array is expensive and you'll keep needing an array of the same shape every step, you could just hold onto it and assign values into it instead of re-creating it every time

elbiot t1_j0ivmop wrote on December 16, 2022 at 11:44 PM

Reply to comment by Logon1028 in Efficient Max Pooling Implementation by Logon1028

Yeah, I was just thinking in 1D. Im not at a computer so I can't try anything but roughly what I'm thinking is you have a (H, W, D) array and use stride tricks to get a (H, W, D, wx, wy). If you could get that to be (H, W, D, wx*wy) then argmax could give you a (H, W, D) array of indices. I dunno if you can reshape a strided array or use strides to get the shape in question

elbiot t1_j0hbr8o wrote on December 16, 2022 at 5:18 PM

Reply to comment by breezedeus in [D] Is "natural" text always maximally likely according to language models ? by Emergency_Apricot_77

And we'd all be finishing each other's... sandwiches?

elbiot t1_j0fitwk wrote on December 16, 2022 at 6:56 AM

Reply to Efficient Max Pooling Implementation by Logon1028

Can't you just reshape the array and use argmax (so no as_strided). Reshaping is often free. You'd have to do some arithmetic to get the indices for the original shape, but it would just be one operation

I.e. you can take a shape (99,) array and reshape it to (3,33) and then get 33 maxes.

elbiot t1_izqirv3 wrote on December 11, 2022 at 2:28 AM

Reply to comment by maxToTheJ in [P] I made a command-line tool that explains your errors using ChatGPT (link in comments) by jsonathan

It was a joke

elbiot t1_izohltd wrote on December 10, 2022 at 5:27 PM

Reply to comment by sabouleux in [P] I made a command-line tool that explains your errors using ChatGPT (link in comments) by jsonathan

I assume the program is installed into the virtual environment and so is operating within it. That would be done with the console_scripts entry point

elbiot t1_izoh372 wrote on December 10, 2022 at 5:23 PM

Reply to comment by RaptorDotCpp in [P] I made a command-line tool that explains your errors using ChatGPT (link in comments) by jsonathan

It was trained on python 2

elbiot t1_ivexlrx wrote on November 7, 2022 at 2:02 PM

Reply to comment by Emotional-Fox-4285 in In my deep NN with 3 layer, . In the second iteration of GD, The activation of Layer 1 and Layer 2 output all 0 due to ReLU as all the input are smaller than 0. And L3 output some value with high floating point which is opposite to first forward_ propagation . Is this how it should work ? by Emotional-Fox-4285

No I don't have time for that. Good luck

elbiot t1_ivdnwpb wrote on November 7, 2022 at 4:51 AM

Reply to [D] Has anyone tried coding latent diffusion from scratch? or tried other conditioning information aside from image classes and text? by yamakeeen

Latent diffusion works with text because Clip was trained on millions of pairs of text and image already. You've got a huge project of training millions of brain activity/text pairs ahead of you

elbiot t1_ivdko3y wrote on November 7, 2022 at 4:20 AM

Reply to In my deep NN with 3 layer, . In the second iteration of GD, The activation of Layer 1 and Layer 2 output all 0 due to ReLU as all the input are smaller than 0. And L3 output some value with high floating point which is opposite to first forward_ propagation . Is this how it should work ? by Emotional-Fox-4285

Maybe your learning rate is way too high. Is this tensor flow or something? Or are you writing this from scratch?

elbiot t1_irwyleo wrote on October 11, 2022 at 5:37 PM

Reply to comment by _Arsenie_Boca_ in [D] Looking for some critiques on recent development of machine learning by fromnighttilldawn

The fact that you can throw a bunch of compute at transformers is part of their superiority. Even if it's the only factor, its really important

elbiot t1_iql92sm wrote on October 1, 2022 at 4:45 AM

Reply to Deciding between rtx 2060 laptop and m1 macbook air for deep learning by gokul113

Neither. Just plan to ssh into an aws instance if you're going to have a laptop. I bought a used gaming desktop with a 3060 (12GB vram) and otherwise great specs for like 900 bucks and I ssh into it from my 6 year old laptop.