throwaway2676 t1_jdl0y80 wrote on March 25, 2023 at 4:05 AM

Reply to comment by mxby7e in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry

Alpaca was only trained on 50k instructions, right? A large group of grad students or a forum like reddit could construct that many manually in a couple weeks. I'm surprised they even had to resort to using ClosedAI

throwaway2676 t1_jd8qe6f wrote on March 22, 2023 at 5:15 PM

Reply to [D] Simple Questions Thread by AutoModerator

When training LLMs to write code, is it standard to just make indentation and new line their own tokens? Like '<\n>' and <\ind>' or something?

Follow up: Are there any good models on HuggingFace that specialize in writing and explaining code?

throwaway2676 t1_ja4w3nj wrote on February 26, 2023 at 9:31 PM

Reply to comment by CaptainCrypto1969 in Start of the soft landing? by rickert1337

Have you been to the supermarket lately?

throwaway2676 t1_ja1bftr wrote on February 26, 2023 at 2:32 AM

Reply to [D] Simple Questions Thread by AutoModerator

How much theoretical speedup do you think DL could get if we coded everything directly in C++ instead of Python?

throwaway2676 t1_j9kilst wrote on February 22, 2023 at 5:27 PM

Reply to [D] Simple Questions Thread by AutoModerator

Are there any developments in the ASIC/analog computing space that people are really excited about? I think most people know about google's TPUs by now, but is there anything else with the potential to threaten the dominance of GPUs in the next few years?

throwaway2676 t1_j8digqj wrote on February 13, 2023 at 2:43 PM

Reply to [D] Quality of posts in this sub going down by MurlocXYZ

Here are the top 10 posts on my front page right now:

>[R] [N] Toolformer: Language Models Can Teach Themselves to Use Tools - paper by Meta AI Research

>[D] Quality of posts in this sub going down

>[D] Is a non-SOTA paper still good to publish if it has an interesting method that does have strong improvements over baselines (read text for more context)? Are there good examples of this kind of work being published?

>[R] [N] pix2pix-zero - Zero-shot Image-to-Image Translation

>[P] Extracting Causal Chains from Text Using Language Models

>[R] [P] Adding Conditional Control to Text-to-Image Diffusion Models. "This paper presents ControlNet, an end-to-end neural network architecture that controls large image diffusion models (like Stable Diffusion) to learn task-specific input conditions." Example uses the Scribble ControlNet model.

>[R] [P] OpenAssistant is a fully open-source chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

>[D] What ML dev tools do you wish you'd discovered earlier?

>[R] CIFAR10 in <8 seconds on an A100 (new architecture!)

>[D] Engineering interviews at Anthropic AI?

From this list the only non-academic/"low quality" posts are the last one and this one. This is consistent with my normal experience, so I'm not really sure what you are talking about.

throwaway2676 t1_j74iilz wrote on February 4, 2023 at 12:30 AM

Reply to [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501

Imo, chain-of-thought and program-of-thought reasoning will be the next major generation of progress for LLMs. Probably another year or two and we will be able to eliminate those goofy instances where the models confidently produce nonsense (well, mostly anyway).

throwaway2676 t1_j6xerk7 wrote on February 2, 2023 at 4:01 PM

Reply to comment by bojohnsonyadig in [N] OpenAI starts selling subscriptions to its ChatGPT bot by bikeskata

I think a lot of people would pay for the initial model they first released. Since then they've been censoring the shit out of it to avoid controversy, and a fair amount of the hype died down among the average joes.

At this point I think their main target demo will be white collar workers who use it to make work easier. However, the hype will pick back up once they connect it to the internet.

throwaway2676 t1_j6syciq wrote on February 1, 2023 at 5:53 PM

Reply to [R] Faithful Chain-of-Thought Reasoning by starstruckmon

Woah, hey, this is basically what I proposed last month

throwaway2676 t1_j6d99fw wrote on January 29, 2023 at 3:06 PM

Reply to comment by currentscurrents in [R] Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers by currentscurrents

So shouldn't this mean we can train transformers using forward passes alone? It seems that it wouldn't be too difficult to derive an algorithm that updates the attention weights based on these results, but I don't believe the authors mention the possibility.

throwaway2676 t1_j68vbfq wrote on January 28, 2023 at 4:08 PM

Reply to comment by currentscurrents in [R] Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers by currentscurrents

> Not in 40 years when computers are 1000x better.

It won't take anywhere near that long. We've barely scratched the surface of ASICs and analog matrix multiplication, which is where the real fun is going to begin.

throwaway2676 t1_j4q8zuh wrote on January 17, 2023 at 2:44 PM

Reply to comment by LetGoAndBeReal in [D] Fine-tuning open source models on specific tasks to compete with ChatGPT? by jaqws

Well, can you just run it from an SSD, but more slowly?

throwaway2676 t1_j4eqxk2 wrote on January 15, 2023 at 4:39 AM

Reply to comment by MegavirusOfDoom in [D] Simple Questions Thread by AutoModerator

GPT-4 should be coming out, right?

throwaway2676 t1_j47m2r9 wrote on January 13, 2023 at 6:51 PM

Reply to comment by navillusr in [D] What's your opinion on "neurocompositional computing"? (Microsoft paper from April 2022) by currentscurrents

> If we could magically combine the reasoning ability of symbolic systems with the pattern recognition and generalization of neural networks, we would be getting very close to AGI imo.

I must be misunderstanding your meaning, because I don't see why this is particularly difficult. Train an AI to recognize deductive/mathematical reasoning and translate it into symbolic or mathematical logic. Run an automated proof assistant or computer algebra system on the result. Use the AI to translate back into natural language. Shouldn't be much more difficult than creating code, which ChatGPT can already do, and it would instantly eliminate 95% of the goofy problems LLMs get wrong.

throwaway2676 t1_j3ygpsd wrote on January 11, 2023 at 10:47 PM

Reply to comment by Desperate-Step4469 in [News] "Once $92 billion in profit plus $13 billion in initial investment are repaid (to Microsoft) and once the other venture investors earn $150 billion, all of the equity reverts back to OpenAI." by Gmroo

Haha, or hyperinflation takes off and they pay it back with spare change in like 7 years

throwaway2676 t1_j3sxeda wrote on January 10, 2023 at 9:18 PM

Reply to comment by ASuarezMascareno in [D] Found very similar paper to my submitted paper on Arxiv by [deleted]

> I work in Astronomy, not in ML, but review first and arxiv later is how most people work in Europe. I typically don't find european arxiv papers that are not accepted for publication already.

I did work in an Astronomy adjacent field, and European researchers in our area all submitted to arxiv first, just like US groups.

throwaway2676 t1_j3h780s wrote on January 8, 2023 at 3:38 PM

Reply to comment by trnka in [D] Simple Questions Thread by AutoModerator

> In a fully-connected layer, the input to the matrix multiply is the output of everything in the previous layer, not just the output of a single unit.

But if the previous layer is 0 everywhere except for one unit, the result is the same, no?

My mental picture is that input layer 0 has V = <token vocabulary size> neurons, and layer 1 has E_d = <embedding dimension> neurons. Layer 0 is 1 in 1 neuron, 0 everywhere else, as one-hot encoding normally goes. The embedding layer 1 is then given by x@W, where x is the layer 0 as a row vector, and W is the weight matrix with dimensions V x E_d. The matrix multiplication then "picks out" the desired row. That would be a fully connected linear layer with no bias.

throwaway2676 t1_j39vamk wrote on January 7, 2023 at 1:01 AM

Reply to [D] Simple Questions Thread by AutoModerator

Is an embedding layer (or at least a simple/standard one) the same thing as a fully connected layer from one-hot encoded tokens to a hidden layer of length <embedding dimension>? The token embeddings would be the weight matrix, but with the biases set to 0.

throwaway2676 t1_j0x89o6 wrote on December 20, 2022 at 2:14 AM

Reply to [D] Simple Questions Thread by AutoModerator

Are there any ML subs or forums with a biology/medicine focus? It seems to be a rapidly expanding field, but it gets almost none of the attention relative to the flashier stuff (apart from Alpha Fold for a bit).

throwaway2676 t1_j0hvos1 wrote on December 16, 2022 at 7:29 PM

Reply to [D] Simple Questions Thread by AutoModerator

What are the chances ChatGPT offers a subscription mode that is totally uncensored?

throwaway2676 t1_iyn1qmn wrote on December 2, 2022 at 4:48 PM

Reply to [D] PyTorch 2.0 Announcement by joshadel

Wow, this sounds pretty exciting. I wonder how the speed will compare to JAX or Julia.