throwaway2676
throwaway2676 t1_jd8qe6f wrote
Reply to [D] Simple Questions Thread by AutoModerator
When training LLMs to write code, is it standard to just make indentation and new line their own tokens? Like '<\n>' and <\ind>' or something?
Follow up: Are there any good models on HuggingFace that specialize in writing and explaining code?
throwaway2676 t1_ja4w3nj wrote
Reply to comment by CaptainCrypto1969 in Start of the soft landing? by rickert1337
Have you been to the supermarket lately?
throwaway2676 t1_ja1bftr wrote
Reply to [D] Simple Questions Thread by AutoModerator
How much theoretical speedup do you think DL could get if we coded everything directly in C++ instead of Python?
throwaway2676 t1_j9kilst wrote
Reply to [D] Simple Questions Thread by AutoModerator
Are there any developments in the ASIC/analog computing space that people are really excited about? I think most people know about google's TPUs by now, but is there anything else with the potential to threaten the dominance of GPUs in the next few years?
throwaway2676 t1_j8digqj wrote
Reply to [D] Quality of posts in this sub going down by MurlocXYZ
Here are the top 10 posts on my front page right now:
>[R] [N] Toolformer: Language Models Can Teach Themselves to Use Tools - paper by Meta AI Research
>[D] Quality of posts in this sub going down
>[D] Is a non-SOTA paper still good to publish if it has an interesting method that does have strong improvements over baselines (read text for more context)? Are there good examples of this kind of work being published?
>[R] [N] pix2pix-zero - Zero-shot Image-to-Image Translation
>[P] Extracting Causal Chains from Text Using Language Models
>[R] [P] Adding Conditional Control to Text-to-Image Diffusion Models. "This paper presents ControlNet, an end-to-end neural network architecture that controls large image diffusion models (like Stable Diffusion) to learn task-specific input conditions." Example uses the Scribble ControlNet model.
>[R] [P] OpenAssistant is a fully open-source chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
>[D] What ML dev tools do you wish you'd discovered earlier?
>[R] CIFAR10 in <8 seconds on an A100 (new architecture!)
>[D] Engineering interviews at Anthropic AI?
From this list the only non-academic/"low quality" posts are the last one and this one. This is consistent with my normal experience, so I'm not really sure what you are talking about.
throwaway2676 t1_j74iilz wrote
Reply to [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501
Imo, chain-of-thought and program-of-thought reasoning will be the next major generation of progress for LLMs. Probably another year or two and we will be able to eliminate those goofy instances where the models confidently produce nonsense (well, mostly anyway).
throwaway2676 t1_j6xerk7 wrote
Reply to comment by bojohnsonyadig in [N] OpenAI starts selling subscriptions to its ChatGPT bot by bikeskata
I think a lot of people would pay for the initial model they first released. Since then they've been censoring the shit out of it to avoid controversy, and a fair amount of the hype died down among the average joes.
At this point I think their main target demo will be white collar workers who use it to make work easier. However, the hype will pick back up once they connect it to the internet.
throwaway2676 t1_j6syciq wrote
Reply to [R] Faithful Chain-of-Thought Reasoning by starstruckmon
Woah, hey, this is basically what I proposed last month
throwaway2676 t1_j6d99fw wrote
Reply to comment by currentscurrents in [R] Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers by currentscurrents
So shouldn't this mean we can train transformers using forward passes alone? It seems that it wouldn't be too difficult to derive an algorithm that updates the attention weights based on these results, but I don't believe the authors mention the possibility.
throwaway2676 t1_j68vbfq wrote
Reply to comment by currentscurrents in [R] Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers by currentscurrents
> Not in 40 years when computers are 1000x better.
It won't take anywhere near that long. We've barely scratched the surface of ASICs and analog matrix multiplication, which is where the real fun is going to begin.
throwaway2676 t1_j4q8zuh wrote
Reply to comment by LetGoAndBeReal in [D] Fine-tuning open source models on specific tasks to compete with ChatGPT? by jaqws
Well, can you just run it from an SSD, but more slowly?
throwaway2676 t1_j4eqxk2 wrote
Reply to comment by MegavirusOfDoom in [D] Simple Questions Thread by AutoModerator
GPT-4 should be coming out, right?
throwaway2676 t1_j47m2r9 wrote
Reply to comment by navillusr in [D] What's your opinion on "neurocompositional computing"? (Microsoft paper from April 2022) by currentscurrents
> If we could magically combine the reasoning ability of symbolic systems with the pattern recognition and generalization of neural networks, we would be getting very close to AGI imo.
I must be misunderstanding your meaning, because I don't see why this is particularly difficult. Train an AI to recognize deductive/mathematical reasoning and translate it into symbolic or mathematical logic. Run an automated proof assistant or computer algebra system on the result. Use the AI to translate back into natural language. Shouldn't be much more difficult than creating code, which ChatGPT can already do, and it would instantly eliminate 95% of the goofy problems LLMs get wrong.
throwaway2676 t1_j3ygpsd wrote
Reply to comment by Desperate-Step4469 in [News] "Once $92 billion in profit plus $13 billion in initial investment are repaid (to Microsoft) and once the other venture investors earn $150 billion, all of the equity reverts back to OpenAI." by Gmroo
Haha, or hyperinflation takes off and they pay it back with spare change in like 7 years
throwaway2676 t1_j3sxeda wrote
Reply to comment by ASuarezMascareno in [D] Found very similar paper to my submitted paper on Arxiv by [deleted]
> I work in Astronomy, not in ML, but review first and arxiv later is how most people work in Europe. I typically don't find european arxiv papers that are not accepted for publication already.
I did work in an Astronomy adjacent field, and European researchers in our area all submitted to arxiv first, just like US groups.
throwaway2676 t1_j3h780s wrote
Reply to comment by trnka in [D] Simple Questions Thread by AutoModerator
> In a fully-connected layer, the input to the matrix multiply is the output of everything in the previous layer, not just the output of a single unit.
But if the previous layer is 0 everywhere except for one unit, the result is the same, no?
My mental picture is that input layer 0 has V = <token vocabulary size> neurons, and layer 1 has E_d = <embedding dimension> neurons. Layer 0 is 1 in 1 neuron, 0 everywhere else, as one-hot encoding normally goes. The embedding layer 1 is then given by x@W, where x is the layer 0 as a row vector, and W is the weight matrix with dimensions V x E_d. The matrix multiplication then "picks out" the desired row. That would be a fully connected linear layer with no bias.
throwaway2676 t1_j39vamk wrote
Reply to [D] Simple Questions Thread by AutoModerator
Is an embedding layer (or at least a simple/standard one) the same thing as a fully connected layer from one-hot encoded tokens to a hidden layer of length <embedding dimension>? The token embeddings would be the weight matrix, but with the biases set to 0.
throwaway2676 t1_j0x89o6 wrote
Reply to [D] Simple Questions Thread by AutoModerator
Are there any ML subs or forums with a biology/medicine focus? It seems to be a rapidly expanding field, but it gets almost none of the attention relative to the flashier stuff (apart from Alpha Fold for a bit).
throwaway2676 t1_j0hvos1 wrote
Reply to [D] Simple Questions Thread by AutoModerator
What are the chances ChatGPT offers a subscription mode that is totally uncensored?
throwaway2676 t1_iyn1qmn wrote
Reply to [D] PyTorch 2.0 Announcement by joshadel
Wow, this sounds pretty exciting. I wonder how the speed will compare to JAX or Julia.
throwaway2676 t1_jdl0y80 wrote
Reply to comment by mxby7e in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
Alpaca was only trained on 50k instructions, right? A large group of grad students or a forum like reddit could construct that many manually in a couple weeks. I'm surprised they even had to resort to using ClosedAI