throwaway2676

throwaway2676 t1_jd8qe6f wrote

When training LLMs to write code, is it standard to just make indentation and new line their own tokens? Like '<\n>' and <\ind>' or something?

Follow up: Are there any good models on HuggingFace that specialize in writing and explaining code?

2

throwaway2676 t1_j9kilst wrote

Are there any developments in the ASIC/analog computing space that people are really excited about? I think most people know about google's TPUs by now, but is there anything else with the potential to threaten the dominance of GPUs in the next few years?

1

throwaway2676 t1_j8digqj wrote

Here are the top 10 posts on my front page right now:

>[R] [N] Toolformer: Language Models Can Teach Themselves to Use Tools - paper by Meta AI Research

>[D] Quality of posts in this sub going down

>[D] Is a non-SOTA paper still good to publish if it has an interesting method that does have strong improvements over baselines (read text for more context)? Are there good examples of this kind of work being published?

>[R] [N] pix2pix-zero - Zero-shot Image-to-Image Translation

>[P] Extracting Causal Chains from Text Using Language Models

>[R] [P] Adding Conditional Control to Text-to-Image Diffusion Models. "This paper presents ControlNet, an end-to-end neural network architecture that controls large image diffusion models (like Stable Diffusion) to learn task-specific input conditions." Example uses the Scribble ControlNet model.

>[R] [P] OpenAssistant is a fully open-source chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

>[D] What ML dev tools do you wish you'd discovered earlier?

>[R] CIFAR10 in <8 seconds on an A100 (new architecture!)

>[D] Engineering interviews at Anthropic AI?

From this list the only non-academic/"low quality" posts are the last one and this one. This is consistent with my normal experience, so I'm not really sure what you are talking about.

8

throwaway2676 t1_j74iilz wrote

Imo, chain-of-thought and program-of-thought reasoning will be the next major generation of progress for LLMs. Probably another year or two and we will be able to eliminate those goofy instances where the models confidently produce nonsense (well, mostly anyway).

53

throwaway2676 t1_j6xerk7 wrote

I think a lot of people would pay for the initial model they first released. Since then they've been censoring the shit out of it to avoid controversy, and a fair amount of the hype died down among the average joes.

At this point I think their main target demo will be white collar workers who use it to make work easier. However, the hype will pick back up once they connect it to the internet.

3

throwaway2676 t1_j6d99fw wrote

So shouldn't this mean we can train transformers using forward passes alone? It seems that it wouldn't be too difficult to derive an algorithm that updates the attention weights based on these results, but I don't believe the authors mention the possibility.

1

throwaway2676 t1_j47m2r9 wrote

> If we could magically combine the reasoning ability of symbolic systems with the pattern recognition and generalization of neural networks, we would be getting very close to AGI imo.

I must be misunderstanding your meaning, because I don't see why this is particularly difficult. Train an AI to recognize deductive/mathematical reasoning and translate it into symbolic or mathematical logic. Run an automated proof assistant or computer algebra system on the result. Use the AI to translate back into natural language. Shouldn't be much more difficult than creating code, which ChatGPT can already do, and it would instantly eliminate 95% of the goofy problems LLMs get wrong.

4

throwaway2676 t1_j3sxeda wrote

> I work in Astronomy, not in ML, but review first and arxiv later is how most people work in Europe. I typically don't find european arxiv papers that are not accepted for publication already.

I did work in an Astronomy adjacent field, and European researchers in our area all submitted to arxiv first, just like US groups.

5

throwaway2676 t1_j3h780s wrote

Reply to comment by trnka in [D] Simple Questions Thread by AutoModerator

> In a fully-connected layer, the input to the matrix multiply is the output of everything in the previous layer, not just the output of a single unit.

But if the previous layer is 0 everywhere except for one unit, the result is the same, no?

My mental picture is that input layer 0 has V = <token vocabulary size> neurons, and layer 1 has E_d = <embedding dimension> neurons. Layer 0 is 1 in 1 neuron, 0 everywhere else, as one-hot encoding normally goes. The embedding layer 1 is then given by x@W, where x is the layer 0 as a row vector, and W is the weight matrix with dimensions V x E_d. The matrix multiplication then "picks out" the desired row. That would be a fully connected linear layer with no bias.

1

throwaway2676 t1_j39vamk wrote

Is an embedding layer (or at least a simple/standard one) the same thing as a fully connected layer from one-hot encoded tokens to a hidden layer of length <embedding dimension>? The token embeddings would be the weight matrix, but with the biases set to 0.

3

throwaway2676 t1_j0x89o6 wrote

Are there any ML subs or forums with a biology/medicine focus? It seems to be a rapidly expanding field, but it gets almost none of the attention relative to the flashier stuff (apart from Alpha Fold for a bit).

7