enryu42

enryu42 OP t1_jdsokwz wrote

Interesting! Here are the scraped and auto-converted statements (formatting is off sometimes, especially in the sample tests, but understandable). Prefixes are: "abc" for beginner, "arc" for regular, "agc" for "grand".

I do believe that the "Beginner" ones can be improved, but it'll be interesting to see what happens on "Grand" (or even "Regular"), as they require coming up with some ideas before writing the code.

7

enryu42 OP t1_jds49mm wrote

Reply to comment by maskedpaki in [D] GPT4 and coding problems by enryu42

Well, they do, and quite successfully, this is what these sites are about...

Of course if you ask some frontend engineer to solve some math-y problem, they'll be confused. But this is simply because they lack knowledge, and GPT4 evidently doesn't have this issue. Moreover, I doubt any human programmer will have troubles with the "Beginner" problems, regardless of their specialization.

15

enryu42 OP t1_jds3452 wrote

Reply to comment by lambertb in [D] GPT4 and coding problems by enryu42

I absolutely agree that it is useful. Even CoPilot is amazing at autocompleting "dumb" boilerplate code, which is a nontrivial amount of the code overall. However, these problems are designed to be challenging (these are competitions after all), and require ideas/intelligence to be solved. Apparently GPT4 cannot do it at all, so IMO it would be a stretch to call whatever it is doing "intelligence".

18

enryu42 OP t1_jds18j4 wrote

I absolutely agree, however, these models repeatedly exceeded expectations (e.g. 5 years ago I thought that "explaining jokes" would be a hard problem for them, with a similar reasoning...)

I tried that because I've heard that there are people inside competitive programming community claiming that GPT4 can solve these problems. But from what I gather, it is still not there.

12

enryu42 OP t1_jdrzcc9 wrote

Do you mean re-prompt it asking to correct its mistakes? It is hard to try with the current tight limits on GPT4 prompt count, I'll try once API is properly available. But I strongly doubt it'll help much: it's not that the solutions have minor bugs, they're usually just completely wrong, i.e. the model doesn't "get" the idea for the correct solution.

(it might help for some of the problems from the "Beginner" category though, but these aren't that interesting)

6

enryu42 OP t1_jdrezba wrote

I don't know about IIT-JEE/Gaokao, but many of the problems from the International Math Olympiad are freaking hard. If the model aims for human-level intelligence, such high bar would be unfair - it is more of the realm of "the best human"-level intelligence.

To be fair, hardest problems from "AtCoder Grand" contests have the same issue. But "AtCoder Regular" problems should definitely be solvable by an average human with the right knowledge and skillset, and yet, GPT4 cannot solve anything (and it doesn't look like it is lacking knowledge).

16

enryu42 OP t1_jdrbyh5 wrote

Arithmetic can be solved in a toolformer-like way, by just giving it an access to a calculator. But this wouldn't help with coding.

Regarding the point about boilerplate, this is exactly what is surprising: GPT4 performs very well on exams/tests, which supposedly require some amount of creative reasoning. So either the tests are poorly designed, or it can do some creative tasks while not others. If the latter is the case, it would be interesting to learn which are the areas where it performs well, and why.

21

enryu42 t1_jaa1lru wrote

> The only AI ethics that has any substance is data bias

While the take in the tweet is ridiculous (but alas common among the "AI Ethics" people), I'd disagree with your statement.

There are many other concerns besides the bias in the static data. E.g. feedback loops induced by ML models when they're deployed in real-life systems. One can argue that causality for decision-making models also falls into this category. But ironically, the field itself is too biased to do productive research in these directions...

1

enryu42 t1_j6wme0p wrote

Nice! It is pretty clear that big models memorize some of their training examples, but the ease of extraction is impressive.

I wonder what would be the best mitigation strategies (besides the obvious one of de-duplicating training images). Theoretically sound approaches (like differential privacy) will perhaps cripple the training too much. I wonder if some simple hacks would work: e.g. train the model as-is first, then generate an entirely new training set using the model and synthetic prompts, and train a new model from scratch only on the generated data.

Another aspect of this is on the user experience side. People can reproduce copyrighted images with just pen and paper, but they'll be fully aware of what they're doing in such case. With diffusion models, the danger is, the user can reproduce an existing image without realizing it. Maybe augmenting the various UI's with reverse image search/nearest neighbor lookup would be a good idea? Or computing training set attributions for generated images with something along the lines of tracin.

1

enryu42 t1_iw2vlwf wrote

Oh, it is totally feasible - I'm getting smth around 2.5 training examples/second with vanilla SD without any optimizations (which translates to more than 200k per day), which is more than enough for fine-tuning.

I'd still not recommend it for teaching the model new concepts though - it is more appropriate for transferring the model to new domains (e.g. here people adapted it to anime images).

1

enryu42 t1_iw2m1nt wrote

Even without any optimizations, it is possible to fine-tune StableDiffusion on RTX 3090, even in fp32, with some effort - even with batch size 2 (precomputing latent embeddings, saving some VRAM by not storing the autoencoder params during training).

But this is definitely not a "one-button" solution, and requires more effort than using the existing tools like textual inversion/DreamBooth (which are more appropriate for the "teach the model a new concept" use-case).

3