enryu42 OP t1_jds3452 wrote on March 26, 2023 at 7:22 PM

Reply to comment by lambertb in [D] GPT4 and coding problems by enryu42

I absolutely agree that it is useful. Even CoPilot is amazing at autocompleting "dumb" boilerplate code, which is a nontrivial amount of the code overall. However, these problems are designed to be challenging (these are competitions after all), and require ideas/intelligence to be solved. Apparently GPT4 cannot do it at all, so IMO it would be a stretch to call whatever it is doing "intelligence".

dimsumham t1_jdsfpqb wrote on March 26, 2023 at 8:50 PM

it's not. it's giving you answers to appear intelligent, many times in almost magical ways, but it doesn't "think" - especially in steps.

The MSFT paper notes that this is one of its clearest shortcomings - it can't do long range planning. At least not yet. But i think this is partially people expecting way too much of a single model.

[deleted] t1_jdt8jcf wrote on March 27, 2023 at 12:27 AM

[deleted]

Ciber_Ninja t1_jdu81zy wrote on March 27, 2023 at 6:02 AM

It can in fact think in steps. All you have to do is ask it to. In fact, multiple papers have shown that asking it to think in steps provides a significant increase in the accuracy of it's answers.

audioen t1_jdujtbl wrote on March 27, 2023 at 8:51 AM

Yes. Directly predicting the answer in one step from a question is a difficult ask. Decomposing the problem to discrete steps, and writing out these steps and then using these sub-answers to compose the final result is evidently simpler and likely requires less outright memorization and depth in network. I think it is also how humans work out answers, we can't just go from question to answer unless the question is simple or we have already memorized the answer.

Right now, we are asking the model to basically memorize everything, and hoping it generalizes something like cognition or reasoning in the deep layers of the network, and to degree this happens. But I think it will be easier to engineer good practical Q&A system by being more intelligent about the way LLM is used, perhaps just by recursively querying itself or using the results of this kind of recursive querying to generate vast synthetic datasets that can be used to train new networks that are designed to perform some kind of LLM + scratchpad for temporary results = answer type behavior.

One way to do it today with something like GPT4 might be to just ask it to write its own prompt. When the model gets the human question, the first prompt actually executed by AI could be "decompose the user's prompt to a simpler, easier to evaluate subtasks if necessary, then perform these subtasks, then respond".

Trotskyist t1_jdt8tx6 wrote on March 27, 2023 at 12:29 AM

It's still an extremely useful tool if you accept its limitations, and I think it's being reductive to say it can only solve "dumb" problems or suggest boilerplate code.

I used GPT-4 the other day to refactor/optimize an extremely bespoke and fairly complicated geoprocessing script that we use at work that was written by a former employee who's no longer with the organization. Yes, it got some things wrong that had to be corrected (sometimes all it took was feeding it a stacktrace, other times this wasn't enough and I'd have to figure out the issue myself)

But at the end of the day (literally, this was over the course of an afternoon,) I'd managed to cut the runtime by more than half, using libraries I'd never before touched and wasn't previously familiar with. It probably would have taken a week to implement otherwise.

SWESWESWEh t1_jdt1ly6 wrote on March 26, 2023 at 11:32 PM

So my guess is whoever ran the test did a bad job with the prompting. One of the examples they share, https://github.com/enryu43/llm_coding/blob/main/atcoder_eval/gpt4_solutions/abc294_c.py, the author complains it's not an efficient algorithm. So I just gave it a shot with ChatGPT:

Me: What does this code do?

def main(): N, M = map(int, input().split()) A = list(map(int, input().split())) B = list(map(int, input().split()))

C = sorted(A + B)
pos_A = [str(C.index(a) + 1) for a in A]
pos_B = [str(C.index(b) + 1) for b in B]

print(" ".join(pos_A))
print(" ".join(pos_B))

if name == "main": main()

ChatGPT: This code takes two lists of integers as input, merges them, sorts the merged list, and then prints the positions of the elements from the original lists in the sorted merged list.

Here's a step-by-step explanation of the code:

The main function is defined. Two integers, N and M, are read from the input and mapped to int using map(int, input().split()). Two lists, A and B, are read from the input, with each element being mapped to int using map(int, input().split()). A new list C is created by merging lists A and B and then sorting the resulting list using sorted(A + B). Two new lists, pos_A and pos_B, are created. These lists store the positions of the elements from lists A and B in the sorted list C. The positions are calculated using the index() method and are 1-indexed. The elements in pos_A and pos_B are joined into strings and printed, separated by a space. The if name == "main": statement checks if the script is being run directly (not imported as a module), and if so, calls the main function. Here's an example to illustrate how the code works:

Input: 4 3 2 1 4 3 5 6 7

Output: 1 2 4 3 5 6 7 The sorted merged list C is [1, 2, 3, 4, 5, 6, 7]. The positions of the elements from the original lists in the sorted list are 1, 2, 4, 3 for list A and 5, 6, 7 for list B.

Me: Could this code be re-factored to be my algorithmically efficient?

Yes, the code can be refactored to be more algorithmically efficient. The current implementation uses the index() method to find the positions of elements from lists A and B in the sorted list C. This approach has a time complexity of O(n^2) because index() is called for every element in both A and B.

To improve the algorithmic efficiency, you can use a dictionary to store the positions of the elements in the sorted list C. This will reduce the time complexity of finding the positions to O(n). Here's a refactored version of the code:

def main(): N, M = map(int, input().split()) A = list(map(int, input().split())) B = list(map(int, input().split()))

C = sorted(A + B)
positions = {value: index + 1 for index, value in enumerate(C)}

pos_A = [str(positions[a]) for a in A]
pos_B = [str(positions[b]) for b in B]

print(" ".join(pos_A))
print(" ".join(pos_B))

if name == "main": main()

The main changes in the refactored code are:

A positions dictionary is created that maps the values in the sorted list C to their 1-indexed positions. The pos_A and pos_B lists are created by directly looking up the positions of the elements from the positions dictionary. This refactored version of the code improves the algorithmic efficiency by reducing the time complexity of finding the positions of the elements from O(n^2) to O(n).

So it accurately analyzes that this code solves the problem without knowing the problem definition, and will spot the inefficiency if asked to actually write algorithmically efficient code. Seems fine to me