Smallpaul t1_jdu38fb wrote on March 27, 2023 at 5:04 AM

Decompilers already exist though.

currentscurrents t1_jdu3vwq wrote on March 27, 2023 at 5:11 AM

Yeah, but they're hand-crafted algorithms and produce code that's hard to read.

ultraminxx t1_jdu7uz8 wrote on March 27, 2023 at 5:59 AM

that said, it might be also a good approach to preprocess the input with a classical algorithm and then train a model on refactoring that decompiled code, so it becomes more readable

currentscurrents t1_jdvxga6 wrote on March 27, 2023 at 4:18 PM

Possibly! But it also seems like a good sequence-to-sequence translation problem, just line up the two streams of tokens and let the model figure it out.

s0n0fagun t1_jdu975r wrote on March 27, 2023 at 6:17 AM

That depends on the language/compiler used. Java and C# have decompilers that turn out great code.

currentscurrents t1_jdvxu6g wrote on March 27, 2023 at 4:21 PM

Those languages don't compile to machine code, they compile to a special bytecode that runs in a VM.

bubudumbdumb t1_jdu90gu wrote on March 27, 2023 at 6:14 AM

Friends working in rev.ng told me that it's very difficult to decompile to the original high level structures actually used in the source code. Maybe C have a few ways to code a loop but c++ has many and figuring out the source code from assembly is very hard to achieve with rule based systems.

Maykey t1_jdv3ft8 wrote on March 27, 2023 at 12:43 PM

And there is already OpenAI-based plugin for them