Viewing a single comment thread. View all comments

Smallpaul t1_jdu38fb wrote

Decompilers already exist though.

3

currentscurrents t1_jdu3vwq wrote

Yeah, but they're hand-crafted algorithms and produce code that's hard to read.

10

ultraminxx t1_jdu7uz8 wrote

that said, it might be also a good approach to preprocess the input with a classical algorithm and then train a model on refactoring that decompiled code, so it becomes more readable

10

currentscurrents t1_jdvxga6 wrote

Possibly! But it also seems like a good sequence-to-sequence translation problem, just line up the two streams of tokens and let the model figure it out.

2

s0n0fagun t1_jdu975r wrote

That depends on the language/compiler used. Java and C# have decompilers that turn out great code.

2

currentscurrents t1_jdvxu6g wrote

Those languages don't compile to machine code, they compile to a special bytecode that runs in a VM.

2

bubudumbdumb t1_jdu90gu wrote

Friends working in rev.ng told me that it's very difficult to decompile to the original high level structures actually used in the source code. Maybe C have a few ways to code a loop but c++ has many and figuring out the source code from assembly is very hard to achieve with rule based systems.

5