Submitted by OraOraP t3_121kxlt in deeplearning

Bellow is what I know about Stable diffusion

​

  • The base model of the Stable Diffusion is orignially trained for removing noises in images.
  • With a given training image, a series of images are created by repeatedly adding noise to the previous image.
  • The model is trained to revert this process, removing noises repeatedly to create the original image.

​

Can't this training method be used for training a reverse engineering model?

A model that can create C, C++, or some language code from a binary code?

​

  1. Make compiler to output not only the binary code but also every code that occurs in the middle steps; hence, make a series of code that begins from the original source code and ends to the binary code(or just assembly code.).
  2. Train a model to revert each code to its previous code in the series.
  3. A model that can retrieve a source code from a binary code is created.
  4. Maybe, it can be trained and updated further, to accept text instruction, like Stable Diffusion. Modifying the source as instructed in the text.

​

Is this not plausible?

Or are there already some researches on this idea?

10

Comments

You must log in or register to comment.

elbiot t1_jdo7ndu wrote

Compilation isn't a process of noising and diffusion doesn't have any relevance here. An LLM is what you would use

5

OraOraP OP t1_jdpahia wrote

I didn't mean to use the model used in stable diffusion process for reverse engineering.

I was just thking this step-by step reverting training process could be used in some model for reverse engineering.

1

elbiot t1_jdpgqoz wrote

I'm just talking about diffusion models in general and the concept of denoising. LLMs are what you would use, not the way you'd train a diffusion model but the way you'd train an LLM

1

Dylanica t1_jdo5jnf wrote

A sequence based model like a transformer (what GPT is based on) would probably work better for this particular task. In fact. GPT-3/4 would probably be pretty darn good at this task right out of the box.

3

howtorewriteaname t1_jdmy2ik wrote

This could definitely work, given that you have the right data and in great amounts. I believe that is the biggest challenge for this kind of model, more than the learning method.

1

OraOraP OP t1_jdpc3du wrote

Just crawling open source codes and compiling the code with the special compiler would produce a massive amount of training data. If the special compiler I mentioned in the post is easy to make.

1

mikonvergence t1_jduf732 wrote

You are definitely stepping outside of the domain of what is understood as denoising diffusion because it seems that your data dimensionality (shape) needs to change during the forward process.

The current definition of diffusion models is that they compute the likelihood gradient of your data (equivalent to predicting standard noise in the sample), and then take a step in that constant data space. So all networks have the same output shape as input.

Perhaps you can use transformers to handle evolving data lengths but as far as I can tell l, you’re entering uncharted territory of research.

I can recommend this open-source course I made for understanding the details of denoising diffusion for images https://github.com/mikonvergence/DiffusionFastForward

1

OraOraP OP t1_jdufnll wrote

I didn't mean to use denoising process directly to reverse engineering. I was just thinking the idea of `step-by-step reverting` could be used in some ML model for reverse engineering.

Though you have a point. Unlike denoising process, reverse engieering would require change of dimensions in the middle steps, making it more difficult than denoising.

1

mikonvergence t1_jdufvlv wrote

Right, I am the denoising diffusion as a term for a wide range of methods based on reversing some forward process. Some interesting works (such as cold diffusion) have been done on using other types of degradation apart from a Gaussian additive noise.

And yeah, the change of both content and dimensionality requires you to put together some very novel and not obvious techniques.

1