Submitted by MohamedRashad t3_y14lvd in MachineLearning
MohamedRashad OP t1_irvmz2q wrote
Reply to comment by KingsmanVince in [D] Reversing Image-to-text models to get the prompt by MohamedRashad
But in this case, I will need to train image captioning model on text-to-image data and hope that it will provide me with the correct prompt to recreate the image using the text-to-image model.
I think a better solution is to use the backward propagation in text-to-image models to get the prompt that made the image (an inverse state or something like it).
KlutzyLeadership3652 t1_irwt908 wrote
Don't know how feasible this would be for you but you could create a surrogate model that learns image-to-text. Use your original text-to-image model to generate images given text (open caption generation datasets can give you good examples of captions), and the surrogate model trains to generate the text/caption back. This would be model centric so don't need to worry about many2many issue mentioned above.
This can be made more robust than a backward propagation approach.
Viewing a single comment thread. View all comments