Viewing a single comment thread. View all comments

WobblySilicon OP t1_j2a0oop wrote

I do have access to an A6000 for a few days. Other resources (less memory) are available by the university as well. By compute expensive I mean whole clusters of gpus...

I have difficulty in trying to wrap my head around text to video problem (particularly the newer models with many smaller components). Are their any suggestions/resources to get acquainted with this new task..? I have read recent research papers but it seems hard to find an area where improvement could be made by technical customization of base models. Do you have any tips on this?

Finally, If I cant work on text to video then my other option would be deep fake detection. Can you comment on merits or demerits of choosing this topic for my study? Both topics are very new for me. I have exposure to intermediate vision based problems and feel confident enough to try these out. Right now it just feels that I am out of ideas for any tinkering with the base models.

1