Viewing a single comment thread. View all comments

Mr_Hu-Man t1_irhzs41 wrote

I need help understanding this. How much audio information is the model receiving before generating those sections? Is it just the couple of seconds that we hear in the examples? Or is it a lot of input in the background and then the examples are the end result of what the AI model learned, and now it can continue from a couple of seconds prompt?

2