Submitted by AutoModerator t3_10cn8pw in MachineLearning
zoontechnicon t1_j5oiraa wrote
I'm trying to use this model to summarize text: https://huggingface.co/bigscience/mt0-large Text generation seems to end after the special end token </s> however. I wonder how I would coax it to generate longer texts. Any ideas?
zoontechnicon t1_j69b6g5 wrote
The solution, as evidenced by code in huggingface/transformers is to force the probability of the end token to -Inf. What a hack...
Viewing a single comment thread. View all comments