Viewing a single comment thread. View all comments

visarga t1_j67q45m wrote on January 28, 2023 at 8:57 AM

The solution is to put more text in the other languages and re-train the tokeniser, it will adapt to the larger corpus by assigning more tokens.