Submitted by madmax_br5 t3_10mbct5 in MachineLearning
visarga t1_j67q45m wrote
Reply to comment by madmax_br5 in [D] Moving away from Unicode for more equal token representation across global languages? by madmax_br5
The solution is to put more text in the other languages and re-train the tokeniser, it will adapt to the larger corpus by assigning more tokens.
Viewing a single comment thread. View all comments