trnka t1_j8hcpwt wrote on February 14, 2023 at 8:50 AM

I've been learning more about multilingual neural machine translation models lately such as the one in Google's recent paper:

Bapna, A., Caswell, I., Kreutzer, J., Firat, O., van Esch, D., Siddhant, A., Niu, M., Baljekar, P., Garcia, X., Macherey, W., Breiner, T., Axelrod, V., Riesa, J., Cao, Y., Chen, M. X., Macherey, K., Krikun, M., Wang, P., Gutkin, A., … Hughes, M. (2022). BUILDING MACHINE TRANSLATION SYSTEMS FOR THE NEXT THOUSAND LANGUAGES

I'm not sure I understand why it works for languages with no parallel data with any language though.... for instance Latinized Hindi doesn't have any parallel data. Why would the encoder or decoder representations of Latinized Hindi be compatible with any other language?

Is it because byte-pair encoding is done across languages, and that Latinized Hindi will have some word overlap with languages that DO have parallel data? So then it's encouraging the learning algorithm to represent those languages in the same latent space?