Viewing a single comment thread. View all comments

LeN3rd t1_jcgq97y wrote

You will need more than a week. If you just want to predict the next word in a sentence, take a look at large language models. ChatGPT being one of them. BERT is a research alternative afaik. If you aim to learn the probabilities yourself, you will need at least a few months.

In general what you want is a generative model that can sample from the conditional probability distribution. In sequences usually transformers like BERT and chatgpt are state of the art. You can also take a look at normalizing flows and diffusion models to learn probability distributions. But this needs some maths, and i unfortunatly do not know what smaller models can be used for computational linguistic applications like this.

1

No_Complaint_1304 t1_jcgshk7 wrote

Well I did expect this but still month’s! I’ll look into everything you mentioned. And I’ll drop the project for now, if I can’t finish it by studying heavily, I might as well learn slowly but surely, absorb all the information and then go back to make a project that involve predictions and analyzing data. ty4ur help

1

LeN3rd t1_jcguoiy wrote

I didn't mean to discourage you. Its a fascinating field, but it is its own field of research for a reason. Start with BERT and see where that gets you.

These ones are also a nice small watch:

https://www.youtube.com/watch?v=gQddtTdmG_8

https://www.youtube.com/watch?v=rURRYI66E54

1

No_Complaint_1304 t1_jchh2u9 wrote

Damn I hope no one got me wrong I wanted to learn the basics in a week (and finish my side project asap) I didn’t claim I could study such a large and complex field in a week.

1

LeN3rd t1_jchkhuv wrote

What you can try is to start with linear or log Regression and try to learn on Wikipedia. That might be fun and give you decent results.

2