2blazen

2blazen t1_je6axjp wrote

>- Creative brainstorming for professional work

I struggle with this, I was trying to get it it help me come up with interesting thesis research questions in a very specific audioML field, but it failed to come up with anything original, and I don't know if there's a certain way I should have phrased my questions or it's just creative limitations

1

2blazen t1_j8r3le4 wrote

You'd want to find a more in-depth topic for a master's thesis, Reddit scraping and sentiment analysis sounds more like an assignment. Ask your supervisor if they have a topic they're researching on, and if you can join. Look around if your university has example projects or even better, open projects. Look around past year's theses if you can continue working on any of them (hint: future works section) Once you find a topic you're interested in and is niche enough, it's still too broad so you have to filter it down to research questions, for which you have to start an in-depth research about the challenges of the topic and such.

Don't panic, there are many topics that need research. I'm starting my thesis in audio processing - health AI / speaker embeddings / impaired speech / diagnosis assistance and it's wild west over here, partially because the data is not publicly accessible though

0

2blazen t1_j7kc4t6 wrote

Definitely Python, that's what all major companies support too. However it's not the byte code cache that makes a difference but the fact that machine learning libraries are written in C++ so you're not sacrificing performance by scripting in it.

These kind of questions are more suitable on r/learndatascience though

2

2blazen t1_j70ux9o wrote

I thought so too, but haven't actually notice any difference, other than how the davinci models don't have the extensive content filters.

>if you use it for work, $20 is negligible

If my company pays for it, sure, otherwise I'll always prefer the request-based pricing with a nice API that I can just call from my terminal

1

2blazen OP t1_j4u5jf7 wrote

With my RTX 3060 it takes 3m50s to diarize 1 hour, 20m to do 3 hours (although can be reduced to 16m by presetting the number of speakers - I didn't check 1h segment like this, also keep in mind it takes time to load the models into vram), however 5 hour episodes keep getting my process killed after around 40m. It's probably a memory issue, and could even happen during the segmentation, but reusing clusters is a common issue on Github, it wouldn't just be for my usecase

2