Seankala
Seankala t1_is8sbpn wrote
Reply to comment by [deleted] in [D] Manually creating the target data is considered as data leakage. by [deleted]
I normally wouldn't entertain that, but upon checking your profile it seems like you're interested in financial ML. As someone who's briefly done research in financial ML I can tell you that 99% of the papers you read and ML code you run are pretty much BS. You're better off using ML very sparingly and using if-else statements.
Seankala t1_is8rc1t wrote
Reply to comment by [deleted] in [D] Manually creating the target data is considered as data leakage. by [deleted]
The necessity of ML for your scenario has nothing to do with your question...
Seankala t1_is8qtf8 wrote
Reply to comment by [deleted] in [D] Manually creating the target data is considered as data leakage. by [deleted]
Let me ask you then, why would you think this is cheating?
Seankala t1_is8nrdc wrote
As long as you're not using the target data for training then you're fine.
Seankala t1_irzxiq8 wrote
Depends on what I'll be working on and who I'll be working with. I know plenty of brilliant people who work for big name companies that are complete asshats that I would never work with no matter how much money I was offered (unless it was 7 figures). If the startup is working on something I believe in and the people are likable, I'd choose the startup.
Seankala t1_iruw37k wrote
Can't speak on behalf of Keras, but for PyTorch's implementation of the cross entropy loss the softmax is calculated with the loss function. Therefore, you'd feed unscaled logits into the loss function.
Seankala t1_iruvc69 wrote
I work at a startup that provides solutions to exactly this problem. It's just multi-label classification, you might want to look more into that.
Seankala t1_iru9kdm wrote
Reply to comment by Empty-Painter-3868 in [D] What are your thoughts about weak supervision? by ratatouille_artist
I second forgetting about Snorkel and the like. I found it better for me to just label the datapoints myself and continuously refine pseudo labels generated by models.
Seankala t1_isrzhkb wrote
Reply to [D] Machine Learning conferences/journals with a mathematical slant? by vajraadhvan
You're not going to find theory-dense papers at major ML conferences. Most of the reviewers don't bother going through them and people usually find theory boring compared to "super duper cool" architectures that lead to 0.1% increase in performance. Like the top comment said, COLT is a good place to start.