trnka t1_iyfc1dj wrote on November 30, 2022 at 11:39 PM

Reply to [D] Can area chair ask all reviewers to be in a meeting? by Least_Pollution7078

If it were me as the reviewer, I wouldn't mind so long as the time is ok, like a half hour when I don't have anything else scheduled. It might be faster and easier than long threads online. But also I'm accustomed to dealing with pressure.

trnka t1_ixew7z9 wrote on November 22, 2022 at 10:33 PM

Reply to comment by jon-chin in [D] Simple Questions Thread by AutoModerator

It's hacky, but you could transform the timestamps into words. I've used that trick a few times successfully.

Something like TweetTimestampRangeA, TweetTimestampRangeB, ... One downside is that you'd need to commit to a strategy for time ranges (either chop the data into N time ranges, or else tokens for month, year, etc)

trnka t1_ixevsqx wrote on November 22, 2022 at 10:30 PM

Reply to comment by pretty19 in [D] Simple Questions Thread by AutoModerator

Linear regression is a good place to start -- it trains quickly and works well with small amounts of data. Categorical inputs aren't a problem; one-hot encoding will learn weights for each value.

That said, linear regression isn't always best, and it depends on your data.

trnka t1_ixeuv51 wrote on November 22, 2022 at 10:23 PM

Reply to comment by SwabianStargazer in [D] Simple Questions Thread by AutoModerator

You might be able to try outlier detection to identify unusual test cycles. Though I've heard that it's often better if you're able to label even a small amount of data for whether it's anomalous or not, because an outlier detection method doesn't know which features are important or not, and labeled data can teach ML which features are important.

Feature representation might be tricky but a simple way to start is min, max, avg, stddev of each sensor.

To segment test cases, you could make it into a machine learning problem by predicting whether time T is the start of a cycle, trained from some labeled data. I imagine that getting good results will depend on how you represent the features of "before time T" and "after time T"

Not my area of expertise but I hope this helps!

trnka t1_iuy68yo wrote on November 3, 2022 at 9:33 PM

Reply to [D] What are the benefits of being a reviewer? by Signal-Mixture-4046

There isn't much short-term reward, aside from exposing me to interesting papers now and then. Really it's about helping to create the kind of community I want to be a part of and helping to create the kinds of publications I'd like to read.

It's kinda like doing community cleanup. It's not really an immediate or direct benefit for myself.

trnka t1_itd5vto wrote on October 22, 2022 at 7:05 PM

Reply to [D] What things did you learn in ML theory that are, in practice, different? by 4bedoe

The first things that came to mind are things that just weren't taught much in school - data quality/relevance/sourcing, designing proxies for business metrics, how good is good enough, privacy of training data, etc.

The deviations from theory that have come to mind:

In theory, deep learning learns its own feature representation so that sounds like the best path. In practice, if the whole system is a black box it's very hard to debug and may run afoul of regulation, so the dream of "it just learns everything from raw inputs" isn't always the answer
Overparameterize and regularize sounds like a great strategy, but then it can take way longer to train and may limit the number of things you can try before any deadlines
I haven't had as much success with deep networks as wide networks

trnka t1_ispbueg wrote on October 17, 2022 at 6:21 PM

Reply to [D] What is the deal with breast cancer scans? by Overall-Importance54

Although we can produce good models, there's a huge gap between a model that can imitate a doctor reasonably well and a software feature that's clinically helpful. That's been my experience doing ML in primary care for years.

If you build software features that influence medical decision making, there are many challenges in making sure that the doctor knows when to rely on the software and when not to. There are also many issues with legal liability for medical errors.

If you're interested in the regulation aspect, FDA updated their criteria for clinical decision support devices for AI last month. This is the summary version and the full version has more detail.

It's not hard to have a highly accurate diagnosis model but it's hard to have a fully compliant diagnosis model that actually saves time and does no harm

trnka t1_ismujhw wrote on October 17, 2022 at 3:54 AM

Reply to [D] PhD advisor doesn’t like open source software journals? by [deleted]

I'm with you - if the publication venue gets good peer review, visibility, and citations, then it wouldn't matter to me.

I could maybe understand your advisor's perspective if they're still seeking tenure and the venue doesn't have great publication stats.

trnka t1_isfpk5a wrote on October 15, 2022 at 5:03 PM

Reply to comment by ritheshgirish9 in Do companies/teams accept ppl coming from a completely different field into AI or ML? [D] by ritheshgirish9

There's Andrew Ng's Coursera class and the related classes if you haven't seen that yet. I think there's a full specialization now. He's also got a decent starter PDF called Machine Learning Yearning.

I've heard that the Fast.ai lectures are good, though I haven't watched them myself.

Google has some great online reading. I like the People + AI guidebook cause it focuses on how to apply machine learning, and that's an area that's often overlooked.

Kaggle and other online competitions are a great place to learn and grow. I'd suggest starting with some of the easy ones that have tutorials, and then looking for competitions that you're passionate about. For instance, years ago I ran into a competition run by the European Space Agency -- that motivated me to push harder and learn more.

If you can find projects to team up with others, that will help you a lot as well. DataKind is an example of that, but I don't think they have much ML work. I'm not sure if hackathons still exist but those can be another great way to learn quickly.

To get inspiration about projects that may be relevant for your current role, I'd suggest doing some searches on Google Scholar and reading those papers, then finding the papers they cite that are interesting. And then finding the most popular papers that cite them. There's almost certainly some interesting work in your area and the trick is figuring out what things are called so you can search.

trnka t1_ir7g3c0 wrote on October 5, 2022 at 9:51 PM

Reply to [D] Things to do for effective ML teamwork at an early stage startup by coinfelix

On the API topic, my read is that the post cautions against writing too many wrappers around standard ML libraries. My experience is that folks tend to write wrappers too soon, and then they can make coding harder in the future. My rule of thumb is to not write a wrapper until you have 3 distinct production implementations of something, so that you have real information on the appropriate level of abstraction needed.

On the other topics, it depends also on your stage of startup. If you're pre-product-market-fit, you're faced with the dilemma between spending time for the long term (if your company survives) and iterating faster to ensure that your company survives. So it's a balancing act depending on your level of confidence in profitability, next round of investment, etc.

Early on, I'd expect 80% or more of research experiments to fail and be thrown away. In those cases it's mainly important to share the findings of your research with the rest of the company. Writing is ideal but tech talks work too.

For the projects that make a difference to the company, it's important to identify when they've met the bar and then dedicate some time to making the project easier to maintain and extend (whether improving the code or docs).

trnka t1_iqx68a8 wrote on October 3, 2022 at 7:16 PM

Reply to [D] What do people dislike the most about modern MLOps startups? by Impressive_Ad4945

Most of the ones I've talked to solve fairly small problems, and it just wasn't worth the hassle of going through multiple months of procurement process for them and/or making them follow our DevSecOps team's requirements. There are some bigger scoped companies like Databricks, but the monetary cost wasn't worth it for us even if it would've been worth the procurement hassle.

trnka t1_iqrxsmi wrote on October 2, 2022 at 5:38 PM

Reply to Do companies/teams accept ppl coming from a completely different field into AI or ML? [D] by ritheshgirish9

Side projects are a great step. If possible, find some ways to apply those skills in your current role as well. For example, trying to predict the number of incidents in the next week based on the changelog from the previous week, or trying to predict whether a release will affect latency. These are just a couple examples to get you thinking - you'd know better than I what would make sense in your role.

The combination of strong side projects and on-job experimentation with machine learning should be enough to get you through an initial recruiter screen for an entry-level ML role, so long as you're using technologies that the role is looking for. After that it's really up to the technical and behavioral assessments.

And just to set expectations, it's doable but not easy. I'd guess it'd take around 20h/week of practice and learning for 6-12 months, then about 20h/week of practice/learning for interviews for 3-6 months. It'll be easier for some people and harder for others; I just don't want to give you false hope that it's typical to switch roles in just a couple of months.

Good luck!