Viewing a single comment thread. View all comments

SkinnyJoshPeck t1_ist26dr wrote

lol i just got rejected by glassdoor because i didn’t have a “mastery” of pandas.

who fuckin’ cares? i have experience over years with verifiable projects that made multiple companies real cash-fuckin-money. i made the model, i tuned it up, got it working well within time limit, etc etc.

just because i’m not a fucking pandas wizard, doesn’t mean i’m not a competent ML eng/scientist/whatever. i can’t remember meeting a good PM who cared what model i used, let alone if i used x and y in pandas over z to accomplish my goal.

if the stats are good, the model generalizes well, and training time isn’t abysmal - who cares???

108

Ularsing t1_isvms80 wrote

I feel compelled to add here that Pandas has an absolutely dogshit API plagued by breaking changes and bastardizations of R code. It's the best package for what it does, but it leaves a lot to be desired. Trying to prioritize Pandas knowledge reads like someone trying to hire based on their omniscient understanding of the field that they gained from their coding bootcamp.

19

marr75 t1_isxougd wrote

I read a good blog post from a guy talking about how modern IDEs encourage you to learn really weird "motions" (using pycharm's refactor, codegen, and code completion mid-stream, for example). He wasn't saying it was bad per se, just that we should all remember the point isn't to be "good" at the IDE, it's to solve problems with the code.

I feel the same about pandas. If anything, the skill to focus on is vectorizing your operations. That's the biggest readability and performance improvement and it's portable to dplyr, polars, etc.

3

chief167 t1_istrptp wrote

As someone who sometimes has to hire people, perhaps this is the issue:

Imagine how difficult it is for big companies to get a MLOps framework going, with all the red tape and scattered IT systems. It was very painful where I work. In the end we got something working using a python platform that really needs you to use pandas and sklearn type interfaces.

Let's hypothetically say you are a great data scientist using R, or Sas or MATLAB or ... If I don't have a lot of options I'd hire you and put you on a training program for our framework. But if I have multiple decent candidates, and some don't require retraining, yeah imma gonna pick one of them. I am not spending 2 months trying to get compliance and cybersec to approve your docker container with R code in it, if I can have a similar model in our pre-approved workflow.

14

SkinnyJoshPeck t1_isu0ui7 wrote

I hear ya; I think the point is less about proficiency and more about mastery -- in my case, I was marked down heavily since I didn't use iloc. Something like

df[df.col < 10]
vs
df[df.iloc[:, 0] < 10]

because I guess it makes it more clear to the reader, and it protects the code from explicit column names; the fact that I didn't use it made me seem like I didn't know pandas well.

to your point, though, I see the importance in the infrastructure. In this case, it was for an ml scientist role where I wouldn't actually be doing any of the MLOps, just designing and tuning the models.

16

phb07jm t1_isudd5v wrote

Can someone please explain why the second is preferable? I would always do the first because it's more likely that the position of a column will change than the name.

21

silvershadow t1_isurezr wrote

Change the iloc to a loc and then I would maybe see the argument.

.iloc and .loc explicitly return the original data frame, while [] indexing can in some cases return a copy. Pandas makes no promises on what you get

So depending on what the full expression was the criticism of using [] inducing could make sense. You’d need to see the full context of what OP was writing though.

From the sounds of what they wrote though, this is not the thinking the interviewer was following.

11

chief167 t1_isuculx wrote

Ok yeah well that's stupid. Because I am actually in favour of column names instead of indexes. Indexes are pain in the ass when your incoming dataframe changes, it creates an implicit dependency.

But your last line is my point. You shouldn't be concerned about MLops stuff, but if your models is already in the right framework, it saves soooo much time

11

monkeyunited t1_isufjc7 wrote

That’s dumb and violates the “explicit is better than implicit” rule.

8

AutumnStar t1_isuy0el wrote

I agree with gist of your comment, but FYI, model selection matters a lot for many different reasons. You have no idea how many people I’ve interviewed who just want to just use Neural Nets or XGBoost every time. Or people who couldn’t tell me any advantages/disadvantages for any algorithm.

I tend to look for people who can critically think well. That’s the hardest skill to find in any DS. They should have some experience and competence in coding, obviously, but realistically almost everything else can be taught more easily.

6

jzini t1_isvofgm wrote

By Glassdoor? Yo do they realize they are Glassdoor?

1