Submitted by sgt102 t3_yv5ckv in MachineLearning

As per the title I wrote a book called "Managing Machine Learning", it's available as an e-book (https://www.manning.com/books/managing-machine-learning-projects). Here's a blog post about the book: https://medium.com/@sgt101/does-the-world-need-yet-another-book-on-machine-learning-ml-ff22f8954d33

I'd be happy to discuss if anyone has any questions or thoughts about it.

​

The process documented in Managing Machine Learning Projects

187

Comments

You must log in or register to comment.

Peantoo t1_iwd3y3l wrote

Nice, I was just researching this topic. Does this touch on CI/CD and other late stage deployment and testing issues?

31

sgt102 OP t1_iwdedyf wrote

Great question; very thought provoking!

I don't go through to in life CI/CD scenarios in the book, but I do look at running MAB's and A/B testing to understand the relative performance of models in live, and also write about the need for model monitoring and governance supporting the prod deployment.

Basically the book mostly ends with getting it into prod - but with the emphasis on getting it into prod with the right framework around it that it can be kept alive in prod.

−6

maybe_yeah t1_iwdu5wi wrote

> The book is laid out as a series of fictionalized in sprints that take you from pre-project requirements and proposal development all the way to deployment. You’ll discover battle-tested techniques for ensuring you have the appropriate data infrastructure, coordinating ML experiments, and measuring model performance. With this book as your guide, you’ll know how to bring a project to a successful conclusion, and how to use your lessons learned for future projects.

1 INTRODUCTION: DELIVERING MACHINE LEARNING PROJECTS IS HARD, LET’S DO IT BETTER

2 PRE-PROJECT: FROM OPPORTUNITY TO REQUIREMENTS

3 PRE-PROJECT: FROM REQUIREMENTS TO A PROPOSAL

4 SPRINT ZERO: GETTING STARTED

5 SPRINT 1: DIVING INTO THE PROBLEM

6 SPRINT 1: EDA, ETHICS, BASELINE EVALUATION

7 SPRINT 2: MAKING USEFUL MODELS WITH ML

8 SPRINT 2: TESTING AND SELECTION

9 SPRINT 3: SYSTEM BUILDING AND PRODUCTION

10 POST PROJECT (SPRINT Ω)

Who is the target audience for this book? The description doesn't mention patterns and the online chapter view doesn't seem to have code samples

9

sgt102 OP t1_iwdwc7x wrote

Because it made me think about whether I should have extended the scope into the operational phases of a machine learning system?

So I found it thought provoking...

26

sgt102 OP t1_iwdwyr5 wrote

The target audience is people who are being asked to lead an ML project for the first time - or who aspire to do so. The book doesn't try to teach the implementation details of modelling - mostly because there are many texts that do that very well already, far better than I could. So there are no code examples.

3

globalminima t1_iwe6aeh wrote

The problem with most guides and even ML frameworks (e.g. MLFlow) is that they do everything pretty well up to deployment, and then offer only very basic options that are not really fit-for-purpose for intermediate or advanced systems. It's definitely the biggest differentiator between the best resources and everything else

15

globalminima t1_iwe6j08 wrote

There is no mention of monitoring, maintenance or retraining - does chapter 9 go into this? This is a big blind-spot if it's not there (and where most of the problems happen for inexperienced ML engineers)

13

91o291o t1_iwfzg03 wrote

I need a book titled "Deploy pytorch in wild and in the rain. NVIDIA Jetson with webcam edition (or equivalent)." Any suggestion??

0

SignificantHall4684 t1_iwgp0ah wrote

I am just in the middle of an interview process that would hopefully lead me from a PM role in automotive development to a new one as a PM in ML projects. I therefore seem to be your target audience. As I currently know just a little and still have a few days before the next interview round, I am going to check the book. Let me know if you wanted some feedback afterwards.

1

sgt102 OP t1_iwhfc9e wrote

Chapter 9 addresses (to some extent) logging and monitoring, and goverance - which is a lot to do with how the model should be managed in life....

I've worked in projects where the model was ungoverned and went wrong and no one noticed for a long time... and that caused damage. I also got called in to sort out a project where the team retrained the model every week... and every week they overfitted it on new data. I think knowing what the models should do, being able to say that they are doing that and then having a clear way of deciding what to do if they aren't (ie. someone in charge) is the base of maintaining them... what's your pov though?

2

sgt102 OP t1_iwhfqco wrote

Party like 1989... I think things have changed - my teams need data pipelines and reproducible test results; and they're doing things like evaluating performance using MABs... CRISP doesn't help so much with that... Also actually building a system, not only extracting a model from a table...

Do you see CRISP as sufficient now?

2