Viewing a single comment thread. View all comments

RuairiSpain t1_j39q68d wrote

I like the idea. I work for a large enterprises on their ML platform team, providing similar services internally to all DEV, ML and analytics teams. I think there is a business in it, it is a competitive space but the accusation potential is great (to be bought over and merged into a larger org).

I suggest you check out https://www.gitpod.io, which does more general provisioning of GitOps clusters/Pods in their managed Kubernetes clusters. It's not specifically ML, but we've looked at it for POC ML projects that want basic hosting.

Also check out: https://github.com/ml-tooling/ml-workspace, it a nice open source project with lots of packages ready to use.

And Jupyterlabs offering, they'll be your main competition on pricing.

You are going to have a headache with Python version compatibility with your base dependencies, the onces used on GitHub, and the ones needed by Jupyter Notebooks. Same with CUDA drivers, suggest you lock down the AWS node instance types, so it's less confusing for end users.

If you are turning it into a business, I'd recommend you have a tier approach to size of ML project. Simple POC ML projects with a tiny dataset, is a good starting point for most people. But then people was data ingest, cleaning, ETL to Big Data and enterprise sources; this gets complex fast (and where most teams waste time and money). Either keep your focus on POCs, and grow it's as a ML hosting company for SMEs; or embrace the ETL side and simplify data ingest for larger enterprise companies. The second option is more a consulting business but you can charge high fees.

ML ETL space: https://www.ycombinator.com/companies/dagworks-inc

https://www.gathr.one/plans-pricing/

https://www.snowflake.com/en/

Of these 3 ETL companies, I've played with Snowflake and like what they do and their direction. Especially like they acquired https://streamlit.io/ which is a fun way to deploy Python apps without dealing with infrastructure and devOps tasks.

My final comment, include data ingest and ETL in your story to customers. ML training and deploying training pipelines is not where DS people spend their time, 80% is spent on data collection, reshaping and validation.

FYI, I think you'll burn through $75 very quickly for a Nvidia GPU. I presume you are running these in on-demand and not spot prices. That monthly price seems generous for an average ML training pipeline.

14

jrmylee OP t1_j39rh76 wrote

Got it that makes a lot of sense. We'll definitely be focusing on POC projects. For me, I mainly wanted a better, faster version of Google Colab. It's difficult to compete with their offering due to their free tier, but we think solving the problems of Colab is still worthwhile.

I'm wondering, would you or anyone you know be willing to give this a spin? It would really help us to know if

a. product works on a variety of repos

b. UI is fully-functionally and easy to use

3

RuairiSpain t1_j39zy50 wrote

Sure DM me and I'll send my email address. I don't have much time to spend in it, but will give it a spin.

You've looked at huggyface? They have an elegant way to package the sample dataset with notebooks, and their documents are easy to digest.

3