JackBlemming t1_jajg4dz wrote on March 1, 2023 at 9:33 PM

Reply to comment by Purplekeyboard in [D] OpenAI introduces ChatGPT and Whisper APIs (ChatGPT API is 1/10th the cost of GPT-3 API) by minimaxir

It's not about the price, it's about the strategy. Google maps API was dirt cheap so nobody competed, then they cranked up prices 1400% once they had years of advantage and market lock in. That's not ok.

If OpenAI keeps prices stable, nobody will complain, but this is likely a market capturing play. They even said they were losing money on every request, but maybe that's not true anymore.

JackBlemming t1_jaisvp4 wrote on March 1, 2023 at 7:09 PM

Reply to comment by Educational-Net303 in [D] OpenAI introduces ChatGPT and Whisper APIs (ChatGPT API is 1/10th the cost of GPT-3 API) by minimaxir

Definitely. This is so they can become entrenched and collect massive amounts of data. It also discourages competition, since they won't be able to compete against these artificially low prices. This is not good for the community. This would be equivalent to opening up a restaurant and giving away food for free, then jacking up prices when the adjacent restaurants go bankrupt. OpenAI are not good guys.

I will rescind my comment and personally apologize if they release ChatGPT code, but we all know that will never happen, unless they have a better product lined up.

JackBlemming t1_j9n4bp2 wrote on February 23, 2023 at 3:50 AM

Reply to comment by DigThatData in [D] "Deep learning is the only thing that currently works at scale" by GraciousReformer

This is true. Netflix famously didnt use some complex neural net for choosing shows you'd like exactly because it didnt scale. Neural nets are expensive and if you can sacrifice a few percentages to save hundreds of millions in server fees, it's probably good.

JackBlemming t1_j8f6v8c wrote on February 13, 2023 at 9:28 PM

Reply to comment by EducationalCicada in [R] [N] Toolformer: Language Models Can Teach Themselves to Use Tools - paper by Meta AI Research by radi-cho

Schmidhuber actually already did this in the 90s

JackBlemming t1_j8c2llw wrote on February 13, 2023 at 4:44 AM

Reply to [R] DIGIFACE-1M — synthetic dataset with one million images for face recognition by t0ns0fph0t0ns

Everyone involved in this project is being paid too much.

JackBlemming t1_j7ql45e wrote on February 8, 2023 at 6:37 PM

Reply to comment by Iunaml in [N] New Book on Synthetic Data: Version 3.0 Just Released by MLRecipes

There's nothing wrong with relevant self promotion, especially if it's high quality material. Obviously bad/irrelevant stuff should be removed, but that's up to the mods discretion.

I personally bookmarked this for later as it's very interesting to me.

JackBlemming t1_j39baqq wrote on January 6, 2023 at 10:44 PM

Reply to comment by jrmylee in [D] I recently quit my job to start a ML company. Would really appreciate feedback on what we're working on. by jrmylee

Per 2. yes, exactly right. Some of my datasets are millions of images with metadata. As you can imagine, uploading and consuming this magnitude is slow and tedious, and then integrating it with the remote machine actually running the training script.

JackBlemming t1_j3997nh wrote on January 6, 2023 at 10:30 PM

Reply to [D] I recently quit my job to start a ML company. Would really appreciate feedback on what we're working on. by jrmylee

Couple thoughts:

Setting up an environment is typically harder than cloning the repo and running pip install on the requirements.txt file. Many python packages require prior linux packages to have been installed beforehand. Your service should ideally take care of this for me. Some obvious examples are opencv, cuda/gpu drivers, mysqlclients etc.
Dataset management is the most annoying part of machine learning for me, not setting up environments which is typically a dockerfile or docker-compose file, and maybe one shell script to bootstrap everything. Dataset management being allowing my models to access the dataset in a fast way, updating the dataset, etc. Ideally your service should make it easy to upload data to your dataset and then make it accessible to the training code. This is assuming you want to allow people to train models on the service.