Submitted by GPUaccelerated t3_yf5jm3 in deeplearning

I serve the AI industry, primarily building, configuring and selling GPU-accelerated workstations/servers and cloud instances.

Most people and companies buy and rent these things based on necessity. *You can't really dig holes effectively if you don't have a shovel kind of thing.*

I'm obviously not the only provider in the market. And I'm not one of the largest. Some choose me because I save them a lot of money and some choose me because I'm really really good at what I do(configuring and optimizing). (Yes, I'm confident enough to put that out there.)

When I'm taking care of an upgrade situation, it's usually because of one of two things.

  1. The hardware is outdated and needs a refresh to be able to support modern processing tools.
  2. The client's project is scaling and they need more compute power or VRAM (usually).

My question is there anyone (or companies) out there who actually cares to upgrade based on speed?

Like is anyone going through the upgrading process simply because they want to train their models faster(save time)? Or bring more value to their clients by having their models inference faster?

I'd like anyone's opinion on this but if you fit the description of this type of client, I'd like to know the thought process of upgrading. Whether you've been through it in the past or something you're going through now.

18

Comments

You must log in or register to comment.

suflaj t1_iu1yx11 wrote

I don't think upgrading is ever worth. It's easier to just scale horizontally, i.e. buy more hardware.

The hardware you do inference on for production is not bought anyways, it's mostly rented, so that doesn't matter. And if you are running models on an edge device you don't have much choice.

1

sckuzzle t1_iu2aa7o wrote

We use models to control things in real-time. We need to be able to predict what is going to happen in 5 or 15 minutes and proactively take actions NOW. If it takes 5 minutes to predict what is going to happen 5 minutes in the future, the model is useless.

So yes. We care about speed. The faster it runs the more we can include in the model (making it more accurate).

13

THE_REAL_ODB t1_iu37080 wrote

I cant imagine it not being very important in any setting.

2

konze t1_iu37t3g wrote

I’m coming from academia with a lot of industry connections. Yes, there are a lot of companies that need fast DNN inference to point where they build custom ASICs just to fulfill their latency demands.

3

VonPosen t1_iu395me wrote

Yes, I spend a lot of time making sure our models train and infer as fast as possible. Faster training/inference means cheaper training/inference. That also means we can afford more training.

3

mayiSLYTHERINyourbed t1_iu3dafc wrote

On a regular basis. We care down to the ms how fast inference or training is. In my last organisation we had to process like 200k images while inferencing. At this point even a delay of 2ms would cost 6.7 minutes just for getting the feature vectors. Which really matters.

3

ShadowStormDrift t1_iu3fkqs wrote

I code up a semantic search engine. I was able to get it down to 3 seconds for one search.

That's blazingly fast by my standard (used to take 45 minutes) that still haunts my dreams. If 10 people use the site simultaneously that's 30 seconds before number 10 gets his results back. Which is unacceptable.

So yes. I do care if I can get that done quicker.

3

hp2304 t1_iu3ixav wrote

Inference: If real time is a requirement then it's necessary to buy high end GPUs to reduce latency other than that it's not worth it.

Training: This loosely depends on how often is a model reiterated in production. Suppose if that period is one year (seems reasonable to me), which means new model will be trained on new data gathered over this duration plus old data. Doing this fast won't make a difference. I would rather use slow GPU even if take days or few weeks. It's not worth it.

A problem to DL models in general is they are only growing in terms of number of parameters. Requiring more VRAM to fit them in single GPU. Huge thanks to model parallelism techniques and ZERO which handles this issue. Otherwise one would have to buy new hardware to train large models. I don't like where AI research is headed. Increasing parameters is not an efficient solution, we need new direction to effectively and practically solve general intelligence. On top of that, models not detecting or misdetecting objects in self driving cars despite huge training datasets is a serious red flag showing we are still far from solving AGI.

3

waa007 t1_iu3x6bn wrote

Of course, it’s depends on applying situation

2

GPUaccelerated OP t1_iu4tflp wrote

This makes sense. Scaling horizontally is usually the case. Thank you for commenting!

But I would argue that hardware for inference is actually bought more than one would assume. I have many clients who purchase mini-workstations to put in settings where data processing and inference jobs are done in the same premise. To limit latency and data travel.

1

GPUaccelerated OP t1_iu4u69c wrote

Wow, your perspective is really something to take note of. I appreciate your comment!

What I'm understanding is that speed matters more in inference than it does for training.

1

suflaj t1_iu4ue5y wrote

Well, that is your clients' choice. It's not cost effective to buy Quadros when you could just rent them as you go, especially given their low resale value. It's not like there are many places you can't rent a nearby server with sub 10ms or at least 100ms latency.

2

GPUaccelerated OP t1_iu4umuw wrote

Yeah, see in your use case, speed makes so much sense. Thank you for sharing.

Mind sharing that site with us here?

I'm always interested in taking a look at cool projects.

Also what kind of hardware is currently tasked with your project's inference?

1

GPUaccelerated OP t1_iu4uxld wrote

That makes a lot of sense. And also really cool. Also, people resorting to ASICs for inference are definitely playing in the big boy leagues.

Thanks for sharing!

1

GPUaccelerated OP t1_iu4ve0v wrote

OK right. That's also a project with immense scale.

I guess the bigger the project, the more inference speed is required. But I've never heard about caring deeply about the ms in training. Mind sharing why that was important in that use case?

1

GPUaccelerated OP t1_iu4w6oh wrote

The perspective of your use case makes so much sense. I appreciate you sharing that info!

Mind sharing which use case that would be? I'm also trying to pin point which industries care about model speed.

3

suflaj t1_iu4zaqo wrote

Well then it's a matter of trust - every serious cloud provider has a privacy policy that claims nothing is logged. Of course, you don't have to trust this, but this is a liability for the cloud provider, so you get to shift the blame if something goes wrong. And I'd argue that for most companies the word of a cloud peovider means more than your word, since they've got much to lose.

It's also standard practice to use end-to-end encryption, with some using end-to-end encrypted models. I don't really see a way how our company would handle personal data and retain samples in a GDPR compliant way without proprietary models in the cloud.

2

ShadowStormDrift t1_iu53ih6 wrote

Of course!

www.sasdghub.up.ac.za

The semantic search as well as a few other key features haven't made it up yet. We're aiming to have them up end of November, mid December.

We've got a two server setup with the second being our "Work-horse" intended for GPU related jobs. It's an RTX 3090 with 32GB VRAM, 64GB DDR4 RAM and a 8 core CPU (I forget it's exact setup)

2

DrXaos t1_iu5j1mj wrote

They care about cost for certain. Speed and hardware may relate to that.

2

sckuzzle t1_iu5mxmx wrote

This kind of thing is likely applicable to digital twins of many fields. The idea is to create a digital representation of whatever you are trying to model and run it alongside the real thing. It has applications in control engineering and predictive / prescriptive analytics. Depending on the application this could be done many ways (not necessarily using neural nets at all) and be fast or slow to run.

2

mayiSLYTHERINyourbed t1_iu7im0x wrote

Our use case was in biometrics, where the test sample would usually range in millions of images which needed to be matched simultaneously. Over here even accumulating 2-3ms over each batch or batch would lead to huge delay.

2

GPUaccelerated OP t1_iuitwy2 wrote

not exactly sure, i'm not a lawyer. But it's something that gets taken very seriously by a lot of my medical field clients. Its definitely something for their side, not mine. I just help those specific clients go on-prem

1

suflaj t1_iujhefz wrote

I asked for the specific law so I could show you that it cannot apply to end-to-end encrypted systems, which either have partly destroyed information, or the information that leaves the premises is not comprehensible to anything but the model and there is formal proof that it is infeasible to crack it.

These are all long solved problems, the only hard part is doing hashing without losing too much information, or encryption compact enough to both fit into the model and be comprehensible to it.

2