ggf31416
ggf31416 t1_jcmql4c wrote
Reply to comment by f-d-t777 in [D] Simple Questions Thread by AutoModerator
At the speeds these things move, when you see them coming it's already too late to do any corrective maneuver. It's the same reason you don't use your eyeballs to detect aircraft 100km away. See https://en.wikipedia.org/wiki/Space_debris#Tracking_and_measurement, Algorithms to Antenna: Tracking Space Debris with a Radar Network, RADAR and LIDAR are used.
ggf31416 t1_jca7zwz wrote
Reply to [D] Choosing Cloud vs local hardware for training LLMs. What's best for a small research group? by PK_thundr
https://fullstackdeeplearning.com/cloud-gpus/
Your best bet to reach 256Gb in the cloud would be Azure with 4x80GB A100 instances, however your 40k budget will only buy you 3000 hours of compute at best on demand, with spot instances stretching that a bit further.
If that's not enough for you then you will have to figure out how to make a server with RTX A6000 Adas with 48GB each. RTX4090 would be cheaper but there may be legal issues due to the gaming driver license, you would need to use multiple servers due to power usage or strongly limit the power limit, and Nvidia dropped P2P that may o may not matter depending on how much communication you need between the GPUs (https://discuss.pytorch.org/t/ddp-training-on-rtx-4090-ada-cu118/168366)
ggf31416 t1_jb4j0uk wrote
Reply to [D] Best way to run LLMs in the cloud? by QTQRQD
Good luck getting a EC2 with a single A100, last time I checked, AWS only offered instances with 8 of them at a high price.
ggf31416 t1_jacq8pl wrote
Reply to comment by ahiddenmessi2 in [D] Training transformer on RTX2060 by ahiddenmessi2
For reference a RTX 3090 can be rented as low as ~ $0.25/hour at vast.ai with just a credit card if you are in a hurry (AWS and GCP require a quota increase to use GPUs), or you may be able to get free credits for research at major cloud providers.
ggf31416 t1_jac61sd wrote
Reply to [D] Training transformer on RTX2060 by ahiddenmessi2
2060 has 6GB of VRAM, right?
It should be possible to train with that amount https://huggingface.co/docs/transformers/perf_train_gpu_one#optimizer
If you need to train from scratch (most people will just finetune) this will take a while, original training took 90 hours in 8xV100, each one should be faster than your GPU https://www.arxiv-vanity.com/papers/1910.01108/
ggf31416 t1_j9clwen wrote
Reply to comment by DevarshTare in [D] What matters while running models? by DevarshTare
I actually have a 3060 too, in theory a 3060ti should be up to 30% faster, but most of the times the 3060 is fast enough and faster than any T4.
For making a few images on stable diffusion maybe the difference will be 15 vs 20 seconds, for running whisper on several hours of audio it could be 45 minutes vs 1 hour. The difference will only matter if the model is optimized to fully use the GPU in the first place.
ggf31416 t1_j9a8p88 wrote
Reply to comment by DevarshTare in [D] What matters while running models? by DevarshTare
3070 and 3060ti both have 8GB, and while the 3070 will be a bit faster, most people will agree that the difference is not worth the price if you have a tight budget.
For training the extra 4GB from the plain 3060 is quite useful, but for inference only you can run most small and medium models (such as stable diffusion) in 8GB and the 3060ti will be faster.
ggf31416 t1_j99y9e1 wrote
Reply to [D] What matters while running models? by DevarshTare
https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/
https://lambdalabs.com/gpu-benchmarks
How much VRAM you need will depend mostly on the number of parameters of the model with some extra for the data. At FP32 precision each parameter needs 4 bytes, at FP16 or BF16 2 bytes, and at FP8 or INT8 only one byte. Almost all models can be run at FP16 without noticeable accuracy loss, FP8 sometimes works, sometimes it doesn't depending on the model.
ggf31416 t1_j7waxlu wrote
It will depend on how much preprocessing and augmentation is needed. I don't think text needs much preprocessing or augmentation, but for example image classification or detection training needs to create a different augmented image on each iteration and will benefit from a more powerful processor.
Note that you can also use cloud services. If you aren't dealing with confidential data vast ai often is one of the cheapest, otherwise you can use Lambda Labs, Google Engine, AWS or other services. At least in the case of Google Engine and AWS you have to request access to GPU instances, which may take some time.
ggf31416 t1_j741sxn wrote
Reply to [D] PC takes a long time to execute code, possibility to use a cloud/external device? by Emergency-Dig-5262
One possibility is GPU acceleration using the cuML framework, but if you are must use a specific framework like sklearn it won't be feasible. https://medium.com/rapids-ai/accelerating-random-forests-up-to-45x-using-cuml-dfb782a31bea
There are some alternatives such as Google, AWS, Gradient, you may be able to get student credits. Also, even if you don't need a GPU, you can rent an instances with many CPU cores at Vast.ai for cheap (even with the GPU it's cheaper than a CPU only AWS instance with the same amount of cores), for example the cheapest instance with 16vCPU is < $0.20/hour and only needs a credit card. The main issue with vast.ai is that you should save your results before shutting down the instance because they are tied to the machine which may become unavailable.
ggf31416 t1_j21s4dy wrote
Reply to comment by Tom_Neverwinter in [R] PyTorch | Budget GPU Benchmarking by zveroboy152
Not a great choice unless you really need the cheapest way to get to 24GB
https://www.reddit.com/r/MLQuestions/comments/rttzxg/tesla_m40_24gb_gpu_very_poor_machinelearning/
ggf31416 t1_j0zw53a wrote
Reply to comment by RealDaddy2020 in [D] Question: best 'starting' server to train deep ML models by KlausMich
Not very good https://www.reddit.com/r/MLQuestions/comments/rttzxg/tesla_m40_24gb_gpu_very_poor_machinelearning
Tesla cards typically are more expensive than the consumer equivalent even at the same performance and amount of memory.
ggf31416 t1_j0ypnpp wrote
Training a large model only in CPU is madness, it will take forever and waste a lot of electricity. You need a GPU with CUDA or an equivalent solution fully supported by your framework. See e.g. this benchmark.
A t2.micro instance may be free during the free trial but is useless for anything resource intensive. You are much better off just using Google Colab or Kaggle notebooks.
If you have to train models very often (like everyday) and 24GB from a RTX3090 or better a RTX4090 is enough, a dedicated computer is the most cost effective way in the long run. If you cant afford a RTX3090 and 12GB is enough, a 3060 with 12GB will do (for ML we usually want as much VRAM as possible, raw computing power often is not the bottleneck).
Vast ai is a cost effective way of renting computing power for non-constant use, much cheaper than AWS or GCP, but beware that because of how it works the instance is not fully secure against attacks from the host so you can't use it with sensitive data.
Any good CUDA GPU will be able to train with a small dataset in less of a day, so take that into account for the decision between purchasing a GPU and cloud computing.
ggf31416 t1_j0dzi42 wrote
ggf31416 t1_iyfacbd wrote
Reply to [D] CPU - which one to choose? by krzaki_
Do you want something that will run on battery or something powerful that won't last too much in battery?
The ones you mentioned belong to the first category, if you want something powerful you will need a GPU, one from Nvidia will make things much easier than one from AMD.
ggf31416 t1_jdesxc0 wrote
Reply to [D] Which AI model for RTX 3080 10GB? by SomeGuyInDeutschland
With memory offloading and 8-bit quantization you may be able to run the 13B model, but slowly. The 7B will be faster.