>NDP200 is designed natively run deep neural networks (DNN) on a variety of architectures, such as CNN, RNN, and fully connected networks, and it performs vision processing with highly accurate inference at under 1mW.

>Up to 896k neural parameters in 8bit mode, 1.6M parameters in 4bit mode, and 7M+ In 1bit mode

An arduino idles at about 10mw, for comparison.

The idea is that if you're not shuffling the entire network weights across the memory bus every inference cycle, you save ludicrous amounts of time and energy. Someday, we'll use this kind of tech to run LLMs on our phones.

2muchnet42day t1_jczooi6 wrote on March 20, 2023 at 7:51 PM

#2,283,028

Replying to UnusualClimberBear (#2,282,800)

Yeah, I wouldn't buy AMD either. It's a shame that NVIDIA is basically a monopoly in a AI, but it is what it is.

gybemeister t1_jczucbf wrote on March 20, 2023 at 8:27 PM

#2,283,434

Replying to currentscurrents (#2,282,617)

Any reason, beside price, to buy 3090s instead of 4090s?

currentscurrents t1_jczuqo8 wrote on March 20, 2023 at 8:29 PM

#2,283,463

Replying to gybemeister (#2,283,434)

Just price. They have the same amount of VRAM. The 4090 is faster of course.

I_will_delete_myself t1_jczvx4j wrote on March 20, 2023 at 8:37 PM

#2,283,550

Replying to currentscurrents (#2,282,617)

That or just use the cloud until Nvidia releases a 48gb gpu (which will happen sooner than one would think. Games are getting limited by VRAM)

satireplusplus t1_jczz8e6 wrote on March 20, 2023 at 8:58 PM

#2,283,810

Replying to currentscurrents (#2,283,463)

VRAM is the limiting factor to run these things though, not tensor cores

currentscurrents t1_jd007aa wrote on March 20, 2023 at 9:05 PM

#2,283,886

Replying to satireplusplus (#2,283,810)

Right. And even once you have enough VRAM, memory bandwidth limits the speed more than tensor core bandwidth.

They could pack more tensor cores in there if they wanted to, they just wouldn't be able to fill them with data fast enough.

ertgbnm t1_jd028k5 wrote on March 20, 2023 at 9:18 PM

#2,284,054

I heard 30B isn't very good. Anyone with experience disagree?

LetMeGuessYourAlts t1_jd02jkq wrote on March 20, 2023 at 9:20 PM

#2,284,074

Replying to gybemeister (#2,283,434)

Used availability is better on the 3090 as well. I got one for $740 on eBay. Little dust on the heatsinks but at half price it was a steal.

Educational-Net303 t1_jd03se1 wrote on March 20, 2023 at 9:28 PM

#2,284,155

Replying to I_will_delete_myself (#2,283,550)

What game is limited by vram? I haven't heard of any game running over 24gb unless it's Skyrim with a bunch of 8k mods

I_will_delete_myself t1_jd04mia wrote on March 20, 2023 at 9:34 PM

#2,284,199

Replying to Educational-Net303 (#2,284,155)

people are demanding more and more interactivity in their video games (look at the trend of open worlds). It’s only gonna get bigger.

Educational-Net303 t1_jd051kh wrote on March 20, 2023 at 9:37 PM

#2,284,234

Replying to I_will_delete_myself (#2,284,199)

Cyberpunk on max with psycho takes ~16gb max. It's gonna be a few years before we actually see games demanding more than 24.

I_will_delete_myself t1_jd05atn wrote on March 20, 2023 at 9:38 PM

#2,284,252

Replying to Educational-Net303 (#2,284,234)

Now try that on 2-4 monitors. You would be surprised how premium gamers like their hardware. It’s like checking out sports cars but for nerds like me.

Educational-Net303 t1_jd05hmc wrote on March 20, 2023 at 9:40 PM

#2,284,265

Replying to I_will_delete_myself (#2,284,252)

Are we still taking consumer grade hardware or specialized GPU made for a niche crowd?

Straight-Comb-6956 t1_jd08cq1 wrote on March 20, 2023 at 9:59 PM

#2,284,445

Replying to currentscurrents (#2,282,617)

LLaMa/Alpaca work just fine on CPU with llama.cpp/alpaca.cpp. Not very snappy(1-15 tokens/s depending on model size), but fast enough for me.

pointer_to_null t1_jd0bv74 wrote on March 20, 2023 at 10:23 PM

#2,284,680

Replying to currentscurrents (#2,283,886)

This is definitely true. Theoretically you can page stuff in/out of VRAM to run larger models, but you won't be getting much benefit over CPU compute with all that thrashing.

currentscurrents t1_jd0f76v wrote on March 20, 2023 at 10:47 PM

#2,284,899

Replying to Educational-Net303 (#2,284,155)

I mean of course not, nobody would make such a game right now because there are no >24GB cards to run it on.

rolexpo t1_jd0fvle wrote on March 20, 2023 at 10:51 PM

#2,284,944

Replying to currentscurrents (#2,282,735)

You'll have better luck waiting for Intel

whyvitamins t1_jd0j5zq wrote on March 20, 2023 at 11:15 PM

#2,285,177

Replying to currentscurrents (#2,282,735)

> hope that AMD gets their act together on AI support

walking around picking up coins from the ground to buy a 3090 should be faster honestly

42gether t1_jd0juau wrote on March 20, 2023 at 11:19 PM

#2,285,212

Replying to Educational-Net303 (#2,284,265)

Niche supercar gamers start up the industry which then will lead into realistic VR which will then lead into consumer high quality stuff?

Educational-Net303 t1_jd0k6p6 wrote on March 20, 2023 at 11:22 PM

#2,285,240

Replying to 42gether (#2,285,212)

Which takes years

ItsGrandPi t1_jd0l3mp wrote on March 20, 2023 at 11:28 PM

#2,285,305

Time to see if I can get this running on Dalai

RoyalCities t1_jd0m4vt wrote on March 20, 2023 at 11:36 PM

#2,285,375

Thanks. So bit confused here. Ot mentions needing an A100 to train. Am I able to run this off a 3090?

wojtek15 t1_jd0p206 wrote on March 20, 2023 at 11:57 PM

#2,285,546

Replying to currentscurrents (#2,283,463)

Hey, recently I was thinking if Apple Silicon Macs may be best thing for AI in the future. Most powerful Mac Studio has 128Gb of Uniform RAM which can be used by CPU, GPU or Neural Engine. If only memory size is considered, even A100, let alone any consumer oriented model, can't match. With this amount of memory you could run GPT3 Davinci size model in 4bit mode.

Civil_Collection7267 t1_jd0pcqf wrote on March 20, 2023 at 11:59 PM

#2,285,572

Replying to ertgbnm (#2,284,054)

Untuned 30B LLaMA, you're saying? It's excellent and adept at storywriting, chatting, and so on, and it can output faster than ChatGPT at 4-bit precision. While I'm not into this myself, I understand that there is a very large RP community at subs like CharacterAI and Pygmalion, and the 30B model is genuinely great for feeling like talking to a real person. I'm using it with text-generation-webui and custom parameters and not the llama.cpp implementation.

For assistant tasks, I've been using either the ChatLLaMA 13B LoRA or the Alpaca 7B LoRA, both of which are very good as well. ChatLLaMA, for instance, was able to answer a reasoning question correctly that GPT-3.5 failed, but it has drawbacks in other areas.

The limitations so far are that none of these models can answer programming questions competently yet, and a finetune for that will be needed. They also have the tendency to hallucinate frequently unless parameters are made more restrictive.

pier4r t1_jd0pf1x wrote on March 20, 2023 at 11:59 PM

#2,285,581

Replying to wojtek15 (#2,285,546)

> 128Gb of Uniform RAM which can be used by CPU, GPU or Neural Engine.

But it doesn't have the same bandwidth as the VRAM on the GPU card iirc.

Otherwise every integrated GPGPU would be better due to available ram.

The neural engine on M1 and M2 is usable IIRC only with apple libraries, that may not be used by notable models yet.

Bloaf t1_jd0qy7z wrote on March 21, 2023 at 12:10 AM

#2,285,667

Replying to RoyalCities (#2,285,375)

You can run it on your CPU. My old i7 6700k spits out 13B words a little slower than I could read them out loud. I'll test the 30B tonight on my 5600X

The_frozen_one t1_jd0sqd7 wrote on March 21, 2023 at 12:23 AM

#2,285,768

Replying to RoyalCities (#2,285,375)

You can run llama-30B on a CPU using llama.cpp, it's just slow. The alpaca models I've seen are the same size as the llama model they are trained on, so I would expect running the alpaca-30B models will be possible on any system capable of running llama-30B.

ertgbnm t1_jd0xwfh wrote on March 21, 2023 at 1:00 AM

#2,286,108

Replying to Civil_Collection7267 (#2,285,572)

Good to hear. Thanks!

mycall t1_jd0yi8i wrote on March 21, 2023 at 1:05 AM

#2,286,146

Replying to currentscurrents (#2,283,014)

> if you're not shuffling the entire network weights across the memory bus every inference cycle

Isn't this common though?

mycall t1_jd0ytah wrote on March 21, 2023 at 1:07 AM

#2,286,167

Replying to The_frozen_one (#2,285,768)

alpaca-30B > llama-30B ?

lurkinginboston t1_jd0zr7c wrote on March 21, 2023 at 1:14 AM

#2,286,238

Replying to Straight-Comb-6956 (#2,284,445)

I will assume you are much more knowledgeable than I am in this space.. have few basic questions that have been bothering me since all the craze started around GPT and LLM recently.

I managed to get Alpaca working on my end using the above link and get very good result. LLaMa biggest takeaway was it is able to reproduce quality comparable to GPT and much lower compute size. If this is the case, why is the ouput much shorter on LLaMa than what I get on OpenGPT? I would imagine the OpenGPT reponse is much longer because ... it is just bigger? What is the limiting factor to not for us to get longer generated response comparable to GPT?

ggml-alpaca-7b-q4.bin is only 4 gigabyt - I guess this what it means by 4bit and 7 billion parameter. Not sure if rumor or fact, GPT3 model is 128B, does it mean if we get trained model of GPT, and manage to run 128B locally, will it give us the same results? Will it be possible to retrofit GPT model within Alpaca.cpp with minor enhancement to get output JUST like OpenGPT? I have read to fit 128B, it requires muliple Nvidia A100.

Last question, inference means that it gets output from a trained model. Meta/OpenAI/Stability.ai have the resources to train a model. If my understanding is correct, Alpaca.cpp or https://github.com/ggerganov/llama.cpp are a sort of 'front-end' for these model. They allow us to provide an input to get an output by inference with the model. The question I am trying to ask is, what is so great about llama.cpp? Is it because it's in C? I know there is Rust version of it out, but it uses llama.cpp behind the scene. Is there any advantage of an inference to be written in Go or Python?

currentscurrents t1_jd10ab5 wrote on March 21, 2023 at 1:18 AM

#2,286,266

Replying to pier4r (#2,285,581)

Llamma.cpp uses the neural engine, so does StableDiffusion. And the speed is not that far off from VRAM, actually.

>Memory bandwidth is increased to 800GB/s, more than 10x the latest PC desktop chip, and M1 Ultra can be configured with 128GB of unified memory.

By comparison, the Nvidia 4090 is clocking in at ~1000GB/s

Apple is clearly positioning their devices for AI.

VodkaHaze t1_jd11vhm wrote on March 21, 2023 at 1:29 AM

#2,286,406

Replying to currentscurrents (#2,283,014)

There's also the tenstorrent chips coming out to public which are vastly more efficient than nvidia stuff

The_frozen_one t1_jd125zf wrote on March 21, 2023 at 1:31 AM

#2,286,420

Replying to mycall (#2,286,167)

Not sure I understand. Is it better? Depends on what you're trying to do. I can say that alpaca-7B and alpaca-13B operate as better and more consistent chatbots than llama-7B and llama-13B. That's what standard alpaca has been fine-tuned to do.

Is it bigger? No, alpaca-7B and 13B are the same size as llama-7B and 13B.

hosjiu t1_jd1a6az wrote on March 21, 2023 at 2:31 AM

#2,286,983

Replying to Civil_Collection7267 (#2,285,572)

"They also have the tendency to hallucinate frequently unless parameters are made more restrictive."

I am not really understand this point in term of technical

currentscurrents t1_jd1c52o wrote on March 21, 2023 at 2:47 AM

#2,287,090

Replying to VodkaHaze (#2,286,406)

Doesn't look like they sell in individual quantities right now but I welcome any competition in the space!

uspmm2 t1_jd1jh1b wrote on March 21, 2023 at 3:50 AM

#2,287,518

Replying to Straight-Comb-6956 (#2,284,445)

are you talking about the 30b one?

remghoost7 t1_jd1k0l6 wrote on March 21, 2023 at 3:55 AM

#2,287,557

Replying to wojtek15 (#2,285,546)

>...Uniform RAM which can be used by CPU, GPU or Neural Engine.

Interesting....

That's why I've seen so many M1 implementations of machine learning models. It really does seem like the M1 chips were made with AI in mind....

AnOnlineHandle t1_jd1k2un wrote on March 21, 2023 at 3:56 AM

#2,287,563

Replying to 2muchnet42day (#2,282,636)

They haven't been sold in Australia for months, only second hand.

KerfuffleV2 t1_jd1kfyp wrote on March 21, 2023 at 3:59 AM

#2,287,582

Replying to lurkinginboston (#2,286,238)

Note: Not the same person.

> I would imagine the OpenGPT reponse is much longer because ... it is just bigger?

llama.cpp recently added a commandline flag to disable the end of message marker from getting generated, so that's one way you can try to force responses to be longer. (It doesn't always work, because the LLM can start generating irrelevant content.)

The length of the response isn't directly related to the size of the model, but just having less information available/relevant could mean it has less to talk about in a response.

> GPT3 model is 128B, does it mean if we get trained model of GPT, and manage to run 128B locally, will it give us the same results?

If you have the same model and you give it the same prompt, you should get the same result. Keep in mind if you're using some other service like ChatGPT you aren't directly controlling the full prompt. I don't know about OpenGPT, but from what I know ChatGPT has a lot of special sauce not just in the training but other stuff like having another LLM write summaries for it so it keeps track of context better, etc.

> Last question, inference means that it gets output from a trained model.

Inference is running a model that's already been trained, as far as I know.

> If my understanding is correct, Alpaca.cpp or https://github.com/ggerganov/llama.cpp are a sort of 'front-end' for these model.

The model is a bunch of data that was generated by training. Something like llama.cpp is what actually uses that data: keeping track of the state, parsing user input into tokens that can be fed to the model, performing the math calculations that are necessary to evaluate its state, etc.

"Gets its output from", "front end" sound like kind of weird ways to describe what's going on. Just as an example, modern video formats and compression for video/audio is pretty complicated. Would you say that a video player "gets its output" from the video file or is a front-end for a video file?

> The question I am trying to ask is, what is so great about llama.cpp?

I mean, it's free software that works pretty well and puts evaluating these models in reach of basically everyone. That's great. It's also quite fast for something running purely on CPU. What's not great about that?

> I know there is Rust version of it out, but it uses llama.cpp behind the scene.

I don't think this is correct. It is true that the Rust version is (or started out) as a port of the C++ version but it's not using it behind the scenes. However, there's a math library called GGML that both programs use, it does the heavy lifting of doing the calculations for the data in the models.

> Is there any advantage of an inference to be written in Go or Python?

Same advantage as writing anything in Go, which is... Just about nothing in my opinion. See: https://fasterthanli.me/articles/i-want-off-mr-golangs-wild-ride

Seriously though, this is a very, very general question and can be asked about basically any project and any set of programming languages. There are strengths and weaknesses. Rust's strength is high performance, ability to do low level stuff like C, and it has a lot of features aimed at writing very reliable software that handles stuff like edge cases. This comes at the expense of having to deal with all those details. On the other hand, a language like Python is very high level. You can just throw something together and ignore a lot of details and it still can work (unless it runs into an unhandled case). It's generally a lot slower than languages like Rust, C, C++ and even Go.

However, for running LLMs, most of the processing is math calculations and that will mean calling into external libraries/modules that will be written in high performance languages like C, Rust, etc. Assuming a Python program is taking advantage of that kind of resource, I wouldn't expect it to be noticeably slow.

So, like a lot of the time, it comes down to personal preference of what the developer wants to use. The person who wrote the Rust version probably like Rust. The person who wrote the C++ version likes C++, etc.

SpiritualCyberpunk t1_jd1m06i wrote on March 21, 2023 at 4:15 AM

#2,287,676

Replying to Straight-Comb-6956 (#2,284,445)

Idk why, but after the first answer to a question addressed to it, mine spewed out random nonsense. Literally unrelated things.

whyvitamins t1_jd1mddg wrote on March 21, 2023 at 4:18 AM

#2,287,691

Replying to currentscurrents (#2,282,617)

realistically, what's the cheapest one can get a used functioning 3090 rn? like 700 usd minimum?

cbsudux t1_jd1qzp7 wrote on March 21, 2023 at 5:09 AM

#2,287,970

How long did the training take on an A100?

Straight-Comb-6956 t1_jd1srkd wrote on March 21, 2023 at 5:31 AM

#2,288,063

Replying to uspmm2 (#2,287,518)

Haven't tried the 30B model. 65B takes 900ms/token on my machine.

royalemate357 t1_jd1stda wrote on March 21, 2023 at 5:31 AM

#2,288,065

Replying to hosjiu (#2,286,983)

Not op, but I imagine they're referring to the sampling hyperparameters that control the text generation process. For example there is a temperature setting, a lower temperature makes it sample more from the most likely choices. So it would potentially be more precise/accurate but also less diverse and creative in it's outputs

Enturbulated t1_jd1x9uu wrote on March 21, 2023 at 6:30 AM

#2,288,287

Replying to pointer_to_null (#2,284,680)

You are absolutely correct. text-gen-webui offers "streaming" via paging models in and out of VRAM. Using this your CPU no longer gets bogged down with running the model, but you don't see much improvement in generation speed as the GPU is churning with loading and unloading model data from main RAM all the time. It can still be an improvement worth some effort, but it's far less drastic of an improvement than when the entire model fits in VRAM.

shafall t1_jd2380o wrote on March 21, 2023 at 7:56 AM

#2,288,578

Replying to Enturbulated (#2,288,287)

To give some more specifics, most of the time its not the CPU that copies the data on modern systems, it is the PCI DMA chip (that may be on the same die though). CPU just sends address ranges to DMA Info

gliptic t1_jd2bsc7 wrote on March 21, 2023 at 10:03 AM

#2,288,992

Replying to lurkinginboston (#2,286,238)

In fact, GPT3 is 175B. But GPT3 is old now and doesn't make effective use of those parameters.

Straight-Comb-6956 t1_jd2iwp6 wrote on March 21, 2023 at 11:30 AM

#2,289,448

Replying to currentscurrents (#2,286,266)

> Llamma.cpp uses the neural engine,

Does it?

benfavre t1_jd2n1cg wrote on March 21, 2023 at 12:12 PM

#2,289,745

Replying to cbsudux (#2,287,970)

1 epoch of finetuning the 30B model with llama-lora implementation, mini-batch-size=2, maxlen=384, is about 11 hours.

42gether t1_jd2rfb6 wrote on March 21, 2023 at 12:51 PM

#2,290,059

Replying to Educational-Net303 (#2,285,240)

Okay, thank you for your input.

And?

Newsflash everything we did started because some cunt felt like growing lungs and wanting oxygen from the air.

It all takes time, what are you trying to argue?

Educational-Net303 t1_jd2rsax wrote on March 21, 2023 at 12:54 PM

#2,290,085

Replying to 42gether (#2,290,059)

My whole point is that it will take years before we get to 48GB vram consumer GPUs. You just proved my point again without even reading it.

SWESWESWEh t1_jd2s9ml wrote on March 21, 2023 at 12:58 PM

#2,290,119

Replying to wojtek15 (#2,285,546)

Unfortunately, most code out there is using calls to cuda explicitly rather then checking the GPU type you have and using that. You can fix this yourself, (I use an m1 macbook pro for ML and it is quite powerful) but you need to know what you're doing and it's just more work. You might also run into situations where things are not fully implemented in Metal Performance Shaders (the mac equivalent to cuda), but Apple does put a lot of resources into making this better

[deleted] t1_jd306w6 wrote on March 21, 2023 at 1:59 PM

#2,290,745

Replying to Educational-Net303 (#2,290,085)

[removed]

pier4r t1_jd39md4 wrote on March 21, 2023 at 3:05 PM

#2,291,483

Replying to currentscurrents (#2,286,266)

> Llamma.cpp uses the neural engine

I am trying to find confirmation for this but I didn't. I saw some ports, but weren't from the LLaMa team. Do you have any source?

2muchnet42day t1_jd3pu0m wrote on March 21, 2023 at 4:50 PM

#2,292,703

Replying to benfavre (#2,289,745)

Can you train with 24 gigs of vram ?

msgs t1_jd46yf9 wrote on March 21, 2023 at 6:38 PM

#2,293,886

Replying to Straight-Comb-6956 (#2,288,063)

do you have a link to a torrent/download for the 30B or 65B weights that works with Alpaca.cpp? reddit DMs are fine if don't want to post it publicly.

keeplosingmypws t1_jd5xygm wrote on March 22, 2023 at 1:40 AM

#2,298,597

Replying to KerfuffleV2 (#2,287,582)

I have the 16B parameter version of Alpaca.cpp (and a copy of the training data as well as the weights) installed locally on a machine with an Nvidia 3070 GPU. I know I can launch my terminal using the Discrete Graphics Card option, but I also believe this version was built for CPU use and I’m guessing that I’m not getting the most out of my graphics card

What’s the move here?

frownyface t1_jd6q1qi wrote on March 22, 2023 at 6:08 AM

#2,300,748

Replying to currentscurrents (#2,284,899)

There was an insane age of PC gaming where hardware was moving so fast that game developers were releasing games with max-settings that didn't run on any current hardware to try to future proof themselves from having a game suddenly feeling obsolete shortly after launch.

KerfuffleV2 t1_jd7sb4u wrote on March 22, 2023 at 1:33 PM

#2,303,234

Replying to keeplosingmypws (#2,298,597)

llama.cpp and alpaca.cpp (and also related projects like llama-rs) only use the CPU. So not only are you not getting the most out of your GPU, it's not getting used at all.

I have an old GPU with only 6GB so running larger models on GPU isn't practical for me. I haven't really looked at that aspect of it much. You could start here: https://rentry.org/llama-tard-v2

Keep in mind you will need to be pretty decent with technical stuff to be able to get it working based on those instructions even though they are detailed.

C0demunkee t1_jd8l0tg wrote on March 22, 2023 at 4:42 PM

#2,305,532

Replying to currentscurrents (#2,282,617)

maybe consider Tesla P40s

24gb, lots of CUDA cores, $150 each

C0demunkee t1_jd8svm2 wrote on March 22, 2023 at 5:31 PM

#2,306,157

Replying to whyvitamins (#2,287,691)

Tesla P40 24gb VRAM, $150 only 1 or 2 gen behind the 3090

[deleted] t1_jd8utna wrote on March 22, 2023 at 5:43 PM

#2,306,340

Replying to C0demunkee (#2,306,157)

[removed]

Genesis_Fractiliza t1_jd8w0b9 wrote on March 22, 2023 at 5:50 PM

#2,306,431

Replying to msgs (#2,293,886)

May I also have those please?

msgs t1_jd9fayg wrote on March 22, 2023 at 7:51 PM

#2,307,993

Replying to Genesis_Fractiliza (#2,306,431)

so far I haven't found a download. I'll let you know if I do.

msgs t1_jd9jpvl wrote on March 22, 2023 at 8:19 PM

#2,308,326

Replying to Genesis_Fractiliza (#2,306,431)

https://huggingface.co/Pi3141/alpaca-30B-ggml/tree/main

though I haven't tried to test it yet.

keeplosingmypws t1_jd9wpwm wrote on March 22, 2023 at 9:44 PM

#2,309,363

Replying to KerfuffleV2 (#2,303,234)

Thanks for leading me in the right direction! I’ll letcha know if I get it working

Unlucky_Excitement_2 t1_jdavhcr wrote on March 23, 2023 at 1:50 AM

#2,311,955

Replying to KerfuffleV2 (#2,287,582)

Bro what are you talking about LOL. Its context length he's discussing. There are multiple ways[all of which I'm expertimenting with] ->

flash attention
strided context window
finetuning on a dataset with longer sequences

[deleted] t1_jdbds13 wrote on March 23, 2023 at 4:22 AM

#2,313,532

[deleted]

msgs t1_jdbi9r2 wrote on March 23, 2023 at 5:09 AM

#2,313,868

magnet:?xt=urn:btih:6K5O4J7DCKAMMMAJHWXQU72OYFXPZQJG&amp;dn=ggml-alpaca-30b-q4.bin&amp;xl=20333638921&amp;tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80%2Fannounce

I hope this magnet link works properly. I've never created one before. This the alpaca.cpp 30B 4-bit weight file. Same file downloaded from huggingface. Apologies if it doesn't work. Ping me if it doesn't.