Comments
Meddhouib10 t1_jcptalr wrote
What are the techniques to male such large models run on low ressources ?
simpleuserhere OP t1_jcpttav wrote
This model is 4 bit quantized,so it will take less RAM (model size around 4GB)
legendofbrando t1_jcpybhl wrote
Anyone gotten it to run on iOS?
votegoat t1_jcq47v6 wrote
commenting to save for later
[deleted] t1_jcq6jko wrote
[deleted]
Prymu t1_jcqclnb wrote
You know that (new) reddit has a save feature
light24bulbs t1_jcqco8i wrote
Old reddit does too
timedacorn369 t1_jcqg4v6 wrote
What is the performance hit with various levels of quantization??
[deleted] t1_jcqj70f wrote
[removed]
schorhr t1_jcqwzek wrote
That's amazing!
Thank you for that link. With my old laptop and slow internet connection I'm struggling downloading visual studio and getting everything to work. I do have weights but still figuring out why building fails. Is there any way to download a prebuilt version?
[deleted] t1_jcr18u4 wrote
[deleted]
Pale-Dentist330 t1_jcr3e9m wrote
Can you add the steps here?
starstruckmon t1_jcrbf0m wrote
You can see some benchmarks here
simpleuserhere OP t1_jcreufr wrote
Hi, please check this branch https://github.com/rupeshs/alpaca.cpp/tree/linux-android-build-support
simpleuserhere OP t1_jcrfjsh wrote
Thanks,What error are you getting? With Vs compiler and cmake we can easily build it.
baffo32 t1_jcronvh wrote
- offloading and accelerating (moving some parts to memory mapped disk or gpu ram, this can also make for quicker loading)
- pruning (removing parts of the model that didn’t end up impacting outputs after training)
- further quantization below 4 bits
- distilling to a mixture of experts?
- factoring and distilling parts out into heuristic algorithms?
- finetuning to specific tasks (e.g. distilling/pruning out all information related to non-relevant languages or domains) this would likely make it very small
EDIT:
- numerous techniques published in papers over the past few years
- distilling into an architecture not limited by e.g. a constraint of being feed forward
[deleted] t1_jcrsk06 wrote
[deleted]
Taenk t1_jcs53iw wrote
The results for LLaMA-33B quantised to 3bit are rather interesting. That would be an extremely potent LLM capable of running on consumer hardware. Pity that there are no test results for the 2bit version.
Taenk t1_jcs5eon wrote
A proper port to the neural engine would be especially interesting. There was one by Apple for Stable Diffusion.
360macky t1_jcsanpu wrote
Thanks!
starstruckmon t1_jcswg1g wrote
I've heard from some experienced testers that the 33B model is shockingly bad compared to even the 13B one. Despite what the benchmarks say. That we should either use the 65B one ( very good apparently ) or stick to 13B/7B. Not because of any technical reason but random luck/chance involved with training these models and the resultant quality.
I wonder if there's any truth to it. If you've tested it yourself, I'd love to hear what you thought.
schorhr t1_jct3v62 wrote
Thanks for your reply!
I have not used vs and cmake before, so I am probably making all newbie mistakes. I've sorted out that some paths where not set, and that C:\mingw-32\bin\make.exe doesn't exist but it's now minigw-make.exe.
Now I get the error that
'C:/MinGW-32/bin/make.exe' '-?'
failed with:
C:/MinGW-32/bin/make.exe: invalid option -- ?
And from the few things I've found on-line I gathered it's because the mingw version doesn't support the option, but I should use Vs instead. I am a bit lost. Every time I manage to fix one issue, there's another one. :-)
simpleuserhere OP t1_jct4k2z wrote
I have updated readme with Windows build instructions,please check https://github.com/rupeshs/alpaca.cpp#windows
schorhr t1_jct58nc wrote
Thanks!
Both of the instructions (for Android which I'm attempting, but also the Windows instructions) result with the > C:/MinGW-32/bin/make.exe: invalid option -- ? error. I can't seem to figure out what make version I should use instead, or how to edit that.
simpleuserhere OP t1_jct9btk wrote
For Android build please use Linux ( tested with Ubuntu 20.04)
schorhr t1_jctb6tz wrote
Okay. I don't have the capacity right now (old laptop, disk too small to really use a second OS). I appreciate the help! I will once I get a new computer.
Taenk t1_jctdmvi wrote
I haven’t tried the larger models unfortunately. However I wonder how the model could be „shockingly bad“ despite having almost three times the parameter count.
starstruckmon t1_jcte34d wrote
🤷
Sometimes models just come out crap. Like BLOOM which has almost the same number of parameters as GPT3, but is absolute garbage in any practical use case. Like a kid from two smart parents that turns out dumb. Just blind chance.
Or they could be wrong. 🤷
ninjasaid13 t1_jcu1odb wrote
I have a problem with
C:\Users\****\source\repos\alpaca.cpp\build>make chat
make: *** No rule to make target 'chat'. Stop.
and
C:\Users\****\source\repos\alpaca.cpp>make chat
I llama.cpp build info: I UNAME_S: CYGWIN_NT-10.0 I
UNAME_P: unknown I UNAME_M: x86_64 I CFLAGS: -I.
-O3 -DNDEBUG -std=c11 -fPIC -mfma -mf16c -mavx -
mavx2 I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -
std=c++11 -fPIC I LDFLAGS: I CC: cc (GCC)
10.2.0 I CXX: g++ (GCC) 10.2.0
cc -I. -O3 -DNDEBUG -std=c11 -fPIC -
mfma -mf16c -mavx -mavx2 -c ggml.c -o ggml.o g++ -
I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -c
utils.cpp -o utils.o g++ -I. -I./examples -O3 -
DNDEBUG -std=c++11 -fPIC chat.cpp ggml.o utils.o -o
chat chat.cpp: In function 'int main(int, char**)':
chat.cpp:883:26: error: aggregate 'main(int,
char**)::sigaction sigint_action' has incomplete type
and cannot be defined 883 | struct sigaction
sigint_action; |
~~~~~~~~~~~~ chat.cpp:885:9: error: 'sigemptyset' was
not declared in this scope 885 | sigemptyset
(&sigint_action.sa_mask); | ~~~~~~~~~~
chat.cpp:887:47: error: invalid use of incomplete
type 'struct main(int, char**)::sigaction' 887 |
sigaction(SIGINT, &sigint_action, NULL); |
^ chat.cpp:883:16: note: forward declaration of
'struct main(int, char**)::sigaction' 883 |
struct sigaction sigint_action; |
~~~~~~~~ make: *** [Makefile:195: chat] Error 1
using windows.
simpleuserhere OP t1_jcu2ikl wrote
For Windows you need Visual C++ compiler, so install Visual Studio C++ 2019 build tools, follow the instruction here https://github.com/rupeshs/alpaca.cpp#windows
ninjasaid13 t1_jcu9nfv wrote
I believe I already have the build.
I still get this error
C:\Users\****\Downloads\alpaca\alpaca.cpp>make chat
I llama.cpp build info: I UNAME_S: CYGWIN_NT-10.0 I UNAME_P:
unknown I UNAME_M: x86_64 I CFLAGS: -I. -O3 -
DNDEBUG -std=c11 -fPIC -mfma -mf16c -mavx -mavx2 I
CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC I
LDFLAGS: I CC: cc (GCC) 10.2.0 I CXX: g++ (GCC)
10.2.0
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC chat.cpp
ggml.o utils.o -o chat chat.cpp: In function 'int main(int,
char**)': chat.cpp:883:26: error: aggregate 'main(int,
char**)::sigaction sigint_action' has incomplete type and
cannot be defined 883 | struct sigaction
sigint_action; | ~~~~~~~~~~~~
chat.cpp:885:9: error: 'sigemptyset' was not declared in this
scope 885 | sigemptyset (&sigint_action.sa_mask); |
~~~~~~~~~~ chat.cpp:887:47: error: invalid use of incomplete
type 'struct main(int, char**)::sigaction' 887 |
sigaction(SIGINT, &sigint_action, NULL); |
^ chat.cpp:883:16: note: forward declaration of 'struct
main(int, char**)::sigaction' 883 | struct sigaction
sigint_action; | ~~~~~~~~ make: *** [Makefile:195: chat] Error 1
simpleuserhere OP t1_jcu9x05 wrote
Are you using cygwin?
ninjasaid13 t1_jcuajwh wrote
yes I have cygwin.
simpleuserhere OP t1_jcubta9 wrote
I haven't tried cygwin for Alpaca.cpp.
ninjasaid13 t1_jcubyue wrote
so it won't work? do I need to install MinGW?
simpleuserhere OP t1_jcuc25e wrote
ninjasaid13 t1_jcufsqf wrote
I'm getting a new error
C:\Users\ninja\source\repos\alpaca.cpp>make chat
process_begin: CreateProcess(NULL, uname -s, ...) failed.
process_begin: CreateProcess(NULL, uname -p, ...) failed.
process_begin: CreateProcess(NULL, uname -m, ...) failed.
'cc' is not recognized as an internal or external command,
operable program or batch file. 'g++' is not recognized as an
internal or external command, operable program or batch file.
I llama.cpp build info: I UNAME_S: I UNAME_P: I UNAME_M: I
CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -
mfma -mf16c -mavx -mavx2 I CXXFLAGS: -I. -I./examples -O3 -
DNDEBUG -std=c++11 -fPIC I LDFLAGS: I CC: I CXX:
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC chat.cpp
ggml.o utils.o -o chat process_begin: CreateProcess(NULL, g++
-I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC chat.cpp
ggml.o utils.o -o chat, ...) failed. make (e=2): The system
cannot find the file specified. Makefile:195: recipe for
target 'chat' failed make: *** [chat] Error 2
1stuserhere t1_jcuyofc wrote
How fast is the model on android, u/simpleuserhere?
pkuba208 t1_jcvmhhm wrote
Depends on the hardware
Art10001 t1_jcwfyw8 wrote
I heard MoE is bad. I have no sources sadly.
Art10001 t1_jcwg2pl wrote
CoreML.
Please, review this link and associated research paper: https://github.com/apple/ml-ane-transformers
Art10001 t1_jcwg5zg wrote
You can really see how phones defeat 10 year old computers, as revealed by their Geekbench 5 scores.
Art10001 t1_jcwg7bv wrote
Try installing MSYS2.
ninjasaid13 t1_jcwwgt5 wrote
now what?
pkuba208 t1_jcx3d9i wrote
Well... I run this model on a raspberry pi 4B, but you will need AT LEAST 8gb ram
baffo32 t1_jcxqr2i wrote
i visited cvpr last year and people were saying that moe was what mostly was being used; i haven’t tried these things myself though
1stuserhere t1_jcxyj1o wrote
pixel 6 or 7 (or other modern phones from last 2-3 years)
Art10001 t1_jcy2jck wrote
I was asleep, my apologies for not replying earlier.
Run pacman -Syu
then pacman -Sy build-essential
then cd
to the build directory and follow the instructions
Art10001 t1_jcy2sb5 wrote
Raspberry Pi 4 is far slower than modern phones.
Also there was somebody else saying it probably actually uses 4/6 GB.
pkuba208 t1_jcy717u wrote
I know, but android uses 3-4gb ram itself. I run it myself, so I know that it uses from 6-7 gb of ram on the smallest model currently with 4bit quantization
pkuba208 t1_jcy7nxg wrote
Should be faster than 1 word per second. Judging by the fact, that modern PC's run it at 5 words per second and a raspberry pi 4b runs it at 1 word per second, it should run somewhere near the 2.5 words per second mark
Art10001 t1_jcy7rqs wrote
Yes, that's why it was tried in a Pixel 7 which has 8 GB of RAM and maybe even swap.
pkuba208 t1_jcy83gf wrote
I use swap too. For now, it can only run on flagships tho. You have to have at least 8gb of ram, because running it directly on let's say 3gb(3gb used by system) ram and 3-5gb SWAP may not even be possible and if it is, then it will be very slow and prone to crashing
Board_Stock t1_jczly8z wrote
hello, I've recently run the alpaca.cpp on my laptop, but I want to give it a context window so that it can remember conversations, and make it voice activated using python. Can someone guide me on this?
ommerike t1_jddjvvn wrote
Is there an APK out there to side load? Would be fun to try on my Pixel 6 Pro without becoming an expert on how to go through the motions of the make stuff...
simpleuserhere OP t1_jcpo1px wrote
I have tested Alpaca 7B model on Android (Google Pixel 7).
https://github.com/rupeshs/alpaca.cpp