baffo32
baffo32 t1_jdo24su wrote
Reply to comment by light24bulbs in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
It is the same thing. The alpaca data is just further pretraining data consisting of instructions and responses. Doing this is called finetuning.
baffo32 t1_jdnppmp wrote
Reply to comment by light24bulbs in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
this is the same task as instruction tuning. instruction tuning just uses specific datasets where instructions are followed. it‘s called “finetuning” but nowadays people are using adapters and peft to do this on low end systems.
baffo32 t1_jd2carr wrote
Reply to comment by Ayacyte in [P] OpenAssistant is now live on reddit (Open Source ChatGPT alternative) by pixiegirl417
sign in to the website to contribute data and corrections. join the discord to contribute in other ways.
baffo32 t1_jcxqr2i wrote
Reply to comment by Art10001 in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere
i visited cvpr last year and people were saying that moe was what mostly was being used; i haven’t tried these things myself though
baffo32 t1_jcsy5mb wrote
Reply to [P] The next generation of Stanford Alpaca by [deleted]
Maybe set up the training code so different foundation models can be plugged in for finetuning and the it’s just compute if somebody wants a different starting model.
Note there are free interfaces to these models such as https://spellbook.scale.com/ . Also note there is a lot of data collected out there already.
baffo32 t1_jcronvh wrote
Reply to comment by Meddhouib10 in [Research] Alpaca 7B language model running on my Pixel 7 by simpleuserhere
- offloading and accelerating (moving some parts to memory mapped disk or gpu ram, this can also make for quicker loading)
- pruning (removing parts of the model that didn’t end up impacting outputs after training)
- further quantization below 4 bits
- distilling to a mixture of experts?
- factoring and distilling parts out into heuristic algorithms?
- finetuning to specific tasks (e.g. distilling/pruning out all information related to non-relevant languages or domains) this would likely make it very small
EDIT:
- numerous techniques published in papers over the past few years
- distilling into an architecture not limited by e.g. a constraint of being feed forward
baffo32 t1_jc8jvoo wrote
Reply to [D] ChatGPT without text limits. by spiritus_dei
start some code! invite contributors! :)
baffo32 t1_jc8jgd4 wrote
Reply to comment by xKraazY in [D] ChatGPT without text limits. by spiritus_dei
i’m thinking, with practice and research, these abstractions could be done in dynamic ways that can pivot and diversify to new norms
baffo32 t1_jc8j4w0 wrote
Reply to comment by big_ol_tender in [D] ChatGPT without text limits. by spiritus_dei
thoughts: each approach has generally something unique that can make it useful, and approaches usually have ways in which they can merge
baffo32 t1_j9j4qbh wrote
Reducing the information a system can represent requires it to learn generalized patterns rather than memorize events. In machine learning this will increase transfer some.
baffo32 t1_j90uucx wrote
Reply to comment by hpstring in [D] HuggingFace considered harmful to the community. /rant by drinkingsomuchcoffee
it’s important if you’re publishing large software packages of course lots of hobbyists also learn in the field
baffo32 t1_j8zd3ge wrote
Reply to comment by drinkingsomuchcoffee in [D] HuggingFace considered harmful to the community. /rant by drinkingsomuchcoffee
You’re not the bad guy, I’m guessing maybe it’s a community of data workers who’ve never had a reason to value DRY.
baffo32 t1_j8zbuup wrote
Reply to comment by baffo32 in [D] HuggingFace considered harmful to the community. /rant by drinkingsomuchcoffee
i think by centralized they mean what they imagine dry looking like, putting code in one place rather than spreading it out. it’s not usually used that way. it’s a reasonable expression though; people usually centralize components so there is one organized place to go to in order to access them.
baffo32 t1_j8zbmua wrote
Reply to comment by hpstring in [D] HuggingFace considered harmful to the community. /rant by drinkingsomuchcoffee
dry is a very basic software engineering principle that means to include only one copy of every sequence of code. it looks like machine learning people did not learn this as they weren’t trained as software engineers. DRY stands for “don’t repeat yourself”, and if not respected then it gets harder and slower more and more to maintain, improve, or bugfix software, the larger and older it gets.
baffo32 t1_j8vt8ob wrote
Reply to comment by [deleted] in [D] HuggingFace considered harmful to the community. /rant by drinkingsomuchcoffee
if we start one we’ll either make something good or bump into the project we accidentally duplicated as we get popular
baffo32 t1_j8vsq9s wrote
Reply to comment by drinkingsomuchcoffee in [D] HuggingFace considered harmful to the community. /rant by drinkingsomuchcoffee
looks like there is emotional or funded influence here, cointerintuitive votes, strange statements stated as facts
Duplicated code makes a very very _unhackable project_ because one has to learn the code duplicating systems and add functionality to them for every factorization. It does make _hackable examples_ but the codebase doesn’t seem to understand where to draw the line at all.
The library looks like it was made entirely without an experienced lead software engineer. As a corporation they should have one.
​
HuggingFace, please understand that software developers find DRY to be hackable. The two terms usually go together. It reads like a contradiction, like fake news trying to manipulate people by ignoring facts, to state it the other way around.
baffo32 t1_j8vrc4d wrote
HuggingFace recently implemented a PEFT library that reimplements the core functionality of AdapterHub. AdapterHub had reached out to them to contribute and integrate work but this failed in February of last year ( https://github.com/adapter-hub/adapter-transformers/issues/65#issuecomment-1031983053 ). Hugging Face was asked how the work related to the old and it was so sad to see they had done it completely independently, completely ignoring the past outreach ( https://github.com/huggingface/peft/issues/92#issuecomment-1431227939 ). The reply reads to me as if they are implementing the same featureset, unaware that it is the same one.
I would like to know why this didn‘t go better. The person who spearheaded AdapterHub for years appears to be one of the most prominent PEFT researchers with published papers. It looks as if they are tossed out in the snow. I can only imagine management never learned of the outreach or equally likely they have no idea how to work with other projects to refactor concepts from multiple codebases together or don’t find it to be a good thing to do so. It would have been nice to at least see lip service paid.
The library and hub are not complex. Is there a community alternative conducive to code organization or do we need to start yet another?
Sometimes I think it would make sense to train language models to transform the code, organize it, merge things, using techniques like langchain and chatgpt, to integrate future work into a more organized system.
Projects where everyone can work together are best.
baffo32 t1_jdrhj77 wrote
Reply to comment by light24bulbs in [R] Hello Dolly: Democratizing the magic of ChatGPT with open models by austintackaberry
I was still confused as to your response, and I’m thinking that if you wanted a model to behave like you had given different pretraining data, you would probably first finetune on the different bulk data, and then after this finetune on the target task such as instruction following.
Instruction following is indeed of course just predicting the next word: on data where the next word is obedient to instructions preceding it.