rshah4 t1_je4ved0 wrote on March 29, 2023 at 1:04 PM

Reply to comment by royalemate357 in [D] Do model weights have the same license as the modem architecture? by murphwalker

I agree with this as well. Just because Meta isn't enforcing their license, this does not mean the license has gone away. At some point in the future, Meta could enforce it.

rshah4 t1_je0crbz wrote on March 28, 2023 at 2:38 PM

Reply to comment by matus_pikuliak in [P] ChatGPT Survey: Performance on NLP datasets by matus_pikuliak

I agree, these baselines are useful. I think we should push for is more human baselines for these benchmarks. That would help figure out how far we have left to go.

rshah4 t1_je00t24 wrote on March 28, 2023 at 1:13 PM

Reply to comment by crazyvaclav3 in [D] FOMO on the rapid pace of LLMs by 00001746

Here is my video: https://youtu.be/YKCtbIJC3kQ
Here is the blog post its based on: https://www.philschmid.de/fine-tune-flan-t5-peft
Efficient Large Language Model training with LoRA and Hugging Face

I should also post in ML - I will do that later today

rshah4 t1_jdzyo8u wrote on March 28, 2023 at 12:55 PM

Reply to [P] ChatGPT Survey: Performance on NLP datasets by matus_pikuliak

Nice work! -- How were the results when comparing using ChatGPT zero shot versus few shot? I have noticed that when using LLMs, you can get an improvement by using few shot learning with LLMs (giving it a few examples in the prompts).

I am not surprised for traditional NLP tasks that we don't see much of an improvement over GPT-3. It seems much of the focus from OpenAI is not on these benchmarks but on trying to make the results more useful to people (all the Instruction tuning / RLHF work).

https://arxiv.org/pdf/2209.12356.pdf https://arxiv.org/pdf/2301.13848.pdf

Also, for real-world use, it's not necessary that ChatGPT beats a fine-tuned SOTA model. ChatGPT is much easier to use than having to fine-tune a more traditional model.

rshah4 t1_jdy7o2h wrote on March 28, 2023 at 1:33 AM

Reply to comment by dimem16 in [D] FOMO on the rapid pace of LLMs by 00001746

Here is my video: https://youtu.be/YKCtbIJC3kQ

Here is the blog post its based on: https://www.philschmid.de/fine-tune-flan-t5-peft

Efficient Large Language Model training with LoRA and Hugging Face

rshah4 t1_jdy111n wrote on March 28, 2023 at 12:43 AM

Reply to [D]Suggestions on keeping Llama index cost down by darkbluetwilight

How about using embeddings from open-source models like those at Hugging Face. That would save your embedding costs.

rshah4 t1_jdy0mjg wrote on March 28, 2023 at 12:40 AM

Reply to [D] FOMO on the rapid pace of LLMs by 00001746

I wouldn't get worried about training these models from scratch. Very few people are going to need those skills. My suggestion is to focus on learning how to use these models (prompting, chained prompting ala langchain) and then maybe fine-tuning. Fine-tuning these models is going to be key and people are just now starting to make those techniques widely usable. I just finished a video on using PEFT for fine-tuning a LLM using LoRA. So don't stress, it's very early and the tools are just starting to become easier to use.

rshah4 t1_jdxhz3d wrote on March 27, 2023 at 10:23 PM

Reply to comment by big_ol_tender in [D] Instruct Datasets for Commercial Use by JohnyWalkerRed

I agree with the sentiments here and don’t think it’s ok to use some of these datasets that appear to violate OpenAIs terms. I dealt with it by making a funny video: https://youtu.be/31u88EDmIwc

rshah4 t1_jdxhesh wrote on March 27, 2023 at 10:19 PM

Reply to [D] Instruct Datasets for Commercial Use by JohnyWalkerRed

It’s possible to pay one of the labeling companies for an instruction dataset. Right now most companies aren’t donating 50k+ datasets to the public, but I expect this will change soon.

rshah4 t1_jbzvzsl wrote on March 13, 2023 at 12:28 AM

Reply to [R] Introducing Ursa from Speechmatics | 25% improvement over Whisper by jplhughes

Is it open source?

rshah4 t1_jbtsl7o wrote on March 11, 2023 at 5:34 PM

Reply to comment by rshah4 in [Discussion] Compare OpenAI and SentenceTransformer Sentence Embeddings by Simusid

Also, not sure about a recent comparison, but Nils Reimers also tried to empirically analyze OpenAI's embeddings here: https://twitter.com/Nils_Reimers/status/1487014195568775173

He found across 14 datasets that the OpenAI 175B model is actually worse than a tiny MiniLM 22M parameter model that can run in your browser.

rshah4 t1_jbtfzig wrote on March 11, 2023 at 4:06 PM

Reply to [Discussion] Compare OpenAI and SentenceTransformer Sentence Embeddings by Simusid

Two quick tips for finding the best embedding models:

Sentence Transformers documentation compares models: https://www.sbert.net/docs/pretrained_models.html

Massive Text Embedding Benchmark (MTEB) Leaderboard has 47 different models: https://huggingface.co/spaces/mteb/leaderboard

These will help you compare different models across a lot of benchmark datasets so you can figure out the best one for your use case.

rshah4 t1_j9cndve wrote on February 20, 2023 at 11:10 PM

Reply to [D] Compare open source LLMs by President_Xi_

Check out this great post that includes fine tuning Flan-T5, Language Models vs. The SAT Reading Test:

https://jeffq.com/blog/language-models-vs-the-sat-reading-test/

rshah4 t1_j974uys wrote on February 19, 2023 at 7:40 PM

Reply to [D] Does langchain upload all user’s data to Openai? by westeast1000

I believe going through azure to access the openai models, Microsoft has said they would delete data after 30 days. You could also consider using open source models like Flan-T5, but that means you need to host/setup the model.

rshah4 t1_j1nbfkn wrote on December 25, 2022 at 8:41 PM

Reply to [D] The case for deep learning for tabular data by dhruvnigam93

I am with you. While I generally favor trees for tabular data, there are some advantages of deep learning as you mentioned. I haven't heard many success stories out of industry for moving away from trees to deep learning, outside of Sean Taylor talking about using deep learning at Lyft. My guess is the extra complexity of using deep learning is probably only useful in a small set of use cases.

Deep learning is probably also useful in multimodal use cases. If people are using deep learning for tabular because of these advantages, I would love to hear about it.

rshah4 t1_izbymy6 wrote on December 7, 2022 at 11:57 PM

Reply to comment by kayhai in [Discussion] No-code ML for engineers by kayhai

If you have repeatable use cases, you can build a simple app like streamlit that applies the ML. But this way you can set some boundaries on how they are using ML. Glad you get it.

rshah4 t1_izb32qr wrote on December 7, 2022 at 8:20 PM

Reply to [Discussion] No-code ML for engineers by kayhai

This is tough. I use to work for a large AutoML company that worked with oil and gas companies. It's difficult and often frustrating for non ML people to use AutoML tools. To use ML you need to know how to setup your problem - what is the target, partitioning data, . . It takes an understanding of ML to do this. Otherwise you will end up with people with 20 rows of data wanting to make a prediction or trying to use ML for something a simple rule would do or building a multilabel model where a binary model would have been better.

My suggestion is to keep them in the descriptive world, and if they want to move to ML, someone needs to introduce ML concepts to them before they start using the tools.