cthorrez t1_jecv5ox wrote on March 31, 2023 at 2:09 AM

Reply to [D] What are your top 3 pain points as an ML developer in 2023? by General-Wing-785

Code quality of my own and my team's code

The reliability of the engineering platforms we use. (spark, gpu clusters, ci/cd build pipelines)

The correctness and completeness of the data we ingest

cthorrez t1_jcvhg41 wrote on March 19, 2023 at 9:38 PM

Reply to comment by MysteryInc152 in [P] The next generation of Stanford Alpaca by [deleted]

RWKV is recurrent right? Why is it token limited?

cthorrez t1_jcmhg8m wrote on March 17, 2023 at 10:14 PM

Reply to comment by mike94025 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

Because the algorithm seems to only work on inference. Probably due to memory management of the cached activations or something. (Idk the actual technical reasons)

cthorrez t1_jclwi3d wrote on March 17, 2023 at 7:52 PM

Reply to comment by Competitive-Rub-1958 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

Fast inference is also important to anyone who wants to deploy these things.

cthorrez t1_jc5s8ag wrote on March 14, 2023 at 6:34 AM

Reply to [D] Comparing models implemented in PyTorch and Tensorflow by chaotycmunkey

Basically I would just make sure the metrics being compared are computed the same way. Same numerator and denominator like summing vs averaging, over the batch vs epoch. If the datasets are the same and the type of metric you are computing is the same it's comparable.

The implementation details just become part of the comparison.

cthorrez t1_ja8d6oc wrote on February 27, 2023 at 4:35 PM

Reply to comment by gniorg in [D] Is RL dead/worth researching these days? by [deleted]

Not exactly. In batch RL the data they train on are real (state, action, next state, reward) tuples from real agents interacting with real environments.

They improve the policy offline. In RLHF there actually is no env. And the policy is just standard LLM decoding.

cthorrez t1_ja70abd wrote on February 27, 2023 at 8:43 AM

Reply to comment by PassingTumbleweed in [D] Is RL dead/worth researching these days? by [deleted]

I find it a little weird that RLHF is considered to be reinforcement learning.

The human feedback is collected offline and forms a static dataset. They use the objective from PPO but it's really more of a form of supervised learning. There isn't an agent interacting with an env, the "env" is just sampling text from a static dataset and the reward is the score from a neural net trained on a static dataset.

cthorrez t1_j9xstlw wrote on February 25, 2023 at 9:36 AM

Reply to comment by gsvclass in [P] Minds - A JS library to build LLM powered backends and workflows (OpenAI & Cohere) by gsvclass

People are rushing to deploy LLMs in search, summarization, virtual assistants, question answering and countless other applications where correct answers are expected.

The reason they want to get to the latent space close to the answer is because they want the LLM to output the correct answer.

cthorrez t1_j9xrv1v wrote on February 25, 2023 at 9:23 AM

Reply to comment by gsvclass in [P] Minds - A JS library to build LLM powered backends and workflows (OpenAI & Cohere) by gsvclass

The source I linked in the comment you linked and then deleted.

cthorrez t1_j9xrl8a wrote on February 25, 2023 at 9:19 AM

Reply to comment by gsvclass in [P] Minds - A JS library to build LLM powered backends and workflows (OpenAI & Cohere) by gsvclass

I think it's not suitable because it isn't really related to the process of a machine learning anything. It seems to me to belong to the field of human computer interaction.

cthorrez t1_j9xrguh wrote on February 25, 2023 at 9:17 AM

Reply to comment by gsvclass in [P] Minds - A JS library to build LLM powered backends and workflows (OpenAI & Cohere) by gsvclass

That comment is very over the top sarcasm. You would have realized that if you had checked the source I linked.

cthorrez t1_j9xgmx1 wrote on February 25, 2023 at 6:53 AM

Reply to comment by gsvclass in [P] Minds - A JS library to build LLM powered backends and workflows (OpenAI & Cohere) by gsvclass

may be an unpopular opinion these days but I don't think prompt engineering is a suitable topic for /r/MachineLearning

cthorrez t1_j9xahu6 wrote on February 25, 2023 at 5:44 AM

Reply to comment by Maximum-Ruin-9590 in [D] Is validation set necessary for non-neural network models, too? by osedao

I just have one dataset too. I train, pick hyperparameters, and test on the same data. Nobody can get better metrics than me. :D

cthorrez t1_j9x8x1b wrote on February 25, 2023 at 5:28 AM

Reply to [D] Isn't self-supervised learning(SSL) simply a kind of SL? by Linear--

The methods and models are identical yep. It's basically just to denote that whether the labels were assigned by a human or determined automatically.

cthorrez t1_j9x772h wrote on February 25, 2023 at 5:10 AM

Reply to [D] Best Way to Measure LLM Uncertainty? by _atswi_

along with each prompt, just put: "And at the end of your response, state on a scale from one to ten how confident you are in you answer"

This works amazingly and is very accurate. source

It has the added bonus where you can get confidence intervals on your confidence intervals just by asking how confident it is in it's estimation of its confidence.

cthorrez t1_j9ir0lx wrote on February 22, 2023 at 6:52 AM

Reply to comment by thomasahle in Unit Normalization instead of Cross-Entropy Loss [Discussion] by thomasahle

Have you tried it with say an MLP or small convnet on cifar10? I think that would be the next logical step.

cthorrez t1_j9iq35y wrote on February 22, 2023 at 6:41 AM

Reply to Unit Normalization instead of Cross-Entropy Loss [Discussion] by thomasahle

> test loss decreased

What function are you using to evaluate test loss? cross entropy or this norm function?

cthorrez t1_j7yjxdr wrote on February 10, 2023 at 9:02 AM

Reply to comment by pseudonerv in [D] Using LLMs as decision engines by These-Assignment-936

That's not reasoning. It's spitting out semi-random moves. If you keep giving it more and more chances it increases the probability of getting a set which has some legal moves.

cthorrez t1_j6xg1ke wrote on February 2, 2023 at 4:09 PM

Reply to comment by Nhabls in [N] Microsoft integrates GPT 3.5 into Teams by bikeskata

Microsoft paid 1B to use GPT3.

cthorrez t1_j67csjx wrote on January 28, 2023 at 6:04 AM

Reply to comment by Complex_Candidate_28 in [R] Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers by currentscurrents

That's an interesting topic that I think deserves further investigation. On the surface it sounds like the size of the LM impacts the mechanism by which the LM is able to "secretly perform gradient descent".

Is finetuning similarly unstable for small sized LMs?

cthorrez t1_j67aa39 wrote on January 28, 2023 at 5:36 AM

Reply to comment by Complex_Candidate_28 in [R] Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers by currentscurrents

If the goal is the mechanism rather than the performance why tune the seed for performance in the first place? The examples used doesn't change the mechanism.

cthorrez t1_j63uc5a wrote on January 27, 2023 at 3:01 PM

Reply to [R] Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers by currentscurrents

I have an issue with the experiments.

> For ICL, we fix the number of demonstration examples to 32 and tune the random seed for each task to find a set of demonstration examples that achieves the best validation performance. For finetuning, we use the same demonstration examples for ICL as the training examples and use SGD as the optimizer

They go through a set of random seeds to pick the "best" possible samples for in context learning, and then use the same set of examples for fine tuning. I think this biases the results in favor of in context learning.

A more fair way to do this would be to use a truly random set of examples, or to use use the same approach and tune the seed to find the "best" set of examples for finetuning as well.

cthorrez t1_j1ossr4 wrote on December 26, 2022 at 4:25 AM

Reply to comment by CriticalTemperature1 in [D] The case for deep learning for tabular data by dhruvnigam93

for datasets with ~1000 rows and 10 columns iirc

cthorrez t1_j1dmafb wrote on December 23, 2022 at 3:04 PM

Reply to [D] Simple Questions Thread by AutoModerator

Quick poll to see what the opinions are here:

Is it ok to tune the random seed and pick the seed which is best on the validation set?

cthorrez OP t1_iqpko2f wrote on October 2, 2022 at 4:04 AM

Reply to comment by gdahl in [D] Why is the machine learning community obsessed with the logistic distribution? by cthorrez

Thanks for the reply and the resource! You're right about the relatively recent influx of people who enter the ML field via deep learning first. Seems like most of the intro material focuses on logistic sigmoid based methods.

That said, do you think there is a fundamental reason why other log likelihood based methods such as probit and poisson as you mentioned haven't caught on in the deep learning field? Is it just that probit doesn't give an edge in classification, and such a large portion of use cases don't require anything besides a classification based loss?