Co0k1eGal3xy t1_jdqgwlh wrote on March 26, 2023 at 11:48 AM

Reply to Is it possible to merge transformers? [D] by seraphaplaca2

BioBERT base and LegalBERT use the same architecture so using a technique like Git-rebasin would improve performance over using just one or the other model, however if you want to merge the models and get the best of both models, you should retrain on a merged dataset or use model ensembles instead (aka, load and run both models and intelligently pick which model to listen to for which type of data)

You can not (easily) merge BioBERT large since that checkpoint uses a custom vocabulary, but BioBERT base looks perfectly fine.

Co0k1eGal3xy t1_jdqfxcr wrote on March 26, 2023 at 11:36 AM

Reply to comment by tdgros in Is it possible to merge transformers? [D] by seraphaplaca2

Most stable diffusion UIs DO merge weights by averaging them
Averaging weights between checkpoints works really well with CLIP fine-tuning, improving performance over both checkpoints for their respective validation sets. https://github.com/mlfoundations/wise-ft
Git-rebasin found that their method of merging weights works for merging checkpoints with completely different pretraining data + init weights and improves accuracy on a mixed validation set over just using one model or the other. https://arxiv.org/abs/2209.04836

You're right that merging the model outputs has higher quality than merging the weights, but OP was asking if it was possible and it is very much possible if the weight tensors have the same shape.

Co0k1eGal3xy t1_jbzi8wc wrote on March 12, 2023 at 10:45 PM

Reply to comment by TemperatureAmazing67 in [D] Is anyone trying to just brute force intelligence with enormous model sizes and existing SOTA architectures? Are there technical limitations stopping us? by hebekec256

Double Decent, more parameters are MORE data efficient.
Most of these LLMs barely complete 1 epoch, so there is no concern about overfitting currently.

Co0k1eGal3xy t1_iscd0no wrote on October 14, 2022 at 9:46 PM

Reply to comment by master3243 in [R] Mind's Eye: Grounded Language Model Reasoning through Simulation - Google Research 2022 by Singularian2501

>In the absence of more detail such as the dynamics of the shapes

Baseball's have a standard diameter and shape.

It's theoretically possible that the heavier baseball has a "furry" surface or something like that, but it's such an unlikely case I didn't consider it when reading the paper.

>it's a pretty big fail that it can't answer this properly.

I emailed the authors and they said "there could be some pre conditions we have not presented in the screenshot" and that they would address it when they released a dataset.

Sounds like it's all sorted out now. No harm done.

Co0k1eGal3xy t1_isc5m6k wrote on October 14, 2022 at 8:55 PM

Reply to comment by master3243 in [R] Mind's Eye: Grounded Language Model Reasoning through Simulation - Google Research 2022 by Singularian2501

>But what about the basketball and the bowling ball? Shouldn't they have different accelerations? Technically, yes.
>
>[...]
>
>it turns out that there are many situations where a heavier object does indeed hit the ground before a lighter object (because of air resistance).

Your link says the heavy baseball and the light baseball would fall at different rates.

Co0k1eGal3xy t1_isa8g7i wrote on October 14, 2022 at 1:05 PM

Reply to comment by eigenlaplace in [R] Mind's Eye: Grounded Language Model Reasoning through Simulation - Google Research 2022 by Singularian2501

>current language models (LMs) miss the grounded experience of humans in the real-world -- their failure to relate language to the physical world causes knowledge to be misrepresented and obvious mistakes in their reasoning.

That is my whole point. This paper trying to avoid "planet Question" and make language models work in the real world instead.

I'm not interested in arguing over this. The paper is good, it just needs a minor correction in a future revision.

Co0k1eGal3xy t1_isa7e7b wrote on October 14, 2022 at 12:57 PM

Reply to comment by eigenlaplace in [R] Mind's Eye: Grounded Language Model Reasoning through Simulation - Google Research 2022 by Singularian2501

I live on the planet earth where most places have air. It is assumed that there is air if it is not mentioned otherwise.

Co0k1eGal3xy t1_isa44ws wrote on October 14, 2022 at 12:30 PM

Reply to comment by Lajamerr_Mittesdine in [R] Mind's Eye: Grounded Language Model Reasoning through Simulation - Google Research 2022 by Singularian2501

>because they don't provide the assumptions necessary to arrive at a complete solution.

I agree, but when atmosphere it not mentioned, the default should be updated to STP (0°C temperature and 101.325 kPa pressure) in future.

Co0k1eGal3xy t1_isa1gbn wrote on October 14, 2022 at 12:06 PM

Reply to comment by Even_Tangerine_800 in [R] Mind's Eye: Grounded Language Model Reasoning through Simulation - Google Research 2022 by Singularian2501

Oh I agree 100%. This paper is fantastic! (and it's an easy fix)

I definitely want to see further research in this, but the comparison they show here is probably not the comparison they wanted to show haha.

Co0k1eGal3xy t1_isa0dvo wrote on October 14, 2022 at 11:56 AM

Reply to comment by Even_Tangerine_800 in [R] Mind's Eye: Grounded Language Model Reasoning through Simulation - Google Research 2022 by Singularian2501

The heavier baseball falling to the ground faster is the correct answer. Maybe you misread my post?

It is a shame none of them mention air resistance.

Co0k1eGal3xy t1_is9yf8p wrote on October 14, 2022 at 11:36 AM

Reply to [R] Mind's Eye: Grounded Language Model Reasoning through Simulation - Google Research 2022 by Singularian2501

>Two baseballs X and Y are released from rest at the same height.
>
>X is heavier than Y.
>
>Which baseball will fall to the ground faster?

Isn't Mind's eye the ONLY wrong answer?

Acceleration due to gravity is constant, but the opposing force from air resistance is roughly proportional to the air displaced and does not change with mass.

I mean, this is the whole point of Apollo 15's test on the moon.

All of them have wrong explainations, but Mind's eye is the only one that incorrectly claims they will fall at the same rate under normal real world conditions.

Proof : Brian Cox visits the world's biggest vacuum | Human Universe - BBC

Co0k1eGal3xy t1_is6xpmg wrote on October 13, 2022 at 7:13 PM

Reply to [R]Wq can be omited in single head attention by wangyi_fudan

How does the loss of converged models compare?

Removing parameters is similar to decreasing the learning rate as far as I remember, so you can't compare them during early training stages.