Co0k1eGal3xy
Co0k1eGal3xy t1_jdqfxcr wrote
Reply to comment by tdgros in Is it possible to merge transformers? [D] by seraphaplaca2
- Most stable diffusion UIs DO merge weights by averaging them
- Averaging weights between checkpoints works really well with CLIP fine-tuning, improving performance over both checkpoints for their respective validation sets. https://github.com/mlfoundations/wise-ft
- Git-rebasin found that their method of merging weights works for merging checkpoints with completely different pretraining data + init weights and improves accuracy on a mixed validation set over just using one model or the other. https://arxiv.org/abs/2209.04836
You're right that merging the model outputs has higher quality than merging the weights, but OP was asking if it was possible and it is very much possible if the weight tensors have the same shape.
Co0k1eGal3xy t1_jbzi8wc wrote
Reply to comment by TemperatureAmazing67 in [D] Is anyone trying to just brute force intelligence with enormous model sizes and existing SOTA architectures? Are there technical limitations stopping us? by hebekec256
- Double Decent, more parameters are MORE data efficient.
- Most of these LLMs barely complete 1 epoch, so there is no concern about overfitting currently.
Co0k1eGal3xy t1_iscd0no wrote
Reply to comment by master3243 in [R] Mind's Eye: Grounded Language Model Reasoning through Simulation - Google Research 2022 by Singularian2501
>In the absence of more detail such as the dynamics of the shapes
Baseball's have a standard diameter and shape.
It's theoretically possible that the heavier baseball has a "furry" surface or something like that, but it's such an unlikely case I didn't consider it when reading the paper.
​
>it's a pretty big fail that it can't answer this properly.
I emailed the authors and they said "there could be some pre conditions we have not presented in the screenshot" and that they would address it when they released a dataset.
Sounds like it's all sorted out now. No harm done.
Co0k1eGal3xy t1_isc5m6k wrote
Reply to comment by master3243 in [R] Mind's Eye: Grounded Language Model Reasoning through Simulation - Google Research 2022 by Singularian2501
>But what about the basketball and the bowling ball? Shouldn't they have different accelerations? Technically, yes.
>
>[...]
>
>it turns out that there are many situations where a heavier object does indeed hit the ground before a lighter object (because of air resistance).
Your link says the heavy baseball and the light baseball would fall at different rates.
Co0k1eGal3xy t1_isa8g7i wrote
Reply to comment by eigenlaplace in [R] Mind's Eye: Grounded Language Model Reasoning through Simulation - Google Research 2022 by Singularian2501
>current language models (LMs) miss the grounded experience of humans in the real-world -- their failure to relate language to the physical world causes knowledge to be misrepresented and obvious mistakes in their reasoning.
That is my whole point. This paper trying to avoid "planet Question" and make language models work in the real world instead.
I'm not interested in arguing over this. The paper is good, it just needs a minor correction in a future revision.
Co0k1eGal3xy t1_isa7e7b wrote
Reply to comment by eigenlaplace in [R] Mind's Eye: Grounded Language Model Reasoning through Simulation - Google Research 2022 by Singularian2501
I live on the planet earth where most places have air. It is assumed that there is air if it is not mentioned otherwise.
Co0k1eGal3xy t1_isa44ws wrote
Reply to comment by Lajamerr_Mittesdine in [R] Mind's Eye: Grounded Language Model Reasoning through Simulation - Google Research 2022 by Singularian2501
>because they don't provide the assumptions necessary to arrive at a complete solution.
I agree, but when atmosphere it not mentioned, the default should be updated to STP (0°C temperature and 101.325 kPa pressure) in future.
Co0k1eGal3xy t1_isa1gbn wrote
Reply to comment by Even_Tangerine_800 in [R] Mind's Eye: Grounded Language Model Reasoning through Simulation - Google Research 2022 by Singularian2501
Oh I agree 100%. This paper is fantastic! (and it's an easy fix)
I definitely want to see further research in this, but the comparison they show here is probably not the comparison they wanted to show haha.
Co0k1eGal3xy t1_isa0dvo wrote
Reply to comment by Even_Tangerine_800 in [R] Mind's Eye: Grounded Language Model Reasoning through Simulation - Google Research 2022 by Singularian2501
The heavier baseball falling to the ground faster is the correct answer. Maybe you misread my post?
It is a shame none of them mention air resistance.
Co0k1eGal3xy t1_is9yf8p wrote
Reply to [R] Mind's Eye: Grounded Language Model Reasoning through Simulation - Google Research 2022 by Singularian2501
>Two baseballs X and Y are released from rest at the same height.
>
>X is heavier than Y.
>
>Which baseball will fall to the ground faster?
Isn't Mind's eye the ONLY wrong answer?
Acceleration due to gravity is constant, but the opposing force from air resistance is roughly proportional to the air displaced and does not change with mass.
I mean, this is the whole point of Apollo 15's test on the moon.
All of them have wrong explainations, but Mind's eye is the only one that incorrectly claims they will fall at the same rate under normal real world conditions.
​
Proof : Brian Cox visits the world's biggest vacuum | Human Universe - BBC
Co0k1eGal3xy t1_is6xpmg wrote
How does the loss of converged models compare?
Removing parameters is similar to decreasing the learning rate as far as I remember, so you can't compare them during early training stages.
Co0k1eGal3xy t1_jdqgwlh wrote
Reply to Is it possible to merge transformers? [D] by seraphaplaca2
BioBERT base and LegalBERT use the same architecture so using a technique like Git-rebasin would improve performance over using just one or the other model, however if you want to merge the models and get the best of both models, you should retrain on a merged dataset or use model ensembles instead (aka, load and run both models and intelligently pick which model to listen to for which type of data)
You can not (easily) merge BioBERT large since that checkpoint uses a custom vocabulary, but BioBERT base looks perfectly fine.