tdgros

tdgros t1_jdx0f2y wrote

>Human vision is about 576 megapixels

it's really not, we don't even have that many vision cells per retina. You can find this figure if you extrapolate the density in the fovea to the entire field of view. But in reality, the density of color cells drops off sharply outside of the fovea, which only has a few degrees of FOV.

Do the test: focus your eyes on one word of text, without moving your eyes, how far can you read the words around? our vision is really really blurry outside the center, we just don't realize it.

3

tdgros t1_jdqjc8q wrote

there's also weight averaging in eSRGAN that I knew about, but that always irked me. The permutation argument from your third point is the usual reason I evoke on this subject, and the paper does show why it's not as simple as just blending weights! The same reasoning also shows why blending subsequent checkpoints isn't like blending random networks.

2

tdgros t1_jdqbgqy wrote

the model merging offered by some stable diffusion UIs do not merge the weights of a network! They merge the denoising results for a single diffusion step from 2 different denoisers, this is very different!

Merging the weights of two different models does not produce something functional in general, it also can only work for 2 models with exactly the same structure. It certainly does not "mix their functionality".

9

tdgros t1_jdqarbq wrote

what's the connection between LoRa and the question about merging weights here?

edit: weird, I saw a notification for an answer from you, but can't see the message...

LoRa is a compression method that replaces weight matrices with low rank approximations for single tasks. It does not merge models or weights

2

tdgros t1_jdlxy8a wrote

There are versions for NLP (and a special one for vision transformers), here is the BERT one from some of the same authors (Frankle & Carbin) https://proceedings.neurips.cc/paper/2020/file/b6af2c9703f203a2794be03d443af2e3-Paper.pdf

It is still costly, as it includes rewinding and finding masks, we probably need to switch to dedicated sparse computations to fully benefit from it.

6

tdgros t1_j9bfds3 wrote

Just read the post!

>However, the paper itself builds a network that gets as an input 5D vectors (3 location coordinates+2 camera angles) and outputs color and volume density for each such coordinate. I don't understand where do I get those 5D coordinates from? My training data surely doesn't have those - I only have a collection of images.

7

tdgros t1_j7vdocr wrote

With constrained optimization, you usually have a feasible set for the variables you optimize, but in a NN training you optimize millions of weights that aren't directly meaningful, so in general, it's not clear if you can define a feasible set for each of them.

3

tdgros t1_j4vipol wrote

if the two cameras are rigidly fixed, then you can calibrate them like one calibrates a stereo pair, and at least align the orientation and intrinsics. The points very far from the camera will be well aligned, the ones very close will remain unaligned.

The calibration process will involve you pointing positions by hand, but the maths for the correction is very very simple after that.

5

tdgros t1_j4qbwij wrote

Are the shape of the curves (clean, a few spikes, lots of spikes, super clean from ~0.002MeV onward) related to some physical processes we know? is it just due to the scale of the plot?

2

tdgros t1_j41fn3f wrote

At train time, you plug decoders at many levels with the same objective, you can find out if some things can be decoded earlier, using an additional network that outputs a sort of confidence. At inference time, you run the layers one by one, and stop when the confidence is high. which allows you to skip some computations. (It's probably a simplistic description, feel free to correct me)

4