ElectronicCress3132 t1_j629tix wrote on January 27, 2023 at 4:53 AM

Reply to comment by currentscurrents in [R] Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers by currentscurrents

> implement a gradient descent optimization process at inference time

Could you expand on what this means? At inference time, I thought all weights were frozen, so how could the attention layers be somehow performing gradient descent?

Edit: I read the paper in detail and understood it (walk through the math in Section 3). Basically, the sentence itself X has some weights that go through the attention layer (recall how attention works: it embeds the sentence, then multiplies it by key, value, query matrices). If you give it some examples, X', to learn from, well, of course there are going to be weights for both X, and X'. Turns out those weights for X' end up being equivalent to stepping in gradient descent.

ElectronicCress3132 t1_j45f3yo wrote on January 13, 2023 at 7:44 AM

Reply to comment by VirtualHat in [D] Has ML become synonymous with AI? by Valachio

> And integrating good-old-fashioned-ai > (GOFAI) with more modern ML is becoming an area of increasing research interest.

Any papers you recommend on this topic?

ElectronicCress3132 t1_j2v4vy4 wrote on January 4, 2023 at 3:50 AM

Reply to comment by learn-deeply in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon

Could you elaborate what you mean by undertrained?

ElectronicCress3132 t1_j29c108 wrote on December 30, 2022 at 4:17 PM

Reply to comment by RingoCatKeeper in [P]Run CLIP on your iPhone to Search Photos offline. by RingoCatKeeper

Btw, one should take care not to implement the worst-case O(n) algorithm (which is Quickselect + Median of Medians), because it has high constant factors in the time complexity which slow it down in the average case. QuickSelect + Random Partitioning, or Introselect (the C++ standard library function mentioned) have good average time complexities and rarely hit the worst case.

ElectronicCress3132 t1_j29byah wrote on December 30, 2022 at 4:17 PM

Reply to comment by Steve132 in [P]Run CLIP on your iPhone to Search Photos offline. by RingoCatKeeper

I think the one in the standard library is introselect, which is a hybrid of QuickSelect

ElectronicCress3132 t1_ir49w2b wrote on October 5, 2022 at 6:00 AM

Reply to [R] Combining GPT-3 with Google Search enables answering complex questions by ofirpress

Curious - what are the primary differences between this, and the "information correction system" in LaMDA? https://arxiv.org/pdf/2201.08239.pdf figure 3