Submitted by eugene129 t3_11lgo2d in deeplearning
rkstgr t1_jbia52h wrote
First of all, beta_t is just some predefined variance schedule (in literature often linear interpolated between 1e-2 and 1e-4) and it defines the variance of the noise that is added at step t. What you have in (1) is the variance of sample x_t which does not have to be beta_t.
What does hold for large t is var(x_t)=1 as our sample converges to ~ Normal Gaussian with mean 0 and var 1.
Viewing a single comment thread. View all comments