Viewing a single comment thread. View all comments

I-am_Sleepy t1_iy7xfo0 wrote

The basic idea of log likelihood is

  1. Assume data is generated from parameterized distribution x ~ p(x| z)
  2. Let X be a set of {x1, x2, …, xi} ~ p(x|z). Because each item is generated independently, to generate this dataset, the probability becomes p(X|z) = p(x1|z) * p(x2|z) * … * p(xi|z)
  3. Best fit z will maximize the above formula, but because multiplication can cause numerical inaccuracy, we apply a monotonic function as it won’t change the optimum point. Thus we get log(p(X|z)) = sum [log p(xi|z)]
  4. Using traditional optimization paradigm, we want to minimize the cost function, thus we need to multiply the formula by -1. Then we arrived at Negative Log Likelihood i.e. Optimize for -log(p(X|z))

Your formula estimate the p distribution as a gaussian, which is parameterized by mu and sigma. Usually initialized as zero vector and identity matrix

Using standard autograd, you can then optimize for those parmateters iteratively. But other optimization method is also possible depends on your preference such as genetic algorithm, or bayesian optimization

For bayesian, if your prior is normal, then its conjugate prior is also normal. For multivariate, it is a bit trickier, depends on your settings (likelihood distribution) you can lookup here. You need to look into Interpretation of hyperparameters columns to understand it better, and/or maybe here too

1