4. General Pattern =================================== Here we will list some frequently faced cases, and introduce some usually used methods. As the calculation of the X update and the Z update are equivalent (as we could switch the parameter notations). We will discuss the update of X (as an example), and the same principle could be applied to Z updates. Consider the general X-update as follows: .. math:: X^{+} = \arg\min_{x}(f(x) + (\rho/2)\|Ax- v\|_{2}^{2}) Where :math:`v = - Bz + c - u` is a known constant vector for X-update. 4.1 Proximity Operator ---------------------------- More detail about this could be seen in the proximal algorithm pages (corresponding to the proximal algorithm paper by S.Boyd, recommond to read). If we consider A = I , we can simplify the x-update to a proximity operator: .. math:: X^{+} = \arg\min_{x}(f(x) + (\rho/2)\|x- v\|_{2}^{2}) = \mathbf{prox}_{q/\rho,f(x)}(v) This expression is closely related to the **Moreau envelope** or the **Moreau Yosida regulaization** with the expression: .. math:: M_{1/\rho} = f(x) + (\rho/2)\|x- v\|_{2}^{2} We have som useful results: * Moreau envelop of l1 norm is huber function, and the proximity operator of l1 norm is the shrinkage function. * Proximity operator of a indicator function is the projection onto that set. 4.2 Quadratic Objective Function ------------------------------ Consider the quadratic primal objective function (with P symmetric positive semidefinite): .. math:: f(x) = (1/2)x^{T}Px + q^{T}x + r Assume :math:`P+\rho A^{T}A` is invertible, then we can derivate the expression of the optimal x with the first order optimal condition: .. math:: \frac{\partial f}{\partial x} = Px^{+} + q + \rho A^{T}(Ax^{+}-v) = 0 .. math:: x^{+} = (P + \rho A^{T}A)^{-1}(\rho A^{T} v - q) We have several options to solve this linear equation: * Solve directly, form the matrix :math:`P+\rho A^{T}A, ` (:math:`\mathcal{O}(pn^{2})`), then calculate and cache the factorization (:math:`\mathcal{O}(n^{3})`), the back-solve cost :math:`\mathcal{O}(n^{2})`. * Further exploit the spairty to save the factorization cost. * If we update :math:`\rho`, we will have to update the factorization of :math:`P+\rho A^{T}A`, which is a large cost. We should weight the update of :math:`\rho` and the update of factorization. * We can use the matrix inverse lamma to simplify the inverse, while all the inverse in the expression exist. With the expression we can reduce the problem to a faster calcualtion, if the inverse of P could be calculated fast, or the dimension p is small. .. math:: (P+\rho A^{T}A)^{-1} = P^{-1} + \rho P^{-1}A^{T}(I + \rho AP^{-1}A^{T})^{-1}AP^{-1} Add an equation constraint Fx = g, with an addition constraint variable, the x update will be : .. math:: x^{+} = \arg\min_{x} ((1/2)x^{T}Px + q^{T}x + r + \lambda^{T}(Fx-g) +(\rho/2)\|x- v\|_{2}^{2}) The first order optimal conditions are : .. math:: \begin{align*} \frac{\partial (\cdot)}{\partial x} &= (P + \rho A^{T}A)x + q - \rho A^{T}v + F^{T}\lambda = 0 \\ \frac{\partial (\cdot)}{\partial \lambda} &= Fx - g = 0 \end{align*} These equations corresponding to the solution of the following linear equation: .. math:: \begin{bmatrix} P + \rho A^{T}A & F^{T}\\ F & 0 \end{bmatrix} \begin{bmatrix} x^{+} \\ \lambda \end{bmatrix} + \begin{bmatrix}q-\rho A^{T}v \\ -g \end{bmatrix} = 0 4.3 Smooth Objective Terms ----------------------------------------- Now consider some more complicate cases. When f is smooth, general iterative methods can be used to carry out the x-minimization. We can use the standard Newton's method, or using some method for large problems, for examples the conjugate gradient method of the limited memory Broyden-Fletcher-Goldfarb-Shanno(L-BFGS) algorithm. The convergence depends on the condition of f function, the quadratic penalty term tends to improve the convergence. * Early termination * Warn start 4.4 Decomposition --------------------------- By apply decomposition on the update, we could further use parallex calculation.