4. General Pattern¶
Here we will list some frequently faced cases, and introduce some usually used methods. As the calculation of the X update and the Z update are equivalent (as we could switch the parameter notations). We will discuss the update of X (as an example), and the same principle could be applied to Z updates. Consider the general X-update as follows:
Where \(v = - Bz + c - u\) is a known constant vector for X-update.
4.1 Proximity Operator¶
More detail about this could be seen in the proximal algorithm pages (corresponding to the proximal algorithm paper by S.Boyd, recommond to read).
If we consider A = I , we can simplify the x-update to a proximity operator:
This expression is closely related to the Moreau envelope or the Moreau Yosida regulaization with the expression:
We have som useful results:
- Moreau envelop of l1 norm is huber function, and the proximity operator of l1 norm is the shrinkage function.
- Proximity operator of a indicator function is the projection onto that set.
4.2 Quadratic Objective Function¶
Consider the quadratic primal objective function (with P symmetric positive semidefinite):
Assume \(P+\rho A^{T}A\) is invertible, then we can derivate the expression of the optimal x with the first order optimal condition:
We have several options to solve this linear equation:
- Solve directly, form the matrix \(P+\rho A^{T}A, \)mathcal{O}(pn^{2})`), then calculate and cache the factorization (\(\mathcal{O}(n^{3})\)), the back-solve cost \(\mathcal{O}(n^{2})\).
- Further exploit the spairty to save the factorization cost.
- If we update \(\rho\), we will have to update the factorization of \(P+\rho A^{T}A\), which is a large cost. We should weight the update of \(\rho\) and the update of factorization.
- We can use the matrix inverse lamma to simplify the inverse, while all the inverse in the expression exist. With the expression we can reduce the problem to a faster calcualtion, if the inverse of P could be calculated fast, or the dimension p is small.
Add an equation constraint Fx = g, with an addition constraint variable, the x update will be :
The first order optimal conditions are :
These equations corresponding to the solution of the following linear equation:
4.3 Smooth Objective Terms¶
Now consider some more complicate cases. When f is smooth, general iterative methods can be used to carry out the x-minimization. We can use the standard Newton’s method, or using some method for large problems, for examples the conjugate gradient method of the limited memory Broyden-Fletcher-Goldfarb-Shanno(L-BFGS) algorithm. The convergence depends on the condition of f function, the quadratic penalty term tends to improve the convergence.
- Early termination
- Warn start
4.4 Decomposition¶
By apply decomposition on the update, we could further use parallex calculation.