4. General Pattern¶

Here we will list some frequently faced cases, and introduce some usually used methods. As the calculation of the X update and the Z update are equivalent (as we could switch the parameter notations). We will discuss the update of X (as an example), and the same principle could be applied to Z updates. Consider the general X-update as follows:

\[X^{+} = \arg\min_{x}(f(x) + (\rho/2)\|Ax- v\|_{2}^{2})\]

Where \(v = - Bz + c - u\) is a known constant vector for X-update.

4.1 Proximity Operator¶

More detail about this could be seen in the proximal algorithm pages (corresponding to the proximal algorithm paper by S.Boyd, recommond to read).

If we consider A = I , we can simplify the x-update to a proximity operator:

\[X^{+} = \arg\min_{x}(f(x) + (\rho/2)\|x- v\|_{2}^{2}) = \mathbf{prox}_{q/\rho,f(x)}(v)\]

This expression is closely related to the Moreau envelope or the Moreau Yosida regulaization with the expression:

\[M_{1/\rho} = f(x) + (\rho/2)\|x- v\|_{2}^{2}\]

We have som useful results:

Moreau envelop of l1 norm is huber function, and the proximity operator of l1 norm is the shrinkage function.
Proximity operator of a indicator function is the projection onto that set.

4.2 Quadratic Objective Function¶

Consider the quadratic primal objective function (with P symmetric positive semidefinite):

\[f(x) = (1/2)x^{T}Px + q^{T}x + r\]

Assume \(P+\rho A^{T}A\) is invertible, then we can derivate the expression of the optimal x with the first order optimal condition:

\[\frac{\partial f}{\partial x} = Px^{+} + q + \rho A^{T}(Ax^{+}-v) = 0\]

\[x^{+} = (P + \rho A^{T}A)^{-1}(\rho A^{T} v - q)\]

We have several options to solve this linear equation:

Solve directly, form the matrix \(P+\rho A^{T}A, \)mathcal{O}(pn^{2})`), then calculate and cache the factorization (\(\mathcal{O}(n^{3})\)), the back-solve cost \(\mathcal{O}(n^{2})\).
Further exploit the spairty to save the factorization cost.
If we update \(\rho\), we will have to update the factorization of \(P+\rho A^{T}A\), which is a large cost. We should weight the update of \(\rho\) and the update of factorization.
We can use the matrix inverse lamma to simplify the inverse, while all the inverse in the expression exist. With the expression we can reduce the problem to a faster calcualtion, if the inverse of P could be calculated fast, or the dimension p is small.

\[(P+\rho A^{T}A)^{-1} = P^{-1} + \rho P^{-1}A^{T}(I + \rho AP^{-1}A^{T})^{-1}AP^{-1}\]

Add an equation constraint Fx = g, with an addition constraint variable, the x update will be :

\[x^{+} = \arg\min_{x} ((1/2)x^{T}Px + q^{T}x + r + \lambda^{T}(Fx-g) +(\rho/2)\|x- v\|_{2}^{2})\]

The first order optimal conditions are :

\[\begin{split}\begin{align*} \frac{\partial (\cdot)}{\partial x} &= (P + \rho A^{T}A)x + q - \rho A^{T}v + F^{T}\lambda = 0 \\ \frac{\partial (\cdot)}{\partial \lambda} &= Fx - g = 0 \end{align*}\end{split}\]

These equations corresponding to the solution of the following linear equation:

\[\begin{split}\begin{bmatrix} P + \rho A^{T}A & F^{T}\\ F & 0 \end{bmatrix} \begin{bmatrix} x^{+} \\ \lambda \end{bmatrix} + \begin{bmatrix}q-\rho A^{T}v \\ -g \end{bmatrix} = 0\end{split}\]

4.3 Smooth Objective Terms¶

Now consider some more complicate cases. When f is smooth, general iterative methods can be used to carry out the x-minimization. We can use the standard Newton’s method, or using some method for large problems, for examples the conjugate gradient method of the limited memory Broyden-Fletcher-Goldfarb-Shanno(L-BFGS) algorithm. The convergence depends on the condition of f function, the quadratic penalty term tends to improve the convergence.

Early termination
Warn start

4.4 Decomposition¶

By apply decomposition on the update, we could further use parallex calculation.

4. General Pattern¶

4.1 Proximity Operator¶

4.2 Quadratic Objective Function¶

4.3 Smooth Objective Terms¶

4.4 Decomposition¶

Cvx-Learning

Navigation

Related Topics