2. Precursors
=============================

2.1 Dual Ascent
---------------------------

Normal we solve a problem by iteratively descent the primal objective problem. **Dual Ascent methods** instead try to solve the
dual problem by iteratively asent the dual function.

Consider the problem:

.. math::
  \begin{align*}
  & minimize \quad f(x) \\
  & subject\ to \quad Ax = b
  \end{align*}

Take the lagrangian function:

.. math::
  \mathcal{L}(x ,\lambda) = f(x) + \lambda^{T}(Ax - b)

The dual function is :

.. math::
  \begin{align*}
  g(\lambda) &= \inf_{x} \mathcal{L}(x, \lambda) \\
  & = \inf_{x} (f(x)  + \lambda^{T}(Ax)) - b^{T}\lambda \\
  & = - sup_{x}((-A^{T}\lambda)^{T}x - f(x)) - b^{T}\lambda \\
  & = - f^{*}(-A^{T}\lambda) - b^{T}\lambda
  \end{align*}

The dual problem is :

.. math::
   maximize\quad g(\lambda)

while we have :

.. math::
  x^{k+1} := arg\min_{x}\mathcal{L}(x, \lambda^{k})

.. math::
  \begin{align*}
  \Delta g(\lambda) &= - (-A)\frac{\partial f^{*}(\mu)}{\partial \mu} - b \\
  & = A x^{*} -b
  \end{align*}

Here , we have a scent direction of the dual variable. If g is not differentible it is 'a' subgradient
(which will be a dual subgradient method). And we could
have a dynamic choice of the step size :math:`\alpha^{k}`.
So, we could update the dual variable with :

.. math::
  \lambda^{k+1} := \lambda^{k} + \alpha^{k}(Ax^{k+1}-b)

In summary the dual ascent update is :

.. math::
  \begin{align*}
  & x^{k+1} := arg\min_{x}\mathcal{L}(x, y^{k})\\
  & y^{k+1} := y^{k} + \alpha^{k}(Ax^{k+1}-b)
  \end{align*}

**While the convergence may not hold in many applications**, so dual ascent often cannot be used.


2.2 Dual Decomposition
-------------------------------

If the objective function could be decomposed, so will the dual function, as a result we could
updata the primal variables separately.

For an example :

.. math::
  f(x) = \sum_{i = 1}^{N}f_{i}(x_{i})

Here we did not clarify the separable sets of the variable, each :math:`x_{i}` could be a sub-block of variables.
We could also partioning the matrix A.

.. math::
  \sum_{i = 1}^{N}A_{i}x_{i} = b

The lagrangian will be :

.. math::
  \begin{align*}
  \mathcal{L}(x, y) &= \sum_{i = 1}^{N}f_{i}(x_{i}) + y^{T}(\sum_{i = 1}^{N}A_{i}x_{i} - b) \\
  &=\sum_{i=1}^{N} (f_{i}(x_{i}) + y^{T}A_{i}x_{i} - (1/N)y^{T}b ) \\
  & = \sum_{i=1}^{N} \mathcal{L}_{i}(x_{i}, y)
  \end{align*}

Similarly as before, the update will be :

.. math::
  \begin{align*}
  & x^{k+1}_{i} := arg\min_{x}\mathcal{L}_{i}(x_{i}, y^{k})\\
  & y^{k+1} := y^{k} + \alpha^{k}(Ax^{k+1}-b)
  \end{align*}


2.3 Augmented Lagrangians and the Method of Multipliers
----------------------------------

The augmented Lagrangians for the problem in 2.1 is :

.. math::
  \mathcal{L}_{\rho}(x,y) = f(x) + y^{T}(Ax-b) + (\rho/2)\|Ax-b\|_{2}^{2}

It is the normal Lagrangian of the following problem (which is equivalent to original one):

.. math::
  \begin{align*}
  &minimize \quad f(x) + (\rho/2)\|Ax-b\|_{2}^{2} \\
  &subject \ to \quad Ax = b
  \end{align*}

The result update will be:

.. math::
  \begin{align*}
  & x^{k+1} := arg\min_{x}\mathcal{L}_{\rho}(x, y^{k})\\
  & y^{k+1} := y^{k} + \rho(Ax^{k+1}-b)
  \end{align*}

For this special step size we chosen, we have the following relations (start from the update equation of x):

.. math::
  \begin{align*}
  0 &= \Delta_{x} \mathcal{L}_{\rho}(x^{k+1}, y^{k})\\
  &= \Delta_{x} f(x^{k+1})  + \rho A^{T}(y^{k} + Ax^{k+1} -b)\\
  &= \Delta f(x^{k+1}) + A^{T}y^{k+1}
  \end{align*}

which is exactly the dual feasibility of the problem. which justify the choice of the step size.

* The convergence of this mehtod is much better.

* However in this expression, the update of x depends on all the primal variables. As a result, even if f is separable, the update of x will not be separable.