1. Proximal Algorithms Introduction

1.1 Definitions

Proximal algorithms can be viewed as tool for non-smooth, constrained, large-sacle, or distributed problems. The proximal operator \(\mathbf{prox}_{\lambda f} : \mathbf{R}^{n} \to \mathbf{R}^{n}\) of f is defined by (where \(\lambda > 0\)):

\[\mathbf{prox}_{\lambda f}(v) = \mathop{\arg\min}_{x} (f(x) + \frac{1}{2 \lambda}\| x - v \|_{2}^{2})\]

1.2 Generalized projections

When f is the indicator function:

\[\begin{split}\mathbf{I}_{\mathbf{C}}(x) = \begin{cases} 0 \quad x \in \mathbf{C}\\ + \infty \quad x \not\in \mathbf{C} \end{cases}\end{split}\]

where \(\mathbf{C}\) is a closed nonempty convex set. The proximal operator of f reduces to Euclidean projection onto \(\mathbf{C}\) :

\[\mathbf{prox}_{\lambda \mathbf{I}_{\mathbf{C}}}(v) = \mathop{\arg\min}_{x} (\mathbf{I}_{\mathbf{C}}(x) + \frac{1}{2 \lambda}\| x - v \|_{2}^{2})\]
\[\mathbf{prox}_{\lambda \mathbf{I}_{\mathbf{C}}}(v) = \mathop{\arg\min}_{x \in \mathbf{C}} (\frac{1}{2 \lambda}\| x - v \|_{2}^{2}) = \Pi_{\mathbf{C}}(v)\]

Proximal operators thus can be viewed as generalized projections.

1.3 Gradient step

The proximal operator of f is an optimal point, so It satisfies the optimal condition:

\[0 = \frac{\partial}{\partial x}(f(x) + \frac{1}{2 \lambda}\| x - v \|_{2}^{2})\]
\[0 = \Delta f(x^{*}) + \frac{1}{\lambda} (x^{*}-v)\]
\[\mathbf{prox}_{\lambda f}(v) = x^{*} = v - \lambda \Delta f(x^{*}) \approx v - \lambda \Delta f(x)\]

We will see more later.

1.4 Fixed point

The following equation holds, if and only if \(x^{*}\) minimizes f.

\[\mathbf{prox}_{\lambda f}(x^{*}) = x^{*}\]

1.5 Advantages

  • Work under extremely general conditions.
  • Can be fast.
  • Amenanle to distributed optimization.
  • Oftern conceptually and mathematically simple.

1.6 Review convex cones

  • The ordering of cone (the overload of the in-equality) :
\[a \succeq_{K}b \Leftrightarrow a-b \in K\]
  • Dual cone:
\[\mathcal{K}^\cdot=\{a\in\mathbb{H}\,\mid\,\langle a,b\rangle\ge0,\,\forall b\in\mathcal{K}\}\]