5. Primal-dual potential reduction methods

Build the potential energy function, use 1st/2nd order approximation (based on the convexity properties), then apply Newton’s method for descent direction, plane search for step size.

5.1 General Description

The potential function is defined below, where is a summary of log of the duality gap, and the derivation from the analytic center.

\[\begin{split}\begin{align} \phi(x, Z) &= \mu\sqrt{n} \log(Tr(F(x)Z)) + \psi(x, Z) \\ & = (n+ \mu\sqrt{n}) \log(Tr(F(x)Z)) - \log\det F(x)Z - n\log n \end{align}\end{split}\]

Where \(\mu\) is a parameter that sets the relative weight of the term involving duality gap and the derivation from center term. As \(\psi > 0\), we have \(\eta \le \exp(\psi/(\mu\sqrt{n}))\), therefore if the potential funcion value is small, the duality gap must be small.

The potential reduction method will be a method to iteratively update x and Z, such that reduce the potential by at least a fixed amount \(\delta\) in every step. In this condition, we will have convergence in \(O(\sqrt{n})\) steps, provided the initial pair is sufficiently centered.

Potential reduction algorithm:

given strictly feasible x and Z.

repeat :
  1. (descent direction) Find a suitable direction \(\delta x\) and a suitable dual feasible direction \(\delta Z\).
  2. (step size - plane search) Find \(p,q\in \mathbb{R}\) that minimize \(\phi(x+p\delta x, Z+q\delta Z)\).
  3. Update \(x:=x+p\delta x\), \(Z:=Z+ q\delta Z\).

until duality gap \(\eta \le \epsilon\).

The plane search could be seen later in this page, and the descent direction could be obtained by solving the following linear equations (A).

\[S\delta Z S + \sum_{i=1}^{m}\delta x_{i}F_{i} = -D\]
\[Tr(F_{j}\delta Z) = 0 , j = 1,..., m.\]

The \(S=S^{T}\) and \(D=D^{T}\) in the first function depends on the particular algorithm and could change in every iteration. The second function is to keep the dual feasiblility of Z. It has m+n(n+1)/2 equations and m+n(n+1)/2 variables. If F(x) has sparity property, it could be simplier.

It can be solved efficiently via a least-squares problem [1] :

\[\delta x = \arg\min_{\mu\in \mathbb{R}^{m}}\| S^{-1/2} (D+\sum_{i=1}^{m}\mu_{i}F_{i}) S^{-1/2} \|_{F}\]

It raises as the optimal condition of the following two quadratic minimization problems:

\[\begin{split}\begin{align*} \delta x = & \arg\min_{\mu \in \mathbb{R}^{m}} && (Tr(DS^{-1}(\sum_{i=1}^{m}\mu_{i}F_{i})S^{-1}) \\ & && \ + \frac{1}{2}Tr((\sum_{i=1}^{m}\mu_{i}F_{i})S^{-1}(\sum_{j=1}^{m}\mu_{j}F_{j})S^{-1}) ),\\ \delta Z = & \arg\min_{V} && Tr(DV)+\frac{1}{2}Tr(VSVS) \\ & subject\ to && V= V^{T}, \\ & && Tr(F_{i}V) = 0, i=1,...,m. \end{align*}\end{split}\]
[1]Proof could be found here , and an example of LP problem could be found in page 2.

5.2 Potential reduction method 1

Referring to the potential function, it is not a convex function. While the first term \((n+ \mu\sqrt{n}) \log(Tr(F(x)Z))\) is concave in x and Z. Which will offer a negative semidefinite term to Hessian of \(\psi\). This method is to ignore the second derivative of this concave term.

Apply first order approximation of the first term and second order approxmiation for the other terms, we will have \(\psi(x+\mu,Z+V)\). Then respectively minimize the approximation of \(\psi\), we will found they are equivalent to the linear equations (A), with respectively [2] :

  • For \(\delta x\) update : \(D=\rho FZF-F\) and \(S=F\), gets \(\delta x^{p}, \delta Z^{p}\).
  • For \(\delta Z\) update : \(D=\rho F-Z^{-1}\) and \(S=Z^{-1}\) gets \(\delta x^{d}, \delta Z^{d}\)..

And we have the convergence Theorem : Let \(x^{k}, Z^{k}\) denotes the values of x and Z after kth iteration of the potential reduction algorithm with search directions \(\delta x^{p}, \delta Z^{d}\). We have :

\[\psi(x^{(k+1)}, Z^{(k+1)}) \le \psi(x^{(k)}, Z^{(k)}) - 0.78\]
[2]Proof could be found here .

5.3 Potential reduction method 2

The upper problem needs to solve two problem seperately to obtain \(\delta x\) and \(\delta Z\). While we could choose to solve only the primal system, use \(\delta x^{p}, \delta Z^{p}\) to update. Which will have :

\[\psi(x^{(k+1)}, Z^{(k+1)}) \le \psi(x^{(k)}, Z^{(k)}) - 0.05\]

We could also use the dual solutions only \(\delta x^{d}, \delta Z^{d}\) to update, which has :

\[\psi(x^{(k+1)}, Z^{(k+1)}) \le \psi(x^{(k)}, Z^{(k)}) - 0.05\]

It has no significant convergence guaranteed, while it seems to perform better in practice than the first method.

5.4 Potential reduction method 3

Nesterov and Todd proposed another variation which perserves the primal-dual symmetry yet avoids solving two systems per iteration. In their method, primal and dual search directions are computed from:

\[RR^{T}\delta Z^{sym} RR^{T} + \sum_{i=1}^{m}\delta x^{sym}_{i}F_{i} = -\rho F + Z^{-1},\]
\[Tr(F_{j}\delta Z^{sym}) = 0,j =1,...,m.\]

See more details in the original paper. This method will have convergence:

\[\psi(x^{(k+1)}, Z^{(k+1)}) \le \psi(x^{(k)}, Z^{(k)}) - 0.24\]

5.6 Numerical examples

  • From the results shown in the paper, we coulde see the reduction in the duality gap, and the reduction of the derivation from analytic center.
  • The number of iterations required grows much more slowly than \(n^{1/2}\), and can also be assumed to be almost constant (as is shown in the result plots). The typcial number of iteration used is 6-10 .
  • The effort of each iteration depends on the matrix structure, and the matrix size.