19 minute read

Convex Optimization is a field within mathematical optimization. Most people encounter optimization problems where knowledge of Convex Optimization tools can come extremely handy. However, I believe not everyone has to go into the mathematical depths in order to take advantage of these tools.

In this post, we go over these tools, and make clear for the ordinary reader some terminology used within the field. We will also discuss Disciplined Convex Programming, and solve an example problem, connecting all the concepts together.

Convexity

If you come from any STEM field, you are almost certainly familiar with the concept of local minima. The problem of local minimums can only arise if the function we are minimizing has any local minimums which are not global minimums.

Any local minimum of a convex function is also a global minimum. When you converge, you can be sure that the solution you found is indeed globally optimal.

When solving Convex Optimization problems, we either minimize a convex objective, or maximize a concave objective function. Just like for convex functions any local minimum is also a global minimum, for concave functions any local maximum is also a global maximum. In fact, multiplying a convex function by a negative constant makes it a concave function! The point where the original convex function is minimized, is also the point where the corresponding concave function is maximized. That point is a solution to the optimization problem.

Needless to say, a function indeed can be both convex and concave at the same time, in which case it’s called affine. In that case, it’s possible to either minimize or maximize it.

Note: when talking about the convexity or non-convexity of a problem, we often use the term convex to mean either convex or concave! After all, all it needs to turn a concave problem into a convex problem is the flip of a sign, and changing the objective from maximize to minimize. As a matter of fact, turning convex minimization problems into concave maximization problems and vice versa is a commonly used strategy while doing proofs related to Convex Optimization.

Convex Optimization problems

Any optimization problem (not necessarily a convex problem) looks something like this:

\[\begin{aligned} \min_{x} \quad & f_0(x)\\ \textrm{s.t.} \quad & f_i(x) \leq 0, \quad i = 1, ..., m\\ & h_i(x) = 0, \quad i = 1, ..., p \end{aligned}\]

We have an objective function \(f(x)\), that we minimize (or maximize) over some optimization variable \(x\) (which may be a vector), and we are subject to (abbreviated as s.t.) some equality and inequality constraints (\(f_i(x) \leq 0\) and \(h_i(x) = 0\) in this case).

Both equality and inequality constraints are optional. In fact, if there are no constraints, we are said to be doing unconstrained optimization.

The problem is feasible if there is some \(x\) that satisfies the constraints (i.e. the feasibility set is not empty). If there is no \(x\) that satisfies the constraints, we can say that the problem is infeasible, or equivalently, the feasibility set is a null set.

For a convex optimization problem, the objective and all the constraints are convex. The equality constraints must be affine (and not just convex). We will not go into the details of how to identify convexity manually in this post.

Types

From this point onward, things start to get a little more technical. We will stick with our more English and less mathematics approach, however, it’s necessary to be familiar with these terms before we move onto the section on solvers.

There are a few categories of convex optimization problems:

  • Linear Programs (LPs) are those where the objective function (to be minimized or maximized) as well as the constraints are affine functions (of the optimization variables). LPs are special in sense that the objectives and constraints are all not just convex, but also concave, and hence, also affine.
  • Quadratic Programs (QPs) are those where the objective function is quadratic but the constraints are linear functions
  • Quadratically Constrained Quadratic Programs (QCQPs) are those where the objectives and constraints are both quadratic functions

Since in mathematics, linear functions are a special case of quadratic functions, you should naturally see the relation between the above three, i.e. LPs are a special case of QPs which in turn are a special case of QCQPs.

There are two more major categories to be familiar with: Second Order Cone Programs (SOCPs) and Semidefinite Programs (SDPs).

While QPs and QCQPs may not always be convex, we restrict our attention to convex QPs and convex QCQPs in this post. As a matter of fact, convex QCQPs are a special case of SOCPs and SOCPs are a special case of SDPs. These generalizations are a bit harder to show and we skip them for this post.

In summary, an LP is also a convex QP, which in turn is also a convex QCQP. A convex QCQP is also an SOCP, and an SOCP is also an SDP.

Note:

  • In order to use DCP based tools (to be discussed shortly), you do not need to know how to classify a problem as LP, QP, QCQP, SOCP or SDP. The DCP based modeling tool will do it for you!
  • The terms program and optimization problem mean the same thing. We usually say linear program but optimization problem.

Relation to cones

We will explain the concept of cones in another post, however, one takeaway from this post should be the fact that the types of convex optimization problems mentioned above are each related to certain certain cones.

  • A Linear Program (LP) can be represented as a conic program with respect to non-negative orthant cone, denoted as \(\mathbb{R}^n_+\) in mathematics.
  • SOCPs can be represented as conic programs with respect to the second order cone (also called a 2-norm cone or an “ice-cream” cone).
  • SDPs can be represented as conic programs with respect to the positive semi-definite (PSD) cone, denoted as \(\mathbb{S}^n_+\) in mathematics. Yes, this is a cone where each point represents a positive-semidefinite matrix. A visualization of this cone is provided in Section 2.2 of [2].

Most solver software are actually conic solvers. We need to convert convex optimization problems into conic problems in order to use these conic solvers. This is usually handled by DCP based modeling tools (to be discussed shortly).

Solving Convex Optimization problems

Just like non-convex problems (e.g. minimizing the loss of a deep neural network), convex optimization problems can also be solved using Gradient Descent and Newton’s Method. In fact, both of these algorithms are actually used by some solvers!

This does not necessarily mean that training neural networks is always a non-convex problem. In fact, some neural networks can be convex. A simple example can be a single layer perceptron with appropriate activations and loss function.

Therefore, some optimization algorithms can be used by both convex and non-convex optimization problems.

Convexity does however buy us the ability to use optimization algorithms beyond Gradient Descent. A lot of solvers specific to convex optimization problems employ Interior Point Methods (IPMs). A special instance of IPMs are barrier methods.

The ability to use Primal-Dual Interior Point Methods offers many benefits, which are beyond the scope of this post. For that, I would like to defer the reader to [1].

Convex Optimization software tools

They can be broken down into two categories:

  1. The modeling systems that allow the user to specify a problem in it’s natural form, and translate it into a valid problem for the solver software.
  2. The solver software themselves.

Disciplined Convex Programming (DCP)

This was introduced by Michael Grant as his PhD Thesis and is in fact what makes category 1 of software even possible, for convex problems.

We earlier mentioned that there are rules to check and identify whether the objective we are minimizing (or maximizing) is convex (or concave). We also mentioned that despite there being rules to check convexity, it’s not always easy. Fortunately, thanks to DCP, software has been made possible which can do this for us (see footnote [3]), as well as translate the problem into a valid input for the solver software.

A DCP based modeling software will take a problem in its natural form, and identify whether it qualifies as a valid LP, QP, QCQP, SOCP or SDP (amongst others not mentioned in this post) and will then call the appropriate solver.

DCP based tools have a set of rules for the problems that are a superset of the rules needed to check convexity (refer again to footnote [3]). When these additional rules are respected by the objective function and the constraints, DCP software tools are able to classify the problem as convex. Then they will transform the problem to match the API of a solver which is an external software that will actually solve the problem.

Are there any downsides to using DCP based tools? Well yes. Since DCP rules are a superset of the rules needed to check convexity, there would be some convex problems which would meet the convexity rules of mathematics but not the DCP rules. In that case, despite the problem actually being convex, DCP modeling software will not be able to identify them as convex, and will not be able to translate the problem for us! In those cases, we will need to use some techniques to modify the problem into an equivalent problem that can be identified by the software as convex. Once we solve this equivalent problem (which may have a different objective), we can simply recover the optimal solution and objective of the original problem from this equivalent problem.

Modeling systems

Bear in mind that modeling systems extend beyond the ones listed here, because modeling systems are also not restricted to just Convex Optimization problems. We restrict our attention to convex problems for now, and hence restrict our attention to DCP based modeling systems.

There currently exist four major DCP based tools. At time of writing this post, all of the below are constantly expanding their solver support and you may need to refer to their respective websites for an up to date listing of the supported solvers.

CVX (MATLAB)

CVX is by Michael Grant and Stephen Boyd, who introduced DCP.

Since two great SQLP solvers, SeDuMi and SDPT3 are written in MATLAB, it has the edge that it can natively support them. It also supports the commercial solvers Gurobi and MOSEK.

Remember that an SDP solver can solve all the special cases, i.e. LP, QP, QCQP, and SOCP.

YALMIP (MATLAB)

YALMIP is written by Johan Löfberg. It stands for Yet Another LMI Parser and was named as such because initially it was meant to parse only Linear Matrix Inequalities (LMIs) and hence primarily meant for only SDPs (the constraints of an SDP are LMIs).

However, as of version 3, it supports all the special cases, i.e. LP, QP, QCQP, and SOCP. A question may arise that if it originally supported “only” SDPs, then it should naturally support all the special cases as well. While that’s true, it means that it was originally not checking for the special cases and would use an SDP solver regardless of the nature of the problem. Keep reading to find out why this wasn’t a good thing to do.

By solver support, YALMIP is a superset of CVX. It currently has the largest number of supported solvers amongst all DCP based modeling software. Since YALMIP is also written in MATLAB, it also seamlessly supports SeDuMi and SDPT3.

CVXPY (Python)

CVXPY is based in Python and would be your most likely choice if you want DCP in your open source project. While originally created by a team from Stanford, it’s now supported by the open source community.

It comes with the open source solvers ECOS, OSQP, and SCS pre-included but many others can be invoked through CVXPY if installed separately.

CVXOPT, GLPK, SCIP and SCIPY are some other great open source solvers that CVXPY supports, if installed separately, as well as the commercial solvers Gurobi and MOSEK.

Python is the favorite language of the open source community and CVXPY personally remains my favorite DCP toolbox. However, some users in academia may prefer Julia and MATLAB over Python as 1-indexed matrices (used by Julia and MATLAB) are more intuitive to them than 0-indexed matrices (used by Python and most other programming languages).

Convex.jl (Julia)

Julia is a relatively new programming language, designed for scientific computing. Convex.jl is a Julia package for Disciplined Convex Programming. Like CVXPY, it was also originally created by a team from Stanford but is now supported by the open source community.

CVXPY and Convex.jl have the edge over CVX and YALMIP in sense that MATLAB requires a license while Python and Julia are free suites (Octave as a replacement for MATLAB is not officially supported, but can be used). However, like we discussed, CVX (which is based in MATLAB), at least at the time of writing this guide is the most stable software around.

Solver Software

Choosing a solver requires you to decide between whether you want a free and open source tool or are you willing to get a commercial solver? Secondly, you have to choose a solver that indeed supports the type of Convex Optimization problem you are trying to solve.

This section on choosing a solver in the CVXPY documentation provides a nice tabulated view of the types of problems different solvers can solve. A more exhaustive list is given in the YALMIP documentation. One solver not listed in either list is COSMO.jl, which is a pure Julia solver and hence available only with Convex.jl.

A lot of solvers come from academia, e.g. the SDP solvers Csdp, DSDP, SDPA, SeDuMi and SDPT3. SeDuMi and SDPT3 are mostly implemented in MATLAB, the reason they are natively supported only on CVX and YALMIP.

There are commercial solvers as well, e.g. Gurobi, MOSEK, IBM CPLEX and FICO XPRESS.

We discussed that LP is a special case of QP. Does that mean that if we have two solvers available, we should always choose the one that solves the larger set of problems? The short answer is no. The more specific the solver, the more efficient it can be. This is because the solver exploits the special structure of the problem to skip unnecessary computations that would be needed only if the problem was a QP but not an LP. That being said, most solvers internally do check for these special cases and to get a definite answer on which is better for your case, you might be better off trying all available solvers and doing some heuristic testing.

DCP Example

Using one of the DCP based modeling systems mentioned above, it is really easy to solve Convex Optimization problems using Disciplined Convex Programming. If (1) your optimization problem is a convex problem, and (2) you can express it using the DCP ruleset, the modeling system will translate it into a valid input for your chosen solver, and if the problem is feasible, your solver will solve it, returning the solution to CVXPY.

I prefer Python for code on this blog, since it’s easily accessible to most readers. Hence, I will run an example in CVXPY.

This problem is from the Additional Exercises of Convex Optimization by Boyd and Vandenberghe [2]. I removed the last four currencies for easily printable results. The raw problem data can be found at this link.

Problem Description

A currency exchange firm is re-balancing their ledger such that they get at least \(c_i^\textrm{req}\) USD worths of each currency \(i\). We aim to minimize the cost of this exchange (or equivalently, maximize our profit during this exchange).

A currency is valued in USD by the geometric mean of it’s bid and ask prices.

Defining problem data

\(F_{ij}\) in the upper right trianglular part of the matrix is the number of units of currency \(j\) it costs to buy one unit of currency \(i\). We can call it the bid price of currency \(i\) in terms of currency \(j\).

\(\frac{1}{F_{ij}}\) is the number of units of currency \(j\) it costs to buy one unit of currency \(i\). We can call it the bid price of currency \(j\) in terms of currency \(i\).

\(F_{ji}\) in the lower left trianglular part of the matrix is the ask price of currency \(j\) in terms of currency \(i\). Ask prices are 1.035x of the bid prices.

# A few necessary imports
import numpy as np
import cvxpy as cvx
import matplotlib.pyplot as plt

# Exchange rate data.
tickers = ["USD", "EUR", "GBP", "CAD", "JPY", "CNY"]
n = len(tickers)

F = np.zeros((n, n))
# USD
data = ([1.0, 0.87, 0.76, 1.31, 108.90, 6.72],
# EUR
[1.0, 0.88, 1.51, 125.15, 7.72],
# GBP
[1.0, 1.72, 142.94, 8.82],
# CAD
[1.0, 82.93, 5.11],
# JPY
[1.0, 0.062],
# CNY
[1.0])

# copy the data into upper triangle of matrix `F`
# these are the ask prices
for i in range(n):
    F[i,i:] = data[i]

# invert them to find the bid prices and
# store in lower triangle of matrix `F`
for j in range(n):
    for i in range(j+1,n):
        F[i,j] = 1.035/F[j,i]

# Define initial and final portfolio configurations
# i.e. `c_init` and `c_req`
c_req = np.arange(1,n+1)
c_req = 1e4*c_req/c_req.sum()
c_init = c_req[::-1]

Approach

We start off with how the amounts paid during the exchange and amounts received during the exchange are computed (with \(X\) representing the amounts to be exchanged and \(F\) representing the exchange rates).

\(X \frac{1}{F}\) is element wise multiplication of \(X\) and \(\frac{1}{F}\) and \(X \frac{1}{F} = \frac{X}{F}\)represents the converted amounts.

\(\left( \frac{X}{F} \right) \vec{1}\) is the row wise sum of \(\frac{X}{F}\) and represents the vector of the amounts in USD of each currency received during exchange.

\(\vec{1}^T X\) is the column wise sum of \(X\) and represents amounts in USD of each currency paid during the exchange.

the expression for \(c_i^\textrm{final}\) can be formulated as

\[\tag{1} c_i^\textrm{final} = \left( \frac{X}{F} \right) \vec{1} - \vec{1}^T X + c_i^\textrm{init}\]

The objective

Minimizing currency exchange cost can be seen as maximizing the currency exchange profit. We can solve it either way. Recall that for a convex optimization problem, multiplying the objective by a negative constant and maximizing the resulting concave objective is an equivalent problem. Eventually, the profit will be negative, which means this exchange will only incur a loss, however, I formulate the problem as maximizing profit (instead of minimizing cost) as an example to show that you could do it either way.

Since the valuation of a currency holding is in USD, represented by \(\sqrt{\frac{F_{j,1}}{F_{1,j}}}\), the profit can be formulated as below:

\[\sum_{i=1}^n (c_i^\textrm{final} - c_i^\textrm{init}) \sqrt{\frac{F_{j,1}}{F_{1,j}}}\]

We will maximize it.

The constraints

The constraints are pretty much as follows:

Exchanges must be positive (to make sure we don’t invert bid and ask, which would almost certainly incur loss).

\[X_{i,j} \geq 0\]

Exchanges may not exceed the initial reserve, this is also the limit of exchanges we can do for each currency.

\[\vec{1}^T X \preceq c_i^\textrm{init}\]

No use exchanging a currency with itself. We should restrict this because these extra exchanges eat up the allowed limit of exchanges (i.e. \(c_i^\textrm{init}\)).

\[\textbf{diag}(X) = 0\]

After the exchange, we must have at least \(c_i^\textrm{req}\) of each currency \(i\). This is the main constraint and the reason we have to rebalance our ledger.

\[c_i^\textrm{final} \succeq c_i^\textrm{req}\]

Identifying Convexity and the Optimization Variables

We will solve for the matrix of exchanges \(X\), which is the only optimization variable here.

We need not check for convexity as CVXPY will check for DCP rules which include convexity rules. However, just for the sake of discussion, we see that the objective and the constraints are pretty much all linear functions in \(X\), making this a Linear Program.

\[\begin{aligned} \max_X \quad & \sum_{i=1}^n (c_i^\textrm{final} - c_i^\textrm{init}) \sqrt{\frac{F_{j,1}}{F_{1,j}}}\\ \textrm{s.t.} \quad & X_{i,j} \geq 0\\ & \vec{1}^T X \preceq c_i^\textrm{init}\\ & \textbf{diag}(X) = 0\\ & c_i^\textrm{final} \succeq c_i^\textrm{req} \end{aligned}\]

The objective is a linear function of \(X\) because of \(c_i^\textrm{final}\) (see equation 1 above). The rest of the terms are constants.

CVXPY Implementation and Results

Define the optimization variables, constraints, objective, and call the solver.

# define the optimization variables
X = cvx.Variable((n,n))

# some intermediate variables (defined for clarity)
ask_prices = F[:, 0].T
bid_prices = F[0, :]
valuations = np.sqrt(np.divide(ask_prices, bid_prices))

# the @ symbol is used for matrix-matrix multiplication
c_final = cvx.vec(cvx.multiply(X, 1/F) @ np.ones((n, 1)) - X.T @ np.ones((n, 1))) + c_init

# define the objective
objective = (valuations @ (c_final - c_init))

# define the constraints
constraints = [
    X >= 0,
    cvx.vec(X.T @ np.ones((n, 1))) <= c_init,
    cvx.diag(X) == 0,
    c_final >= c_req
]

# define the problem
prob = cvx.Problem(cvx.Maximize(objective), constraints)

# list available solvers
print(cvx.installed_solvers())

# solve using any two solvers
for solver in [("ECOS", cvx.ECOS), ("SCIPY", cvx.SCIPY)]:
    prob.solve(solver=solver[1], verbose=False)
    print("[Using solver: %s, status: %s] Max profit is %.2f" % (solver[0], prob.status, prob.objective.value))

I deliberately solved it using two solvers. ECOS is an SOCP solver, while SCIPY solver is purely an LP solver. We converge to a value of -12.6 using both solvers, which is the global optimum value of the profit we can achieve while doing this exchange (this was expected because the original problem wanted us to minimize the “cost” of the exchange, while we maximized the profit). The cost of exchange is hence $12.6. Bear in mind that this is the optimal value of the problem. The solution of the problem is the value of the optimization variables, which in this case happens to be the matrix X:

Let’s write some code to pretty print it:

print("\t".join([""] + tickers))

for ask, row in zip(tickers, X.value):
    print("\t".join([ask]+['%.2f' % val for val in row.tolist()]))

We see the below table.

	USD	EUR	GBP	CAD	JPY	CNY
USD	-0.00	-0.00	-0.00	-0.00	-0.00	-0.00
EUR	-0.00	-0.00	-0.00	-0.00	-0.00	-0.00
GBP	-0.00	-0.00	-0.00	-0.00	-0.00	-0.00
CAD	293.53	-0.00	-0.00	-0.00	952.38	476.19
JPY	22.63	-0.00	-0.00	-0.00	-0.00	-0.00
CNY	440.05	-0.00	-0.00	-0.00	-0.00	-0.00

Let’s analyze for example the row corresponding to \(i\)=CAD.

We sold USD, JPY and CNY to buy CAD. There could be below reasons for this:

  1. Our holdings had much lesser CAD than we required.
  2. The exchange of USD, JPY and CNY for CAD was profitable by the bid-ask valuation of the currencies at the time.
  3. We had excess USD, JPY and CNY and the sale of USD 293.53, USD 952.38 and USD 476.19 of their respective quantities did not take their final amounts below what was required by the ledger.

Summary

Using DCP based modeling tools, we can specify problems in their natural form and the modeling tool will translate the problem into a valid input for our chosen solver. We used CVXPY as the modeling tool and ECOS and SCIPY as the solvers.

I hope this post helped you, and you were able to get a grasp of the tools used in Convex Optimization! While optimization is a complicated field, the tools need not be.

References and footnotes

[1] Nocedal, Jorge, and Stephen J. Wright, eds. Numerical optimization. New York, NY: Springer New York, 1999.

[2] Boyd, Stephen P., and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.

[3] DCP rules are almost a superset, but not exactly. For example, when the underlying problem is a QP or a QCQP, the convexity depends on whether or not the Hessian matrix of the QP/QCQP is positive-semidefinite. DCP rules do not check the Hessian matrix. For this reason, using DCP rules to check convexity has been advised against. In order to properly use DCP tools, it is recommended to ensure convexity of the problem manually before using a DCP based tool (and the proper use of these tools should be limited to translating the problem from its natural form to a conic problem that matches the API of the solver you want to use).

Comments