13 minute read

Interoperability of Python with C++ has made it a great language in the scientific community and beyond. Currently, there exist multiple options to interface Python code with C++. In this post, we briefly go over these options, and provide minimal working examples for each option.

Overview

Multiple times I was asked by colleagues how they should translate their TensorFlow (or PyTorch) codes to C++ so they could see a “speed boost”.

While frameworks like TensorFlow and PyTorch do allow their users to write C++ only code and compile it [1] [2], users should be aware that this is not for a speed boost, rather for deployment purposes. Translating TensorFlow and PyTorch code from Python to C++ may reduce certain overheads (e.g. startup time) however, it will certainly not reduce the training time of your model. Why is this so? The answer lies in the way Python libraries work.

Most scientific computing libraries (e.g. NumPy, SciPy, TensorFlow, etc) have their backends written in C++ (and/or Fortran) and compiled into dynamic libraries (.dll on Windows and .so files on Linux/macOS). Just like standalone executables (i.e. the .exe files on Windows), dynamic libraries are also compiled and linked object code, the only difference being that they lack an entry point, i.e. a main() routine.

The Python front-ends of these libraries simply load those libraries, and pass your inputs to them. The main() routine hence lies in the Python interpreter, which dynamically links against and calls the desired library (e.g. libtensorflow.dll or libtensorflow.so file). An alternate approach to using these libraries is to write a C++ program and link it against libtensorflow.dll or libtensorflow.so.

Whether your TensorFlow model is in C++ or Python, the code that runs to perform the mathematical operations is always compiled code in the form of dynamic libraries.

CPython

We start off by talking about the Python interpreter CPython.

A language in itself, is a specification, or a set of standards. C++ for example, is defined by the ISO/IEC 14882 standard, the most recent version of which is currently C++ 20. The implementation of a language standard is provided by a compiler or an interpreter. For example, the compilers G++ and Microsoft Visual C++ happen to be some very commonly used implementations of the C++ language.

Technically, these compilers have the ability to modify the language syntax that a user must write in order to have them output a valid object code. However, in order for a compiler to be able to say that its a compiler for C++, it must adhere to the specifications and standards of the C++ language.

In some cases, a language standard may be accompanied by a reference implementation. For example, the Python language is always accompanied by a reference implementation, which is the Python interpreter made available at the official Python website. This (reference) interpreter is called CPython. There do however exist alternative implementations, which are also Python interpreters, but optimized for certain use cases.

CPython is the most popular implementation, and if you use Anaconda to manage your Python environments, the main channel of Anaconda repositories offers the CPython implementation. However, the second most popular implementation, i.e. PyPy is also available on the Anaconda repositories under the conda-forge channel. Most Python libraries provide wheels for both CPython and PyPy on the Python Package Index (PyPI), as well as on the Anaconda repositories.

CPython is very different from Cython, that we will talk about later (which is one of the ways to interoperate with C++).

Extending Python by directly writing C/C++

Extending Python is essentially adding extra code and functionality to its interpreter, CPython. This extra functionality may be added either by dynamic libraries (e.g. libtensorflow.so) which can be loaded by CPython, or the user may as well choose to rebuild CPython directly integrating the libtensorflow.so code inside CPython. The former is the most common and also the more practical technique. Whenever we install (or update) a Python package via pip or conda, if it has C++ backends, this .so or .dll file will be downloaded to your computer. Alternatively, sometimes the source package is downloaded, and the .so file is locally compiled on your computer.

CPython’s native C++ Extensions

Python’s official documentation has a section dedicated to extending Python with C/C++. We go over the process briefly and summarize it with a minimal working example.

In the example given in the documentation, the user is building a package called spam which will be importable in Python by doing import spam. The user wants to write the spam_system function of this package in C/C++. This function will be callable in Python as simply spam.system(...), after importing the package.

To this end, the user will write regular C++ code, to implement this spam_system function. In the documentation, this function has a signature as follows:

static PyObject* spam_system(PyObject* self, PyObject* args)

The user will include the Python.h header file in the code in order to be able to use Python symbols in C/C++ code. All of these symbols have the Py prefix, for example PyObject in the signature above. This code will be compiled by a C++ compiler (g++ on Linux/macOS and most likely vc++ on Windows, though MinGW can also be used).

The build process is made convenient by the distutils package of Python (though setuptools is often used as a better replacement for distutils). Python packages have a setup.py script which uses distutils (or setuptools). The user will write a setup.py script for the spam module. Since the official documentation is a long read, I have compiled the relevant parts into a minimal working example (MWE) which you can view on GitHub:

MWE on GitHub

The Extension object is used to wrap the metadata of the C++ extension, which includes the code files that comprise it. For example, continuing with the spam example, the code file that contains the spam_module function (let’s suppose it’s called spam_module.c) will be used to create this Extension object. The Extension module basically provides the functionality given by a buildsystem in regular C/C++ projects. If this .c file needs to include certain headers, or link against a compiled library, the Extension object will also take the location of those headers or libraries.

The setup() function (which is the bare minimum of the setup.py script) takes the list of extension objects that we need to compile along with the package. When a user installs this package from source, by issuing python setup.py install on the command line, the Extension object takes care of generating the compile command, appropriate to the operating system and the desired compiler. The output of this compile command will be a .so or .dll file (e.g. libspam.so) that resides in the same folder as the Python package. The exact compile command is also logged to the console output during installation.

What happens when we import the spam package? The .so or .dll file will be dynamically loaded by CPython, and when the user runs spam.system(...) in Python, the compiled C++ function will execute at the backend.

PyBind, a more powerful way to make extensions

In the previous example concerning Python’s native C++ Extensions, we saw a simple example where a C++ function was exposed to, and called by Python. This “default” capability of Python for writing C++ extensions can handle numeric data types, and some other simple types (i.e. strings, etc), but as you must be aware, C++ is not restricted to just numeric data types. In fact, you are free to define your own class, which can be the input or return type of a function. In that case, this “default” capability becomes very limited.

Thanks to PyBind, we can have “seamless interoperability” between Python and C++, allowing custom data types to be returned, besides a multitude of other capabilities. PyBind is widely used in today’s age. TensorFlow and PyTorch are both major users of PyBind. Since they have to link against a multitude of third party libraries, and need to expose their functions to Python code.

PyBind acts as a proper superset of the functionality provided by the default Python’s ability to write C++ Extensions. However, in case you do not need this sort of interoperability in your project, you probably do not need to make an additional dependency of your project on PyBind and you might want to stick with Python’s default ability to write extensions, as summarized in the previous section.

The process of building PyBind extensions is very similar to that of building native extensions. Instead of including Python.h in the code, the user will have to include the PyBind headers, i.e. <pybind11/pybind11.h>. In adddition, PyBind provides its replacement to distutils (or setuptools) in the form of pybind11.setup_helpers. pybind11.setup_helpers provides its own extension class, called Pybind11Extension which basically inherits from the Extension class of setuptools.

We discussed that the Extension module (when using Python’s native way to make extensions) provides the functionality usually given by a buildsystem in regular C/C++ projects (the buildsystem takes care of calling the compiler and the linker with the necessary arguments). In addition to using Pybind11Extension module, we can also use the awesome CMake buildsystem, though it needs a custom setup.py script to be written. PyBind is well documented and minimal working examples are provided for building PyBind extensions using Pybind11Extension as well as using the CMake buildsystem.

MWE using PyBind11 Extension MWE using CMake

PyBind is very powerful, and many more features are listed in the README of its GitHub repository.

Extending Python by translating it to C++

So far we discussed creating C++ extensions for Python where the user directly writes C++ code, which is compiled directly by a C++ compiler into a library, which is dynamically loaded by CPython.

There exists another great way to combine the convenience of writing Python code and the performance of compiled C++ code: Cython. Cython is considered a superset language of Python, with slightly different semantics. This approach works very differently in the way it interoperates with CPython. Cython is Python, or Python-like code, which is translated to C++ by a utility called cythonize, and then compiled by a C++ compiler, e.g. G++. In this case, the user does not have to directly write C++ code. The resulting C++ uses the same Python headers that we use when writing C++ extensions the native Python way (i.e. Python.h)

In order to use Cython, the cython package should be installed via pip or conda. The cythonize utility is also installed when cython is installed.

Using Cython is similar to the previous approaches in sense that it also involves building extensions, and the setup.py script is where we provide the list of extensions. This time, they are Cython extensions, and they are represented by a cythonize class. See the quickstart page of the documentation for a quick visualization. However, the cythonize utility may as be invoked directly from the command line to convert Cython code to C++.

Cython without type information

Naturally, if we are to translate Cython (Python-like) code into C++, the programmer writing Cython should provide all information in the code which a C++ compiler would need. Cython, just like Python is a dynamically typed language, where the data types need not be provided when declaring variables, in fact declaration itself is unnecessary. C++ however, is statically typed, and variables must be declared with types. For example, a = 0 is sufficient in Python while C++ code must say something like int a = 0.

Bare minimal Python code, therefore becomes insufficient to encapsulate all information required to compile it as C++. That being said, Cython can still use some “defaults” to make up for this information, however, usually this will not provide the speedup that Cython is capable of providing.

The documentation starts off with the following Python code as an example. It mentions that barely compiling the above with Cython yields only a 35% speedup. Let’s see why.

def f(x):
    return x ** 2 - x

def integrate_f(a, b, N):
    s = 0
    dx = (b - a) / N
    for i in range(N):
        s += f(a + i * dx)
    return s * dx

Since we are only concered with the resulting C++ code, let’s skip the extra work required to build extensions for now (which would involve writing a setup.py). We can directly save the above code as integrate.pyx and convert it to C++ using cythonize utility. The below command will yield a translated .c file as well as compile it using your native compiler to yield a .so or .dll library file.

cythonize -i integrate.pyx

We do not need the compiled library, as we do not need to run it. We will come back to the integrate.c file shortly.

Cython with type information

Python language does have optional features, for example decorators and optional type annotations to force certain types when writing functions. In order to be able to use Cython properly, we can use these features to provide this extra information.

Now, let’s also compile the following code (which provides the extra information).

import cython

def f(x: cython.double):
    return x ** 2 - x

def integrate_f(a: cython.double, b: cython.double, N: cython.int):
    i: cython.int
    s: cython.double
    dx: cython.double
    s = 0
    dx = (b - a) / N
    for i in range(N):
        s += f(a + i * dx)
    return s * dx

We could save the above as integrate_typed.pyx and similarly convert it to C++ as we did for integrate.pyx.

cythonize -i integrate_typed.pyx

At this stage, we should have both integrate.c as well as integrate_typed.c. We can use any comparison tool to compare these two code files. At a first glance, it should be visible that while integrate.c (which did not have type information) used PyObject* to represent a and b, integrate_typed.c used double to represent them. This additional information is what makes integrate_typed.c perform better. According to the documentation, the second approach yields a 150 times speedup over Python, compared to 35% of the first approach.

Different ways to write Cython

So far we have used the word Cython and Python-like interchangeably. It should be noted that there are two ways to write Cython. One is pure Python (which is what we saw in the above examples). The other is called Cython and is slightly different from Python. To show the Cython code as an example, we rewrite the second (typed) example using Cython.

def f(double x):
    return x ** 2 - x

def integrate_f(double a, double b, int N):
    cdef int i
    cdef double s
    cdef double dx
    s = 0
    dx = (b - a) / N
    for i in range(N):
        s += f(a + i * dx)
    return s * dx

Notice how we did not write import cython in the beginning. This is because this code file is exclusively Cython. It will compile fine using the same cythonize command.

Another point worth mentioning is that Cython currently has two supported versions, 0.x as well as version 3.x. The Python-like code we write for cythonize is Python 2 if we are using 0.x and is Python 3 if we are using version 3.x of the cython package.

Summary

We saw two ways to extend Python with compiled C++. We could either directly write C++ code or generate C++ from Python-like code. Choosing between the two is merely user preference. Directly writing C++ code offers finer grained control than using Cython to translate Python into C++.

Another benefit of the first approach is that this method is not restricted to just C++, in fact, the user may choose to write code in another language, for example Fortran, and invoke a Fortran compiler instead (e.g. gfortran. This however, would require overriding the Extension class such that the right compiler is invoked. Fortunately, using the CMake buildsystem makes this much simpler, in fact, it offers the user much more control over the build process.

It is also worth mentioning that both PyBind and Cython build upon the native Python functionality of building C++ extensions, though both do so in different ways (PyBind code is directly written in C++ while Cython translates Python-like code into C++).

References

[1] Using TensorFlow in C

[2] Using PyTorch in C++

Categories:

Updated:

Comments