Example of time series analysis implemented in C++

This page illustrates a few concepts:

  1. fwdpy11 is extensible using both C++ and Python.

  2. Often, the amount of C++ code to write is about as long as the equivalent Python

  3. How to build plugins/extensions using cmake.

Using C++, we will write a callable that will record the mean trait value of a population over time. In essence, we will build a small Python extension module using pybind11 that will be called gvalue_recorder. The module will define a single Python function, gvalue_recorder.record_gvalue. That function will take a single Python list as an argument, and returns a Python object known as a capsule, which is a Python object holding something defined in another language (C++ in thise case). That capsule holds our callable, which follows the requirements described in Time series analysis.

Let’s just look at the code:

#include <pybind11/pybind11.h>
#include <pybind11/functional.h>
#include <fwdpy11/types/DiploidPopulation.hpp>
#include <fwdpy11/evolvets/SampleRecorder.hpp>

PYBIND11_MODULE(gvalue_recorder, m)
    m.def("record_gvalue", [](pybind11::list l) {
        return pybind11::cpp_function(
            [l](const fwdpy11::DiploidPopulation &pop,
                fwdpy11::SampleRecorder &) {
                double mean_trait_value = 0.0;
                for (auto &md : pop.diploid_metadata)
                        mean_trait_value += md.g;
                l.append(mean_trait_value / static_cast<double>(pop.N));

The code is very simple:

  • The Python function is defined using a C++ lambda

  • The Python function returns a pybind11::cpp_function (which is a capsule) that contains another C++ lambda that “captures” the argument passed to the Python function.

We have a function returning a function, in other words.

The following Python script tests that our plugin behaves as expected:

import fwdpy11
import gvalue_recorder

N = 1000

pdict = {
    "demography": fwdpy11.DiscreteDemography(),
    "simlen": 100,
    "nregions": [],
    "sregions": [fwdpy11.GaussianS(0, 1, 1, 0.25)],
    "recregions": [fwdpy11.Region(0, 1, 1)],
    "rates": (0.0, 5e-3, 5e-3),
    "gvalue": fwdpy11.Additive(2.0, fwdpy11.GSS(VS=1.0, optimum=1.0)),
    "prune_selected": False,

params = fwdpy11.ModelParams(**pdict)

pop = fwdpy11.DiploidPopulation(N, 1.0)

rng = fwdpy11.GSLrng(42)

gvalues = []
r = gvalue_recorder.record_gvalue(gvalues)
fwdpy11.evolvets(rng, pop, params, 100, r)

assert len(gvalues) == pop.generation, "Callback failure"

For the record, the Python equivalent to what we’ve generated in C++ is:

gvalues = []

def r(pop, sampler):
    md = np.array(pop.diploid_metadata, copy=False)

I expect the performance of both methods to be nearly equivalent, largely because traversing the metadata once per generation is extremely fast compared to everything else going on in a simulation.

The plugin is built using cmake. Note the execute_process steps, which find the fwdpy11 headers for you.

cmake_minimum_required(VERSION 2.8.12)

# As of 0.8.0, fwdpy11
# is compiled with the C++14 language
# standard (-std=c++14)
message(STATUS "Found pybind11: ${pybind11_VERSION}")
if(${pybind11_VERSION} VERSION_LESS '2.4.3')
    message(FATAL_ERROR "pybind11 version must be >= '2.4.3'")

execute_process(COMMAND python3 -m fwdpy11 --fwdpy11_headers OUTPUT_VARIABLE FP11HEADERS)
execute_process(COMMAND python3 -m fwdpy11 --fwdpp_headers OUTPUT_VARIABLE FWDPPHEADERS)

find_package(GSL REQUIRED)
include_directories(BEFORE ${FP11HEADERS} ${FWDPPHEADERS})
message(STATUS "GSL headers in ${GSL_INCLUDE_DIRS}")
include_directories(BEFORE ${GSL_INCLUDE_DIRS})


pybind11_add_module(gvalue_recorder MODULE gvalue_recorder.cc)
target_link_libraries(gvalue_recorder PRIVATE GSL::gsl GSL::gslcblas)