Tutorial

Function Generation

To generate all functions at a given complexity (here complexity 5), one simply needs to run the following.

import esr.generation.duplicate_checker

runname = 'core_maths'
comp = 5
esr.generation.duplicate_checker.main(runname, comp)

In esr.generation.duplicate_checker we have predefined a few sets of functions which we believe would be useful. However, one simply needs to add another option to the start of that script to define a new run:

if runname == 'keep_duplicates':
        basis_functions = [["x", "a"],  # type0
                ["square", "exp", "inv", "sqrt_abs", "log_abs"],  # type1
                ["+", "*", "-", "/", "pow"]]  # type2
elif runname == 'core_maths':
        basis_functions = [["x", "a"],  # type0
                ["inv"],  # type1
                ["+", "*", "-", "/", "pow"]]  # type2
elif runname == 'ext_maths':
        basis_functions = [["x", "a"],  # type0
                ["inv", "sqrt_abs", "square", "exp"],  # type1
                ["+", "*", "-", "/", "pow"]]  # type2
elif runname == 'osc_maths':
        basis_functions = [["x", "a"],  # type0
                ["inv", "sin"],  # type1
                ["+", "*", "-", "/", "pow"]]  # type2

where type 0, 1 and 2 functions are nullary, unary, and binary, respectively.

Fitting to a dataset

Suppose we have already generated the equations required for the CCLikelihood class. In the following we show the steps that are required to fit the complexity 5 functions to these data. The various ``fitting’’ functions run rely on the output of the previous script, so the order cannot change.

import esr.fitting.test_all
import esr.fitting.test_all_Fisher
import esr.fitting.match
import esr.fitting.combine_DL
import esr.fitting.plot
from esr.fitting.likelihood import CCLikelihood

comp = 5
likelihood = CCLikelihood()

esr.fitting.test_all.main(comp, likelihood)
esr.fitting.test_all_Fisher.main(comp, likelihood)
esr.fitting.match.main(comp, likelihood)
esr.fitting.combine_DL.main(comp, likelihood)
esr.fitting.plot.main(comp, likelihood)

Once you have run this more many complexities, you can plot the pareto front and save it to file using the following function.

import esr.plotting.plot

esr.plotting.plot.pareto_plot(likelihood.out_dir, 'pareto.png')

Fitting a single function

One may wish to fit a single function to your data, instead of the full library produced during the function generation step. For example, suppose we wish to fit a \(\Lambda\) CDM expansion history to the cosmic chronometer data. This function is \(y(x) = \theta_0 + \theta_1 x^3\) which can be represented as the tree \([+, \theta_0, \times, \theta_1, {\rm pow}, x, 3]\) although we will rewrite \(\theta_0\) as a0, \(\theta_1\) as a1 and \(\times\) as *. The following script initially loads the cosmic chronometer data then fits the function to this data, returning the negative log-likelihood and the description length.

from esr.fitting.fit_single import single_function
from esr.fitting.likelihood import CCLikelihood

cc_like = CCLikelihood()

labels = ["+", "a0", "*", "a1", "pow", "x", "3"]
basis_functions = [["x", "a"],  # type0
                ["inv"],  # type1
                ["+", "*", "-", "/", "pow"]]  # type2

logl_lcdm_cc, dl_lcdm_cc = single_function(labels,
                                                basis_functions,
                                                cc_like,
                                                verbose=True)

One can also fit the function directly from writing it as a string. This will convert the string to a list of labels, which are also returned. Note that this conversion if not guaranteed to produce the tree representation with the shortest description length, but does provide an upper limit on the DL of a function.

from esr.fitting.fit_single import fit_from_string
from esr.fitting.likelihood import CCLikelihood

cc_like = CCLikelihood()

basis_functions = [["x", "a"],  # type0
                ["inv"],  # type1
                ["+", "*", "-", "/", "pow"]]  # type2

logl_lcdm_cc, dl_lcdm_cc, labels = fit_from_string("a0 + a1 * x ** 3",
                                                basis_functions,
                                                cc_like,
                                                verbose=True)

Custom Likelihoods

To fit a function to your own data, one must create an alternative likelihood using the parent class esr.fitting.likelihood.Likelihood. In the __init__() for this likelihood, you must define xvar, yvar and yerr (the x, y and error on y variables) and a function negloglike(self, a, eq_numpy, **kwargs) which returns the negative log-likelihood.

For example, a Gaussian likelihood can be defined as

from esr.fitting.likelihood import Likelihood
import numpy as np
import os

class GaussLikelihood(Likelihood):
    """Likelihood class used to fit a function directly using a Gaussian likelihood

    Args:
        :data_file (str): Name of the file containing the data to use
        :run_name (str): The name to be associated with this likelihood, e.g. 'my_esr_run'
        :data_dir (str, default=None): The path containing the data and cov files
        :fn_set (str, default='core_maths'): The name of the function set to use with the likelihood. Must match one of those defined in ``generation.duplicate_checker``

    """

    def __init__(self, data_file, run_name, data_dir=None, fn_set='core_maths'):

        super().__init__(data_file, data_file, run_name, data_dir=data_dir, fn_set=fn_set)
        self.ylabel = r'$y$'    # for plotting
        self.xvar, self.yvar, self.yerr = np.loadtxt(self.data_file, unpack=True)


    def negloglike(self, a, eq_numpy, **kwargs):
        """Negative log-likelihood for a given function.

        Args:
            :a (list): parameters to subsitute into equation considered
            :eq_numpy (numpy function): function to use which gives y

        Returns:
            :nll (float): - log(likelihood) for this function and parameters


        """

        ypred = self.get_pred(self.xvar, np.atleast_1d(a), eq_numpy)
        if not np.all(np.isreal(ypred)):
            return np.inf
        nll = np.sum(0.5 * (ypred - self.yvar) ** 2 / self.yerr ** 2 + 0.5 * np.log(2 * np.pi) + np.log(self.yerr))
        if np.isnan(nll):
            return np.inf
        return nll

although note that this is already included as esr.fitting.likelihood.GaussLikelihood. If you want to use a different set of functions than core_maths then this can be passed using the fn_set argument.

We can then combine the above code with the below to fit a mock dataset

import esr.fitting.test_all
import esr.fitting.test_all_Fisher
import esr.fitting.match
import esr.fitting.combine_DL
import esr.fitting.plot

np.random.seed(123)
x = np.random.uniform(0.1, 5, 100)
y = 0.5 * x ** 2
yerr = np.full(x.shape, 1.0)
y = y + yerr * np.random.normal(size=len(x))
np.savetxt('data.txt', np.array([x, y, yerr]).T)
likelihood = GaussLikelihood('data.txt', 'gauss_example', data_dir=os.getcwd())

comp = 5

esr.fitting.test_all.main(comp, likelihood)
esr.fitting.test_all_Fisher.main(comp, likelihood)
esr.fitting.match.main(comp, likelihood)
esr.fitting.combine_DL.main(comp, likelihood)
esr.fitting.plot.main(comp, likelihood)

We also have a Poisson likelihood already implemented, which can be run as

from esr.fitting.likelihood import PoissonLikelihood
import numpy as np
import os

import esr.fitting.test_all
import esr.fitting.test_all_Fisher
import esr.fitting.match
import esr.fitting.combine_DL
import esr.fitting.plot

np.random.seed(123)
x = np.random.uniform(0.1, 5, 100)
y = 0.5 * x ** 2
y = np.random.poisson(y)
np.savetxt('data.txt', np.array([x, y]).T)
likelihood = PoissonLikelihood('data.txt', 'poisson_example', data_dir=os.getcwd())

comp = 5

esr.fitting.test_all.main(comp, likelihood)
esr.fitting.test_all_Fisher.main(comp, likelihood)
esr.fitting.match.main(comp, likelihood)
esr.fitting.combine_DL.main(comp, likelihood)
esr.fitting.plot.main(comp, likelihood)