API

generation

duplicate_checker

generation.duplicate_checker.main(runname, compl, track_memory=False, search_tmax=60, expand_tmax=1, seed=1234)[source]

Run the generation of functions for a given complexity and set of basis functions

Args:
runname (str):

name of run, which defines the basis functions used

compl (int):

complexity of functions to consider

track_memory (bool, default=True):

whether to compute and print memory statistics (True) or not (False)

search_tmax (float, default=60.):

maximum time in seconds to run any one part of simplification procedure for a given function

expand_tmax (float, default=1.):

maximum time in seconds to run any one part of expand/simplify procedure for a given function

seed (int, default=1234):

seed to set random number generator for shuffling functions (used to prevent one rank having similar, hard to simplify functions)

Returns:

None

custom_printer

class generation.custom_printer.ESRPrinter(settings=None)[source]

Bases: Printer

A Printer for generating readable representation of most SymPy classes as required for ExhaustiveSR.

Only difference to sympy’s StrPrinter is to use pow(.,.) instead of ** unless integer powers used

emptyPrinter(expr)[source]
parenthesize(item, level, strict=False)[source]
printmethod: str = '_sympystr'
stringify(args, sep, level=0)[source]
class generation.custom_printer.ESRReprPrinter(settings=None)[source]

Bases: ESRPrinter

(internal) – see sstrrepr

generator

class generation.generator.DecoratedNode(fun, basis_functions, parent_op=None, parent=None)[source]

Bases: object

count_nodes(basis_functions)[source]
Basis_functions (list):

list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators

from_node_list(idx, nodes, basis_functions, parent_op=None, parent=None)[source]
get_lineage()[source]
get_sibling_lineage()[source]
get_siblings()[source]
is_unity()[source]
to_list(basis_functions)[source]
class generation.generator.Node(t)[source]

Bases: object

assign_op(op)[source]
copy()[source]
is_used()[source]
generation.generator.aifeyn_complexity(tree, param_list)[source]

Compute contribution to description length from describing tree

Args:
tree (list):

list of strings giving node labels of tree

param_list (list):

list of strings of all possible parameter names

Returns:
aifeyn (float):

the contribution to description length from describing tree

generation.generator.check_operators(nodes, basis_functions)[source]

Check whether all operators in the tree are in the basis

Args:
nodes (DecoratedNode):

Node representation of the function tree

basis_functions (list):

list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators

Returns:
all_in_basis (bool):

Whether all functions in tree are in basis

generation.generator.check_tree(s)[source]

Given a candidate string of 0, 1 and 2s, see whether one can make a function out of this

Args:
s (str):

string comprised of 0, 1 and 2 representing tree of nullary, unary and binary nodes

Returns:
success (bool):

whether candidate string can form a valid tree (True) or not (False)

part_considered (str):

string of length <= s, where s[:len(part_considered)] = part_considered

tree (list):

list of Node objects corresponding to string s

generation.generator.find_additional_trees(tree, labels, basis_functions)[source]

For a given tree, try to find all simpler representations of the function by combining sums, exponentials and powers

Args:
tree (list):

list of Node objects corresponding to tree of function

labels (list):

list of strings giving node labels of tree

basis_functions (list):

list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators

Returns:
new_tree (list):

list of equivalent trees, given as lists of Node objects

new_labels (list):

list of lists of strings giving node labels of new_tree

generation.generator.generate_equations(compl, basis_functions, dirname)[source]

Generate all equations at a given complexity for a set of basis functions and save results to file

Args:
compl (int):

complexity of functions to consider

basis_functions (list):

list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators

dirname (str):

directory path to save results in

Returns:
all_fun (list):

list of strings containing all functions generated

extra_orig (list):

list of strings containing functions generated by combining sums, exponentials and powers of the functions in all_fun as they appear in all_fun

generation.generator.get_allowed_shapes(compl)[source]

Find the shapes of all allowed trees containing compl nodes

Args:
compl (int):

complexity of tree = number of nodes

Returns:
cand (list):

list of strings comprised of 0, 1 and 2 representing valid trees of nullary, unary and binary nodes

generation.generator.is_float(string)[source]

Determine whether a string is a float or not

Args:
string (str):

The string to check

Returns:

bool: Whether the string is a float (True) or not (False).

generation.generator.labels_to_shape(labels, basis_functions)[source]

Find the representation of the shape of a tree given its labels

Args:
labels (list):

list of strings giving node labels of tree

basis_functions (list):

list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators

Returns:
s (str):

string comprised of 0, 1 and 2 representing tree of nullary, unary and binary nodes

generation.generator.node_to_string(idx, tree, labels)[source]

Convert a tree with labels into a string giving function

Args:
idx (int):

index of tree to consider

tree (list):

list of Node objects corresponding to the tree

labels (list):

list of strings giving node labels of tree

Returns:

Function as a string

generation.generator.shape_to_functions(s, basis_functions)[source]

Find all possible functions formed from the given list of 0s, 1s and 2s defining a tree and basis functions

Args:
s (str):

string comprised of 0, 1 and 2 representing tree of nullary, unary and binary nodes

basis_functions (list):

list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators

Returns:
all_fun (list):

list of strings containing all functions generated directly from tree

all_tree (list):

list of lists of Node objects corresponding to the trees of functions in all_fun

extra_fun (list):

list of strings containing functions generated by combining sums, exponentials and powers of the functions in all_fun

extra_tree (list):

list of lists of Node objects corresponding to the trees of functions in extra_fun

extra_orig (list):

list of strings corresponding to original versions of extra_fun, as found in all_fun

generation.generator.string_to_expr(s, kern=False, evaluate=False, locs=None)[source]

Convert a string giving function into a sympy object

Args:
s (str):

string representation of the function considered

kern (bool):

whether to use sympy’s kernS function or sympify

evaluate (bool):

whether to use powsimp, factor and subs

locs (dict):

dictionary of string:sympy objects. If None, will create here

Returns:
expr (sympy object):

expression corresponding to s

generation.generator.string_to_node(s, basis_functions, locs=None, evalf=False, allow_eval=True, check_ops=False)[source]

Convert a string giving function into a tree with labels

Args:
s (str):

string representation of the function considered

basis_functions (list):

list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators

locs (dict):

dictionary of string:sympy objects. If None, will create here

evalf (bool):

whether to run evalf() on function (default=False)

allow_eval (bool, default=True):

whether to run the (kernS=False and evaluate=True) option

check_ops (bool, default=False):

whether to check all operators appear in basis functions

Returns:
tree (list):

list of Node objects corresponding to the tree

labels (list):

list of strings giving node labels of tree

generation.generator.update_sums(tree, labels, try_idx, basis_functions)[source]

Try to combine sums to make simpler representations of functions

Args:
tree (list):

list of Node objects corresponding to tree of function

labels (list):

list of strings giving node labels of tree

try_idx (int):

when we have multiple substituions we can attempt, this indicates which one to try

basis_functions (list):

list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators

Returns:
new_labels (list):

list of strings giving node labels of new tree

new_shape (list):

list of 0, 1 and 2 representing whether nodes in new tree are nullary, unary or binary

nadded (int):

number of new functions added

generation.generator.update_tree(tree, labels, try_idx, basis_functions)[source]

Try to combine exponentials and powers to make simpler representations of functions

Args:
tree (list):

list of Node objects corresponding to tree of function

labels (list):

list of strings giving node labels of tree

try_idx (int):

when we have multiple substituions we can attempt, this indicates which one to try

basis_functions (list):

list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators

Returns:
new_labels (list):

list of strings giving node labels of new tree

new_shape (list):

list of 0, 1 and 2 representing whether nodes in new tree are nullary, unary or binary

nadded (int):

number of new functions added

simplifier

exception generation.simplifier.TimeoutException[source]

Bases: Exception

generation.simplifier.check_results(dirname, compl, tmax=10)[source]

Check that all functions can be recovered by applying the subsitutions to the unique functions. If not, define a new unique function and save results to file.

Args:
dirname (str):

name of directory containing all the functions to consider

compl (int):

complexity of functions to consider

tmax (float, default=10.):

maximum time in seconds to run the substitutions

Returns:

None

generation.simplifier.convert_params(p_meas, fish_meas, inv_subs, n=4)[source]

Convert parameters from those in unique function to those in actual function

Args:
p_meas (list):

list of measured parameters in unique function

fish_meas (list):

flattened version of the Hessian of -log(likelihood) at the maximum likelihood point

inv_subs (list):

list of substitutions required to convert between all and unique functions

n (int, default=4):

the number of dimensions of the array from which fish_meas was computed using

Returns:
p_new (list):

list of parameters for the actual function

diag_fish (np.array):

the diagonal entries of the Fisher matrix of the actual function at the maximum likelihood point

generation.simplifier.count_params(all_fun, max_param)[source]

Count the number of free parameters in each member of a list of functions

Args:
all_fun (list):

list of strings containing functions

max_param (int):

maximum number of free parameters in any equation in all_fun

Returns:
nparam (np.array):

array of ints containing number of free parameters in corresponding member of all_fun

generation.simplifier.do_sympy(all_fun, all_sym, compl, search_tmax, expand_tmax, dirname, track_memory=False)[source]

Run the duplicate checking procedure

Args:
all_fun (list):

list of strings containing all functions

all_sym (OrderedDict):

dictionary of sympy objects which can be accessed by their string representations.

compl (int):

search_tmax (float, default=1.):

maximum time in seconds to run any one part of simplification procedure for a given function

expand_tmax (float, default=1.):

maximum time in seconds to run any one part of expand/simplify procedure for a given function

dirname (str):

directory path to save results in

track_memory (bool, default=True):

whether to compute and print memory statistics (True) or not (False)

Returns:
all_fun (list):

list of strings containing all (updated) functions

all_sym (list):

dictionary of (updated) sympy objects which can be accessed by their string representations.

count (int):

number of rounds of optimisation which were performed

generation.simplifier.expand_or_factor(all_sym, tmax=1, method='expand')[source]

Run the sympy expand or factor functions

Args:
all_sym (OrderedDict):

dictionary of sympy objects which can be accessed by their string representations.

tmax (float, default=1.):

maximum time in seconds to run any one part of expand/simplify procedure for a given function

method (str, default=’expand’):

whether to run expand (‘expand’) or simplify (‘simplify’). All other options are ignored

Returns:
all_sym (OrderedDict):

dictionary of (updated) sympy objects which can be accessed by their string representations.

generation.simplifier.get_all_dup(max_param)[source]

Finds self-inverse transformations of parameters, to be used in simplify_inv_subs(inv_subs, all_dup)

Args:
max_param (int):

maximum number of parameters to consider

Returns:
all_dup (list):

list of dictionaries giving subsitutions which are self-inverse

generation.simplifier.get_max_param(all_fun, verbose=True)[source]

Find maximum number of free parameters in list of functions

Args:
all_fun (list):

list of strings containing functions

verbose (bool, default=True):

Whether to print result (True) or not (False)

Returns:
max_param (int):

maximum number of free parameters in any equation in all_fun

generation.simplifier.initial_sympify(all_fun, max_param, verbose=True, parallel=True, track_memory=False, save_sympy=True)[source]

Convert list of strings of functions into list of sympy objects

Args:
all_fun (list):

list of strings containing functions

max_param (int):

maximum number of free parameters in any equation in all_fun

verbose (bool, default=True):

whether to print progress (True) or not (False)

parallel (bool, default=True):

whether to split equations amongst ranks (True) or each equation considered by all ranks (False)

track_memory (bool, default=True):

whether to compute and print memory statistics (True) or not (False)

save_sympy (bool, default=True):

whether to return sympy objects (True) or not (False)

Returns:
str_fun (list):

list of strings containing functions

sym_fun (OrderedDict):

dictionary of sympy objects which can be accessed by their string representations. If save_sympy is False, then sym_fun is None.

generation.simplifier.load_subs(fname, max_param, use_sympy=True, bcast_res=True)[source]

Load the subsitutions required to convert between all and unique functions

Args:
fname (str):

file name containing the subsitutions

max_param (int):

maximum number of parameters to consider

use_sympy (bool, default=True):

whether to convert substituions to sympy objects (True) or leave as strings (False)

bcast_res (bool, default=True):

whether to allow all ranks to have the substitutions (True) or just the 0th rank (False)

Returns:
all_subs (list):

list of substitutions required to convert between all and unique functions. Each item is either a dictionary with sympy objects as keys and values (use_sympy=True) or a string version of this dictionary (use_sympy=False). If bcast_res=True, then all ranks have this list, otherwise only rank 0 has this list and all other ranks return None.

generation.simplifier.make_changes(all_fun, all_sym, all_inv_subs, str_fun, sym_fun, inv_subs_fun)[source]

Update global variables of functions and symbolic expressions by combining rank calculations

Args:
all_fun (list):

list of strings containing all functions

all_sym (list):

list of sympy objects containing all functions

all_inv_subs (list):

list of dictionaries giving subsitutions to be applied to all functions

str_fun (list):

list of strings containing functions considered by rank

sym_fun (list):

list of sympy objects containing functions considered by rank

inv_subs_fun (list):

list of dictionaries giving subsitutions to be applied to functions considered by rank

Returns:
all_fun:

list of strings containing all (updated) functions

all_sym (list):

list of sympy objects containing all (updated) functions

all_inv_subs:

list of dictionaries giving subsitutions to be applied to all (updated) functions

generation.simplifier.simplify_inv_subs(inv_subs, all_dup)[source]

Find if two consecutive {a0: -a0} or {a0: a1, a1: a0} or {a0: 1/a0} and then remove both of these

Args:
inv_subs (list):

list of dictionaries giving subsitutions to check

all_dup (list):

list of dictionaries giving subsitutions which are self-inverse

Returns:
all_subs (list):

list of dictionaries giving subsitutions without consecutive self-inverses

generation.simplifier.sympy_simplify(all_fun, all_sym, all_inv_subs, max_param, expand_fun=True, tmax=1, check_perm=False)[source]

Simplify equations and find duplicates.

Args:
all_fun (list):

list of strings containing all functions

all_sym (list):

list of sympy objects containing all functions

all_inv_subs (list):

list of dictionaries giving subsitutions to be applied to all functions

max_param (int):

maximum number of free parameters in any equation in all_fun

expand_fun (bool, default=True):

whether to run the sympy expand options (True) or not (False)

tmax (float, default=1.):

maximum time in seconds to run any one part of simplification procedure for a given function

check_perm (bool, default=False):

whether to check all possible permutations and inverses of constants (True) or not (False)

Returns:
all_fun:

list of strings containing all (updated) functions

all_sym (list):

list of sympy objects containing all (updated) functions

all_inv_subs:

list of dictionaries giving subsitutions to be applied to all (updated) functions

generation.simplifier.time_limit(seconds)[source]

Check function call does not exceed allotted time

Args:
seconds (float):

maximum time function can run in seconds

Raises:

TimeoutException if time exceeds seconds

utils

generation.utils.get_match_indexes(a, b)[source]

Returns indices in a of items in b

Args:
a (list):

list of values whose index in b we wish to determine

b (list):

list of values whose indices we wish to find

Returns:
result (list):

indices where corresponding value of a appears in b

generation.utils.get_unique_indexes(l)[source]

Find the indices of the unique items in a list

Args:
l (list):

list from which we want to find unique indices

Returns:
result (OrderedDict):

dictionary which returns index of unique item in l, accessed by unique item

match (dict):

dictionary which returns index of unique item in result, accessed by unique item

generation.utils.locals_size(loc)[source]

Find and print the total memory used by locals()

Args:
loc (dict):

dictionary of locals (obtained calling locals() in another script)

Returns:

None

generation.utils.merge_keys(all_fun, all_sym)[source]

Convert all_fun so that different values which give same item in all_sym now have the same value

Args:
all_fun (list):

list of strings containing all functions

all_sym (OrderedDict):

dictionary of sympy objects which can be accessed by their string representations.

Returns:

None

generation.utils.pprint_ntuple(nt)[source]

Printing function for memory diagnostics

Args:
nt (tuple):

tuple of memory statistics returned by psutil.virtual_memory()

Returns:

None

generation.utils.split_idx(Ntotal, r, indices_or_sections)[source]

Returns the rth set indices for numpy.array_split(a,indices_or_sections) where len(a) = Ntotal

Args:
Ntotal (int):

length of array to split

r (int):

rank whose indices are required

indices_or_sections (int):

how many parts to split array into

Returns:
i (list):

[min, max] index used by rank

generation.utils.using_mem(point='')[source]

Find and print current virtual memory usage

Args:
point (str):

string to print to identify where memory diagnostics calculated

Returns:

None

fitting

combine_DL

fitting.combine_DL.main(comp, likelihood, print_frequency=1000)[source]

Combine the description lengths of all functions of a given complexity, sort by this and save to file.

Args:
comp (int):

complexity of functions to consider

likelihood (fitting.likelihood object):

object containing data, likelihood functions and file paths

print_frequency (int, default=1000):

the status of the fits will be printed every print_frequency number of iterations

Returns:

None

fit_single

fitting.fit_single.fit_from_string(fun, basis_functions, likelihood, pmin=0, pmax=5, tmax=5, try_integration=False, verbose=False, Niter=30, Nconv=5, maxvar=20, log_opt=False, replace_floats=False, return_params=False)[source]

Run end-to-end fitting of function for a single function, given as a string. Note that this is not guaranteed to find the optimimum representation as a tree, so there could be a lower description-length representation of the function

Args:
fun (str):

String representation of the function to be fitted

basis_functions (list):

list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators

likelihood (fitting.likelihood object):

object containing data, likelihood functions and file paths

pmin (float, default=0.):

minimum value for each parameter to consider when generating initial guess

pmax (float, default=3.):

maximum value for each parameter to consider when generating initial guess

tmax (float, default=5.):

maximum time in seconds to run any one part of simplification procedure for a given function

try_integration (bool, default=False):

when likelihood requires integral, whether to try to analytically integrate (True) or just numerically integrate (False)

verbose (bool, default=True):

Whether to print results (True) or not (False)

Niter (int, default=30):

Maximum number of parameter optimisation iterations to attempt.

Nconv (int, default=5):

If we find Nconv solutions for the parameters which are within a logL of 0.5 of the best, we say we have converged and stop optimising parameters

maxvar (int):

The maximum number of variables which could appear in the function

log_opt (bool, default=False):

whether to optimise 1 and 2 parameter cases in log space

replace_floats (bool, default=False):

whether to replace any numbers found in the function with variables to optimise

return_params (bool, default=False):

whether to return the parameters of the maximum likelihood point

Returns:
negloglike (float):

the minimum value of -log(likelihood) (corresponding to the maximum likelihood)

DL (float):

the description length of this function

labels (list):

list of strings giving node labels of tree

params (optional, list):

the maximum likelihood parameters. Only returned if return_params is true

fitting.fit_single.single_function(labels, basis_functions, likelihood, pmin=0, pmax=5, tmax=5, try_integration=False, verbose=False, Niter=30, Nconv=5, log_opt=False, return_params=False)[source]

Run end-to-end fitting of function for a single function

Args:
labels (list):

list of strings giving node labels of tree

basis_functions (list):

list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators

likelihood (fitting.likelihood object):

object containing data, likelihood functions and file paths

pmin (float, default=0.):

minimum value for each parameter to consider when generating initial guess

pmax (float, default=3.):

maximum value for each parameter to consider when generating initial guess

tmax (float, default=5.):

maximum time in seconds to run any one part of simplification procedure for a given function

try_integration (bool, default=False):

when likelihood requires integral, whether to try to analytically integrate (True) or just numerically integrate (False)

verbose (bool, default=True):

Whether to print results (True) or not (False)

Niter (int, default=30):

Maximum number of parameter optimisation iterations to attempt.

Nconv (int, default=5):

If we find Nconv solutions for the parameters which are within a logL of 0.5 of the best, we say we have converged and stop optimising parameters

log_opt (bool, default=False):

whether to optimise 1 and 2 parameter cases in log space

return_params (bool, default=False):

whether to return the parameters of the maximum likelihood point

Returns:
negloglike (float):

the minimum value of -log(likelihood) (corresponding to the maximum likelihood)

DL (float):

the description length of this function

params (optional, list):

the maximum likelihood parameters. Only returned if return_params is true

fitting.fit_single.string_to_aifeyn(fun, basis_functions, maxvar=20, verbose=True, replace_floats=False)[source]

Takes a string defining a function and returns the AIFeyn term of complexity and the complexity of the function

Args:
fun (str):

String representation of the function to be fitted

basis_functions (list):

list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators

maxvar (int, default=20):

The maximum number of variables which could appear in the function

verbose (bool, default=True):

Whether to print results (True) or not (False)

replace_floats (bool, default=False):

whether to replace any numbers found in the function with variables to optimise

Returns:
aifeyn (float):

the contribution to description length from describing tree

complexity (int):

the number of nodes in the function

fitting.fit_single.tree_to_aifeyn(labels, basis_functions, verbose=True)[source]

Takes a list of labels defining a function and returns the AIFeyn term of complexity and the complexity of the function

Args:
labels (list):

list of strings giving node labels of tree

basis_functions (list):

list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators

verbose (bool, default=True):

Whether to print results (True) or not (False)

Returns:
aifeyn (float):

the contribution to description length from describing tree

complexity (int):

the number of nodes in the function

likelihood

class fitting.likelihood.CCLikelihood[source]

Bases: Likelihood

Likelihood class used to fit cosmic chronometer data. Should be used as a template for other likelihoods as all functions in this class are required in fitting functions.

get_pred(zp1, a, eq_numpy, **kwargs)[source]

Return the predicted H(z), which is the square root of the functions we are using.

Args:
zp1 (float or np.array):

1 + z for redshift z

a (list):

parameters to subsitute into equation considered

eq_numpy (numpy function):

function to use which gives H^2

Returns:
H (float or np.array):

the predicted Hubble parameter at redshifts supplied

negloglike(a, eq_numpy, **kwargs)[source]

Negative log-likelihood for a given function

Args:
a (list):

parameters to subsitute into equation considered

eq_numpy (numpy function):

function to use which gives H^2

Returns:
nll (float):
  • log(likelihood) for this function and parameters

class fitting.likelihood.GaussLikelihood(data_file, run_name, data_dir=None, fn_set='core_maths')[source]

Bases: Likelihood

Likelihood class used to fit a function directly using a Gaussian likelihood

Args:
data_file (str):

Name of the file containing the data to use

run_name (str):

The name to be associated with this likelihood, e.g. ‘my_esr_run’

data_dir (str, default=None):

The path containing the data and cov files

fn_set (str, default=’core_maths’):

The name of the function set to use with the likelihood. Must match one of those defined in generation.duplicate_checker

negloglike(a, eq_numpy, **kwargs)[source]

Negative log-likelihood for a given function.

Args:
a (list):

parameters to subsitute into equation considered

eq_numpy (numpy function):

function to use which gives y

Returns:
nll (float):
  • log(likelihood) for this function and parameters

class fitting.likelihood.Likelihood(data_file, cov_file, run_name, data_dir=None, fn_set='core_maths')[source]

Bases: object

Likelihood class used to fit a function directly

Args:
data_file (str):

Name of the file containing the data to use

cov_file (str):

Name of the file containing the errors/covariance on the data

run_name (str):

The name to be associated with this likelihood, e.g. ‘my_esr_run’

data_dir (str, default=None):

The path containing the data and cov files

fn_set (str, default=’core_maths’):

The name of the function set to use with the likelihood. Must match one of those defined in generation.duplicate_checker

clear_data()[source]

Clear data used for numerical integration (not required in most cases)

get_pred(x, a, eq_numpy, **kwargs)[source]

Return the predicted y(x)

Args:
x (float or np.array):

x value being used

a (list):

parameters to subsitute into equation considered

eq_numpy (numpy function):

function to use which gives H^2

Returns:
y (float or np.array):

the predicted y value at x supplied

run_sympify(fcn_i, **kwargs)[source]

Sympify a function

Args:
fcn_i (str):

string representing function we wish to fit to data

Returns:
fcn_i (str):

string representing function we wish to fit to data (with superfluous characters removed)

eq (sympy object):

sympy object representing function we wish to fit to data

integrated (bool, always False):

whether we analytically integrated the function (True) or not (False)

class fitting.likelihood.MSE(data_file, run_name, data_dir=None, fn_set='core_maths')[source]

Bases: Likelihood

Likelihood class used to fit a function directly using a MSE

IMPORTANT - MSE is NOT a likelihood in the probabilistic sense. It should not be used for MDL calculations as the answer will be nonesense since an uncertainty is required for MDL to have meaning.

Args:
data_file (str):

Name of the file containing the data to use

run_name (str):

The name to be associated with this likelihood, e.g. ‘my_esr_run’

data_dir (str, default=None):

The path containing the data and cov files

fn_set (str, default=’core_maths’):

The name of the function set to use with the likelihood. Must match one of those defined in generation.duplicate_checker

negloglike(a, eq_numpy, **kwargs)[source]

Negative log-likelihood for a given function. Here it is (y-ypred)^2 Note that this is technically not a log-likelihood, but the function name is required to be accessed by other functions.

Args:
a (list):

parameters to subsitute into equation considered

eq_numpy (numpy function):

function to use which gives y

Returns:
nll (float):
  • log(likelihood) for this function and parameters

class fitting.likelihood.MockLikelihood(nz, yfracerr, data_dir=None)[source]

Bases: Likelihood

Likelihood class used to fit mock cosmic chronometer data

Args:
nz (int):

number of mock redshifts to use

yfracerr (float):

the fractional uncertainty on the cosmic chronometer mock we are using

data_dir (str, default=None):

The path containing the data and cov files

get_pred(zp1, a, eq_numpy, **kwargs)[source]

Return the predicted H(z), which is the square root of the functions we are using.

Args:
zp1 (float or np.array):

1 + z for redshift z

a (list):

parameters to subsitute into equation considered

eq_numpy (numpy function):

function to use which gives H^2

Returns:
H (float or np.array):

the predicted Hubble parameter at redshifts supplied

negloglike(a, eq_numpy, **kwargs)[source]

Negative log-likelihood for a given function

Args:
a (list):

parameters to subsitute into equation considered

eq_numpy (numpy function):

function to use which gives H^2

Returns:
nll (float):
  • log(likelihood) for this function and parameters

class fitting.likelihood.PanthLikelihood[source]

Bases: Likelihood

Likelihood class used to fit Pantheon data

clear_data()[source]

Clear data used for numerical integration

get_pred(zp1, a, eq_numpy, integrated=False)[source]

Return the predicted distance modulus from the H^2 function supplied.

Args:
zp1 (float or np.array):

1 + z for redshift z

a (list):

parameters to subsitute into equation considered

eq_numpy (numpy function):

function to use which gives H^2

integrated (bool, default=False):

whether we previously analytically integrated the function (True) or not (False)

Returns:
mu (float or np.array):

the predicted distance modulus at redshifts supplied

negloglike(a, eq_numpy, integrated=False)[source]

Negative log-likelihood for a given function

Args:
a (list):

parameters to subsitute into equation considered

eq_numpy (numpy function):

function to use which gives H^2

Returns:
nll (float):
  • log(likelihood) for this function and parameters

run_sympify(fcn_i, tmax=5, try_integration=True)[source]

Sympify a function

Args:
fcn_i (str):

string representing function we wish to fit to data

tmax (float):

maximum time in seconds to attempt analytic integration

try_integration (bool, default=True):

as the likelihood requires an integral, whether to try to analytically integrate (True) or not (False)

Returns:
fcn_i (str):

string representing function we wish to fit to data (with superfluous characters removed)

eq (sympy object):

sympy object representing function we wish to fit to data

integrated (bool):

whether we we able to analytically integrate the function (True) or not (False)

class fitting.likelihood.PoissonLikelihood(data_file, run_name, data_dir=None, fn_set='core_maths')[source]

Bases: Likelihood

Likelihood class used to fit a function directly using a Poisson likelihood

Args:
data_file (str):

Name of the file containing the data to use

run_name (str):

The name to be associated with this likelihood, e.g. ‘my_esr_run’

data_dir (str, default=None):

The path containing the data and cov files

fn_set (str, default=’core_maths’):

The name of the function set to use with the likelihood. Must match one of those defined in generation.duplicate_checker

negloglike(a, eq_numpy, **kwargs)[source]

Negative log-likelihood for a given function. Here it is a Poisson

Args:
a (list):

parameters to subsitute into equation considered

eq_numpy (numpy function):

function to use which gives y

Returns:
nll (float):
  • log(likelihood) for this function and parameters

match

fitting.match.main(comp, likelihood, tmax=5, print_frequency=1000, try_integration=False)[source]

Apply results of fitting the unique functions to all functions and save to file

Args:
comp (int):

complexity of functions to consider

likelihood (fitting.likelihood object):

object containing data, likelihood functions and file paths

tmax (float, default=5.):

maximum time in seconds to run any one part of simplification procedure for a given function

print_frequency (int, default=1000):

the status of the fits will be printed every print_frequency number of iterations

try_integration (bool, default=False):

when likelihood requires integral, whether to try to analytically integrate (True) or just numerically integrate (False)

Returns:

None

plot

fitting.plot.main(comp, likelihood, tmax=5, try_integration=False, xscale='linear', yscale='linear')[source]

Plot best 50 functions at given complexity against data and save plot to file

Args:
comp (int):

complexity of functions to consider

likelihood (fitting.likelihood object):

object containing data, functions to convert SR expressions to variable of data and output path

tmax (float, default=5.):

maximum time in seconds to run any one part of simplification procedure for a given function

try_integration (bool, default=False):

when likelihood requires integral, whether to try to analytically integrate (True) or just numerically integrate (False)

xscale (str), default=’linear’):

Scaling for x-axis

yscale (str), default=’linear’):

Scaling for y-axis

Returns:

None

sympy_symbols

Script defining functions and symbols which can be used to interpret functions used in fitting functions

test_all

fitting.test_all.chi2_fcn(x, likelihood, eq_numpy, integrated, signs)[source]

Compute chi2 for a function

Args:
x (list):

parameters to use for function

likelihood (fitting.likelihood object):

object containing data and likelihood function

eq_numpy (numpy function):

function to pass to likelihood object to make prediction of y(x)

integrated (bool):

whether eq_numpy has already been integrated

signs (list):

each entry specifies whether than parameter should be optimised logarithmically. If None, then do nothing, if ‘+’ then optimise 10**x[i] and if ‘-’ then optimise -10**x[i]

Returns:
negloglike (float):
  • log(likelihood) for this function and parameters

fitting.test_all.get_functions(comp, likelihood, unique=True)[source]

Load all functions for a given complexity to use and distribute among ranks

Args:
comp (int):

complexity of functions to consider

likelihood (fitting.likelihood object):

object containing data, functions to convert SR expressions to variable of data and file path

unique (bool, default=True):

whether to load just the unique functions (True) or all functions (False)

Returns:
fcn_list (list):

list of strings representing functions to be used by given rank

data_start (int):

first index of function used by rank

data_end (int):

last index of function used by rank

fitting.test_all.main(comp, likelihood, tmax=5, pmin=0, pmax=3, print_frequency=50, try_integration=False, log_opt=False, Niter_params=[40, 60], Nconv_params=[-5, 20], ignore_previous_eqns=True)[source]

Optimise all functions for a given complexity and save results to file.

This can optimise in log-space, with separate +ve and -ve branch (except when there are >=3 params in which case it does it in linear)

The list of parameters, P, passed as Niter_params and Nconv_params compute these values, N, to be N = P[0] + P[1] * nparam + P[2] * nparam ** 2 + … where nparam is the number of parameters of the function. The order of the polynomial is determined by the length of P, so P can be arbirary in length.

Args:
comp (int):

complexity of functions to consider

likelihood (fitting.likelihood object):

object containing data, likelihood functions and file paths

tmax (float, default=5.):

maximum time in seconds to run any one part of simplification procedure for a given function

pmin (float, default=0.):

minimum value for each parameter to considered when generating initial guess

pmax (float, default=3.):

maximum value for each parameter to considered when generating initial guess

print_frequency (int, default=50):

the status of the fits will be printed every print_frequency number of iterations

try_integration (bool, default=False):

when likelihood requires integral, whether to try to analytically integrate (True) or just numerically integrate (False)

log_opt (bool, default=False):

whether to optimise 1 and 2 parameter cases in log space

Niter_params (list, default=[40, 60]):

Parameters determining maximum number of parameter optimisation iterations to attempt.

Nconv_params (list, default=[-5, 20]):

If we find Nconv solutions for the parameters which are within a logL of 0.5 of the best, we say we have converged and stop optimising parameters. These parameters determine Nconv.

ignore_previous_eqns (bool, default=True):

If we have seen an equation at lower complexity, whether to ignore the equation in this routine.

Returns:

None

fitting.test_all.optimise_fun(fcn_i, likelihood, tmax, pmin, pmax, comp=0, try_integration=False, log_opt=False, max_param=4, Niter_params=[40, 60], Nconv_params=[-5, 20], test_success=False, ignore_previous_eqns=True)[source]

Optimise the parameters of a function to fit data

The list of parameters, P, passed as Niter_params and Nconv_params compute these values, N, to be N = P[0] + P[1] * nparam + P[2] * nparam ** 2 + … where nparam is the number of parameters of the function. The order of the polynomial is determined by the length of P, so P can be arbirary in length.

Args:
fcn_i (str):

string representing function we wish to fit to data

likelihood (fitting.likelihood object):

object containing data and likelihood function

tmax (float):

maximum time in seconds to run any one part of simplification procedure for a given function

pmin (float):

minimum value for each parameter to consider when generating initial guess

pmax (float):

maximum value for each parameter to consider when generating initial guess

comp (float, default=0):

Complexity. Deafault of 0 because it is not provided when fitting a single function

try_integration (bool, default=False):

when likelihood requires integral, whether to try to analytically integrate (True) or just numerically integrate (False)

log_opt (bool, default=False):

whether to optimise 1 and 2 parameter cases in log space

max_param (int, default=4):

The maximum number of parameters considered. This sets the shapes of arrays used.

Niter_params (list, default=[40, 60]):

Parameters determining maximum number of parameter optimisation iterations to attempt.

Nconv_params (list, default=[-5, 20]):

If we find Nconv solutions for the parameters which are within a logL of 0.5 of the best, we say we have converged and stop optimising parameters. These parameters determine Nconv.

test_sucess (bool, default=False):

Whether to test whether the optimisation was successful using scipy’s criteria

ignore_previous_eqns (bool, default=True):

If we have seen an equation at lower complexity, whether to ignore the equation in this routine.

Returns:
chi2_i (float):

the minimum value of -log(likelihood) (corresponding to the maximum likelihood)

params (list):

the maximum likelihood values of the parameters

test_all_Fisher

fitting.test_all_Fisher.convert_params(fcn_i, eq, integrated, theta_ML, likelihood, negloglike, max_param=4)[source]

Compute Fisher, correct MLP and find parametric contirbution to description length for single function

Args:
fcn_i (str):

string representing function we wish to fit to data

eq (sympy object):

sympy object for the function we wish to fit to data

integrated (bool):

whether eq_numpy has already been integrated

theta_ML (list):

the maximum likelihood values of the parameters

likelihood (fitting.likelihood object):

object containing data, likelihood functions and file paths

negloglike (float):

the minimum log-likelihood for this function

max_param (int, default=4):

The maximum number of parameters considered. This sets the shapes of arrays used.

Returns:
params (list):

the corrected maximum likelihood values of the parameters

negloglike (float):

the corrected minimum log-likelihood for this function

deriv (list):

flattened version of the Hessian of -log(likelihood) at the maximum likelihood point

codelen (float):

the parameteric contribution to the description length of this function

fitting.test_all_Fisher.load_loglike(comp, likelihood, data_start, data_end, split=True)[source]

Load results of optimisation completed by test_all.py

Args:
comp (int):

complexity of functions to consider

likelihood (fitting.likelihood object):

object containing data, likelihood functions and file paths

data_start (int):

minimum index of results we want to load (only if split=True)

data_end (int):

maximum index of results we want to load (only if split=True)

split (bool, deault=True):

whether to return subset of results given by data_start and data_end (True) or all data (False)

Returns:
negloglike (list):

list of minimum log-likelihoods

params (np.ndarray):

list of parameters at maximum likelihood points. Shape = (nfun, nparam).

fitting.test_all_Fisher.main(comp, likelihood, tmax=5, print_frequency=50, try_integration=False)[source]

Compute Fisher, correct MLP and find parametric contirbution to description length for all functions and save to file

Args:
comp (int):

complexity of functions to consider

likelihood (fitting.likelihood object):

object containing data, likelihood functions and file paths

tmax (float, default=5.):

maximum time in seconds to run any one part of simplification procedure for a given function

print_frequency (int, default=50):

the status of the fits will be printed every print_frequency number of iterations

try_integration (bool, default=False):

when likelihood requires integral, whether to try to analytically integrate (True) or just numerically integrate (False)

Returns:

None

plotting

plot

plotting.plot.pareto_plot(dirname, savename, do_DL=True, do_logL=True)[source]

Plot the pareto front using the files in a given directory

Args:
dirname (str):

The directory name to consider.

savename (str):

File name to save file (within dirname)

do_DL (bool, default=True):

Whether to plot the description length in the pareto front

do_logL (bool, default=True):

Whether to plot the log-likelihood in the pareto front