API

generation

duplicate_checker

generation.duplicate_checker.main(runname, compl, track_memory=False, search_tmax=60, expand_tmax=1, seed=1234)[source]

Run the generation of functions for a given complexity and set of basis functions

Args:

runname (str):: name of run, which defines the basis functions used
compl (int):: complexity of functions to consider
track_memory (bool, default=True):: whether to compute and print memory statistics (True) or not (False)
search_tmax (float, default=60.):: maximum time in seconds to run any one part of simplification procedure for a given function
expand_tmax (float, default=1.):: maximum time in seconds to run any one part of expand/simplify procedure for a given function
seed (int, default=1234):: seed to set random number generator for shuffling functions (used to prevent one rank having similar, hard to simplify functions)

Returns:

None

custom_printer

class generation.custom_printer.ESRPrinter(settings=None)[source]

Bases: Printer

A Printer for generating readable representation of most SymPy classes as required for ExhaustiveSR.

Only difference to sympy’s StrPrinter is to use pow(.,.) instead of ** unless integer powers used

emptyPrinter(expr)[source]

parenthesize(item, level, strict=False)[source]

printmethod: str = '_sympystr'

stringify(args, sep, level=0)[source]

class generation.custom_printer.ESRReprPrinter(settings=None)[source]

Bases: ESRPrinter

(internal) – see sstrrepr

generator

class generation.generator.DecoratedNode(fun, basis_functions, parent_op=None, parent=None)[source]

Bases: object

count_nodes(basis_functions)[source]

Basis_functions (list):: list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators

from_node_list(idx, nodes, basis_functions, parent_op=None, parent=None)[source]

get_lineage()[source]

get_sibling_lineage()[source]

get_siblings()[source]

is_unity()[source]

to_list(basis_functions)[source]

class generation.generator.Node(t)[source]

Bases: object

assign_op(op)[source]

copy()[source]

is_used()[source]

generation.generator.aifeyn_complexity(tree, param_list)[source]

Compute contribution to description length from describing tree

Args:

tree (list):: list of strings giving node labels of tree
param_list (list):: list of strings of all possible parameter names

Returns:

aifeyn (float):: the contribution to description length from describing tree

generation.generator.check_operators(nodes, basis_functions)[source]

Check whether all operators in the tree are in the basis

Args:

nodes (DecoratedNode):: Node representation of the function tree
basis_functions (list):: list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators

Returns:

all_in_basis (bool):: Whether all functions in tree are in basis

generation.generator.check_tree(s)[source]

Given a candidate string of 0, 1 and 2s, see whether one can make a function out of this

Args:

s (str):: string comprised of 0, 1 and 2 representing tree of nullary, unary and binary nodes

Returns:

success (bool):: whether candidate string can form a valid tree (True) or not (False)
part_considered (str):: string of length <= s, where s[:len(part_considered)] = part_considered
tree (list):: list of Node objects corresponding to string s

generation.generator.find_additional_trees(tree, labels, basis_functions)[source]

For a given tree, try to find all simpler representations of the function by combining sums, exponentials and powers

Args:

tree (list):: list of Node objects corresponding to tree of function
labels (list):: list of strings giving node labels of tree
basis_functions (list):: list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators

Returns:

new_tree (list):: list of equivalent trees, given as lists of Node objects
new_labels (list):: list of lists of strings giving node labels of new_tree

generation.generator.generate_equations(compl, basis_functions, dirname)[source]

Generate all equations at a given complexity for a set of basis functions and save results to file

Args:

compl (int):: complexity of functions to consider
basis_functions (list):: list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators
dirname (str):: directory path to save results in

Returns:

all_fun (list):: list of strings containing all functions generated
extra_orig (list):: list of strings containing functions generated by combining sums, exponentials and powers of the functions in all_fun as they appear in all_fun

generation.generator.get_allowed_shapes(compl)[source]

Find the shapes of all allowed trees containing compl nodes

Args:

compl (int):: complexity of tree = number of nodes

Returns:

cand (list):: list of strings comprised of 0, 1 and 2 representing valid trees of nullary, unary and binary nodes

generation.generator.is_float(string)[source]

Determine whether a string is a float or not

Args:

string (str):: The string to check

Returns:

bool: Whether the string is a float (True) or not (False).

generation.generator.labels_to_shape(labels, basis_functions)[source]

Find the representation of the shape of a tree given its labels

Args:

labels (list):: list of strings giving node labels of tree
basis_functions (list):: list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators

Returns:

s (str):: string comprised of 0, 1 and 2 representing tree of nullary, unary and binary nodes

generation.generator.node_to_string(idx, tree, labels)[source]

Convert a tree with labels into a string giving function

Args:

idx (int):: index of tree to consider
tree (list):: list of Node objects corresponding to the tree
labels (list):: list of strings giving node labels of tree

Returns:

Function as a string

generation.generator.shape_to_functions(s, basis_functions)[source]

Find all possible functions formed from the given list of 0s, 1s and 2s defining a tree and basis functions

Args:

s (str):: string comprised of 0, 1 and 2 representing tree of nullary, unary and binary nodes
basis_functions (list):: list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators

Returns:

all_fun (list):: list of strings containing all functions generated directly from tree
all_tree (list):: list of lists of Node objects corresponding to the trees of functions in all_fun
extra_fun (list):: list of strings containing functions generated by combining sums, exponentials and powers of the functions in all_fun
extra_tree (list):: list of lists of Node objects corresponding to the trees of functions in extra_fun
extra_orig (list):: list of strings corresponding to original versions of extra_fun, as found in all_fun

generation.generator.string_to_expr(s, kern=False, evaluate=False, locs=None)[source]

Convert a string giving function into a sympy object

Args:

s (str):: string representation of the function considered
kern (bool):: whether to use sympy’s kernS function or sympify
evaluate (bool):: whether to use powsimp, factor and subs
locs (dict):: dictionary of string:sympy objects. If None, will create here

Returns:

expr (sympy object):: expression corresponding to s

generation.generator.string_to_node(s, basis_functions, locs=None, evalf=False, allow_eval=True, check_ops=False)[source]

Convert a string giving function into a tree with labels

Args:

s (str):: string representation of the function considered
basis_functions (list):: list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators
locs (dict):: dictionary of string:sympy objects. If None, will create here
evalf (bool):: whether to run evalf() on function (default=False)
allow_eval (bool, default=True):: whether to run the (kernS=False and evaluate=True) option
check_ops (bool, default=False):: whether to check all operators appear in basis functions

Returns:

tree (list):: list of Node objects corresponding to the tree
labels (list):: list of strings giving node labels of tree

generation.generator.update_sums(tree, labels, try_idx, basis_functions)[source]

Try to combine sums to make simpler representations of functions

Args:

tree (list):: list of Node objects corresponding to tree of function
labels (list):: list of strings giving node labels of tree
try_idx (int):: when we have multiple substituions we can attempt, this indicates which one to try
basis_functions (list):: list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators

Returns:

new_labels (list):: list of strings giving node labels of new tree
new_shape (list):: list of 0, 1 and 2 representing whether nodes in new tree are nullary, unary or binary
nadded (int):: number of new functions added

generation.generator.update_tree(tree, labels, try_idx, basis_functions)[source]

Try to combine exponentials and powers to make simpler representations of functions

Args:

tree (list):: list of Node objects corresponding to tree of function
labels (list):: list of strings giving node labels of tree
try_idx (int):: when we have multiple substituions we can attempt, this indicates which one to try
basis_functions (list):: list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators

Returns:

new_labels (list):: list of strings giving node labels of new tree
new_shape (list):: list of 0, 1 and 2 representing whether nodes in new tree are nullary, unary or binary
nadded (int):: number of new functions added

simplifier

exception generation.simplifier.TimeoutException[source]: Bases: Exception

generation.simplifier.check_results(dirname, compl, tmax=10)[source]

Check that all functions can be recovered by applying the subsitutions to the unique functions. If not, define a new unique function and save results to file.

Args:

dirname (str):: name of directory containing all the functions to consider
compl (int):: complexity of functions to consider
tmax (float, default=10.):: maximum time in seconds to run the substitutions

Returns:

None

generation.simplifier.convert_params(p_meas, fish_meas, inv_subs, n=4)[source]

Convert parameters from those in unique function to those in actual function

Args:

p_meas (list):: list of measured parameters in unique function
fish_meas (list):: flattened version of the Hessian of -log(likelihood) at the maximum likelihood point
inv_subs (list):: list of substitutions required to convert between all and unique functions
n (int, default=4):: the number of dimensions of the array from which fish_meas was computed using

Returns:

p_new (list):: list of parameters for the actual function
diag_fish (np.array):: the diagonal entries of the Fisher matrix of the actual function at the maximum likelihood point

generation.simplifier.count_params(all_fun, max_param)[source]

Count the number of free parameters in each member of a list of functions

Args:

all_fun (list):: list of strings containing functions
max_param (int):: maximum number of free parameters in any equation in all_fun

Returns:

nparam (np.array):: array of ints containing number of free parameters in corresponding member of all_fun

generation.simplifier.do_sympy(all_fun, all_sym, compl, search_tmax, expand_tmax, dirname, track_memory=False)[source]

Run the duplicate checking procedure

Args:

all_fun (list):: list of strings containing all functions
all_sym (OrderedDict):: dictionary of sympy objects which can be accessed by their string representations.
compl (int):
search_tmax (float, default=1.):: maximum time in seconds to run any one part of simplification procedure for a given function
expand_tmax (float, default=1.):: maximum time in seconds to run any one part of expand/simplify procedure for a given function
dirname (str):: directory path to save results in
track_memory (bool, default=True):: whether to compute and print memory statistics (True) or not (False)

Returns:

all_fun (list):: list of strings containing all (updated) functions
all_sym (list):: dictionary of (updated) sympy objects which can be accessed by their string representations.
count (int):: number of rounds of optimisation which were performed

generation.simplifier.expand_or_factor(all_sym, tmax=1, method='expand')[source]

Run the sympy expand or factor functions

Args:

all_sym (OrderedDict):: dictionary of sympy objects which can be accessed by their string representations.
tmax (float, default=1.):: maximum time in seconds to run any one part of expand/simplify procedure for a given function
method (str, default=’expand’):: whether to run expand (‘expand’) or simplify (‘simplify’). All other options are ignored

Returns:

all_sym (OrderedDict):: dictionary of (updated) sympy objects which can be accessed by their string representations.

generation.simplifier.get_all_dup(max_param)[source]

Finds self-inverse transformations of parameters, to be used in simplify_inv_subs(inv_subs, all_dup)

Args:

max_param (int):: maximum number of parameters to consider

Returns:

all_dup (list):: list of dictionaries giving subsitutions which are self-inverse

generation.simplifier.get_max_param(all_fun, verbose=True)[source]

Find maximum number of free parameters in list of functions

Args:

all_fun (list):: list of strings containing functions
verbose (bool, default=True):: Whether to print result (True) or not (False)

Returns:

max_param (int):: maximum number of free parameters in any equation in all_fun

generation.simplifier.initial_sympify(all_fun, max_param, verbose=True, parallel=True, track_memory=False, save_sympy=True)[source]

Convert list of strings of functions into list of sympy objects

Args:

all_fun (list):: list of strings containing functions
max_param (int):: maximum number of free parameters in any equation in all_fun
verbose (bool, default=True):: whether to print progress (True) or not (False)
parallel (bool, default=True):: whether to split equations amongst ranks (True) or each equation considered by all ranks (False)
track_memory (bool, default=True):: whether to compute and print memory statistics (True) or not (False)
save_sympy (bool, default=True):: whether to return sympy objects (True) or not (False)

Returns:

str_fun (list):: list of strings containing functions
sym_fun (OrderedDict):: dictionary of sympy objects which can be accessed by their string representations. If save_sympy is False, then sym_fun is None.

generation.simplifier.load_subs(fname, max_param, use_sympy=True, bcast_res=True)[source]

Load the subsitutions required to convert between all and unique functions

Args:

fname (str):: file name containing the subsitutions
max_param (int):: maximum number of parameters to consider
use_sympy (bool, default=True):: whether to convert substituions to sympy objects (True) or leave as strings (False)
bcast_res (bool, default=True):: whether to allow all ranks to have the substitutions (True) or just the 0th rank (False)

Returns:

all_subs (list):: list of substitutions required to convert between all and unique functions. Each item is either a dictionary with sympy objects as keys and values (use_sympy=True) or a string version of this dictionary (use_sympy=False). If bcast_res=True, then all ranks have this list, otherwise only rank 0 has this list and all other ranks return None.

generation.simplifier.make_changes(all_fun, all_sym, all_inv_subs, str_fun, sym_fun, inv_subs_fun)[source]

Update global variables of functions and symbolic expressions by combining rank calculations

Args:

all_fun (list):: list of strings containing all functions
all_sym (list):: list of sympy objects containing all functions
all_inv_subs (list):: list of dictionaries giving subsitutions to be applied to all functions
str_fun (list):: list of strings containing functions considered by rank
sym_fun (list):: list of sympy objects containing functions considered by rank
inv_subs_fun (list):: list of dictionaries giving subsitutions to be applied to functions considered by rank

Returns:

all_fun:: list of strings containing all (updated) functions
all_sym (list):: list of sympy objects containing all (updated) functions
all_inv_subs:: list of dictionaries giving subsitutions to be applied to all (updated) functions

generation.simplifier.simplify_inv_subs(inv_subs, all_dup)[source]

Find if two consecutive {a0: -a0} or {a0: a1, a1: a0} or {a0: 1/a0} and then remove both of these

Args:

inv_subs (list):: list of dictionaries giving subsitutions to check
all_dup (list):: list of dictionaries giving subsitutions which are self-inverse

Returns:

all_subs (list):: list of dictionaries giving subsitutions without consecutive self-inverses

generation.simplifier.sympy_simplify(all_fun, all_sym, all_inv_subs, max_param, expand_fun=True, tmax=1, check_perm=False)[source]

Simplify equations and find duplicates.

Args:

all_fun (list):: list of strings containing all functions
all_sym (list):: list of sympy objects containing all functions
all_inv_subs (list):: list of dictionaries giving subsitutions to be applied to all functions
max_param (int):: maximum number of free parameters in any equation in all_fun
expand_fun (bool, default=True):: whether to run the sympy expand options (True) or not (False)
tmax (float, default=1.):: maximum time in seconds to run any one part of simplification procedure for a given function
check_perm (bool, default=False):: whether to check all possible permutations and inverses of constants (True) or not (False)

Returns:

all_fun:: list of strings containing all (updated) functions
all_sym (list):: list of sympy objects containing all (updated) functions
all_inv_subs:: list of dictionaries giving subsitutions to be applied to all (updated) functions

generation.simplifier.time_limit(seconds)[source]

Check function call does not exceed allotted time

Args:

seconds (float):: maximum time function can run in seconds

Raises:

TimeoutException if time exceeds seconds

utils

generation.utils.get_match_indexes(a, b)[source]

Returns indices in a of items in b

Args:

a (list):: list of values whose index in b we wish to determine
b (list):: list of values whose indices we wish to find

Returns:

result (list):: indices where corresponding value of a appears in b

generation.utils.get_unique_indexes(l)[source]

Find the indices of the unique items in a list

Args:

l (list):: list from which we want to find unique indices

Returns:

result (OrderedDict):: dictionary which returns index of unique item in l, accessed by unique item
match (dict):: dictionary which returns index of unique item in result, accessed by unique item

generation.utils.locals_size(loc)[source]

Find and print the total memory used by locals()

Args:

loc (dict):: dictionary of locals (obtained calling locals() in another script)

Returns:

None

generation.utils.merge_keys(all_fun, all_sym)[source]

Convert all_fun so that different values which give same item in all_sym now have the same value

Args:

all_fun (list):: list of strings containing all functions
all_sym (OrderedDict):: dictionary of sympy objects which can be accessed by their string representations.

Returns:

None

generation.utils.pprint_ntuple(nt)[source]

Printing function for memory diagnostics

Args:

nt (tuple):: tuple of memory statistics returned by psutil.virtual_memory()

Returns:

None

generation.utils.split_idx(Ntotal, r, indices_or_sections)[source]

Returns the rth set indices for numpy.array_split(a,indices_or_sections) where len(a) = Ntotal

Args:

Ntotal (int):: length of array to split
r (int):: rank whose indices are required
indices_or_sections (int):: how many parts to split array into

Returns:

i (list):: [min, max] index used by rank

generation.utils.using_mem(point='')[source]

Find and print current virtual memory usage

Args:

point (str):: string to print to identify where memory diagnostics calculated

Returns:

None

fitting

combine_DL

fitting.combine_DL.main(comp, likelihood, print_frequency=1000)[source]

Combine the description lengths of all functions of a given complexity, sort by this and save to file.

Args:

comp (int):: complexity of functions to consider
likelihood (fitting.likelihood object):: object containing data, likelihood functions and file paths
print_frequency (int, default=1000):: the status of the fits will be printed every print_frequency number of iterations

Returns:

None

fit_single

fitting.fit_single.fit_from_string(fun, basis_functions, likelihood, pmin=0, pmax=5, tmax=5, try_integration=False, verbose=False, Niter=30, Nconv=5, maxvar=20, log_opt=False, replace_floats=False, return_params=False)[source]

Run end-to-end fitting of function for a single function, given as a string. Note that this is not guaranteed to find the optimimum representation as a tree, so there could be a lower description-length representation of the function

Args:

fun (str):: String representation of the function to be fitted
basis_functions (list):: list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators
likelihood (fitting.likelihood object):: object containing data, likelihood functions and file paths
pmin (float, default=0.):: minimum value for each parameter to consider when generating initial guess
pmax (float, default=3.):: maximum value for each parameter to consider when generating initial guess
tmax (float, default=5.):: maximum time in seconds to run any one part of simplification procedure for a given function
try_integration (bool, default=False):: when likelihood requires integral, whether to try to analytically integrate (True) or just numerically integrate (False)
verbose (bool, default=True):: Whether to print results (True) or not (False)
Niter (int, default=30):: Maximum number of parameter optimisation iterations to attempt.
Nconv (int, default=5):: If we find Nconv solutions for the parameters which are within a logL of 0.5 of the best, we say we have converged and stop optimising parameters
maxvar (int):: The maximum number of variables which could appear in the function
log_opt (bool, default=False):: whether to optimise 1 and 2 parameter cases in log space
replace_floats (bool, default=False):: whether to replace any numbers found in the function with variables to optimise
return_params (bool, default=False):: whether to return the parameters of the maximum likelihood point

Returns:

negloglike (float):: the minimum value of -log(likelihood) (corresponding to the maximum likelihood)
DL (float):: the description length of this function
labels (list):: list of strings giving node labels of tree
params (optional, list):: the maximum likelihood parameters. Only returned if return_params is true

fitting.fit_single.single_function(labels, basis_functions, likelihood, pmin=0, pmax=5, tmax=5, try_integration=False, verbose=False, Niter=30, Nconv=5, log_opt=False, return_params=False)[source]

Run end-to-end fitting of function for a single function

Args:

labels (list):: list of strings giving node labels of tree
basis_functions (list):: list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators
likelihood (fitting.likelihood object):: object containing data, likelihood functions and file paths
pmin (float, default=0.):: minimum value for each parameter to consider when generating initial guess
pmax (float, default=3.):: maximum value for each parameter to consider when generating initial guess
tmax (float, default=5.):: maximum time in seconds to run any one part of simplification procedure for a given function
try_integration (bool, default=False):: when likelihood requires integral, whether to try to analytically integrate (True) or just numerically integrate (False)
verbose (bool, default=True):: Whether to print results (True) or not (False)
Niter (int, default=30):: Maximum number of parameter optimisation iterations to attempt.
Nconv (int, default=5):: If we find Nconv solutions for the parameters which are within a logL of 0.5 of the best, we say we have converged and stop optimising parameters
log_opt (bool, default=False):: whether to optimise 1 and 2 parameter cases in log space
return_params (bool, default=False):: whether to return the parameters of the maximum likelihood point

Returns:

negloglike (float):: the minimum value of -log(likelihood) (corresponding to the maximum likelihood)
DL (float):: the description length of this function
params (optional, list):: the maximum likelihood parameters. Only returned if return_params is true

fitting.fit_single.string_to_aifeyn(fun, basis_functions, maxvar=20, verbose=True, replace_floats=False)[source]

Takes a string defining a function and returns the AIFeyn term of complexity and the complexity of the function

Args:

fun (str):: String representation of the function to be fitted
basis_functions (list):: list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators
maxvar (int, default=20):: The maximum number of variables which could appear in the function
verbose (bool, default=True):: Whether to print results (True) or not (False)
replace_floats (bool, default=False):: whether to replace any numbers found in the function with variables to optimise

Returns:

aifeyn (float):: the contribution to description length from describing tree
complexity (int):: the number of nodes in the function

fitting.fit_single.tree_to_aifeyn(labels, basis_functions, verbose=True)[source]

Takes a list of labels defining a function and returns the AIFeyn term of complexity and the complexity of the function

Args:

labels (list):: list of strings giving node labels of tree
basis_functions (list):: list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators
verbose (bool, default=True):: Whether to print results (True) or not (False)

Returns:

aifeyn (float):: the contribution to description length from describing tree
complexity (int):: the number of nodes in the function

likelihood

class fitting.likelihood.CCLikelihood[source]

Bases: Likelihood

Likelihood class used to fit cosmic chronometer data. Should be used as a template for other likelihoods as all functions in this class are required in fitting functions.

get_pred(zp1, a, eq_numpy, **kwargs)[source]

Return the predicted H(z), which is the square root of the functions we are using.

Args:

zp1 (float or np.array):: 1 + z for redshift z
a (list):: parameters to subsitute into equation considered
eq_numpy (numpy function):: function to use which gives H^2

Returns:

H (float or np.array):: the predicted Hubble parameter at redshifts supplied

negloglike(a, eq_numpy, **kwargs)[source]

Negative log-likelihood for a given function

Args:

a (list):: parameters to subsitute into equation considered
eq_numpy (numpy function):: function to use which gives H^2

Returns:

nll (float):

log(likelihood) for this function and parameters

class fitting.likelihood.GaussLikelihood(data_file, run_name, data_dir=None, fn_set='core_maths')[source]

Bases: Likelihood

Likelihood class used to fit a function directly using a Gaussian likelihood

Args:

data_file (str):: Name of the file containing the data to use
run_name (str):: The name to be associated with this likelihood, e.g. ‘my_esr_run’
data_dir (str, default=None):: The path containing the data and cov files
fn_set (str, default=’core_maths’):: The name of the function set to use with the likelihood. Must match one of those defined in generation.duplicate_checker

negloglike(a, eq_numpy, **kwargs)[source]

Negative log-likelihood for a given function.

Args:

a (list):: parameters to subsitute into equation considered
eq_numpy (numpy function):: function to use which gives y

Returns:

nll (float):

log(likelihood) for this function and parameters

class fitting.likelihood.Likelihood(data_file, cov_file, run_name, data_dir=None, fn_set='core_maths')[source]

Bases: object

Likelihood class used to fit a function directly

Args:

data_file (str):: Name of the file containing the data to use
cov_file (str):: Name of the file containing the errors/covariance on the data
run_name (str):: The name to be associated with this likelihood, e.g. ‘my_esr_run’
data_dir (str, default=None):: The path containing the data and cov files
fn_set (str, default=’core_maths’):: The name of the function set to use with the likelihood. Must match one of those defined in generation.duplicate_checker

clear_data()[source]: Clear data used for numerical integration (not required in most cases)

get_pred(x, a, eq_numpy, **kwargs)[source]

Return the predicted y(x)

Args:

x (float or np.array):: x value being used
a (list):: parameters to subsitute into equation considered
eq_numpy (numpy function):: function to use which gives H^2

Returns:

y (float or np.array):: the predicted y value at x supplied

run_sympify(fcn_i, **kwargs)[source]

Sympify a function

Args:

fcn_i (str):: string representing function we wish to fit to data

Returns:

fcn_i (str):: string representing function we wish to fit to data (with superfluous characters removed)
eq (sympy object):: sympy object representing function we wish to fit to data
integrated (bool, always False):: whether we analytically integrated the function (True) or not (False)

class fitting.likelihood.MSE(data_file, run_name, data_dir=None, fn_set='core_maths')[source]

Bases: Likelihood

Likelihood class used to fit a function directly using a MSE

IMPORTANT - MSE is NOT a likelihood in the probabilistic sense. It should not be used for MDL calculations as the answer will be nonesense since an uncertainty is required for MDL to have meaning.

Args:

data_file (str):: Name of the file containing the data to use
run_name (str):: The name to be associated with this likelihood, e.g. ‘my_esr_run’
data_dir (str, default=None):: The path containing the data and cov files
fn_set (str, default=’core_maths’):: The name of the function set to use with the likelihood. Must match one of those defined in generation.duplicate_checker

negloglike(a, eq_numpy, **kwargs)[source]

Negative log-likelihood for a given function. Here it is (y-ypred)^2 Note that this is technically not a log-likelihood, but the function name is required to be accessed by other functions.

Args:

a (list):: parameters to subsitute into equation considered
eq_numpy (numpy function):: function to use which gives y

Returns:

nll (float):

log(likelihood) for this function and parameters

class fitting.likelihood.MockLikelihood(nz, yfracerr, data_dir=None)[source]

Bases: Likelihood

Likelihood class used to fit mock cosmic chronometer data

Args:

nz (int):: number of mock redshifts to use
yfracerr (float):: the fractional uncertainty on the cosmic chronometer mock we are using
data_dir (str, default=None):: The path containing the data and cov files

get_pred(zp1, a, eq_numpy, **kwargs)[source]

Return the predicted H(z), which is the square root of the functions we are using.

Args:

zp1 (float or np.array):: 1 + z for redshift z
a (list):: parameters to subsitute into equation considered
eq_numpy (numpy function):: function to use which gives H^2

Returns:

H (float or np.array):: the predicted Hubble parameter at redshifts supplied

negloglike(a, eq_numpy, **kwargs)[source]

Negative log-likelihood for a given function

Args:

a (list):: parameters to subsitute into equation considered
eq_numpy (numpy function):: function to use which gives H^2

Returns:

nll (float):

log(likelihood) for this function and parameters

class fitting.likelihood.PanthLikelihood[source]

Bases: Likelihood

Likelihood class used to fit Pantheon data

clear_data()[source]: Clear data used for numerical integration

get_pred(zp1, a, eq_numpy, integrated=False)[source]

Return the predicted distance modulus from the H^2 function supplied.

Args:

zp1 (float or np.array):: 1 + z for redshift z
a (list):: parameters to subsitute into equation considered
eq_numpy (numpy function):: function to use which gives H^2
integrated (bool, default=False):: whether we previously analytically integrated the function (True) or not (False)

Returns:

mu (float or np.array):: the predicted distance modulus at redshifts supplied

negloglike(a, eq_numpy, integrated=False)[source]

Negative log-likelihood for a given function

Args:

a (list):: parameters to subsitute into equation considered
eq_numpy (numpy function):: function to use which gives H^2

Returns:

nll (float):

log(likelihood) for this function and parameters

run_sympify(fcn_i, tmax=5, try_integration=True)[source]

Sympify a function

Args:

fcn_i (str):: string representing function we wish to fit to data
tmax (float):: maximum time in seconds to attempt analytic integration
try_integration (bool, default=True):: as the likelihood requires an integral, whether to try to analytically integrate (True) or not (False)

Returns:

fcn_i (str):: string representing function we wish to fit to data (with superfluous characters removed)
eq (sympy object):: sympy object representing function we wish to fit to data
integrated (bool):: whether we we able to analytically integrate the function (True) or not (False)

class fitting.likelihood.PoissonLikelihood(data_file, run_name, data_dir=None, fn_set='core_maths')[source]

Bases: Likelihood

Likelihood class used to fit a function directly using a Poisson likelihood

Args:

data_file (str):: Name of the file containing the data to use
run_name (str):: The name to be associated with this likelihood, e.g. ‘my_esr_run’
data_dir (str, default=None):: The path containing the data and cov files
fn_set (str, default=’core_maths’):: The name of the function set to use with the likelihood. Must match one of those defined in generation.duplicate_checker

negloglike(a, eq_numpy, **kwargs)[source]

Negative log-likelihood for a given function. Here it is a Poisson

Args:

a (list):: parameters to subsitute into equation considered
eq_numpy (numpy function):: function to use which gives y

Returns:

nll (float):

log(likelihood) for this function and parameters

match

fitting.match.main(comp, likelihood, tmax=5, print_frequency=1000, try_integration=False)[source]

Apply results of fitting the unique functions to all functions and save to file

Args:

comp (int):: complexity of functions to consider
likelihood (fitting.likelihood object):: object containing data, likelihood functions and file paths
tmax (float, default=5.):: maximum time in seconds to run any one part of simplification procedure for a given function
print_frequency (int, default=1000):: the status of the fits will be printed every print_frequency number of iterations
try_integration (bool, default=False):: when likelihood requires integral, whether to try to analytically integrate (True) or just numerically integrate (False)

Returns:

None

plot

fitting.plot.main(comp, likelihood, tmax=5, try_integration=False, xscale='linear', yscale='linear')[source]

Plot best 50 functions at given complexity against data and save plot to file

Args:

comp (int):: complexity of functions to consider
likelihood (fitting.likelihood object):: object containing data, functions to convert SR expressions to variable of data and output path
tmax (float, default=5.):: maximum time in seconds to run any one part of simplification procedure for a given function
try_integration (bool, default=False):: when likelihood requires integral, whether to try to analytically integrate (True) or just numerically integrate (False)
xscale (str), default=’linear’):: Scaling for x-axis
yscale (str), default=’linear’):: Scaling for y-axis

Returns:

None

sympy_symbols

Script defining functions and symbols which can be used to interpret functions used in fitting functions

test_all

fitting.test_all.chi2_fcn(x, likelihood, eq_numpy, integrated, signs)[source]

Compute chi2 for a function

Args:

x (list):: parameters to use for function
likelihood (fitting.likelihood object):: object containing data and likelihood function
eq_numpy (numpy function):: function to pass to likelihood object to make prediction of y(x)
integrated (bool):: whether eq_numpy has already been integrated
signs (list):: each entry specifies whether than parameter should be optimised logarithmically. If None, then do nothing, if ‘+’ then optimise 10**x[i] and if ‘-’ then optimise -10**x[i]

Returns:

negloglike (float):

log(likelihood) for this function and parameters

fitting.test_all.get_functions(comp, likelihood, unique=True)[source]

Load all functions for a given complexity to use and distribute among ranks

Args:

comp (int):: complexity of functions to consider
likelihood (fitting.likelihood object):: object containing data, functions to convert SR expressions to variable of data and file path
unique (bool, default=True):: whether to load just the unique functions (True) or all functions (False)

Returns:

fcn_list (list):: list of strings representing functions to be used by given rank
data_start (int):: first index of function used by rank
data_end (int):: last index of function used by rank

fitting.test_all.main(comp, likelihood, tmax=5, pmin=0, pmax=3, print_frequency=50, try_integration=False, log_opt=False, Niter_params=[40, 60], Nconv_params=[-5, 20], ignore_previous_eqns=True)[source]

Optimise all functions for a given complexity and save results to file.

This can optimise in log-space, with separate +ve and -ve branch (except when there are >=3 params in which case it does it in linear)

The list of parameters, P, passed as Niter_params and Nconv_params compute these values, N, to be N = P[0] + P[1] * nparam + P[2] * nparam ** 2 + … where nparam is the number of parameters of the function. The order of the polynomial is determined by the length of P, so P can be arbirary in length.

Args:

comp (int):: complexity of functions to consider
likelihood (fitting.likelihood object):: object containing data, likelihood functions and file paths
tmax (float, default=5.):: maximum time in seconds to run any one part of simplification procedure for a given function
pmin (float, default=0.):: minimum value for each parameter to considered when generating initial guess
pmax (float, default=3.):: maximum value for each parameter to considered when generating initial guess
print_frequency (int, default=50):: the status of the fits will be printed every print_frequency number of iterations
try_integration (bool, default=False):: when likelihood requires integral, whether to try to analytically integrate (True) or just numerically integrate (False)
log_opt (bool, default=False):: whether to optimise 1 and 2 parameter cases in log space
Niter_params (list, default=[40, 60]):: Parameters determining maximum number of parameter optimisation iterations to attempt.
Nconv_params (list, default=[-5, 20]):: If we find Nconv solutions for the parameters which are within a logL of 0.5 of the best, we say we have converged and stop optimising parameters. These parameters determine Nconv.
ignore_previous_eqns (bool, default=True):: If we have seen an equation at lower complexity, whether to ignore the equation in this routine.

Returns:

None

fitting.test_all.optimise_fun(fcn_i, likelihood, tmax, pmin, pmax, comp=0, try_integration=False, log_opt=False, max_param=4, Niter_params=[40, 60], Nconv_params=[-5, 20], test_success=False, ignore_previous_eqns=True)[source]

Optimise the parameters of a function to fit data

The list of parameters, P, passed as Niter_params and Nconv_params compute these values, N, to be N = P[0] + P[1] * nparam + P[2] * nparam ** 2 + … where nparam is the number of parameters of the function. The order of the polynomial is determined by the length of P, so P can be arbirary in length.

Args:

fcn_i (str):: string representing function we wish to fit to data
likelihood (fitting.likelihood object):: object containing data and likelihood function
tmax (float):: maximum time in seconds to run any one part of simplification procedure for a given function
pmin (float):: minimum value for each parameter to consider when generating initial guess
pmax (float):: maximum value for each parameter to consider when generating initial guess
comp (float, default=0):: Complexity. Deafault of 0 because it is not provided when fitting a single function
try_integration (bool, default=False):: when likelihood requires integral, whether to try to analytically integrate (True) or just numerically integrate (False)
log_opt (bool, default=False):: whether to optimise 1 and 2 parameter cases in log space
max_param (int, default=4):: The maximum number of parameters considered. This sets the shapes of arrays used.
Niter_params (list, default=[40, 60]):: Parameters determining maximum number of parameter optimisation iterations to attempt.
Nconv_params (list, default=[-5, 20]):: If we find Nconv solutions for the parameters which are within a logL of 0.5 of the best, we say we have converged and stop optimising parameters. These parameters determine Nconv.
test_sucess (bool, default=False):: Whether to test whether the optimisation was successful using scipy’s criteria
ignore_previous_eqns (bool, default=True):: If we have seen an equation at lower complexity, whether to ignore the equation in this routine.

Returns:

chi2_i (float):: the minimum value of -log(likelihood) (corresponding to the maximum likelihood)
params (list):: the maximum likelihood values of the parameters

test_all_Fisher

fitting.test_all_Fisher.convert_params(fcn_i, eq, integrated, theta_ML, likelihood, negloglike, max_param=4)[source]

Compute Fisher, correct MLP and find parametric contirbution to description length for single function

Args:

fcn_i (str):: string representing function we wish to fit to data
eq (sympy object):: sympy object for the function we wish to fit to data
integrated (bool):: whether eq_numpy has already been integrated
theta_ML (list):: the maximum likelihood values of the parameters
likelihood (fitting.likelihood object):: object containing data, likelihood functions and file paths
negloglike (float):: the minimum log-likelihood for this function
max_param (int, default=4):: The maximum number of parameters considered. This sets the shapes of arrays used.

Returns:

params (list):: the corrected maximum likelihood values of the parameters
negloglike (float):: the corrected minimum log-likelihood for this function
deriv (list):: flattened version of the Hessian of -log(likelihood) at the maximum likelihood point
codelen (float):: the parameteric contribution to the description length of this function

fitting.test_all_Fisher.load_loglike(comp, likelihood, data_start, data_end, split=True)[source]

Load results of optimisation completed by test_all.py

Args:

comp (int):: complexity of functions to consider
likelihood (fitting.likelihood object):: object containing data, likelihood functions and file paths
data_start (int):: minimum index of results we want to load (only if split=True)
data_end (int):: maximum index of results we want to load (only if split=True)
split (bool, deault=True):: whether to return subset of results given by data_start and data_end (True) or all data (False)

Returns:

negloglike (list):: list of minimum log-likelihoods
params (np.ndarray):: list of parameters at maximum likelihood points. Shape = (nfun, nparam).

fitting.test_all_Fisher.main(comp, likelihood, tmax=5, print_frequency=50, try_integration=False)[source]

Compute Fisher, correct MLP and find parametric contirbution to description length for all functions and save to file

Args:

comp (int):: complexity of functions to consider
likelihood (fitting.likelihood object):: object containing data, likelihood functions and file paths
tmax (float, default=5.):: maximum time in seconds to run any one part of simplification procedure for a given function
print_frequency (int, default=50):: the status of the fits will be printed every print_frequency number of iterations
try_integration (bool, default=False):: when likelihood requires integral, whether to try to analytically integrate (True) or just numerically integrate (False)

Returns:

None

plotting

plot

plotting.plot.pareto_plot(dirname, savename, do_DL=True, do_logL=True)[source]

Plot the pareto front using the files in a given directory

Args:

dirname (str):: The directory name to consider.
savename (str):: File name to save file (within dirname)
do_DL (bool, default=True):: Whether to plot the description length in the pareto front
do_logL (bool, default=True):: Whether to plot the log-likelihood in the pareto front