API
generation
duplicate_checker
- generation.duplicate_checker.main(runname, compl, track_memory=False, search_tmax=60, expand_tmax=1, seed=1234)[source]
Run the generation of functions for a given complexity and set of basis functions
- Args:
- runname (str):
name of run, which defines the basis functions used
- compl (int):
complexity of functions to consider
- track_memory (bool, default=True):
whether to compute and print memory statistics (True) or not (False)
- search_tmax (float, default=60.):
maximum time in seconds to run any one part of simplification procedure for a given function
- expand_tmax (float, default=1.):
maximum time in seconds to run any one part of expand/simplify procedure for a given function
- seed (int, default=1234):
seed to set random number generator for shuffling functions (used to prevent one rank having similar, hard to simplify functions)
- Returns:
None
custom_printer
- class generation.custom_printer.ESRPrinter(settings=None)[source]
Bases:
Printer
A Printer for generating readable representation of most SymPy classes as required for ExhaustiveSR.
Only difference to sympy’s StrPrinter is to use pow(.,.) instead of ** unless integer powers used
- printmethod: str = '_sympystr'
- class generation.custom_printer.ESRReprPrinter(settings=None)[source]
Bases:
ESRPrinter
(internal) – see sstrrepr
generator
- class generation.generator.DecoratedNode(fun, basis_functions, parent_op=None, parent=None)[source]
Bases:
object
- generation.generator.aifeyn_complexity(tree, param_list)[source]
Compute contribution to description length from describing tree
- Args:
- tree (list):
list of strings giving node labels of tree
- param_list (list):
list of strings of all possible parameter names
- Returns:
- aifeyn (float):
the contribution to description length from describing tree
- generation.generator.check_operators(nodes, basis_functions)[source]
Check whether all operators in the tree are in the basis
- Args:
- nodes (DecoratedNode):
Node representation of the function tree
- basis_functions (list):
list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators
- Returns:
- all_in_basis (bool):
Whether all functions in tree are in basis
- generation.generator.check_tree(s)[source]
Given a candidate string of 0, 1 and 2s, see whether one can make a function out of this
- Args:
- s (str):
string comprised of 0, 1 and 2 representing tree of nullary, unary and binary nodes
- Returns:
- success (bool):
whether candidate string can form a valid tree (True) or not (False)
- part_considered (str):
string of length <= s, where s[:len(part_considered)] = part_considered
- tree (list):
list of Node objects corresponding to string s
- generation.generator.find_additional_trees(tree, labels, basis_functions)[source]
For a given tree, try to find all simpler representations of the function by combining sums, exponentials and powers
- Args:
- tree (list):
list of Node objects corresponding to tree of function
- labels (list):
list of strings giving node labels of tree
- basis_functions (list):
list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators
- Returns:
- new_tree (list):
list of equivalent trees, given as lists of Node objects
- new_labels (list):
list of lists of strings giving node labels of new_tree
- generation.generator.generate_equations(compl, basis_functions, dirname)[source]
Generate all equations at a given complexity for a set of basis functions and save results to file
- Args:
- compl (int):
complexity of functions to consider
- basis_functions (list):
list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators
- dirname (str):
directory path to save results in
- Returns:
- all_fun (list):
list of strings containing all functions generated
- extra_orig (list):
list of strings containing functions generated by combining sums, exponentials and powers of the functions in all_fun as they appear in all_fun
- generation.generator.get_allowed_shapes(compl)[source]
Find the shapes of all allowed trees containing compl nodes
- Args:
- compl (int):
complexity of tree = number of nodes
- Returns:
- cand (list):
list of strings comprised of 0, 1 and 2 representing valid trees of nullary, unary and binary nodes
- generation.generator.is_float(string)[source]
Determine whether a string is a float or not
- Args:
- string (str):
The string to check
- Returns:
bool: Whether the string is a float (True) or not (False).
- generation.generator.labels_to_shape(labels, basis_functions)[source]
Find the representation of the shape of a tree given its labels
- Args:
- labels (list):
list of strings giving node labels of tree
- basis_functions (list):
list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators
- Returns:
- s (str):
string comprised of 0, 1 and 2 representing tree of nullary, unary and binary nodes
- generation.generator.node_to_string(idx, tree, labels)[source]
Convert a tree with labels into a string giving function
- Args:
- idx (int):
index of tree to consider
- tree (list):
list of Node objects corresponding to the tree
- labels (list):
list of strings giving node labels of tree
- Returns:
Function as a string
- generation.generator.shape_to_functions(s, basis_functions)[source]
Find all possible functions formed from the given list of 0s, 1s and 2s defining a tree and basis functions
- Args:
- s (str):
string comprised of 0, 1 and 2 representing tree of nullary, unary and binary nodes
- basis_functions (list):
list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators
- Returns:
- all_fun (list):
list of strings containing all functions generated directly from tree
- all_tree (list):
list of lists of Node objects corresponding to the trees of functions in all_fun
- extra_fun (list):
list of strings containing functions generated by combining sums, exponentials and powers of the functions in all_fun
- extra_tree (list):
list of lists of Node objects corresponding to the trees of functions in extra_fun
- extra_orig (list):
list of strings corresponding to original versions of extra_fun, as found in all_fun
- generation.generator.string_to_expr(s, kern=False, evaluate=False, locs=None)[source]
Convert a string giving function into a sympy object
- Args:
- s (str):
string representation of the function considered
- kern (bool):
whether to use sympy’s kernS function or sympify
- evaluate (bool):
whether to use powsimp, factor and subs
- locs (dict):
dictionary of string:sympy objects. If None, will create here
- Returns:
- expr (sympy object):
expression corresponding to s
- generation.generator.string_to_node(s, basis_functions, locs=None, evalf=False, allow_eval=True, check_ops=False)[source]
Convert a string giving function into a tree with labels
- Args:
- s (str):
string representation of the function considered
- basis_functions (list):
list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators
- locs (dict):
dictionary of string:sympy objects. If None, will create here
- evalf (bool):
whether to run evalf() on function (default=False)
- allow_eval (bool, default=True):
whether to run the (kernS=False and evaluate=True) option
- check_ops (bool, default=False):
whether to check all operators appear in basis functions
- Returns:
- tree (list):
list of Node objects corresponding to the tree
- labels (list):
list of strings giving node labels of tree
- generation.generator.update_sums(tree, labels, try_idx, basis_functions)[source]
Try to combine sums to make simpler representations of functions
- Args:
- tree (list):
list of Node objects corresponding to tree of function
- labels (list):
list of strings giving node labels of tree
- try_idx (int):
when we have multiple substituions we can attempt, this indicates which one to try
- basis_functions (list):
list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators
- Returns:
- new_labels (list):
list of strings giving node labels of new tree
- new_shape (list):
list of 0, 1 and 2 representing whether nodes in new tree are nullary, unary or binary
- nadded (int):
number of new functions added
- generation.generator.update_tree(tree, labels, try_idx, basis_functions)[source]
Try to combine exponentials and powers to make simpler representations of functions
- Args:
- tree (list):
list of Node objects corresponding to tree of function
- labels (list):
list of strings giving node labels of tree
- try_idx (int):
when we have multiple substituions we can attempt, this indicates which one to try
- basis_functions (list):
list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators
- Returns:
- new_labels (list):
list of strings giving node labels of new tree
- new_shape (list):
list of 0, 1 and 2 representing whether nodes in new tree are nullary, unary or binary
- nadded (int):
number of new functions added
simplifier
- generation.simplifier.check_results(dirname, compl, tmax=10)[source]
Check that all functions can be recovered by applying the subsitutions to the unique functions. If not, define a new unique function and save results to file.
- Args:
- dirname (str):
name of directory containing all the functions to consider
- compl (int):
complexity of functions to consider
- tmax (float, default=10.):
maximum time in seconds to run the substitutions
- Returns:
None
- generation.simplifier.convert_params(p_meas, fish_meas, inv_subs, n=4)[source]
Convert parameters from those in unique function to those in actual function
- Args:
- p_meas (list):
list of measured parameters in unique function
- fish_meas (list):
flattened version of the Hessian of -log(likelihood) at the maximum likelihood point
- inv_subs (list):
list of substitutions required to convert between all and unique functions
- n (int, default=4):
the number of dimensions of the array from which fish_meas was computed using
- Returns:
- p_new (list):
list of parameters for the actual function
- diag_fish (np.array):
the diagonal entries of the Fisher matrix of the actual function at the maximum likelihood point
- generation.simplifier.count_params(all_fun, max_param)[source]
Count the number of free parameters in each member of a list of functions
- Args:
- all_fun (list):
list of strings containing functions
- max_param (int):
maximum number of free parameters in any equation in all_fun
- Returns:
- nparam (np.array):
array of ints containing number of free parameters in corresponding member of all_fun
- generation.simplifier.do_sympy(all_fun, all_sym, compl, search_tmax, expand_tmax, dirname, track_memory=False)[source]
Run the duplicate checking procedure
- Args:
- all_fun (list):
list of strings containing all functions
- all_sym (OrderedDict):
dictionary of sympy objects which can be accessed by their string representations.
- compl (int):
- search_tmax (float, default=1.):
maximum time in seconds to run any one part of simplification procedure for a given function
- expand_tmax (float, default=1.):
maximum time in seconds to run any one part of expand/simplify procedure for a given function
- dirname (str):
directory path to save results in
- track_memory (bool, default=True):
whether to compute and print memory statistics (True) or not (False)
- Returns:
- all_fun (list):
list of strings containing all (updated) functions
- all_sym (list):
dictionary of (updated) sympy objects which can be accessed by their string representations.
- count (int):
number of rounds of optimisation which were performed
- generation.simplifier.expand_or_factor(all_sym, tmax=1, method='expand')[source]
Run the sympy expand or factor functions
- Args:
- all_sym (OrderedDict):
dictionary of sympy objects which can be accessed by their string representations.
- tmax (float, default=1.):
maximum time in seconds to run any one part of expand/simplify procedure for a given function
- method (str, default=’expand’):
whether to run expand (‘expand’) or simplify (‘simplify’). All other options are ignored
- Returns:
- all_sym (OrderedDict):
dictionary of (updated) sympy objects which can be accessed by their string representations.
- generation.simplifier.get_all_dup(max_param)[source]
Finds self-inverse transformations of parameters, to be used in simplify_inv_subs(inv_subs, all_dup)
- Args:
- max_param (int):
maximum number of parameters to consider
- Returns:
- all_dup (list):
list of dictionaries giving subsitutions which are self-inverse
- generation.simplifier.get_max_param(all_fun, verbose=True)[source]
Find maximum number of free parameters in list of functions
- Args:
- all_fun (list):
list of strings containing functions
- verbose (bool, default=True):
Whether to print result (True) or not (False)
- Returns:
- max_param (int):
maximum number of free parameters in any equation in all_fun
- generation.simplifier.initial_sympify(all_fun, max_param, verbose=True, parallel=True, track_memory=False, save_sympy=True)[source]
Convert list of strings of functions into list of sympy objects
- Args:
- all_fun (list):
list of strings containing functions
- max_param (int):
maximum number of free parameters in any equation in all_fun
- verbose (bool, default=True):
whether to print progress (True) or not (False)
- parallel (bool, default=True):
whether to split equations amongst ranks (True) or each equation considered by all ranks (False)
- track_memory (bool, default=True):
whether to compute and print memory statistics (True) or not (False)
- save_sympy (bool, default=True):
whether to return sympy objects (True) or not (False)
- Returns:
- str_fun (list):
list of strings containing functions
- sym_fun (OrderedDict):
dictionary of sympy objects which can be accessed by their string representations. If save_sympy is False, then sym_fun is None.
- generation.simplifier.load_subs(fname, max_param, use_sympy=True, bcast_res=True)[source]
Load the subsitutions required to convert between all and unique functions
- Args:
- fname (str):
file name containing the subsitutions
- max_param (int):
maximum number of parameters to consider
- use_sympy (bool, default=True):
whether to convert substituions to sympy objects (True) or leave as strings (False)
- bcast_res (bool, default=True):
whether to allow all ranks to have the substitutions (True) or just the 0th rank (False)
- Returns:
- all_subs (list):
list of substitutions required to convert between all and unique functions. Each item is either a dictionary with sympy objects as keys and values (use_sympy=True) or a string version of this dictionary (use_sympy=False). If bcast_res=True, then all ranks have this list, otherwise only rank 0 has this list and all other ranks return None.
- generation.simplifier.make_changes(all_fun, all_sym, all_inv_subs, str_fun, sym_fun, inv_subs_fun)[source]
Update global variables of functions and symbolic expressions by combining rank calculations
- Args:
- all_fun (list):
list of strings containing all functions
- all_sym (list):
list of sympy objects containing all functions
- all_inv_subs (list):
list of dictionaries giving subsitutions to be applied to all functions
- str_fun (list):
list of strings containing functions considered by rank
- sym_fun (list):
list of sympy objects containing functions considered by rank
- inv_subs_fun (list):
list of dictionaries giving subsitutions to be applied to functions considered by rank
- Returns:
- all_fun:
list of strings containing all (updated) functions
- all_sym (list):
list of sympy objects containing all (updated) functions
- all_inv_subs:
list of dictionaries giving subsitutions to be applied to all (updated) functions
- generation.simplifier.simplify_inv_subs(inv_subs, all_dup)[source]
Find if two consecutive {a0: -a0} or {a0: a1, a1: a0} or {a0: 1/a0} and then remove both of these
- Args:
- inv_subs (list):
list of dictionaries giving subsitutions to check
- all_dup (list):
list of dictionaries giving subsitutions which are self-inverse
- Returns:
- all_subs (list):
list of dictionaries giving subsitutions without consecutive self-inverses
- generation.simplifier.sympy_simplify(all_fun, all_sym, all_inv_subs, max_param, expand_fun=True, tmax=1, check_perm=False)[source]
Simplify equations and find duplicates.
- Args:
- all_fun (list):
list of strings containing all functions
- all_sym (list):
list of sympy objects containing all functions
- all_inv_subs (list):
list of dictionaries giving subsitutions to be applied to all functions
- max_param (int):
maximum number of free parameters in any equation in all_fun
- expand_fun (bool, default=True):
whether to run the sympy expand options (True) or not (False)
- tmax (float, default=1.):
maximum time in seconds to run any one part of simplification procedure for a given function
- check_perm (bool, default=False):
whether to check all possible permutations and inverses of constants (True) or not (False)
- Returns:
- all_fun:
list of strings containing all (updated) functions
- all_sym (list):
list of sympy objects containing all (updated) functions
- all_inv_subs:
list of dictionaries giving subsitutions to be applied to all (updated) functions
utils
- generation.utils.get_match_indexes(a, b)[source]
Returns indices in a of items in b
- Args:
- a (list):
list of values whose index in b we wish to determine
- b (list):
list of values whose indices we wish to find
- Returns:
- result (list):
indices where corresponding value of a appears in b
- generation.utils.get_unique_indexes(l)[source]
Find the indices of the unique items in a list
- Args:
- l (list):
list from which we want to find unique indices
- Returns:
- result (OrderedDict):
dictionary which returns index of unique item in l, accessed by unique item
- match (dict):
dictionary which returns index of unique item in result, accessed by unique item
- generation.utils.locals_size(loc)[source]
Find and print the total memory used by locals()
- Args:
- loc (dict):
dictionary of locals (obtained calling locals() in another script)
- Returns:
None
- generation.utils.merge_keys(all_fun, all_sym)[source]
Convert all_fun so that different values which give same item in all_sym now have the same value
- Args:
- all_fun (list):
list of strings containing all functions
- all_sym (OrderedDict):
dictionary of sympy objects which can be accessed by their string representations.
- Returns:
None
- generation.utils.pprint_ntuple(nt)[source]
Printing function for memory diagnostics
- Args:
- nt (tuple):
tuple of memory statistics returned by psutil.virtual_memory()
- Returns:
None
- generation.utils.split_idx(Ntotal, r, indices_or_sections)[source]
Returns the rth set indices for numpy.array_split(a,indices_or_sections) where len(a) = Ntotal
- Args:
- Ntotal (int):
length of array to split
- r (int):
rank whose indices are required
- indices_or_sections (int):
how many parts to split array into
- Returns:
- i (list):
[min, max] index used by rank
fitting
combine_DL
- fitting.combine_DL.main(comp, likelihood, print_frequency=1000)[source]
Combine the description lengths of all functions of a given complexity, sort by this and save to file.
- Args:
- comp (int):
complexity of functions to consider
- likelihood (fitting.likelihood object):
object containing data, likelihood functions and file paths
- print_frequency (int, default=1000):
the status of the fits will be printed every
print_frequency
number of iterations
- Returns:
None
fit_single
- fitting.fit_single.fit_from_string(fun, basis_functions, likelihood, pmin=0, pmax=5, tmax=5, try_integration=False, verbose=False, Niter=30, Nconv=5, maxvar=20, log_opt=False, replace_floats=False, return_params=False)[source]
Run end-to-end fitting of function for a single function, given as a string. Note that this is not guaranteed to find the optimimum representation as a tree, so there could be a lower description-length representation of the function
- Args:
- fun (str):
String representation of the function to be fitted
- basis_functions (list):
list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators
- likelihood (fitting.likelihood object):
object containing data, likelihood functions and file paths
- pmin (float, default=0.):
minimum value for each parameter to consider when generating initial guess
- pmax (float, default=3.):
maximum value for each parameter to consider when generating initial guess
- tmax (float, default=5.):
maximum time in seconds to run any one part of simplification procedure for a given function
- try_integration (bool, default=False):
when likelihood requires integral, whether to try to analytically integrate (True) or just numerically integrate (False)
- verbose (bool, default=True):
Whether to print results (True) or not (False)
- Niter (int, default=30):
Maximum number of parameter optimisation iterations to attempt.
- Nconv (int, default=5):
If we find Nconv solutions for the parameters which are within a logL of 0.5 of the best, we say we have converged and stop optimising parameters
- maxvar (int):
The maximum number of variables which could appear in the function
- log_opt (bool, default=False):
whether to optimise 1 and 2 parameter cases in log space
- replace_floats (bool, default=False):
whether to replace any numbers found in the function with variables to optimise
- return_params (bool, default=False):
whether to return the parameters of the maximum likelihood point
- Returns:
- negloglike (float):
the minimum value of -log(likelihood) (corresponding to the maximum likelihood)
- DL (float):
the description length of this function
- labels (list):
list of strings giving node labels of tree
- params (optional, list):
the maximum likelihood parameters. Only returned if return_params is true
- fitting.fit_single.single_function(labels, basis_functions, likelihood, pmin=0, pmax=5, tmax=5, try_integration=False, verbose=False, Niter=30, Nconv=5, log_opt=False, return_params=False)[source]
Run end-to-end fitting of function for a single function
- Args:
- labels (list):
list of strings giving node labels of tree
- basis_functions (list):
list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators
- likelihood (fitting.likelihood object):
object containing data, likelihood functions and file paths
- pmin (float, default=0.):
minimum value for each parameter to consider when generating initial guess
- pmax (float, default=3.):
maximum value for each parameter to consider when generating initial guess
- tmax (float, default=5.):
maximum time in seconds to run any one part of simplification procedure for a given function
- try_integration (bool, default=False):
when likelihood requires integral, whether to try to analytically integrate (True) or just numerically integrate (False)
- verbose (bool, default=True):
Whether to print results (True) or not (False)
- Niter (int, default=30):
Maximum number of parameter optimisation iterations to attempt.
- Nconv (int, default=5):
If we find Nconv solutions for the parameters which are within a logL of 0.5 of the best, we say we have converged and stop optimising parameters
- log_opt (bool, default=False):
whether to optimise 1 and 2 parameter cases in log space
- return_params (bool, default=False):
whether to return the parameters of the maximum likelihood point
- Returns:
- negloglike (float):
the minimum value of -log(likelihood) (corresponding to the maximum likelihood)
- DL (float):
the description length of this function
- params (optional, list):
the maximum likelihood parameters. Only returned if return_params is true
- fitting.fit_single.string_to_aifeyn(fun, basis_functions, maxvar=20, verbose=True, replace_floats=False)[source]
Takes a string defining a function and returns the AIFeyn term of complexity and the complexity of the function
- Args:
- fun (str):
String representation of the function to be fitted
- basis_functions (list):
list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators
- maxvar (int, default=20):
The maximum number of variables which could appear in the function
- verbose (bool, default=True):
Whether to print results (True) or not (False)
- replace_floats (bool, default=False):
whether to replace any numbers found in the function with variables to optimise
- Returns:
- aifeyn (float):
the contribution to description length from describing tree
- complexity (int):
the number of nodes in the function
- fitting.fit_single.tree_to_aifeyn(labels, basis_functions, verbose=True)[source]
Takes a list of labels defining a function and returns the AIFeyn term of complexity and the complexity of the function
- Args:
- labels (list):
list of strings giving node labels of tree
- basis_functions (list):
list of lists basis functions. basis_functions[0] are nullary, basis_functions[1] are unary and basis_functions[2] are binary operators
- verbose (bool, default=True):
Whether to print results (True) or not (False)
- Returns:
- aifeyn (float):
the contribution to description length from describing tree
- complexity (int):
the number of nodes in the function
likelihood
- class fitting.likelihood.CCLikelihood[source]
Bases:
Likelihood
Likelihood class used to fit cosmic chronometer data. Should be used as a template for other likelihoods as all functions in this class are required in fitting functions.
- get_pred(zp1, a, eq_numpy, **kwargs)[source]
Return the predicted H(z), which is the square root of the functions we are using.
- Args:
- zp1 (float or np.array):
1 + z for redshift z
- a (list):
parameters to subsitute into equation considered
- eq_numpy (numpy function):
function to use which gives H^2
- Returns:
- H (float or np.array):
the predicted Hubble parameter at redshifts supplied
- class fitting.likelihood.GaussLikelihood(data_file, run_name, data_dir=None, fn_set='core_maths')[source]
Bases:
Likelihood
Likelihood class used to fit a function directly using a Gaussian likelihood
- Args:
- data_file (str):
Name of the file containing the data to use
- run_name (str):
The name to be associated with this likelihood, e.g. ‘my_esr_run’
- data_dir (str, default=None):
The path containing the data and cov files
- fn_set (str, default=’core_maths’):
The name of the function set to use with the likelihood. Must match one of those defined in
generation.duplicate_checker
- class fitting.likelihood.Likelihood(data_file, cov_file, run_name, data_dir=None, fn_set='core_maths')[source]
Bases:
object
Likelihood class used to fit a function directly
- Args:
- data_file (str):
Name of the file containing the data to use
- cov_file (str):
Name of the file containing the errors/covariance on the data
- run_name (str):
The name to be associated with this likelihood, e.g. ‘my_esr_run’
- data_dir (str, default=None):
The path containing the data and cov files
- fn_set (str, default=’core_maths’):
The name of the function set to use with the likelihood. Must match one of those defined in
generation.duplicate_checker
- get_pred(x, a, eq_numpy, **kwargs)[source]
Return the predicted y(x)
- Args:
- x (float or np.array):
x value being used
- a (list):
parameters to subsitute into equation considered
- eq_numpy (numpy function):
function to use which gives H^2
- Returns:
- y (float or np.array):
the predicted y value at x supplied
- run_sympify(fcn_i, **kwargs)[source]
Sympify a function
- Args:
- fcn_i (str):
string representing function we wish to fit to data
- Returns:
- fcn_i (str):
string representing function we wish to fit to data (with superfluous characters removed)
- eq (sympy object):
sympy object representing function we wish to fit to data
- integrated (bool, always False):
whether we analytically integrated the function (True) or not (False)
- class fitting.likelihood.MSE(data_file, run_name, data_dir=None, fn_set='core_maths')[source]
Bases:
Likelihood
Likelihood class used to fit a function directly using a MSE
IMPORTANT - MSE is NOT a likelihood in the probabilistic sense. It should not be used for MDL calculations as the answer will be nonesense since an uncertainty is required for MDL to have meaning.
- Args:
- data_file (str):
Name of the file containing the data to use
- run_name (str):
The name to be associated with this likelihood, e.g. ‘my_esr_run’
- data_dir (str, default=None):
The path containing the data and cov files
- fn_set (str, default=’core_maths’):
The name of the function set to use with the likelihood. Must match one of those defined in
generation.duplicate_checker
- negloglike(a, eq_numpy, **kwargs)[source]
Negative log-likelihood for a given function. Here it is (y-ypred)^2 Note that this is technically not a log-likelihood, but the function name is required to be accessed by other functions.
- Args:
- a (list):
parameters to subsitute into equation considered
- eq_numpy (numpy function):
function to use which gives y
- Returns:
- nll (float):
log(likelihood) for this function and parameters
- class fitting.likelihood.MockLikelihood(nz, yfracerr, data_dir=None)[source]
Bases:
Likelihood
Likelihood class used to fit mock cosmic chronometer data
- Args:
- nz (int):
number of mock redshifts to use
- yfracerr (float):
the fractional uncertainty on the cosmic chronometer mock we are using
- data_dir (str, default=None):
The path containing the data and cov files
- get_pred(zp1, a, eq_numpy, **kwargs)[source]
Return the predicted H(z), which is the square root of the functions we are using.
- Args:
- zp1 (float or np.array):
1 + z for redshift z
- a (list):
parameters to subsitute into equation considered
- eq_numpy (numpy function):
function to use which gives H^2
- Returns:
- H (float or np.array):
the predicted Hubble parameter at redshifts supplied
- class fitting.likelihood.PanthLikelihood[source]
Bases:
Likelihood
Likelihood class used to fit Pantheon data
- get_pred(zp1, a, eq_numpy, integrated=False)[source]
Return the predicted distance modulus from the H^2 function supplied.
- Args:
- zp1 (float or np.array):
1 + z for redshift z
- a (list):
parameters to subsitute into equation considered
- eq_numpy (numpy function):
function to use which gives H^2
- integrated (bool, default=False):
whether we previously analytically integrated the function (True) or not (False)
- Returns:
- mu (float or np.array):
the predicted distance modulus at redshifts supplied
- negloglike(a, eq_numpy, integrated=False)[source]
Negative log-likelihood for a given function
- Args:
- a (list):
parameters to subsitute into equation considered
- eq_numpy (numpy function):
function to use which gives H^2
- Returns:
- nll (float):
log(likelihood) for this function and parameters
- run_sympify(fcn_i, tmax=5, try_integration=True)[source]
Sympify a function
- Args:
- fcn_i (str):
string representing function we wish to fit to data
- tmax (float):
maximum time in seconds to attempt analytic integration
- try_integration (bool, default=True):
as the likelihood requires an integral, whether to try to analytically integrate (True) or not (False)
- Returns:
- fcn_i (str):
string representing function we wish to fit to data (with superfluous characters removed)
- eq (sympy object):
sympy object representing function we wish to fit to data
- integrated (bool):
whether we we able to analytically integrate the function (True) or not (False)
- class fitting.likelihood.PoissonLikelihood(data_file, run_name, data_dir=None, fn_set='core_maths')[source]
Bases:
Likelihood
Likelihood class used to fit a function directly using a Poisson likelihood
- Args:
- data_file (str):
Name of the file containing the data to use
- run_name (str):
The name to be associated with this likelihood, e.g. ‘my_esr_run’
- data_dir (str, default=None):
The path containing the data and cov files
- fn_set (str, default=’core_maths’):
The name of the function set to use with the likelihood. Must match one of those defined in
generation.duplicate_checker
- negloglike(a, eq_numpy, **kwargs)[source]
Negative log-likelihood for a given function. Here it is a Poisson
- Args:
- a (list):
parameters to subsitute into equation considered
- eq_numpy (numpy function):
function to use which gives y
- Returns:
- nll (float):
log(likelihood) for this function and parameters
match
- fitting.match.main(comp, likelihood, tmax=5, print_frequency=1000, try_integration=False)[source]
Apply results of fitting the unique functions to all functions and save to file
- Args:
- comp (int):
complexity of functions to consider
- likelihood (fitting.likelihood object):
object containing data, likelihood functions and file paths
- tmax (float, default=5.):
maximum time in seconds to run any one part of simplification procedure for a given function
- print_frequency (int, default=1000):
the status of the fits will be printed every
print_frequency
number of iterations- try_integration (bool, default=False):
when likelihood requires integral, whether to try to analytically integrate (True) or just numerically integrate (False)
- Returns:
None
plot
- fitting.plot.main(comp, likelihood, tmax=5, try_integration=False, xscale='linear', yscale='linear')[source]
Plot best 50 functions at given complexity against data and save plot to file
- Args:
- comp (int):
complexity of functions to consider
- likelihood (fitting.likelihood object):
object containing data, functions to convert SR expressions to variable of data and output path
- tmax (float, default=5.):
maximum time in seconds to run any one part of simplification procedure for a given function
- try_integration (bool, default=False):
when likelihood requires integral, whether to try to analytically integrate (True) or just numerically integrate (False)
- xscale (str), default=’linear’):
Scaling for x-axis
- yscale (str), default=’linear’):
Scaling for y-axis
- Returns:
None
sympy_symbols
Script defining functions and symbols which can be used to interpret functions used in fitting functions
test_all
- fitting.test_all.chi2_fcn(x, likelihood, eq_numpy, integrated, signs)[source]
Compute chi2 for a function
- Args:
- x (list):
parameters to use for function
- likelihood (fitting.likelihood object):
object containing data and likelihood function
- eq_numpy (numpy function):
function to pass to likelihood object to make prediction of y(x)
- integrated (bool):
whether eq_numpy has already been integrated
- signs (list):
each entry specifies whether than parameter should be optimised logarithmically. If None, then do nothing, if ‘+’ then optimise 10**x[i] and if ‘-’ then optimise -10**x[i]
- Returns:
- negloglike (float):
log(likelihood) for this function and parameters
- fitting.test_all.get_functions(comp, likelihood, unique=True)[source]
Load all functions for a given complexity to use and distribute among ranks
- Args:
- comp (int):
complexity of functions to consider
- likelihood (fitting.likelihood object):
object containing data, functions to convert SR expressions to variable of data and file path
- unique (bool, default=True):
whether to load just the unique functions (True) or all functions (False)
- Returns:
- fcn_list (list):
list of strings representing functions to be used by given rank
- data_start (int):
first index of function used by rank
- data_end (int):
last index of function used by rank
- fitting.test_all.main(comp, likelihood, tmax=5, pmin=0, pmax=3, print_frequency=50, try_integration=False, log_opt=False, Niter_params=[40, 60], Nconv_params=[-5, 20], ignore_previous_eqns=True)[source]
Optimise all functions for a given complexity and save results to file.
This can optimise in log-space, with separate +ve and -ve branch (except when there are >=3 params in which case it does it in linear)
The list of parameters, P, passed as Niter_params and Nconv_params compute these values, N, to be N = P[0] + P[1] * nparam + P[2] * nparam ** 2 + … where nparam is the number of parameters of the function. The order of the polynomial is determined by the length of P, so P can be arbirary in length.
- Args:
- comp (int):
complexity of functions to consider
- likelihood (fitting.likelihood object):
object containing data, likelihood functions and file paths
- tmax (float, default=5.):
maximum time in seconds to run any one part of simplification procedure for a given function
- pmin (float, default=0.):
minimum value for each parameter to considered when generating initial guess
- pmax (float, default=3.):
maximum value for each parameter to considered when generating initial guess
- print_frequency (int, default=50):
the status of the fits will be printed every
print_frequency
number of iterations- try_integration (bool, default=False):
when likelihood requires integral, whether to try to analytically integrate (True) or just numerically integrate (False)
- log_opt (bool, default=False):
whether to optimise 1 and 2 parameter cases in log space
- Niter_params (list, default=[40, 60]):
Parameters determining maximum number of parameter optimisation iterations to attempt.
- Nconv_params (list, default=[-5, 20]):
If we find Nconv solutions for the parameters which are within a logL of 0.5 of the best, we say we have converged and stop optimising parameters. These parameters determine Nconv.
- ignore_previous_eqns (bool, default=True):
If we have seen an equation at lower complexity, whether to ignore the equation in this routine.
- Returns:
None
- fitting.test_all.optimise_fun(fcn_i, likelihood, tmax, pmin, pmax, comp=0, try_integration=False, log_opt=False, max_param=4, Niter_params=[40, 60], Nconv_params=[-5, 20], test_success=False, ignore_previous_eqns=True)[source]
Optimise the parameters of a function to fit data
The list of parameters, P, passed as Niter_params and Nconv_params compute these values, N, to be N = P[0] + P[1] * nparam + P[2] * nparam ** 2 + … where nparam is the number of parameters of the function. The order of the polynomial is determined by the length of P, so P can be arbirary in length.
- Args:
- fcn_i (str):
string representing function we wish to fit to data
- likelihood (fitting.likelihood object):
object containing data and likelihood function
- tmax (float):
maximum time in seconds to run any one part of simplification procedure for a given function
- pmin (float):
minimum value for each parameter to consider when generating initial guess
- pmax (float):
maximum value for each parameter to consider when generating initial guess
- comp (float, default=0):
Complexity. Deafault of 0 because it is not provided when fitting a single function
- try_integration (bool, default=False):
when likelihood requires integral, whether to try to analytically integrate (True) or just numerically integrate (False)
- log_opt (bool, default=False):
whether to optimise 1 and 2 parameter cases in log space
- max_param (int, default=4):
The maximum number of parameters considered. This sets the shapes of arrays used.
- Niter_params (list, default=[40, 60]):
Parameters determining maximum number of parameter optimisation iterations to attempt.
- Nconv_params (list, default=[-5, 20]):
If we find Nconv solutions for the parameters which are within a logL of 0.5 of the best, we say we have converged and stop optimising parameters. These parameters determine Nconv.
- test_sucess (bool, default=False):
Whether to test whether the optimisation was successful using scipy’s criteria
- ignore_previous_eqns (bool, default=True):
If we have seen an equation at lower complexity, whether to ignore the equation in this routine.
- Returns:
- chi2_i (float):
the minimum value of -log(likelihood) (corresponding to the maximum likelihood)
- params (list):
the maximum likelihood values of the parameters
test_all_Fisher
- fitting.test_all_Fisher.convert_params(fcn_i, eq, integrated, theta_ML, likelihood, negloglike, max_param=4)[source]
Compute Fisher, correct MLP and find parametric contirbution to description length for single function
- Args:
- fcn_i (str):
string representing function we wish to fit to data
- eq (sympy object):
sympy object for the function we wish to fit to data
- integrated (bool):
whether eq_numpy has already been integrated
- theta_ML (list):
the maximum likelihood values of the parameters
- likelihood (fitting.likelihood object):
object containing data, likelihood functions and file paths
- negloglike (float):
the minimum log-likelihood for this function
- max_param (int, default=4):
The maximum number of parameters considered. This sets the shapes of arrays used.
- Returns:
- params (list):
the corrected maximum likelihood values of the parameters
- negloglike (float):
the corrected minimum log-likelihood for this function
- deriv (list):
flattened version of the Hessian of -log(likelihood) at the maximum likelihood point
- codelen (float):
the parameteric contribution to the description length of this function
- fitting.test_all_Fisher.load_loglike(comp, likelihood, data_start, data_end, split=True)[source]
Load results of optimisation completed by test_all.py
- Args:
- comp (int):
complexity of functions to consider
- likelihood (fitting.likelihood object):
object containing data, likelihood functions and file paths
- data_start (int):
minimum index of results we want to load (only if split=True)
- data_end (int):
maximum index of results we want to load (only if split=True)
- split (bool, deault=True):
whether to return subset of results given by data_start and data_end (True) or all data (False)
- Returns:
- negloglike (list):
list of minimum log-likelihoods
- params (np.ndarray):
list of parameters at maximum likelihood points. Shape = (nfun, nparam).
- fitting.test_all_Fisher.main(comp, likelihood, tmax=5, print_frequency=50, try_integration=False)[source]
Compute Fisher, correct MLP and find parametric contirbution to description length for all functions and save to file
- Args:
- comp (int):
complexity of functions to consider
- likelihood (fitting.likelihood object):
object containing data, likelihood functions and file paths
- tmax (float, default=5.):
maximum time in seconds to run any one part of simplification procedure for a given function
- print_frequency (int, default=50):
the status of the fits will be printed every
print_frequency
number of iterations- try_integration (bool, default=False):
when likelihood requires integral, whether to try to analytically integrate (True) or just numerically integrate (False)
- Returns:
None
plotting
plot
- plotting.plot.pareto_plot(dirname, savename, do_DL=True, do_logL=True)[source]
Plot the pareto front using the files in a given directory
- Args:
- dirname (str):
The directory name to consider.
- savename (str):
File name to save file (within dirname)
- do_DL (bool, default=True):
Whether to plot the description length in the pareto front
- do_logL (bool, default=True):
Whether to plot the log-likelihood in the pareto front