diff --git a/README.md b/README.md index 432dd6b..b7137b8 100644 --- a/README.md +++ b/README.md @@ -1,391 +1,394 @@                 ![docs_title_logo](./resources/docs_title_logo.png) # A Hyper-Parameter Optimization Toolbox
## What is Hyppopy? Hyppopy is a python toolbox for blackbox optimization. It's purpose is to offer a unified and easy to use interface to a collection of solver libraries. Currently provided solvers are: * [Hyperopt](http://hyperopt.github.io/hyperopt/) * [Optunity](https://optunity.readthedocs.io/en/latest/user/index.html) * [Optuna](https://optuna.org/) * [BayesianOptimization](https://github.com/fmfn/BayesianOptimization) +* Quasi-Randomsearch Solver * Randomsearch Solver * Gridsearch Solver ## Installation 1. clone the [Hyppopy](http:\\github.com) project from Github 2. (create a virtual environment), open a console (with your activated virtual env) and go to the hyppopy root folder 3. ```$ pip install -r requirements.txt``` 4. ```$ python setup.py install``` (for normal usage) or ```$ python setup.py develop``` (if you want to join the hyppopy development *hooray*) ## How to use Hyppopy? #### The Hyperparamaterspace Hyppopy defines a common hyperparameterspace description, whatever solver is used. A hyperparameter description includes the following fields: * domain: the domain defines how the solver samples the parameter space, options are: * uniform: samples the data range [a,b] evenly, whereas b>a * normal: samples the data range [a,b] using a normal distribution with mu=a+(b-a)/2, sigma=(b-a)/6, whereas b>a * loguniform: samples the data range [a,b] logarithmic using e^x by sampling the exponent range x=[log(a), log(b)] uniformly, whereas a>0 and b>a * categorical: is used to define a data list * data: in case of categorical domain data is a list, all other domains expect a range [a, b] * type: the parameter data type as string 'int', 'float' or 'str' An exeption must be kept in mind when using the GridsearchSolver. The gridsearch additionally needs a number of samples per domain, which must be set using the field: frequency. #### The HyppopyProject class The HyppopyProject class takes care all settings necessary for the solver and your workflow. To setup a HyppopyProject instance we can use a nested dictionary or the classes memberfunctions respectively. ```python # Import the HyppopyProject class from hyppopy.HyppopyProject import HyppopyProject # Create a nested dict with a section hyperparameter. We define a 2 dimensional # hyperparameter space with a numerical dimension named myNumber of type float and # a uniform sampling. The second dimension is a categorical parameter of type string. config = { "hyperparameter": { "myNumber": { "domain": "uniform", "data": [0, 100], "type": "float" }, "myOption": { "domain": "categorical", "data": ["a", "b", "c"], "type": "str" } }} # Create a HyppopyProject instance and pass the config dict to # the constructor. Alternatively one can use set_config method. project = HyppopyProject(config=config) # To demonstrate the second option we clear the project project.clear() # and add the parameter again using the member function add_hyperparameter project.add_hyperparameter(name="myNumber", domain="uniform", data=[0, 100], dtype="float") project.add_hyperparameter(name="myOption", domain="categorical", data=["a", "b", "c"], dtype="str") ``` ```python from hyppopy.HyppopyProject import HyppopyProject # We might have seen a warning: 'UserWarning: config dict had no # section settings/solver/max_iterations, set default value: 500' # when executing the example above. This is due to the fact that # most solvers need a value for a maximum number of iterations. # To take care of solver settings (there might be more in the future) # one can set a second section called settings. The settings section # again is splitted into a subsection 'solver' and a subsection 'custom'. # When adding max_iterations to the section settings/solver we can change # the number of iterations the solver is doing. All solver except of the # GridsearchSolver make use of the value max_iterations. # The usage of the custom section is demonstrated later. config = { "hyperparameter": { "myNumber": { "domain": "uniform", "data": [0, 100], "type": "float" }, "myOption": { "domain": "categorical", "data": ["a", "b", "c"], "type": "str" } }, "settings": { "solver": { "max_iterations": 500 }, "custom": {} }} project = HyppopyProject(config=config) ``` The settings added are automatically converted to a class member with a prefix_ where prefix is the name of the subsection. One can make use of this feature to build custom workflows by adding params to the custom section. More interesting is this feature when developing your own solver. ```python from hyppopy.HyppopyProject import HyppopyProject # Creating a HyppopyProject instance project = HyppopyProject() project.add_hyperparameter(name="x", domain="uniform", data=[-10, 10], dtype="float") project.add_hyperparameter(name="y", domain="uniform", data=[-10, 10], dtype="float") project.add_settings(section="solver", name="max_iterations", value=300) project.add_settings(section="custom", name="my_param1", value=True) project.add_settings(section="custom", name="my_param2", value=42) print("What is max_iterations value? {}".format(project.solver_max_iterations)) if project.custom_my_param1: print("What is the answer? {}".format(project.custom_my_param2)) else: print("What is the answer? x") ``` #### The HyppopySolver classes Each solver is a child of the HyppopySolver class. This is only interesting if you're planning to write a new solver, we will discuss this in the section Solver Development. All solvers we can use to optimize our blackbox function are part of the module 'hyppopy.solver'. Below is a list of all solvers available along with their access key in squared brackets. * HyperoptSolver [hyperopt] * OptunitySolver [optunity] * OptunaSolver [optuna] * BayesOptSolver [bayesopt] * RandomsearchSolver [randomsearch] +* QuasiRandomsearchSolver [quasirandomsearch] * GridsearchSolver [gridsearch] + There are two options to get a solver, we can import directly from the hyppopy.solver package or we use the SolverPool class. We look into both options by optimizing a simple function, starting with the direct import case. ```python # Import the HyppopyProject class from hyppopy.HyppopyProject import HyppopyProject # Import the HyperoptSolver class, in this case wh use Hyperopt from hyppopy.solvers.HyperoptSolver import HyperoptSolver # Our function to optimize def my_loss_func(x, y): return x**2+y**2 # Creating a HyppopyProject instance project = HyppopyProject() project.add_hyperparameter(name="x", domain="uniform", data=[-10, 10], dtype="float") project.add_hyperparameter(name="y", domain="uniform", data=[-10, 10], dtype="float") project.add_settings(section="solver", name="max_iterations", value=300) # create a solver instance solver = HyperoptSolver(project) # pass the loss function to the solver solver.blackbox = my_loss_func # run the solver solver.run() df, best = solver.get_results() print("\n") print("*"*100) print("Best Parameter Set:\n{}".format(best)) print("*"*100) ``` The SolverPool is a class keeping track of all solver classes. We have several options to ask the SolverPool for the desired solver. We can add an option called use_solver to our settings/custom section or to the project instance respectively, or we can use the solver access key (see solver listing above) to ask for the solver directly. ```python # import the SolverPool class from hyppopy.SolverPool import SolverPool # Import the HyppopyProject class from hyppopy.HyppopyProject import HyppopyProject # Our function to optimize def my_loss_func(x, y): return x**2+y**2 # Creating a HyppopyProject instance project = HyppopyProject() project.add_hyperparameter(name="x", domain="uniform", data=[-10, 10], dtype="float") project.add_hyperparameter(name="y", domain="uniform", data=[-10, 10], dtype="float") project.add_settings(section="solver", name="max_iterations", value=300) project.add_settings(section="custom", name="use_solver", value="hyperopt") # create a solver instance. The SolverPool class is a singleton # and can be used without instanciating. It looks in the project # instance for the use_solver option and returns the correct solver. solver = SolverPool.get(project=project) # Another option without the usage of the use_solver field would be: # solver = SolverPool.get(solver_name='hyperopt', project=project) # pass the loss function to the solver solver.blackbox = my_loss_func # run the solver solver.run() df, best = solver.get_results() print("\n") print("*"*100) print("Best Parameter Set:\n{}".format(best)) print("*"*100) ``` #### The BlackboxFunction class To extend the possibilities beyond using parameter only loss function as in the examples above, the BlackboxFunction class can be used. This class is a wrapper class around the actual loss_function providing a more advanced access to data handling and a callback_function for accessing the solvers iteration loop. ```python # import the HyppopyProject class keeping track of inputs from hyppopy.HyppopyProject import HyppopyProject # import the SolverPool singleton class from hyppopy.SolverPool import SolverPool # import the Blackboxfunction class wrapping your problem for Hyppopy from hyppopy.BlackboxFunction import BlackboxFunction # Create the HyppopyProject class instance project = HyppopyProject() project.add_hyperparameter(name="C", domain="uniform", data=[0.0001, 20], dtype="float") project.add_hyperparameter(name="gamma", domain="uniform", data=[0.0001, 20], dtype="float") project.add_hyperparameter(name="kernel", domain="categorical", data=["linear", "sigmoid", "poly", "rbf"], dtype="str") project.add_settings(section="solver", name="max_iterations", value=500) project.add_settings(section="custom", name="use_solver", value="optunity") # The BlackboxFunction signature is as follows: # BlackboxFunction(blackbox_func=None, # dataloader_func=None, # preprocess_func=None, # callback_func=None, # data=None, # **kwargs) # # - blackbox_func: a function pointer to the users loss function # - dataloader_func: a function pointer for handling dataloading. The function is called once before # optimizing. What it returns is passed as first argument to your loss functions # data argument. # - preprocess_func: a function pointer for data preprocessing. The function is called once before # optimizing and gets via kwargs['data'] the raw data object set directly or returned # from dataloader_func. What this function returns is then what is passed as first # argument to your loss function. # - callback_func: a function pointer called after each iteration. The input kwargs is a dictionary # keeping the parameters used in this iteration, the 'iteration' index, the 'loss' # and the 'status'. The function in this example is used for realtime printing it's # input but can also be used for realtime visualization. # - data: if not done via dataloader_func one can set a raw_data object directly # - kwargs: dict that whose content is passed to all functions above. from sklearn.svm import SVC from sklearn.datasets import load_iris from sklearn.model_selection import cross_val_score def my_dataloader_function(**kwargs): print("Dataloading...") # kwargs['params'] allows accessing additional parameter passed, # see below my_preproc_param, my_dataloader_input. print("my loading argument: {}".format(kwargs['params']['my_dataloader_input'])) iris_data = load_iris() return [iris_data.data, iris_data.target] def my_preprocess_function(**kwargs): print("Preprocessing...") # kwargs['data'] allows accessing the input data print("data:", kwargs['data'][0].shape, kwargs['data'][1].shape) # kwargs['params'] allows accessing additional parameter passed, # see below my_preproc_param, my_dataloader_input. print("kwargs['params']['my_preproc_param']={}".format(kwargs['params']['my_preproc_param']), "\n") # if the preprocessing function returns something, # the input data will be replaced with the data returned by this function. x = kwargs['data'][0] y = kwargs['data'][1] for i in range(x.shape[0]): x[i, :] += kwargs['params']['my_preproc_param'] return [x, y] def my_callback_function(**kwargs): print("\r{}".format(kwargs), end="") def my_loss_function(data, params): clf = SVC(**params) return -cross_val_score(estimator=clf, X=data[0], y=data[1], cv=3).mean() # We now create the BlackboxFunction object and pass all function pointers defined above, # as well as 2 dummy parameter (my_preproc_param, my_dataloader_input) for demonstration purposes. blackbox = BlackboxFunction(blackbox_func=my_loss_function, dataloader_func=my_dataloader_function, preprocess_func=my_preprocess_function, callback_func=my_callback_function, my_preproc_param=1, my_dataloader_input='could/be/a/path') # Get the solver solver = SolverPool.get(project=project) # Give the solver your blackbox solver.blackbox = blackbox # Run the solver solver.run() # Get your results df, best = solver.get_results() print("\n") print("*"*100) print("Best Parameter Set:\n{}".format(best)) print("*"*100) ``` #### The Parameter Space Domains Each hyperparameter needs a range and a domain specifier. The range, specified via 'data', is the left and right bound of an interval (!!!exception is the domain 'categorical', here 'data' is the actual list of data elements!!!) and the domain specifier the way this interval is sampled. Currently supported domains are: * uniform (samples the interval [a,b] evenly) * normal (a gaussian sampling of the interval [a,b] such that mu=a+(b-a)/2 and sigma=(b-a)/6) * loguniform (a logaritmic sampling of the iterval [a,b], such that the exponent e^x is sampled evenly x=[log(a),log(b)]) * categorical (in this case data is not interpreted as interval but as actual list of objects) One exception is the GridsearchSolver, here we need to specifiy an interval and a number of samples using a frequency specifier. The max_iterations parameter is obsolet in this case, because each axis specifies an individual number of samples via frequency. ```python # import the SolverPool class from hyppopy.solvers.GridsearchSolver import GridsearchSolver # Import the HyppopyProject class from hyppopy.HyppopyProject import HyppopyProject # Our function to optimize def my_loss_func(x, y): return x**2+y**2 # Creating a HyppopyProject instance project = HyppopyProject() project.add_hyperparameter(name="x", domain="uniform", data=[-1.1, 1], frequency=10, dtype="float") project.add_hyperparameter(name="y", domain="uniform", data=[-1.1, 1], frequency=12, dtype="float") solver = GridsearchSolver(project=project) # pass the loss function to the solver solver.blackbox = my_loss_func # run the solver solver.run() df, best = solver.get_results() print("\n") print("*"*100) print("Best Parameter Set:\n{}".format(best)) print("*"*100) ``` #### Using a Visdom Server to Visualize the Optimization Process We can simply create a realtime visualization using a visdom server. If installed, start your visdom server via console command: ``` >visdom ``` Go to your browser and open the site: http://localhost:8097 To enable the visualization call the function 'start_viewer' before running the solver: ``` #enable visualization solver.start_viewer() # Run the solver solver.run() ``` You can also change the port and the server name in start_viewer(port=8097, server="http://localhost") ## Acknowledgements: _This work is supported by the [Helmholtz Association Initiative and Networking](https://www.helmholtz.de/en/about_us/the_association/initiating_and_networking/) Fund under project number ZT-I-0003._
diff --git a/examples/tutorial_multisolver.py b/examples/tutorial_multisolver.py index 909a686..9c6c161 100644 --- a/examples/tutorial_multisolver.py +++ b/examples/tutorial_multisolver.py @@ -1,188 +1,188 @@ # DKFZ # # # Copyright (c) German Cancer Research Center, # Division of Medical Image Computing. # All rights reserved. # # This software is distributed WITHOUT ANY WARRANTY; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR # A PARTICULAR PURPOSE. # # See LICENSE # In this tutorial we solve an optimization problem using the Hyperopt Solver (http://hyperopt.github.io/hyperopt/). # Hyperopt uses a Baysian - Tree Parzen Estimator - Optimization approach, which means that each iteration computes a # new function value of the blackbox, interpolates a guess for the whole energy function and predicts a point to # compute the next function value at. This next point is not necessarily a "better" value, it's only the value with # the highest uncertainty for the function interpolation. # # See a visual explanation e.g. here (http://philipperemy.github.io/visualization/) # import the HyppopyProject class keeping track of inputs from hyppopy.HyppopyProject import HyppopyProject # import the SolverPool singleton class from hyppopy.SolverPool import SolverPool # import the Blackboxfunction class wrapping your problem for Hyppopy from hyppopy.BlackboxFunction import BlackboxFunction # Next step is defining the problem space and all settings Hyppopy needs to optimize your problem. # The config is a simple nested dictionary with two obligatory main sections, hyperparameter and settings. # The hyperparameter section defines your searchspace. Each hyperparameter is again a dictionary with: # # - a domain ['categorical', 'uniform', 'normal', 'loguniform'] # - the domain data [left bound, right bound] and # - a type of your domain ['str', 'int', 'float'] # # The settings section has two subcategories, solver and custom. The first contains settings for the solver, # here 'max_iterations' - is the maximum number of iteration. # # The custom section allows defining custom parameter. An entry here is transformed to a member variable of the # HyppopyProject class. These can be useful when implementing new solver classes or for control your hyppopy script. # Here we use it as a solver switch to control the usage of our solver via the config. This means with the script # below your can try out every solver by changing use_solver to 'optunity', 'randomsearch', 'gridsearch',... # It can be used like so: project.custom_use_plugin (see below) If using the gridsearch solver, max_iterations is # ignored, instead each hyperparameter must specifiy a number of samples additionally to the range like so: # 'data': [0, 1, 100] which means sampling the space from 0 to 1 in 100 intervals. config = { "hyperparameter": { "C": { "domain": "uniform", "data": [0.0001, 20], "type": "float" }, "gamma": { "domain": "uniform", "data": [0.0001, 20.0], "type": "float" }, "kernel": { "domain": "categorical", "data": ["linear", "sigmoid", "poly", "rbf"], "type": "str" }, "decision_function_shape": { "domain": "categorical", "data": ["ovo", "ovr"], "type": "str" } }, "settings": { "solver": { - "max_iterations": 500 + "max_iterations": 100 }, "custom": { - "use_solver": "randomsearch" + "use_solver": "quasirandomsearch" } }} # When creating a HyppopyProject instance we # pass the config dictionary to the constructor. project = HyppopyProject(config=config) # demonstration of the custom parameter access print("-"*30) print("max_iterations:\t{}".format(project.solver_max_iterations)) print("solver chosen -> {}".format(project.custom_use_solver)) print("-"*30) # The BlackboxFunction signature is as follows: # BlackboxFunction(blackbox_func=None, # dataloader_func=None, # preprocess_func=None, # callback_func=None, # data=None, # **kwargs) # # - blackbox_func: a function pointer to the users loss function # - dataloader_func: a function pointer for handling dataloading. The function is called once before # optimizing. What it returns is passed as first argument to your loss functions # data argument. # - preprocess_func: a function pointer for data preprocessing. The function is called once before # optimizing and gets via kwargs['data'] the raw data object set directly or returned # from dataloader_func. What this function returns is then what is passed as first # argument to your loss function. # - callback_func: a function pointer called after each iteration. The input kwargs is a dictionary # keeping the parameters used in this iteration, the 'iteration' index, the 'loss' # and the 'status'. The function in this example is used for realtime printing it's # input but can also be used for realtime visualization. # - data: if not done via dataloader_func one can set a raw_data object directly # - kwargs: dict that whose content is passed to all functions above. from sklearn.svm import SVC from sklearn.datasets import load_iris from sklearn.model_selection import cross_val_score def my_dataloader_function(**kwargs): print("Dataloading...") # kwargs['params'] allows accessing additional parameter passed, see below my_preproc_param, my_dataloader_input. print("my loading argument: {}".format(kwargs['params']['my_dataloader_input'])) iris_data = load_iris() return [iris_data.data, iris_data.target] def my_preprocess_function(**kwargs): print("Preprocessing...") # kwargs['data'] allows accessing the input data print("data:", kwargs['data'][0].shape, kwargs['data'][1].shape) # kwargs['params'] allows accessing additional parameter passed, see below my_preproc_param, my_dataloader_input. print("kwargs['params']['my_preproc_param']={}".format(kwargs['params']['my_preproc_param']), "\n") # if the preprocessing function returns something, # the input data will be replaced with the data returned by this function. x = kwargs['data'][0] y = kwargs['data'][1] for i in range(x.shape[0]): x[i, :] += kwargs['params']['my_preproc_param'] return [x, y] def my_callback_function(**kwargs): print("\r{}".format(kwargs), end="") def my_loss_function(data, params): clf = SVC(**params) return -cross_val_score(estimator=clf, X=data[0], y=data[1], cv=3).mean() # We now create the BlackboxFunction object and pass all function pointers defined above, # as well as 2 dummy parameter (my_preproc_param, my_dataloader_input) for demonstration purposes. blackbox = BlackboxFunction(blackbox_func=my_loss_function, dataloader_func=my_dataloader_function, preprocess_func=my_preprocess_function, callback_func=my_callback_function, my_preproc_param=1, my_dataloader_input='could/be/a/path') # Last step, is we use our SolverPool which automatically returns the correct solver. # There are multiple ways to get the desired solver from the solver pool. # 1. solver = SolverPool.get('hyperopt') # solver.project = project # 2. solver = SolverPool.get('hyperopt', project) # 3. The SolverPool will look for the field 'use_solver' in the project instance, if # it is present it will be used to specify the solver so that in this case it is enough # to pass the project instance. solver = SolverPool.get(project=project) # Give the solver your blackbox and run it. After execution we can get the result # via get_result() which returns a pandas dataframe containing the complete history # The dict best contains the best parameter set. solver.blackbox = blackbox #solver.start_viewer() solver.run() df, best = solver.get_results() print("\n") print("*"*100) print("Best Parameter Set:\n{}".format(best)) print("*"*100) diff --git a/hyppopy/SolverPool.py b/hyppopy/SolverPool.py index bfa5f86..36a56bc 100644 --- a/hyppopy/SolverPool.py +++ b/hyppopy/SolverPool.py @@ -1,79 +1,85 @@ # DKFZ # # # Copyright (c) German Cancer Research Center, # Division of Medical Image Computing. # All rights reserved. # # This software is distributed WITHOUT ANY WARRANTY; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR # A PARTICULAR PURPOSE. # # See LICENSE from .Singleton import * import os import logging from hyppopy.HyppopyProject import HyppopyProject from hyppopy.solvers.OptunaSolver import OptunaSolver from hyppopy.solvers.BayesOptSolver import BayesOptSolver from hyppopy.solvers.HyperoptSolver import HyperoptSolver from hyppopy.solvers.OptunitySolver import OptunitySolver from hyppopy.solvers.GridsearchSolver import GridsearchSolver from hyppopy.solvers.RandomsearchSolver import RandomsearchSolver +from hyppopy.solvers.QuasiRandomsearchSolver import QuasiRandomsearchSolver from hyppopy.globals import DEBUGLEVEL LOG = logging.getLogger(os.path.basename(__file__)) LOG.setLevel(DEBUGLEVEL) @singleton_object class SolverPool(metaclass=Singleton): def __init__(self): self._solver_list = ["hyperopt", "optunity", "bayesopt", "optuna", "randomsearch", + "quasirandomsearch", "gridsearch"] def get_solver_names(self): return self._solver_list def get(self, solver_name=None, project=None): if solver_name is not None: assert isinstance(solver_name, str), "precondition violation, solver_name type str expected, got {} instead!".format(type(solver_name)) if project is not None: assert isinstance(project, HyppopyProject), "precondition violation, project type HyppopyProject expected, got {} instead!".format(type(project)) if "custom_use_solver" in project.__dict__: solver_name = project.custom_use_solver if solver_name not in self._solver_list: raise AssertionError("Solver named [{}] not implemented!".format(solver_name)) if solver_name == "hyperopt": if project is not None: return HyperoptSolver(project) return HyperoptSolver() elif solver_name == "optunity": if project is not None: return OptunitySolver(project) return OptunitySolver() elif solver_name == "bayesopt": if project is not None: return BayesOptSolver(project) return BayesOptSolver() elif solver_name == "optuna": if project is not None: return OptunaSolver(project) return OptunaSolver() elif solver_name == "gridsearch": if project is not None: return GridsearchSolver(project) return GridsearchSolver() elif solver_name == "randomsearch": if project is not None: return RandomsearchSolver(project) return RandomsearchSolver() + elif solver_name == "quasirandomsearch": + if project is not None: + return QuasiRandomsearchSolver(project) + return QuasiRandomsearchSolver() diff --git a/hyppopy/solvers/QuasiRandomsearchSolver.py b/hyppopy/solvers/QuasiRandomsearchSolver.py new file mode 100644 index 0000000..1405abc --- /dev/null +++ b/hyppopy/solvers/QuasiRandomsearchSolver.py @@ -0,0 +1,213 @@ +# DKFZ +# +# +# Copyright (c) German Cancer Research Center, +# Division of Medical Image Computing. +# All rights reserved. +# +# This software is distributed WITHOUT ANY WARRANTY; without +# even the implied warranty of MERCHANTABILITY or FITNESS FOR +# A PARTICULAR PURPOSE. +# +# See LICENSE + +import os +import copy +import random +import logging +import warnings +import itertools +import numpy as np +from random import choice +from pprint import pformat +from hyppopy.globals import DEBUGLEVEL +from hyppopy.solvers.HyppopySolver import HyppopySolver + +LOG = logging.getLogger(os.path.basename(__file__)) +LOG.setLevel(DEBUGLEVEL) + + +def get_gaussian_ranges(a, b, N): + r = abs(b-a)/2 + if N % 2 == 0: + _N = int(N/2) + else: + _N = int((N-1)/2) + dr = r/_N + sigma = r/2.5 + mu = a + r + cuts = [] + csum = 0 + for n in range(_N): + x = a+r+n*dr + c = sigma*np.sqrt(2.0*np.pi)/(np.exp(-0.5*((x-mu)/sigma)**2)) + cuts.append(c) + cuts.insert(0, c) + csum += 2*c + for n in range(len(cuts)): + cuts[n] /= csum + cuts[n] *= abs(b-a) + ranges = [] + end = a + for n, c in enumerate(cuts): + start = end + end = start + c + ranges.append([start, end]) + return ranges + + +def get_loguniform_ranges(a, b, N): + aL = np.log(a) + bL = np.log(b) + exp_range = np.linspace(aL, bL, N+1) + ranges = [] + for i in range(N): + ranges.append([np.exp(exp_range[i]), np.exp(exp_range[i+1])]) + return ranges + + +class QuasiRandomSampleGenerator(object): + + def __init__(self, N_samples=None, border_frac=0.1): + self._grid = None + self._axis = None + self._numerical = [] + self._categorical = [] + self._N_samples = N_samples + self._border_frac = border_frac + + def set_axis(self, name, data, domain, dtype): + if domain == "categorical": + if dtype == "int": + data = [int(i) for i in data] + elif dtype == "str": + data = [str(i) for i in data] + elif dtype == "float" or dtype == "double": + data = [float(i) for i in data] + self._categorical.append({"name": name, "data": data, "type": dtype}) + else: + self._numerical.append({"name": name, "data": data, "type": dtype, "domain": domain}) + + def build_grid(self, N_samples=None): + self._axis = [] + if N_samples is None: + assert isinstance(self._N_samples, int), "Precondition violation, no number of samples specified!" + else: + self._N_samples = N_samples + + if len(self._numerical) > 0: + axis_steps = int(round(self._N_samples**(1.0/len(self._numerical)))) + self._N_samples = int(axis_steps**(len(self._numerical))) + + for axis in self._numerical: + self._axis.append(None) + n = len(self._axis)-1 + boxes = None + if axis["domain"] == "uniform": + boxes = self.add_uniform_axis(n, axis_steps) + elif axis["domain"] == "normal": + boxes = self.add_normal_axis(n, axis_steps) + elif axis["domain"] == "loguniform": + boxes = self.add_loguniform_axis(n, axis_steps) + + assert isinstance(boxes, list), "failed to compute axis ranges!" + for k in range(len(boxes)): + dx = abs(boxes[k][1] - boxes[k][0]) + boxes[k][0] += self._border_frac * dx + boxes[k][1] -= self._border_frac * dx + self._axis[n] = boxes + self._grid = list(itertools.product(*self._axis)) + else: + warnings.warn("No numerical axis defined, this warning can be ignored if searchspace is categorical only, otherwise check if axis was set!") + + def add_uniform_axis(self, n, axis_steps): + drange = self._numerical[n]["data"] + width = abs(drange[1]-drange[0]) + dx = width / axis_steps + boxes = [] + for k in range(1, axis_steps+1): + bl = drange[0] + (k-1)*dx + br = drange[0] + k*dx + boxes.append([bl, br]) + return boxes + + def add_normal_axis(self, n, axis_steps): + drange = self._numerical[n]["data"] + boxes = get_gaussian_ranges(drange[0], drange[1], axis_steps) + for k in range(len(boxes)): + dx = abs(boxes[k][1] - boxes[k][0]) + boxes[k][0] += self._border_frac * dx + boxes[k][1] -= self._border_frac * dx + return boxes + + def add_loguniform_axis(self, n, axis_steps): + drange = self._numerical[n]["data"] + boxes = get_loguniform_ranges(drange[0], drange[1], axis_steps) + for k in range(len(boxes)): + dx = abs(boxes[k][1] - boxes[k][0]) + boxes[k][0] += self._border_frac * dx + boxes[k][1] -= self._border_frac * dx + return boxes + + def next(self): + if self._grid is None: + self.build_grid() + if len(self._grid) == 0: + return None + next_index = np.random.randint(0, len(self._grid), 1)[0] + next_range = self._grid.pop(next_index) + pset = {} + for n, rng in enumerate(next_range): + name = self._numerical[n]["name"] + rnd = np.random.random() + param = rng[0] + rnd*abs(rng[1]-rng[0]) + if self._numerical[n]["type"] == "int": + param = int(np.floor(param)) + pset[name] = param + for cat in self._categorical: + pset[cat["name"]] = choice(cat["data"]) + return pset + + +class QuasiRandomsearchSolver(HyppopySolver): + """ + The QuasiRandomsearchSolver class implements a quasi randomsearch optimization. The quasi randomsearch supports + categorical, uniform, normal and loguniform sampling. The solver defines a grid which size and appearance depends + on the max_iterations parameter and the domain. The at each grid box a random value is drawn. This ensures both, + random parameter samples with the cosntraint that the space is evenly sampled and cluster building prevention.""" + def __init__(self, project=None): + HyppopySolver.__init__(self, project) + self._sampler = None + + def loss_function_call(self, params): + loss = self.blackbox(**params) + if loss is None: + return np.nan + return loss + + def execute_solver(self, searchspace): + N = self.max_iterations + self._sampler = QuasiRandomSampleGenerator(N) + for name, axis in searchspace.items(): + self._sampler.set_axis(name, axis["data"], axis["domain"], axis["type"]) + try: + for n in range(N): + params = self._sampler.next() + if params is None: + break + self.loss_function(**params) + except Exception as e: + msg = "internal error in randomsearch execute_solver occured. {}".format(e) + LOG.error(msg) + raise BrokenPipeError(msg) + self.best = self._trials.argmin + + def convert_searchspace(self, hyperparameter): + """ + this function simply pipes the input parameter through, the sample + drawing functions are responsible for interpreting the parameter. + :param hyperparameter: [dict] hyperparameter space + :return: [dict] hyperparameter space + """ + LOG.debug("convert input parameter\n\n\t{}\n".format(pformat(hyperparameter))) + return hyperparameter diff --git a/hyppopy/tests/test_quasirandomsearchsolver.py b/hyppopy/tests/test_quasirandomsearchsolver.py new file mode 100644 index 0000000..54b2c26 --- /dev/null +++ b/hyppopy/tests/test_quasirandomsearchsolver.py @@ -0,0 +1,160 @@ +# DKFZ +# +# +# Copyright (c) German Cancer Research Center, +# Division of Medical Image Computing. +# All rights reserved. +# +# This software is distributed WITHOUT ANY WARRANTY; without +# even the implied warranty of MERCHANTABILITY or FITNESS FOR +# A PARTICULAR PURPOSE. +# +# See LICENSE + +import unittest +import matplotlib.pylab as plt + +from hyppopy.solvers.QuasiRandomsearchSolver import * +from hyppopy.VirtualFunction import VirtualFunction +from hyppopy.HyppopyProject import HyppopyProject + + +class QuasiRandomsearchTestSuite(unittest.TestCase): + + def setUp(self): + pass + + def test_get_gaussian_ranges(self): + interval = [0, 10] + N = 10 + ranges = get_gaussian_ranges(interval[0], interval[1], N) + gt = [[0, 1.97368411013644], + [1.97368411013644, 3.1010703630207566], + [3.1010703630207566, 3.856779967954119], + [3.856779967954119, 4.4512421980703], + [4.4512421980703, 5.000000000000001], + [5.000000000000001, 5.5487578019297015], + [5.5487578019297015, 6.143220032045882], + [6.143220032045882, 6.898929636979244], + [6.898929636979244, 8.026315889863561], + [8.026315889863561, 10.0]] + for a, b in zip(ranges, gt): + self.assertAlmostEqual(a[0], b[0]) + self.assertAlmostEqual(a[1], b[1]) + + interval = [-100, 100] + N = 10 + ranges = get_gaussian_ranges(interval[0], interval[1], N) + gt = [[-100, -60.526317797271204], + [-60.526317797271204, -37.97859273958487], + [-37.97859273958487, -22.864400640917623], + [-22.864400640917623, -10.975156038594006], + [-10.975156038594006, 0.0], + [0.0, 10.975156038594006], + [10.975156038594006, 22.864400640917623], + [22.864400640917623, 37.97859273958487], + [37.97859273958487, 60.526317797271204], + [60.526317797271204, 100.0]] + for a, b in zip(ranges, gt): + self.assertAlmostEqual(a[0], b[0]) + self.assertAlmostEqual(a[1], b[1]) + + def test_get_loguniform_ranges(self): + interval = [1, 1000] + N = 10 + ranges = get_loguniform_ranges(interval[0], interval[1], N) + gt = [[1.0, 1.9952623149688797], + [1.9952623149688797, 3.9810717055349727], + [3.9810717055349727, 7.943282347242818], + [7.943282347242818, 15.848931924611136], + [15.848931924611136, 31.62277660168379], + [31.62277660168379, 63.095734448019364], + [63.095734448019364, 125.89254117941677], + [125.89254117941677, 251.18864315095806], + [251.18864315095806, 501.1872336272723], + [501.1872336272723, 999.9999999999998]] + for a, b in zip(ranges, gt): + self.assertAlmostEqual(a[0], b[0]) + self.assertAlmostEqual(a[1], b[1]) + + interval = [1, 10000] + N = 50 + ranges = get_loguniform_ranges(interval[0], interval[1], N) + gt = [[1.0, 1.202264434617413], + [1.202264434617413, 1.4454397707459274], + [1.4454397707459274, 1.7378008287493756], + [1.7378008287493756, 2.0892961308540396], + [2.0892961308540396, 2.51188643150958], + [2.51188643150958, 3.0199517204020165], + [3.0199517204020165, 3.6307805477010135], + [3.6307805477010135, 4.36515832240166], + [4.36515832240166, 5.248074602497727], + [5.248074602497727, 6.309573444801933], + [6.309573444801933, 7.5857757502918375], + [7.5857757502918375, 9.120108393559098], + [9.120108393559098, 10.964781961431854], + [10.964781961431854, 13.18256738556407], + [13.18256738556407, 15.848931924611136], + [15.848931924611136, 19.054607179632477], + [19.054607179632477, 22.908676527677738], + [22.908676527677738, 27.542287033381676], + [27.542287033381676, 33.11311214825911], + [33.11311214825911, 39.810717055349734], + [39.810717055349734, 47.863009232263856], + [47.863009232263856, 57.543993733715695], + [57.543993733715695, 69.18309709189366], + [69.18309709189366, 83.17637711026713], + [83.17637711026713, 100.00000000000004], + [100.00000000000004, 120.22644346174135], + [120.22644346174135, 144.54397707459285], + [144.54397707459285, 173.78008287493753], + [173.78008287493753, 208.92961308540396], + [208.92961308540396, 251.18864315095806], + [251.18864315095806, 301.9951720402017], + [301.9951720402017, 363.0780547701015], + [363.0780547701015, 436.5158322401662], + [436.5158322401662, 524.8074602497729], + [524.8074602497729, 630.9573444801938], + [630.9573444801938, 758.5775750291845], + [758.5775750291845, 912.0108393559099], + [912.0108393559099, 1096.4781961431854], + [1096.4781961431854, 1318.2567385564075], + [1318.2567385564075, 1584.8931924611143], + [1584.8931924611143, 1905.4607179632485], + [1905.4607179632485, 2290.867652767775], + [2290.867652767775, 2754.228703338169], + [2754.228703338169, 3311.3112148259115], + [3311.3112148259115, 3981.071705534977], + [3981.071705534977, 4786.300923226385], + [4786.300923226385, 5754.399373371577], + [5754.399373371577, 6918.309709189369], + [6918.309709189369, 8317.63771102671], + [8317.63771102671, 10000.00000000001]] + for a, b in zip(ranges, gt): + self.assertAlmostEqual(a[0], b[0]) + self.assertAlmostEqual(a[1], b[1]) + + def test_QuasiRandomSampleGenerator(self): + N_samples = 10*10*10 + axis_data = {"p1": {"domain": "loguniform", "data": [1, 10000], "type": "float"}, + "p2": {"domain": "normal", "data": [-5, 5], "type": "float"}, + "p3": {"domain": "uniform", "data": [0, 10], "type": "float"}, + "p4": {"domain": "categorical", "data": [False, True], "type": "bool"}} + sampler = QuasiRandomSampleGenerator(N_samples, 0.1) + for name, axis in axis_data.items(): + sampler.set_axis(name, axis["data"], axis["domain"], axis["type"]) + + for i in range(N_samples): + sample = sampler.next() + self.assertTrue(len(sample.keys()) == 4) + for k in range(4): + self.assertTrue("p{}".format(k+1) in sample.keys()) + self.assertTrue(1 <= sample["p1"] <= 10000) + self.assertTrue(-5 <= sample["p2"] <= 5) + self.assertTrue(0 <= sample["p3"] <= 10) + self.assertTrue(isinstance(sample["p4"], bool)) + self.assertTrue(sampler.next() is None) + + +if __name__ == '__main__': + unittest.main()