diff --git a/CHANGELOG.md b/CHANGELOG.md index 6678101..d245baa 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,7 +1,16 @@ +Release 0.5.0.0 + +- settings structure changed, additional settings now can be addded as additional entries in the config dict or using the methods add_setting or set_settings +- sections solver and custom in config dict are removed completely +- use_solver setting in config dict is renamed to solver +- hyperparameter type now a native type, not a string anymore +- automatic consistency check between config and solver conditions, each solver defines now it's interface which is checked when executing the solver throwing exceptions if the project instance and the solvers interface doesn't work together +- bayesOpt solver removed, extremely slow and not very good + Release 0.4.2.0 New feature QuasiRandomSolver added. The QuasiRandomSolver provides a randomized gridsampling. This means that depending on max_iterations a grid over all numerical parameter is spanned and each cell is populated with a random value within the the cell bounds for numerical and a random draw for each categorical parameter. This ensures a random sampling of the parameter space and a good space coverage without random cluster building. The solver also supports normal and loguniform sampling. \ No newline at end of file diff --git a/README.md b/README.md index 81c83f5..a5f1b8d 100644 --- a/README.md +++ b/README.md @@ -1,408 +1,379 @@                 ![docs_title_logo](./resources/docs_title_logo.png) # A Hyper-Parameter Optimization Toolbox
## What is Hyppopy? Hyppopy is a python toolbox for blackbox optimization. It's purpose is to offer a unified and easy to use interface to a collection of solver libraries. Currently provided solvers are: * [Hyperopt](http://hyperopt.github.io/hyperopt/) * [Optunity](https://optunity.readthedocs.io/en/latest/user/index.html) * [Optuna](https://optuna.org/) -* [BayesianOptimization](https://github.com/fmfn/BayesianOptimization) -* Quasi-Randomsearch Solver +* Quasi-Randomsearch Solver * Randomsearch Solver * Gridsearch Solver ## Installation 1. clone the [Hyppopy](http:\\github.com) project from Github 2. (create a virtual environment), open a console (with your activated virtual env) and go to the hyppopy root folder 3. ```$ pip install -r requirements.txt``` 4. ```$ python setup.py install``` (for normal usage) or ```$ python setup.py develop``` (if you want to join the hyppopy development *hooray*) ## How to use Hyppopy? #### The Hyperparamaterspace Hyppopy defines a common hyperparameterspace description, whatever solver is used. A hyperparameter description includes the following fields: * domain: the domain defines how the solver samples the parameter space, options are: * uniform: samples the data range [a,b] evenly, whereas b>a * normal: samples the data range [a,b] using a normal distribution with mu=a+(b-a)/2, sigma=(b-a)/6, whereas b>a * loguniform: samples the data range [a,b] logarithmic using e^x by sampling the exponent range x=[log(a), log(b)] uniformly, whereas a>0 and b>a * categorical: is used to define a data list * data: in case of categorical domain data is a list, all other domains expect a range [a, b] * type: the parameter data type as string 'int', 'float' or 'str' An exeption must be kept in mind when using the GridsearchSolver. The gridsearch additionally needs a number of samples per domain, which must be set using the field: frequency. #### The HyppopyProject class The HyppopyProject class takes care all settings necessary for the solver and your workflow. To setup a HyppopyProject instance we can use a nested dictionary or the classes memberfunctions respectively. ```python # Import the HyppopyProject class from hyppopy.HyppopyProject import HyppopyProject # Create a nested dict with a section hyperparameter. We define a 2 dimensional # hyperparameter space with a numerical dimension named myNumber of type float and # a uniform sampling. The second dimension is a categorical parameter of type string. config = { "hyperparameter": { "myNumber": { "domain": "uniform", "data": [0, 100], - "type": "float" + "type": float }, "myOption": { "domain": "categorical", "data": ["a", "b", "c"], - "type": "str" + "type": str } }} # Create a HyppopyProject instance and pass the config dict to # the constructor. Alternatively one can use set_config method. project = HyppopyProject(config=config) -# To demonstrate the second option we clear the project -project.clear() -# and add the parameter again using the member function add_hyperparameter -project.add_hyperparameter(name="myNumber", domain="uniform", data=[0, 100], dtype="float") -project.add_hyperparameter(name="myOption", domain="categorical", data=["a", "b", "c"], dtype="str") +# We can also add hyperparameter using the add_hyperparameter method +project = HyppopyProject() +project.add_hyperparameter(name="myNumber", domain="uniform", data=[0, 100], dtype=float) +project.add_hyperparameter(name="myOption", domain="categorical", data=["a", "b", "c"], dtype=str) ``` +Additional settings for the solver or custom parameters can be set either as additional entries in the config dict, or via the methods set_settings or add_setting: ```python from hyppopy.HyppopyProject import HyppopyProject -# We might have seen a warning: 'UserWarning: config dict had no -# section settings/solver/max_iterations, set default value: 500' -# when executing the example above. This is due to the fact that -# most solvers need a value for a maximum number of iterations. -# To take care of solver settings (there might be more in the future) -# one can set a second section called settings. The settings section -# again is splitted into a subsection 'solver' and a subsection 'custom'. -# When adding max_iterations to the section settings/solver we can change -# the number of iterations the solver is doing. All solver except of the -# GridsearchSolver make use of the value max_iterations. -# The usage of the custom section is demonstrated later. config = { "hyperparameter": { "myNumber": { "domain": "uniform", "data": [0, 100], - "type": "float" + "type": float }, "myOption": { "domain": "categorical", "data": ["a", "b", "c"], - "type": "str" + "type": str } }, -"settings": { - "solver": { - "max_iterations": 500 - }, - "custom": {} -}} +"max_iterations": 500, +"anything_you_want": 42 +} project = HyppopyProject(config=config) -``` - -The settings added are automatically converted to a class member with a prefix_ where prefix is the name of the subsection. One can make use of this feature to build custom workflows by adding params to the custom section. More interesting is this feature when developing your own solver. +print("max_iterations:", project.max_iterations) +print("anything_you_want:", project.anything_you_want) -```python -from hyppopy.HyppopyProject import HyppopyProject +#alternatively +project = HyppopyProject() +project.set_settings(max_iterations=500, anything_you_want=42) +print("anything_you_want:", project.anything_you_want) -# Creating a HyppopyProject instance +#alternatively project = HyppopyProject() -project.add_hyperparameter(name="x", domain="uniform", data=[-10, 10], dtype="float") -project.add_hyperparameter(name="y", domain="uniform", data=[-10, 10], dtype="float") -project.add_settings(section="solver", name="max_iterations", value=300) -project.add_settings(section="custom", name="my_param1", value=True) -project.add_settings(section="custom", name="my_param2", value=42) - -print("What is max_iterations value? {}".format(project.solver_max_iterations)) -if project.custom_my_param1: - print("What is the answer? {}".format(project.custom_my_param2)) -else: - print("What is the answer? x") +project.add_setting(name="max_iterations", value=500) +project.add_setting(name="anything_you_want", value=42) +print("anything_you_want:", project.anything_you_want) ``` + #### The HyppopySolver classes Each solver is a child of the HyppopySolver class. This is only interesting if you're planning to write a new solver, we will discuss this in the section Solver Development. All solvers we can use to optimize our blackbox function are part of the module 'hyppopy.solver'. Below is a list of all solvers available along with their access key in squared brackets. * HyperoptSolver [hyperopt] - _Bayes Optimization use Tree-Parzen Estimator, supports uniform, normal, loguniform and categorical parameter_ -* OptunitySolver [optunity] - +* OptunitySolver [optunity] _Particle Swarm Optimizer, supports uniform and categorical parameter_ -* OptunaSolver [optuna] - +* OptunaSolver [optuna] _Bayes Optimization, supports uniform, and categorical parameter_ -* BayesOptSolver [bayesopt] - - _Bayes Optimization, supports uniform, and categorical parameter_ -* RandomsearchSolver [randomsearch] - +* RandomsearchSolver [randomsearch] _Naive randomized parameter search, supports uniform, normal, loguniform and categorical parameter_ -* QuasiRandomsearchSolver [quasirandomsearch] - +* QuasiRandomsearchSolver [quasirandomsearch] _Randomized grid ensuring random sample drawing and a good space coverage, supports uniform, normal, loguniform and categorical parameter_ -* GridsearchSolver [gridsearch] - +* GridsearchSolver [gridsearch] _Standard gridsearch, supports uniform, normal, loguniform and categorical parameter_ -There are two options to get a solver, we can import directly from the hyppopy.solver package or we use the SolverPool class. We look into both options by optimizing a simple function, starting with the direct import case. +There are two options to get a solver, we can import directly from the hyppopy.solvers package or we use the SolverPool class. We look into both options by optimizing a simple function, starting with the direct import case. ```python # Import the HyppopyProject class from hyppopy.HyppopyProject import HyppopyProject # Import the HyperoptSolver class, in this case wh use Hyperopt from hyppopy.solvers.HyperoptSolver import HyperoptSolver # Our function to optimize def my_loss_func(x, y): return x**2+y**2 # Creating a HyppopyProject instance project = HyppopyProject() -project.add_hyperparameter(name="x", domain="uniform", data=[-10, 10], dtype="float") -project.add_hyperparameter(name="y", domain="uniform", data=[-10, 10], dtype="float") -project.add_settings(section="solver", name="max_iterations", value=300) +project.add_hyperparameter(name="x", domain="uniform", data=[-10, 10], type=float) +project.add_hyperparameter(name="y", domain="uniform", data=[-10, 10], type=float) +project.add_setting(name="max_iterations", value=300) # create a solver instance solver = HyperoptSolver(project) # pass the loss function to the solver solver.blackbox = my_loss_func # run the solver solver.run() df, best = solver.get_results() print("\n") print("*"*100) print("Best Parameter Set:\n{}".format(best)) print("*"*100) ``` -The SolverPool is a class keeping track of all solver classes. We have several options to ask the SolverPool for the desired solver. We can add an option called use_solver to our settings/custom section or to the project instance respectively, or we can use the solver access key (see solver listing above) to ask for the solver directly. +The SolverPool is a class keeping track of all solver classes. We have several options to ask the SolverPool for the desired solver. We can add a setting called solver to our config or to the project instance respectively, or we can use the solver access key (see solver listing above) to ask for the solver directly. ```python # import the SolverPool class from hyppopy.SolverPool import SolverPool # Import the HyppopyProject class from hyppopy.HyppopyProject import HyppopyProject # Our function to optimize def my_loss_func(x, y): return x**2+y**2 # Creating a HyppopyProject instance project = HyppopyProject() -project.add_hyperparameter(name="x", domain="uniform", data=[-10, 10], dtype="float") -project.add_hyperparameter(name="y", domain="uniform", data=[-10, 10], dtype="float") -project.add_settings(section="solver", name="max_iterations", value=300) -project.add_settings(section="custom", name="use_solver", value="hyperopt") +project.add_hyperparameter(name="x", domain="uniform", data=[-10, 10], type=float) +project.add_hyperparameter(name="y", domain="uniform", data=[-10, 10], type=float) +project.set_settings(max_iterations=300, solver="hyperopt") # create a solver instance. The SolverPool class is a singleton # and can be used without instanciating. It looks in the project # instance for the use_solver option and returns the correct solver. solver = SolverPool.get(project=project) -# Another option without the usage of the use_solver field would be: +# Another option without the usage of the solver field would be: # solver = SolverPool.get(solver_name='hyperopt', project=project) # pass the loss function to the solver solver.blackbox = my_loss_func # run the solver solver.run() df, best = solver.get_results() print("\n") print("*"*100) print("Best Parameter Set:\n{}".format(best)) print("*"*100) ``` #### The BlackboxFunction class -To extend the possibilities beyond using parameter only loss function as in the examples above, the BlackboxFunction class can be used. This class is a wrapper class around the actual loss_function providing a more advanced access to data handling and a callback_function for accessing the solvers iteration loop. +To extend the possibilities beyond using parameter only loss functions as in the examples above, we can use the BlackboxFunction class. This class is a wrapper class around the actual loss_function providing a more advanced access interface to data handling and a callback_function for accessing the solvers iteration loop. ```python # import the HyppopyProject class keeping track of inputs from hyppopy.HyppopyProject import HyppopyProject # import the SolverPool singleton class from hyppopy.SolverPool import SolverPool # import the Blackboxfunction class wrapping your problem for Hyppopy from hyppopy.BlackboxFunction import BlackboxFunction # Create the HyppopyProject class instance project = HyppopyProject() -project.add_hyperparameter(name="C", domain="uniform", data=[0.0001, 20], dtype="float") -project.add_hyperparameter(name="gamma", domain="uniform", data=[0.0001, 20], dtype="float") -project.add_hyperparameter(name="kernel", domain="categorical", data=["linear", "sigmoid", "poly", "rbf"], dtype="str") -project.add_settings(section="solver", name="max_iterations", value=500) -project.add_settings(section="custom", name="use_solver", value="optunity") +project.add_hyperparameter(name="C", domain="uniform", data=[0.0001, 20], type=float) +project.add_hyperparameter(name="gamma", domain="uniform", data=[0.0001, 20], type=float) +project.add_hyperparameter(name="kernel", domain="categorical", data=["linear", "sigmoid", "poly", "rbf"], type=str) +project.add_setting(name="max_iterations", value=500) +project.add_setting(name="solver", value="optunity") # The BlackboxFunction signature is as follows: # BlackboxFunction(blackbox_func=None, # dataloader_func=None, # preprocess_func=None, # callback_func=None, # data=None, # **kwargs) # # - blackbox_func: a function pointer to the users loss function # - dataloader_func: a function pointer for handling dataloading. The function is called once before # optimizing. What it returns is passed as first argument to your loss functions # data argument. # - preprocess_func: a function pointer for data preprocessing. The function is called once before # optimizing and gets via kwargs['data'] the raw data object set directly or returned # from dataloader_func. What this function returns is then what is passed as first # argument to your loss function. # - callback_func: a function pointer called after each iteration. The input kwargs is a dictionary # keeping the parameters used in this iteration, the 'iteration' index, the 'loss' # and the 'status'. The function in this example is used for realtime printing it's # input but can also be used for realtime visualization. # - data: if not done via dataloader_func one can set a raw_data object directly # - kwargs: dict that whose content is passed to all functions above. from sklearn.svm import SVC from sklearn.datasets import load_iris from sklearn.model_selection import cross_val_score def my_dataloader_function(**kwargs): print("Dataloading...") - # kwargs['params'] allows accessing additional parameter passed, + # kwargs['params'] allows accessing additional parameter passed, # see below my_preproc_param, my_dataloader_input. print("my loading argument: {}".format(kwargs['params']['my_dataloader_input'])) iris_data = load_iris() return [iris_data.data, iris_data.target] def my_preprocess_function(**kwargs): print("Preprocessing...") # kwargs['data'] allows accessing the input data print("data:", kwargs['data'][0].shape, kwargs['data'][1].shape) # kwargs['params'] allows accessing additional parameter passed, # see below my_preproc_param, my_dataloader_input. print("kwargs['params']['my_preproc_param']={}".format(kwargs['params']['my_preproc_param']), "\n") # if the preprocessing function returns something, # the input data will be replaced with the data returned by this function. x = kwargs['data'][0] y = kwargs['data'][1] for i in range(x.shape[0]): x[i, :] += kwargs['params']['my_preproc_param'] return [x, y] def my_callback_function(**kwargs): print("\r{}".format(kwargs), end="") def my_loss_function(data, params): clf = SVC(**params) return -cross_val_score(estimator=clf, X=data[0], y=data[1], cv=3).mean() # We now create the BlackboxFunction object and pass all function pointers defined above, # as well as 2 dummy parameter (my_preproc_param, my_dataloader_input) for demonstration purposes. blackbox = BlackboxFunction(blackbox_func=my_loss_function, dataloader_func=my_dataloader_function, preprocess_func=my_preprocess_function, callback_func=my_callback_function, my_preproc_param=1, my_dataloader_input='could/be/a/path') # Get the solver solver = SolverPool.get(project=project) # Give the solver your blackbox solver.blackbox = blackbox # Run the solver solver.run() # Get your results df, best = solver.get_results() print("\n") print("*"*100) print("Best Parameter Set:\n{}".format(best)) print("*"*100) ``` #### The Parameter Space Domains -Each hyperparameter needs a range and a domain specifier. The range, specified via 'data', is the left and right bound of an interval (!!!exception is the domain 'categorical', here 'data' is the actual list of data elements!!!) and the domain specifier the way this interval is sampled. Currently supported domains are: +Each hyperparameter needs a range and a domain specifier. The range, specified via 'data', is the left and right bound of an interval (exception is the domain 'categorical', here 'data' is the actual list of data elements) and the domain specifier the way this interval is sampled. Currently supported domains are: * uniform (samples the interval [a,b] evenly) -* normal (a gaussian sampling of the interval [a,b] such that mu=a+(b-a)/2 and sigma=(b-a)/6) -* loguniform (a logaritmic sampling of the iterval [a,b], such that the exponent e^x is sampled evenly x=[log(a),log(b)]) +* normal* (a gaussian sampling of the interval [a,b] such that mu=a+(b-a)/2 and sigma=(b-a)/6) +* loguniform* (a logaritmic sampling of the iterval [a,b], such that the exponent e^x is sampled evenly x=[log(a),log(b)]) * categorical (in this case data is not interpreted as interval but as actual list of objects) -One exception is the GridsearchSolver, here we need to specifiy an interval and a number of samples using a frequency specifier. The max_iterations parameter is obsolet in this case, because each axis specifies an individual number of samples via frequency. +*Not all domains are supported by all solvers, this might be fixed in the future, but until, the solver throws an error telling you that the domain is unknown. + +When using the GridsearchSolver we need to specifiy an interval and a number of samples using a frequency specifier. The max_iterations parameter is obsolet in this case, because each axis specifies an individual number of samples via frequency. ```python # import the SolverPool class from hyppopy.solvers.GridsearchSolver import GridsearchSolver # Import the HyppopyProject class from hyppopy.HyppopyProject import HyppopyProject # Our function to optimize def my_loss_func(x, y): return x**2+y**2 # Creating a HyppopyProject instance project = HyppopyProject() -project.add_hyperparameter(name="x", domain="uniform", data=[-1.1, 1], frequency=10, dtype="float") -project.add_hyperparameter(name="y", domain="uniform", data=[-1.1, 1], frequency=12, dtype="float") +project.add_hyperparameter(name="x", domain="uniform", data=[-1.1, 1], frequency=10, type=float) +project.add_hyperparameter(name="y", domain="uniform", data=[-1.1, 1], frequency=12, type=float) solver = GridsearchSolver(project=project) # pass the loss function to the solver solver.blackbox = my_loss_func # run the solver solver.run() df, best = solver.get_results() print("\n") print("*"*100) print("Best Parameter Set:\n{}".format(best)) print("*"*100) ``` #### Using a Visdom Server to Visualize the Optimization Process We can simply create a realtime visualization using a visdom server. If installed, start your visdom server via console command: ``` >visdom ``` Go to your browser and open the site: http://localhost:8097 To enable the visualization call the function 'start_viewer' before running the solver: ``` #enable visualization solver.start_viewer() # Run the solver solver.run() ``` You can also change the port and the server name in start_viewer(port=8097, server="http://localhost") ## Acknowledgements: _This work is supported by the [Helmholtz Association Initiative and Networking](https://www.helmholtz.de/en/about_us/the_association/initiating_and_networking/) Fund under project number ZT-I-0003._
diff --git a/examples/solver_comparison.py b/examples/solver_comparison.py index 66b2785..e9eaf5b 100644 --- a/examples/solver_comparison.py +++ b/examples/solver_comparison.py @@ -1,192 +1,311 @@ # DKFZ # # # Copyright (c) German Cancer Research Center, # Division of Medical Image Computing. # All rights reserved. # # This software is distributed WITHOUT ANY WARRANTY; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR # A PARTICULAR PURPOSE. # # See LICENSE import os +import sys +import time import pickle import numpy as np from math import pi -from pprint import pprint import matplotlib.pyplot as plt from hyppopy.SolverPool import SolverPool from hyppopy.HyppopyProject import HyppopyProject from hyppopy.VirtualFunction import VirtualFunction from hyppopy.BlackboxFunction import BlackboxFunction OUTPUTDIR = "C:\\Users\\s635r\\Desktop\\solver_comparison" -SOLVER = ["hyperopt", "optunity", "randomsearch", "optuna"]#, "bayesopt"] -ITERATIONS = [25, 100, 250, 500] -STATREPEATS = 10 -VFUNC = "5D3" +SOLVER = ["hyperopt", "optunity", "randomsearch", "optuna", "quasirandomsearch"] +ITERATIONS = [50, 100, 250, 500] +STATREPEATS = 50 OVERWRITE = False -OUTPUTDIR = os.path.join(OUTPUTDIR, VFUNC) -if not os.path.isdir(OUTPUTDIR): - os.makedirs(OUTPUTDIR) def compute_deviation(solver_name, vfunc_id, iterations, N, fname): project = HyppopyProject() - project.add_hyperparameter(name="axis_00", domain="uniform", data=[0, 1], dtype="float") - project.add_hyperparameter(name="axis_01", domain="uniform", data=[0, 1], dtype="float") - project.add_hyperparameter(name="axis_02", domain="uniform", data=[0, 1], dtype="float") - project.add_hyperparameter(name="axis_03", domain="uniform", data=[0, 1], dtype="float") - project.add_hyperparameter(name="axis_04", domain="uniform", data=[0, 1], dtype="float") + project.add_hyperparameter(name="axis_00", domain="uniform", data=[0, 1], type=float) + project.add_hyperparameter(name="axis_01", domain="uniform", data=[0, 1], type=float) + project.add_hyperparameter(name="axis_02", domain="uniform", data=[0, 1], type=float) + project.add_hyperparameter(name="axis_03", domain="uniform", data=[0, 1], type=float) + project.add_hyperparameter(name="axis_04", domain="uniform", data=[0, 1], type=float) vfunc = VirtualFunction() vfunc.load_default(vfunc_id) minima = vfunc.minima() def my_loss_function(data, params): return vfunc(**params) blackbox = BlackboxFunction(data=[], blackbox_func=my_loss_function) + results = {} results["gt"] = [] for mini in minima: results["gt"].append(np.median(mini[0])) for iter in iterations: - results[iter] = {"minima": {}, "loss": None} + results[iter] = {"minima": {}, + "distance": {}, + "duration": None, + "set_difference": None, + "loss": None, + "loss_history": {}} for i in range(vfunc.dims()): results[iter]["minima"]["axis_0{}".format(i)] = [] + results[iter]["distance"]["axis_0{}".format(i)] = [] - project.add_settings(section="solver", name="max_iterations", value=iter) - project.add_settings(section="custom", name="use_solver", value=solver_name) + project.add_setting("max_iterations", iter) + project.add_setting("solver", solver_name) solver = SolverPool.get(project=project) solver.blackbox = blackbox axis_minima = [] best_losses = [] + best_sets_diff = [] for i in range(vfunc.dims()): axis_minima.append([]) + + loss_history = [] + durations = [] for n in range(N): print("\rSolver={} iteration={} round={}".format(solver, iter, n), end="") + start = time.time() solver.run(print_stats=False) + end = time.time() + durations.append(end-start) + df, best = solver.get_results() + + loss_history.append(np.flip(np.sort(df['losses'].values))) best_row = df['losses'].idxmin() best_losses.append(df['losses'][best_row]) + best_sets_diff.append(abs(df['axis_00'][best_row] - best['axis_00'])+ + abs(df['axis_01'][best_row] - best['axis_01'])+ + abs(df['axis_02'][best_row] - best['axis_02'])+ + abs(df['axis_03'][best_row] - best['axis_03'])+ + abs(df['axis_04'][best_row] - best['axis_04'])) for i in range(vfunc.dims()): tmp = df['axis_0{}'.format(i)][best_row] axis_minima[i].append(tmp) + + results[iter]["loss_history"] = loss_history for i in range(vfunc.dims()): results[iter]["minima"]["axis_0{}".format(i)] = [np.mean(axis_minima[i]), np.std(axis_minima[i])] + dist = np.sqrt((axis_minima[i]-results["gt"][i])**2) + results[iter]["distance"]["axis_0{}".format(i)] = [np.mean(dist), np.std(dist)] results[iter]["loss"] = [np.mean(best_losses), np.std(best_losses)] + results[iter]["set_difference"] = sum(best_sets_diff) + results[iter]["duration"] = np.mean(durations) file = open(fname, 'wb') pickle.dump(results, file) file.close() def make_radarplot(results, title, fname=None): gt = results.pop("gt") categories = list(results[list(results.keys())[0]]["minima"].keys()) N = len(categories) angles = [n / float(N) * 2 * pi for n in range(N)] angles += angles[:1] ax = plt.subplot(1, 1, 1, polar=True, ) ax.set_theta_offset(pi / 2) ax.set_theta_direction(-1) plt.xticks(angles[:-1], categories, color='grey', size=8) ax.set_rlabel_position(0) plt.yticks([0.2, 0.4, 0.6, 0.8, 1.0], ["0.2", "0.4", "0.6", "0.8", "1.0"], color="grey", size=7) plt.ylim(0, 1) gt += gt[:1] ax.fill(angles, gt, color=(0.2, 0.8, 0.2), alpha=0.2) colors = [] cm = plt.get_cmap('Set1') if len(results) > 2: indices = list(range(0, len(results) + 1)) indices.pop(2) else: indices = list(range(0, len(results))) for i in range(len(results)): colors.append(cm(indices[i])) for iter, data in results.items(): values = [] for i in range(len(categories)): values.append(data["minima"]["axis_0{}".format(i)][0]) values += values[:1] color = colors.pop(0) ax.plot(angles, values, color=color, linewidth=2, linestyle='solid', label="iterations {}".format(iter)) plt.title(title, size=11, color=(0.1, 0.1, 0.1), y=1.1) plt.legend(bbox_to_anchor=(0.08, 1.12)) if fname is None: plt.show() else: plt.savefig(fname + ".png") - plt.savefig(fname + ".svg") + #plt.savefig(fname + ".svg") plt.clf() -def make_deviationerrorplot(fnames): - results = {} - for fname in fnames: +def make_errrorbars_plot(results, fname=None): + n_groups = len(results) + + for iter in ITERATIONS: + means = [] + stds = [] + names = [] + colors = [] + axis = [] + fig = plt.figure(figsize=(10, 8)) + for solver_name, numbers in results.items(): + names.append(solver_name) + means.append([]) + stds.append([]) + + for axis_name, data in numbers[iter]["distance"].items(): + means[-1].append(data[0]) + stds[-1].append(data[1]) + if len(axis) < 5: + axis.append(axis_name) + + for c in range(len(names)): + colors.append(plt.cm.Set2(c/len(names))) + + index = np.arange(len(axis)) + bar_width = 0.14 + opacity = 0.8 + error_config = {'ecolor': '0.3'} + + for k, name in enumerate(names): + plt.bar(index + k*bar_width, means[k], bar_width, + alpha=opacity, + color=colors[k], + yerr=stds[k], + error_kw=error_config, + label=name) + plt.xlabel('Axis') + plt.ylabel('Mean [+/- std]') + plt.title('Deviation per Axis and Solver for {} Iterations'.format(iter)) + plt.xticks(index + 2*bar_width, axis) + plt.legend() + + if fname is None: + plt.show() + else: + plt.savefig(fname + "_{}.png".format(iter)) + #plt.savefig(fname + "_{}.svg".format(iter)) + plt.clf() + + +def plot_loss_histories(results, fname=None): + colors = [] + for c in range(len(SOLVER)): + colors.append(plt.cm.Set2(c / len(SOLVER))) + + for iter in ITERATIONS: + fig = plt.figure(figsize=(10, 8)) + added_solver = [] + for n, solver_name in enumerate(results.keys()): + for history in results[solver_name][iter]["loss_history"]: + if solver_name not in added_solver: + plt.plot(history, color=colors[n], label=solver_name, alpha=0.5) + added_solver.append(solver_name) + else: + plt.plot(history, color=colors[n], alpha=0.5) + plt.legend() + plt.ylabel('Loss') + plt.xlabel('Iteration') + + if fname is None: + plt.show() + else: + plt.savefig(fname + "_{}.png".format(iter)) + plt.clf() + + +def print_durations(results, fname=None): + colors = [] + for c in range(len(SOLVER)): + colors.append(plt.cm.Set2(c / len(SOLVER))) + + f = open(fname, "w") + lines = ["\t".join(SOLVER)+"\n"] + + for iter in ITERATIONS: + txt = str(iter) + "\t" + for solver_name in SOLVER: + duration = results[solver_name][iter]["duration"] + txt += str(duration) + "\t" + txt += "\n" + lines.append(txt) + + f.writelines(lines) + f.close() + + +if __name__ == "__main__": + vfunc_ID = "5D" + if len(sys.argv) == 2: + vfunc_ID = sys.argv[1] + print("Start Evaluation on {}".format(vfunc_ID)) + + OUTPUTDIR = os.path.join(OUTPUTDIR, vfunc_ID) + if not os.path.isdir(OUTPUTDIR): + os.makedirs(OUTPUTDIR) + + ################################################## + ############### create datasets ################## + fnames = [] + for solver_name in SOLVER: + fname = os.path.join(OUTPUTDIR, solver_name) + fnames.append(fname) + if OVERWRITE or not os.path.isfile(fname): + compute_deviation(solver_name, vfunc_ID, ITERATIONS, N=STATREPEATS, fname=fname) + ################################################## + ################################################## + + ################################################## + ############## create radarplots ################# + all_results = {} + for solver_name, fname in zip(SOLVER, fnames): file = open(fname, 'rb') - result = pickle.load(file) + results = pickle.load(file) file.close() - results[os.path.basename(fname)] = result - pprint(results) - - plt.figure() - for iter in results["hyperopt"].keys(): - y = [] - if iter == "gt": - x = list(range(len(results["hyperopt"][iter]))) - for i in range(len(results["hyperopt"][iter])): - y.append(results["hyperopt"][iter][i]) - plt.plot(x, y, "--g", label="groundtruth: {}".format(iter)) - continue - - x = list(range(len(results["hyperopt"][iter]["minima"]))) - for i in range(len(results["hyperopt"][iter]["minima"])): - y.append(results["hyperopt"][iter]["minima"]["axis_0{}".format(i)][0]) - plt.plot(x, y, label="iterations: {}".format(iter)) - plt.title("") - plt.legend() - plt.show() - - - -################################################## -############### create datasets ################## -fnames = [] -for solver_name in SOLVER: - fname = os.path.join(OUTPUTDIR, solver_name) - fnames.append(fname) - if OVERWRITE or not os.path.isfile(fname): - compute_deviation(solver_name, VFUNC, ITERATIONS, N=STATREPEATS, fname=fname) -################################################## -################################################## - -################################################## -############## create radarplots ################# -for solver_name, fname in zip(SOLVER, fnames): - file = open(fname, 'rb') - results = pickle.load(file) - file.close() - make_radarplot(results, solver_name, fname + "_deviation") -################################################## -################################################## + make_radarplot(results, solver_name, fname + "_deviation") + all_results[solver_name] = results + + fname = os.path.join(OUTPUTDIR, "errorbars") + make_errrorbars_plot(all_results, fname) + + fname = os.path.join(OUTPUTDIR, "losshistory") + plot_loss_histories(all_results, fname) + + fname = os.path.join(OUTPUTDIR, "durations.txt") + print_durations(all_results, fname) + + for solver_name, iterations in all_results.items(): + for iter, numbers in iterations.items(): + if numbers["set_difference"] != 0: + print("solver {} has a different parameter set match in iteration {}".format(solver_name, iter)) + + ################################################## + ################################################## diff --git a/examples/tutorial_custom_visualization.py b/examples/tutorial_custom_visualization.py index 9633cf9..1b624ab 100644 --- a/examples/tutorial_custom_visualization.py +++ b/examples/tutorial_custom_visualization.py @@ -1,105 +1,105 @@ # DKFZ # # # Copyright (c) German Cancer Research Center, # Division of Medical Image Computing. # All rights reserved. # # This software is distributed WITHOUT ANY WARRANTY; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR # A PARTICULAR PURPOSE. # # See LICENSE import matplotlib.pylab as plt from hyppopy.SolverPool import SolverPool from hyppopy.HyppopyProject import HyppopyProject from hyppopy.VirtualFunction import VirtualFunction from hyppopy.BlackboxFunction import BlackboxFunction project = HyppopyProject() -project.add_hyperparameter(name="axis_00", domain="uniform", data=[0, 1], dtype="float") -project.add_hyperparameter(name="axis_01", domain="uniform", data=[0, 1], dtype="float") -project.add_hyperparameter(name="axis_02", domain="uniform", data=[0, 1], dtype="float") -project.add_hyperparameter(name="axis_03", domain="uniform", data=[0, 1], dtype="float") -project.add_hyperparameter(name="axis_04", domain="uniform", data=[0, 1], dtype="float") -project.add_settings(section="solver", name="max_iterations", value=500) -project.add_settings(section="custom", name="use_solver", value="randomsearch") +project.add_hyperparameter(name="axis_00", domain="uniform", data=[0, 1], type=float) +project.add_hyperparameter(name="axis_01", domain="uniform", data=[0, 1], type=float) +project.add_hyperparameter(name="axis_02", domain="uniform", data=[0, 1], type=float) +project.add_hyperparameter(name="axis_03", domain="uniform", data=[0, 1], type=float) +project.add_hyperparameter(name="axis_04", domain="uniform", data=[0, 1], type=float) +project.add_setting("max_iterations", 500) +project.add_setting("solver", "randomsearch") plt.ion() fig, axes = plt.subplots(nrows=2, ncols=3, figsize=(12, 8), sharey=True) plot_data = {"iterations": [], "loss": [], "axis_00": [], "axis_01": [], "axis_02": [], "axis_03": [], "axis_04": []} def my_visualization_function(**kwargs): print("\r{}".format(kwargs), end="") plot_data["iterations"].append(kwargs['iterations']) plot_data["loss"].append(kwargs['loss']) plot_data["axis_00"].append(kwargs['axis_00']) plot_data["axis_01"].append(kwargs['axis_01']) plot_data["axis_02"].append(kwargs['axis_02']) plot_data["axis_03"].append(kwargs['axis_03']) plot_data["axis_04"].append(kwargs['axis_04']) axes[0, 0].clear() axes[0, 0].scatter(plot_data["axis_00"], plot_data["loss"], c=plot_data["loss"], cmap="jet", marker='.') axes[0, 0].set_ylabel("loss") axes[0, 0].set_xlabel("axis_00") axes[0, 1].clear() axes[0, 1].scatter(plot_data["axis_01"], plot_data["loss"], c=plot_data["loss"], cmap="jet", marker='.') axes[0, 1].set_xlabel("axis_01") axes[0, 2].clear() axes[0, 2].scatter(plot_data["axis_02"], plot_data["loss"], c=plot_data["loss"], cmap="jet", marker='.') axes[0, 2].set_xlabel("axis_02") axes[1, 0].clear() axes[1, 0].scatter(plot_data["axis_03"], plot_data["loss"], c=plot_data["loss"], cmap="jet", marker='.') axes[1, 0].set_ylabel("loss") axes[1, 0].set_xlabel("axis_03") axes[1, 1].clear() axes[1, 1].scatter(plot_data["axis_04"], plot_data["loss"], c=plot_data["loss"], cmap="jet", marker='.') axes[1, 1].set_xlabel("axis_04") axes[1, 2].clear() axes[1, 2].plot(plot_data["iterations"], plot_data["loss"], "--", c=(0.8, 0.8, 0.8, 0.5)) axes[1, 2].scatter(plot_data["iterations"], plot_data["loss"], marker='.', c=(0.2, 0.2, 0.2)) axes[1, 2].set_xlabel("iterations") plt.draw() plt.tight_layout() plt.pause(0.001) def my_loss_function(data, params): vfunc = VirtualFunction() vfunc.load_default("5D") return vfunc(**params) blackbox = BlackboxFunction(data=[], blackbox_func=my_loss_function, callback_func=my_visualization_function) solver = SolverPool.get(project=project) solver.blackbox = blackbox solver.run() df, best = solver.get_results() print("\n") print("*" * 100) print("Best Parameter Set:\n{}".format(best)) print("*" * 100) print("") save_plot = input("Save Plot? [y/n] ") if save_plot == "y": plt.savefig('plot_{}.png'.format(project.custom_use_solver)) diff --git a/examples/tutorial_gridsearch.py b/examples/tutorial_gridsearch.py index 56093cb..6f45995 100644 --- a/examples/tutorial_gridsearch.py +++ b/examples/tutorial_gridsearch.py @@ -1,129 +1,128 @@ # DKFZ # # # Copyright (c) German Cancer Research Center, # Division of Medical Image Computing. # All rights reserved. # # This software is distributed WITHOUT ANY WARRANTY; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR # A PARTICULAR PURPOSE. # # See LICENSE # In this tutorial we solve an optimization problem using the GridsearchSolver # Gridsearch is very inefficient a Randomsearch might most of the time be the # better choice. # import the HyppopyProject class keeping track of inputs from hyppopy.HyppopyProject import HyppopyProject # import the GridsearchSolver classes from hyppopy.solvers.GridsearchSolver import GridsearchSolver # import the Blackboxfunction class wrapping your problem for Hyppopy from hyppopy.BlackboxFunction import BlackboxFunction # To configure the GridsearchSolver we only need the hyperparameter section. Another # difference to the other solvers is that we need to define a gridsampling in addition # to the range: 'data': [0, 1, 100] which means sampling the space from 0 to 1 in 100 # intervals. Gridsearch also supports categorical, uniform, normal and lognormal sampling config = { "hyperparameter": { "C": { "domain": "uniform", - "data": [0.0001, 20, 20], - "type": "float" + "data": [0.0001, 20], + "type": float, + "frequency": 20 }, "gamma": { "domain": "uniform", - "data": [0.0001, 20.0, 20], - "type": "float" + "data": [0.0001, 20.0], + "type": float, + "frequency": 20 }, "kernel": { "domain": "categorical", "data": ["linear", "sigmoid", "poly", "rbf"], - "type": "str" + "type": str, + "frequency": 1 } -}, -"settings": { - "solver": {}, - "custom": {} }} # When creating a HyppopyProject instance we # pass the config dictionary to the constructor. project = HyppopyProject(config=config) # Hyppopy offers a class called BlackboxFunction to wrap your problem for Hyppopy. # The function signature is as follows: # BlackboxFunction(blackbox_func=None, # dataloader_func=None, # preprocess_func=None, # callback_func=None, # data=None, # **kwargs) # # Means we can set a couple of function pointers, a data object and an arbitrary number of custom parameter via kwargs. # # - blackbox_func: a function pointer to the actual, user defined, blackbox function that is computing our loss # - dataloader_func: a function pointer to a function handling the dataloading # - preprocess_func: a function pointer to a function automatically executed before starting the optimization process # - callback_func: a function pointer to a function that is called after each iteration with the trail object as input # - data: setting data can be done via dataloader_func or directly # - kwargs are passed to all functions above and thus can be used for parameter sharing between the functions # # (more details see in the documentation) # # Below we demonstrate the usage of all the above by defining a my_dataloader_function which in fact only grabs the # iris dataset from sklearn and returns it. A my_preprocess_function which also does nothing useful here but # demonstrating that a custom parameter can be set via kwargs and used in all of our functions when called within # Hyppopy. The my_callback_function gets as input the dictionary containing the state of the iteration and thus can be # used to access the current state of each solver iteration. Finally we define the actual loss_function # my_loss_function, which gets as input a data object and params. Both parameter are fixed, the first is defined by # the user depending on what is dataloader returns or the data object set in the constructor, the second is a dictionary # with a sample of your hyperparameter space which content is in the choice of the solver. from sklearn.svm import SVC from sklearn.datasets import load_iris from sklearn.model_selection import cross_val_score def my_dataloader_function(**kwargs): print("Dataloading...") iris_data = load_iris() return [iris_data.data, iris_data.target] def my_callback_function(**kwargs): print("\r{}".format(kwargs), end="") def my_loss_function(data, params): clf = SVC(**params) return -cross_val_score(estimator=clf, X=data[0], y=data[1], cv=3).mean() # We now create the BlackboxFunction object and pass all function pointers defined above, # as well as 2 dummy parameter (my_preproc_param, my_dataloader_input) for demonstration purposes. blackbox = BlackboxFunction(blackbox_func=my_loss_function, dataloader_func=my_dataloader_function, callback_func=my_callback_function) # create a solver instance solver = GridsearchSolver(project) # pass the loss function to the solver solver.blackbox = blackbox # run the solver solver.run() # get the result via get_result() which returns a pandas dataframe # containing the complete history and a dict best containing the # best parameter set. df, best = solver.get_results() print("\n") print("*"*100) print("Best Parameter Set:\n{}".format(best)) print("*"*100) diff --git a/examples/tutorial_hyppopyprojectclass.py b/examples/tutorial_hyppopyprojectclass.py index 704674f..ec98ad7 100644 --- a/examples/tutorial_hyppopyprojectclass.py +++ b/examples/tutorial_hyppopyprojectclass.py @@ -1,69 +1,64 @@ # DKFZ # # # Copyright (c) German Cancer Research Center, # Division of Medical Image Computing. # All rights reserved. # # This software is distributed WITHOUT ANY WARRANTY; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR # A PARTICULAR PURPOSE. # # See LICENSE # In this tutorial we demonstrate the HyppopyProject class usage # import the HyppopyProject class from hyppopy.HyppopyProject import HyppopyProject # To configure a solver we need to instanciate a HyppopyProject class. # This class can be configured using a nested dict. This dict has two # obligatory sections, hyperparameter and settings. A hyperparameter # is described using a dict containing a section, data and type field # and thus the hyperparameter section is a collection of hyperparameter # dicts. The settings section keeps solver settings. These might depend # on the solver used and need to be checked for each. E.g. Randomsearch, # Hyperopt and Optunity need a solver setting max_iterations, the Grid- # searchSolver don't. config = { "hyperparameter": { "C": { "domain": "uniform", "data": [0.0001, 20], - "type": "float" + "type": float }, "gamma": { "domain": "uniform", "data": [0.0001, 20.0], - "type": "float" + "type": float }, "kernel": { "domain": "categorical", "data": ["linear", "sigmoid", "poly", "rbf"], - "type": "str" - } -}, -"settings": { - "solver": { - "max_iterations": 500 - }, - "custom": {} -}} + "type": str + }}, + "max_iterations": 500 +} # When creating a HyppopyProject instance we # pass the config dictionary to the constructor. project = HyppopyProject(config=config) # When building the project programmatically we can also use the methods # add_hyperparameter and add_settings -project.clear() +project = HyppopyProject() project.add_hyperparameter(name="C", domain="uniform", data=[0.0001, 20], dtype="float") project.add_hyperparameter(name="kernel", domain="categorical", data=["linear", "sigmoid"], dtype="str") -project.add_settings(section="solver", name="max_iterations", value=500) +project.set_settings(max_iterations=500) # The custom section can be used freely -project.add_settings(section="custom", name="my_var", value=10) +project.add_setting("my_var", 10) # Settings are automatically transformed to member variables of the project class with the section as prefix -if project.solver_max_iterations < 1000 and project.custom_my_var == 10: +if project.max_iterations < 1000 and project.my_var == 10: print("Project configured!") diff --git a/examples/tutorial_multisolver.py b/examples/tutorial_multisolver.py index 9c6c161..5f9acc4 100644 --- a/examples/tutorial_multisolver.py +++ b/examples/tutorial_multisolver.py @@ -1,188 +1,183 @@ # DKFZ # # # Copyright (c) German Cancer Research Center, # Division of Medical Image Computing. # All rights reserved. # # This software is distributed WITHOUT ANY WARRANTY; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR # A PARTICULAR PURPOSE. # # See LICENSE # In this tutorial we solve an optimization problem using the Hyperopt Solver (http://hyperopt.github.io/hyperopt/). # Hyperopt uses a Baysian - Tree Parzen Estimator - Optimization approach, which means that each iteration computes a # new function value of the blackbox, interpolates a guess for the whole energy function and predicts a point to # compute the next function value at. This next point is not necessarily a "better" value, it's only the value with # the highest uncertainty for the function interpolation. # # See a visual explanation e.g. here (http://philipperemy.github.io/visualization/) # import the HyppopyProject class keeping track of inputs from hyppopy.HyppopyProject import HyppopyProject # import the SolverPool singleton class from hyppopy.SolverPool import SolverPool # import the Blackboxfunction class wrapping your problem for Hyppopy from hyppopy.BlackboxFunction import BlackboxFunction # Next step is defining the problem space and all settings Hyppopy needs to optimize your problem. # The config is a simple nested dictionary with two obligatory main sections, hyperparameter and settings. # The hyperparameter section defines your searchspace. Each hyperparameter is again a dictionary with: # # - a domain ['categorical', 'uniform', 'normal', 'loguniform'] # - the domain data [left bound, right bound] and # - a type of your domain ['str', 'int', 'float'] # # The settings section has two subcategories, solver and custom. The first contains settings for the solver, # here 'max_iterations' - is the maximum number of iteration. # # The custom section allows defining custom parameter. An entry here is transformed to a member variable of the # HyppopyProject class. These can be useful when implementing new solver classes or for control your hyppopy script. # Here we use it as a solver switch to control the usage of our solver via the config. This means with the script # below your can try out every solver by changing use_solver to 'optunity', 'randomsearch', 'gridsearch',... # It can be used like so: project.custom_use_plugin (see below) If using the gridsearch solver, max_iterations is # ignored, instead each hyperparameter must specifiy a number of samples additionally to the range like so: # 'data': [0, 1, 100] which means sampling the space from 0 to 1 in 100 intervals. config = { "hyperparameter": { "C": { "domain": "uniform", "data": [0.0001, 20], - "type": "float" + "type": float }, "gamma": { "domain": "uniform", "data": [0.0001, 20.0], - "type": "float" + "type": float }, "kernel": { "domain": "categorical", "data": ["linear", "sigmoid", "poly", "rbf"], - "type": "str" + "type": str }, "decision_function_shape": { "domain": "categorical", "data": ["ovo", "ovr"], - "type": "str" + "type": str } }, -"settings": { - "solver": { - "max_iterations": 100 - }, - "custom": { - "use_solver": "quasirandomsearch" - } -}} +"max_iterations": 100, +"solver": "quasirandomsearch" +} # When creating a HyppopyProject instance we # pass the config dictionary to the constructor. project = HyppopyProject(config=config) # demonstration of the custom parameter access print("-"*30) -print("max_iterations:\t{}".format(project.solver_max_iterations)) -print("solver chosen -> {}".format(project.custom_use_solver)) +print("max_iterations:\t{}".format(project.max_iterations)) +print("solver chosen -> {}".format(project.solver)) print("-"*30) # The BlackboxFunction signature is as follows: # BlackboxFunction(blackbox_func=None, # dataloader_func=None, # preprocess_func=None, # callback_func=None, # data=None, # **kwargs) # # - blackbox_func: a function pointer to the users loss function # - dataloader_func: a function pointer for handling dataloading. The function is called once before # optimizing. What it returns is passed as first argument to your loss functions # data argument. # - preprocess_func: a function pointer for data preprocessing. The function is called once before # optimizing and gets via kwargs['data'] the raw data object set directly or returned # from dataloader_func. What this function returns is then what is passed as first # argument to your loss function. # - callback_func: a function pointer called after each iteration. The input kwargs is a dictionary # keeping the parameters used in this iteration, the 'iteration' index, the 'loss' # and the 'status'. The function in this example is used for realtime printing it's # input but can also be used for realtime visualization. # - data: if not done via dataloader_func one can set a raw_data object directly # - kwargs: dict that whose content is passed to all functions above. from sklearn.svm import SVC from sklearn.datasets import load_iris from sklearn.model_selection import cross_val_score def my_dataloader_function(**kwargs): print("Dataloading...") # kwargs['params'] allows accessing additional parameter passed, see below my_preproc_param, my_dataloader_input. print("my loading argument: {}".format(kwargs['params']['my_dataloader_input'])) iris_data = load_iris() return [iris_data.data, iris_data.target] def my_preprocess_function(**kwargs): print("Preprocessing...") # kwargs['data'] allows accessing the input data print("data:", kwargs['data'][0].shape, kwargs['data'][1].shape) # kwargs['params'] allows accessing additional parameter passed, see below my_preproc_param, my_dataloader_input. print("kwargs['params']['my_preproc_param']={}".format(kwargs['params']['my_preproc_param']), "\n") # if the preprocessing function returns something, # the input data will be replaced with the data returned by this function. x = kwargs['data'][0] y = kwargs['data'][1] for i in range(x.shape[0]): x[i, :] += kwargs['params']['my_preproc_param'] return [x, y] def my_callback_function(**kwargs): print("\r{}".format(kwargs), end="") def my_loss_function(data, params): clf = SVC(**params) return -cross_val_score(estimator=clf, X=data[0], y=data[1], cv=3).mean() # We now create the BlackboxFunction object and pass all function pointers defined above, # as well as 2 dummy parameter (my_preproc_param, my_dataloader_input) for demonstration purposes. blackbox = BlackboxFunction(blackbox_func=my_loss_function, dataloader_func=my_dataloader_function, preprocess_func=my_preprocess_function, callback_func=my_callback_function, my_preproc_param=1, my_dataloader_input='could/be/a/path') # Last step, is we use our SolverPool which automatically returns the correct solver. # There are multiple ways to get the desired solver from the solver pool. # 1. solver = SolverPool.get('hyperopt') # solver.project = project # 2. solver = SolverPool.get('hyperopt', project) # 3. The SolverPool will look for the field 'use_solver' in the project instance, if # it is present it will be used to specify the solver so that in this case it is enough # to pass the project instance. solver = SolverPool.get(project=project) # Give the solver your blackbox and run it. After execution we can get the result # via get_result() which returns a pandas dataframe containing the complete history # The dict best contains the best parameter set. solver.blackbox = blackbox #solver.start_viewer() solver.run() df, best = solver.get_results() print("\n") print("*"*100) print("Best Parameter Set:\n{}".format(best)) print("*"*100) diff --git a/examples/tutorial_simple.py b/examples/tutorial_simple.py index b4198d2..199bc9e 100644 --- a/examples/tutorial_simple.py +++ b/examples/tutorial_simple.py @@ -1,84 +1,80 @@ # DKFZ # # # Copyright (c) German Cancer Research Center, # Division of Medical Image Computing. # All rights reserved. # # This software is distributed WITHOUT ANY WARRANTY; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR # A PARTICULAR PURPOSE. # # See LICENSE # A hyppopy minimal example optimizing a simple demo function f(x,y) = x**2+y**2 # import the HyppopyProject class keeping track of inputs from hyppopy.HyppopyProject import HyppopyProject # import the HyperoptSolver class from hyppopy.solvers.HyperoptSolver import HyperoptSolver # To configure the Hyppopy solver we use a simple nested dictionary with two obligatory main sections, # hyperparameter and settings. The hyperparameter section defines your searchspace. Each hyperparameter # is again a dictionary with: # # - a domain ['categorical', 'uniform', 'normal', 'loguniform'] # - the domain data [left bound, right bound] and # - a type of your domain ['str', 'int', 'float'] # # The settings section has two subcategories, solver and custom. The first contains settings for the solver, # here 'max_iterations' - is the maximum number of iteration. # # The custom section allows defining custom parameter. An entry here is transformed to a member variable of the # HyppopyProject class. These can be useful when implementing new solver classes or for control your hyppopy script. # Here we use it as a solver switch to control the usage of our solver via the config. This means with the script # below your can try out every solver by changing use_solver to 'optunity', 'randomsearch', 'gridsearch',... # It can be used like so: project.custom_use_plugin (see below) If using the gridsearch solver, max_iterations is # ignored, instead each hyperparameter must specifiy a number of samples additionally to the range like so: # 'data': [0, 1, 100] which means sampling the space from 0 to 1 in 100 intervals. config = { "hyperparameter": { "x": { "domain": "normal", "data": [-10.0, 10.0], - "type": "float" + "type": float }, "y": { "domain": "uniform", "data": [-10.0, 10.0], - "type": "float" + "type": float } }, -"settings": { - "solver": { - "max_iterations": 500 - }, - "custom": {} -}} +"max_iterations": 500 +} # When creating a HyppopyProject instance we # pass the config dictionary to the constructor. project = HyppopyProject(config=config) # The user defined loss function def my_loss_function(x, y): return x**2+y**2 # create a solver instance solver = HyperoptSolver(project) # pass the loss function to the solver solver.blackbox = my_loss_function # run the solver solver.run() df, best = solver.get_results() print("\n") print("*"*100) print("Best Parameter Set:\n{}".format(best)) print("*"*100) diff --git a/hyppopy/SolverPool.py b/hyppopy/SolverPool.py index 36a56bc..1e4fd6b 100644 --- a/hyppopy/SolverPool.py +++ b/hyppopy/SolverPool.py @@ -1,85 +1,79 @@ # DKFZ # # # Copyright (c) German Cancer Research Center, # Division of Medical Image Computing. # All rights reserved. # # This software is distributed WITHOUT ANY WARRANTY; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR # A PARTICULAR PURPOSE. # # See LICENSE from .Singleton import * import os import logging from hyppopy.HyppopyProject import HyppopyProject from hyppopy.solvers.OptunaSolver import OptunaSolver -from hyppopy.solvers.BayesOptSolver import BayesOptSolver from hyppopy.solvers.HyperoptSolver import HyperoptSolver from hyppopy.solvers.OptunitySolver import OptunitySolver from hyppopy.solvers.GridsearchSolver import GridsearchSolver from hyppopy.solvers.RandomsearchSolver import RandomsearchSolver from hyppopy.solvers.QuasiRandomsearchSolver import QuasiRandomsearchSolver from hyppopy.globals import DEBUGLEVEL LOG = logging.getLogger(os.path.basename(__file__)) LOG.setLevel(DEBUGLEVEL) @singleton_object class SolverPool(metaclass=Singleton): def __init__(self): self._solver_list = ["hyperopt", "optunity", - "bayesopt", "optuna", "randomsearch", "quasirandomsearch", "gridsearch"] def get_solver_names(self): return self._solver_list def get(self, solver_name=None, project=None): if solver_name is not None: assert isinstance(solver_name, str), "precondition violation, solver_name type str expected, got {} instead!".format(type(solver_name)) if project is not None: assert isinstance(project, HyppopyProject), "precondition violation, project type HyppopyProject expected, got {} instead!".format(type(project)) - if "custom_use_solver" in project.__dict__: - solver_name = project.custom_use_solver + if "solver" in project.__dict__: + solver_name = project.solver if solver_name not in self._solver_list: raise AssertionError("Solver named [{}] not implemented!".format(solver_name)) if solver_name == "hyperopt": if project is not None: return HyperoptSolver(project) return HyperoptSolver() elif solver_name == "optunity": if project is not None: return OptunitySolver(project) return OptunitySolver() - elif solver_name == "bayesopt": - if project is not None: - return BayesOptSolver(project) - return BayesOptSolver() elif solver_name == "optuna": if project is not None: return OptunaSolver(project) return OptunaSolver() elif solver_name == "gridsearch": if project is not None: return GridsearchSolver(project) return GridsearchSolver() elif solver_name == "randomsearch": if project is not None: return RandomsearchSolver(project) return RandomsearchSolver() elif solver_name == "quasirandomsearch": if project is not None: return QuasiRandomsearchSolver(project) return QuasiRandomsearchSolver() diff --git a/hyppopy/solvers/BayesOptSolver.py b/hyppopy/solvers/BayesOptSolver.py deleted file mode 100644 index 4a0421f..0000000 --- a/hyppopy/solvers/BayesOptSolver.py +++ /dev/null @@ -1,89 +0,0 @@ -# DKFZ -# -# -# Copyright (c) German Cancer Research Center, -# Division of Medical Image Computing. -# All rights reserved. -# -# This software is distributed WITHOUT ANY WARRANTY; without -# even the implied warranty of MERCHANTABILITY or FITNESS FOR -# A PARTICULAR PURPOSE. -# -# See LICENSE - -import os -import logging -import warnings -import numpy as np -from pprint import pformat -from hyperopt import Trials -from bayes_opt import BayesianOptimization - -from hyppopy.globals import DEBUGLEVEL -from hyppopy.solvers.HyppopySolver import HyppopySolver - -LOG = logging.getLogger(os.path.basename(__file__)) -LOG.setLevel(DEBUGLEVEL) - - -class BayesOptSolver(HyppopySolver): - - def __init__(self, project=None): - HyppopySolver.__init__(self, project) - self._searchspace = None - self._idx = None - - def define_interface(self): - self.add_member("max_iterations", int) - self.add_hyperparameter_signature(name="domain", dtype=str, - options=["uniform", "categorical"]) - self.add_hyperparameter_signature(name="data", dtype=list) - self.add_hyperparameter_signature(name="type", dtype=type) - - def reformat_parameter(self, params): - out_params = {} - for name, value in params.items(): - if self._searchspace[name]["domain"] == "categorical": - out_params[name] = self._searchspace[name]["data"][int(np.round(value))] - else: - if self._searchspace[name]["type"] is int: - out_params[name] = int(np.round(value)) - else: - out_params[name] = value - return out_params - - def loss_function_call(self, params): - params = self.reformat_parameter(params) - for key in params.keys(): - if self.project.get_typeof(key) is int: - params[key] = int(round(params[key])) - return self.blackbox(**params) - - def execute_solver(self, searchspace): - LOG.debug("execute_solver using solution space:\n\n\t{}\n".format(pformat(searchspace))) - self.trials = Trials() - self._idx = 0 - - try: - optimizer = BayesianOptimization(f=self.loss_function, pbounds=searchspace, verbose=0) - optimizer.maximize(init_points=2, n_iter=self.max_iterations) - self.best = self.reformat_parameter(optimizer.max["params"]) - except Exception as e: - LOG.error("internal error in bayes_opt maximize occured. {}".format(e)) - raise BrokenPipeError("internal error in bayes_opt maximize occured. {}".format(e)) - - def convert_searchspace(self, hyperparameter): - LOG.debug("convert input parameter\n\n\t{}\n".format(pformat(hyperparameter))) - self._searchspace = hyperparameter - pbounds = {} - for name, param in hyperparameter.items(): - if param["domain"] != "categorical": - if param["domain"] != "uniform": - msg = "Warning: BayesOpt cannot handle {} domain. Only uniform and categorical domains are supported!".format( - param["domain"]) - warnings.warn(msg) - LOG.warning(msg) - pbounds[name] = (param["data"][0], param["data"][1]) - else: - pbounds[name] = (0, len(param["data"])-1) - return pbounds diff --git a/hyppopy/tests/test_bayesoptsolver.py b/hyppopy/tests/test_bayesoptsolver.py deleted file mode 100644 index 9d4b46e..0000000 --- a/hyppopy/tests/test_bayesoptsolver.py +++ /dev/null @@ -1,66 +0,0 @@ -# DKFZ -# -# -# Copyright (c) German Cancer Research Center, -# Division of Medical Image Computing. -# All rights reserved. -# -# This software is distributed WITHOUT ANY WARRANTY; without -# even the implied warranty of MERCHANTABILITY or FITNESS FOR -# A PARTICULAR PURPOSE. -# -# See LICENSE - -import unittest - -from hyppopy.solvers.BayesOptSolver import * -from hyppopy.VirtualFunction import VirtualFunction -from hyppopy.HyppopyProject import HyppopyProject - - -class BayesOptSolverTestSuite(unittest.TestCase): - - def setUp(self): - pass - - def test_solver_complete(self): - config = { - "hyperparameter": { - "axis_00": { - "domain": "uniform", - "data": [0, 800], - "type": float - }, - "axis_01": { - "domain": "uniform", - "data": [-1, 1], - "type": float - }, - "axis_02": { - "domain": "uniform", - "data": [0, 10], - "type": float - } - }, - "max_iterations": 10, - } - - project = HyppopyProject(config) - solver = BayesOptSolver(project) - vfunc = VirtualFunction() - vfunc.load_default() - solver.blackbox = vfunc - solver.run(print_stats=False) - df, best = solver.get_results() - self.assertTrue(0 <= best['axis_00'] <= 800) - self.assertTrue(-1 <= best['axis_01'] <= 1) - self.assertTrue(0 <= best['axis_02'] <= 10) - - for status in df['status']: - self.assertTrue(status) - for loss in df['losses']: - self.assertTrue(isinstance(loss, float)) - - -if __name__ == '__main__': - unittest.main() diff --git a/hyppopy/tests/test_randomsearchsolver.py b/hyppopy/tests/test_randomsearchsolver.py index c1db803..10a7117 100644 --- a/hyppopy/tests/test_randomsearchsolver.py +++ b/hyppopy/tests/test_randomsearchsolver.py @@ -1,170 +1,165 @@ # DKFZ # # # Copyright (c) German Cancer Research Center, # Division of Medical Image Computing. # All rights reserved. # # This software is distributed WITHOUT ANY WARRANTY; without # even the implied warranty of MERCHANTABILITY or FITNESS FOR # A PARTICULAR PURPOSE. # # See LICENSE import unittest import matplotlib.pylab as plt from hyppopy.solvers.RandomsearchSolver import * from hyppopy.VirtualFunction import VirtualFunction from hyppopy.HyppopyProject import HyppopyProject class RandomsearchTestSuite(unittest.TestCase): def setUp(self): pass def test_draw_uniform_sample(self): - param = {"data": [0, 1, 10], - "type": float} + param = {"data": [0, 1, 10], "type": float} values = [] for i in range(10000): values.append(draw_uniform_sample(param)) self.assertTrue(0 <= values[-1] <= 1) self.assertTrue(isinstance(values[-1], float)) hist = plt.hist(values, bins=10, normed=True) std = np.std(hist[0]) mean = np.mean(hist[0]) self.assertTrue(std < 0.05) self.assertTrue(0.9 < mean < 1.1) - param = {"data": [0, 10, 11], - "type": int} + param = {"data": [0, 10, 11], "type": int} values = [] for i in range(10000): values.append(draw_uniform_sample(param)) self.assertTrue(0 <= values[-1] <= 10) self.assertTrue(isinstance(values[-1], int)) hist = plt.hist(values, bins=11, normed=True) std = np.std(hist[0]) mean = np.mean(hist[0]) self.assertTrue(std < 0.05) self.assertTrue(0.09 < mean < 0.11) def test_draw_normal_sample(self): - param = {"data": [0, 10, 11], - "type": int} + param = {"data": [0, 10, 11], "type": int} values = [] for i in range(10000): values.append(draw_normal_sample(param)) self.assertTrue(0 <= values[-1] <= 10) self.assertTrue(isinstance(values[-1], int)) hist = plt.hist(values, bins=11, normed=True) for i in range(1, 5): self.assertTrue(hist[0][i-1]-hist[0][i] < 0) for i in range(5, 10): self.assertTrue(hist[0][i] - hist[0][i+1] > 0) def test_draw_loguniform_sample(self): - param = {"data": [1, 1000, 11], - "type": float} + param = {"data": [1, 1000, 11], "type": float} values = [] for i in range(10000): values.append(draw_loguniform_sample(param)) self.assertTrue(1 <= values[-1] <= 1000) self.assertTrue(isinstance(values[-1], float)) hist = plt.hist(values, bins=11, normed=True) for i in range(4): self.assertTrue(hist[0][i] > hist[0][i+1]) self.assertTrue((hist[0][i] - hist[0][i+1]) > 0) def test_draw_categorical_sample(self): - param = {"data": [1, 2, 3], - "type": int} + param = {"data": [1, 2, 3], "type": int} values = [] for i in range(10000): values.append(draw_categorical_sample(param)) self.assertTrue(values[-1] == 1 or values[-1] == 2 or values[-1] == 3) self.assertTrue(isinstance(values[-1], int)) hist = plt.hist(values, bins=3, normed=True) for i in range(3): self.assertTrue(0.45 < hist[0][i] < 0.55) def test_solver_uniform(self): config = { "hyperparameter": { "axis_00": { "domain": "uniform", "data": [0, 800], "type": float }, "axis_01": { "domain": "uniform", "data": [-1, 1], "type": float }, "axis_02": { "domain": "uniform", "data": [0, 10], "type": float } }, "max_iterations": 300 } project = HyppopyProject(config) solver = RandomsearchSolver(project) vfunc = VirtualFunction() vfunc.load_default() solver.blackbox = vfunc solver.run(print_stats=False) df, best = solver.get_results() self.assertTrue(0 <= best['axis_00'] <= 800) self.assertTrue(-1 <= best['axis_01'] <= 1) self.assertTrue(0 <= best['axis_02'] <= 10) for status in df['status']: self.assertTrue(status) for loss in df['losses']: self.assertTrue(isinstance(loss, float)) def test_solver_normal(self): config = { "hyperparameter": { "axis_00": { "domain": "normal", "data": [500, 650], "type": float }, "axis_01": { "domain": "normal", "data": [0, 1], "type": float }, "axis_02": { "domain": "normal", "data": [4, 5], "type": float } }, "max_iterations": 500, } solver = RandomsearchSolver(config) vfunc = VirtualFunction() vfunc.load_default() solver.blackbox = vfunc solver.run(print_stats=False) df, best = solver.get_results() self.assertTrue(500 <= best['axis_00'] <= 650) self.assertTrue(0 <= best['axis_01'] <= 1) self.assertTrue(4 <= best['axis_02'] <= 5) for status in df['status']: self.assertTrue(status) for loss in df['losses']: self.assertTrue(isinstance(loss, float)) if __name__ == '__main__': unittest.main()