1D problem example

Objective function

In order to demonstrate how surrogate modeling can outperform random sampling, we set up an easy 1-dimensional (single parameter) problem which can be found in this blog . As HYPPO targets generally Machine Learning problems where the loss function represents the objective function to be minimized, we will use the negative version of the function used in the blog.

[2]:
import numpy
def problem(X,**kwargs):
    return -(-numpy.sin(3*X) - X**2 + 0.7*X)

Let’s consider the X values to lie between -1 and 2.

[3]:
import matplotlib.pyplot as plt
X = numpy.arange(-1, 2, 0.01)
plt.style.use('seaborn-v0_8')
plt.figure(figsize=(6,4),dpi=100)
plt.plot(X, problem(X), 'black')
plt.show()
../_images/analysis_1d_problem_4_0.png

Surrogate modeling

In this page, we compare the performance of doing surrogate modeling with random sampling. The comparison can be done by plotting what we call a “convergence plot” where we plot the best value of the loss found so far after each evaluated set of parameters. The code below will execute a random sampling of 3 different parameter sets and then does a surrogate modeling over 17 iterations.

Gaussian Process

[ ]:
import hyppo
config = {
    'trainer':problem,
    'prms':{
        'nevals' : 3,
        'names'  : [ 'X'],
        'mult'   : [0.001],
        'xlow'   : [-1000],
        'xup'    : [ 2000],
    },
    'hpo':{
        'loops':27,
        'surrogate':'gp'
    }
}
hyppo.inline_job(config,run_mode=2,loops=2,store_path='1d_problem/gp')
[8]:
import hyppo
hpo_path   = 'logs/1d_problem/gp/001/logs'
evaluation = hyppo.extract('%s/evaluation*_01.log' % hpo_path, raw=True).sort_values(by=['nevals'])
surrogate  = hyppo.extract('%s/surrogate*_01.log' % hpo_path, raw=True).sort_values(by=['nevals'])

The animated gif can be created using the convert command from ImageMagick, which can be installed in the conda environment through conda-forge as follows:

conda config --add channels conda-forge
conda install imagemagick
[ ]:
import os
import numpy
X = numpy.arange(-1, 2, 0.001)
Y = -(-numpy.sin(3*X) - X**2 + 0.7*X)
hyppo.model_visualization_1d(X,Y,hpo_path,evaluation,surrogate,model_file_name="gpr",imax=10,sigma=True)
os.system('convert -delay 50 -loop 0 '+hpo_path+'/plots/iter_*.png gp1d.gif')

e6031cc4a06d4964b9d307228d334d4c

Radial Basis Function

[ ]:
import hyppo
config = {
    'trainer':problem,
    'prms':{
        'nevals' : 3,
        'names'  : [ 'X'],
        'mult'   : [0.001],
        'xlow'   : [-1000],
        'xup'    : [ 2000],
    },
    'hpo':{
        'loops':27,
        'surrogate':'rbf'
    }
}
hyppo.inline_job(config,run_mode=2,loops=20,store_path='1d_problem/rbf')
[ ]:
import hyppo
hpo_path   = 'logs/1d_problem/rbf/002/logs'
evaluation = hyppo.extract('%s/evaluation*_01.log' % hpo_path, raw=True).sort_values(by=['nevals'])
surrogate  = hyppo.extract('%s/surrogate*_01.log' % hpo_path, raw=True).sort_values(by=['nevals'])
[ ]:
import os
import numpy
X = numpy.arange(-1, 2, 0.001)
Y = -(-numpy.sin(3*X) - X**2 + 0.7*X)
hyppo.model_visualization_1d(X,Y,hpo_path,evaluation,surrogate,model_file_name="rbf",imax=10)
os.system('convert -delay 50 -loop 0 '+hpo_path+'/plots/iter_*.png rbf1d.gif')

3fb05da3ce0a40e38398b3ba6de8135d

Random sampling

[ ]:
import os
import shutil
import hyppo
import logging
from importlib import reload
config = {
    'trainer':problem,
    'prms':{
        'nevals' : 30,
        'names'  : [  'X'],
        'mult'   : [0.001],
        'xlow'   : [-1000],
        'xup'    : [ 2000],
    },
}
hyppo.inline_job(config,run_mode=0,loops=20,store_path='1d_problem/random')

Convergence plots

[ ]:
import hyppo
import matplotlib.pyplot as plt
from matplotlib.ticker import FormatStrFormatter
plt.style.use('seaborn')
plt.figure(figsize=(6,4),dpi=200)
for i,(path,color) in enumerate([['logs/1d_problem/gp','blue'],['logs/1d_problem/rbf','green'],['logs/1d_problem/random','red']]):
    mean_loss, mean_sdev = hyppo.get_convergence(path,with_surrogate=(i<2),sigma=True)
    plt.plot(mean_loss,color=color,lw=1,zorder=1,label=path,drawstyle='steps-post')
    plt.fill_between(range(len(mean_loss)),mean_loss-mean_sdev/2,mean_loss+mean_sdev/2,
                     alpha=0.1,lw=0,color=color,step='post')
plt.xlim(0,len(mean_loss))
plt.ylabel('Loss')
plt.xlabel('Index of function evaluations')
plt.legend(loc='upper right')
plt.axvline(3,color='k',lw=1,ls='dashed')
plt.tight_layout(h_pad=0.1)
plt.show()
../_images/analysis_1d_problem_20_0.png