Configuration File ¶
Below is a complete example of a configuration file. This is the only input file needed in order to execute the program. The information therein can be divided into 6 main sections covering the following aspects: (1) trainer model , (2) data setup , (3) model architecture , (4) hyperparameter sampling , (5) surrogate modeling , and (6) distributed setting . Not all sections are required and the required content will depend on what the user wants to do.
trainer: internal
data:
dataset : generic
data_path : temperature
n_timestamp : 100
n_out : 1
verbose : False
model:
trial : 5
library : pt
dl_type : mlp
update : True
verbose : False
validate : True
save_model : False
obj : mse
prms:
nevals : 32
names : [epochs,nodes]
mult : [ 1, 1]
xlow : [ 1, 1]
xup : [ 50, 100]
record : samples.txt
salib : False
default:
layers : 1
batch : 64
dropout : 0
activation : relu
optimizer : sgd
opt_args:
lr : 0.01
hpo:
surrogate : gp # gp or rbf
Fbest : inf
phifunction : linear # necessary for rbf
polynomial : linear # necessary for rbf
NumberNewSamples : 1 #number of new samples evaluated in each iteration
Ncand : 500 #number of candidates generated in each iteration
loops : 2
uq:
uq_on : True
uq_hpo : False # if this is set to be True then, hpo[surrogate] should be set to rbf
uq_weights : [0.5, 0.5]
data_noise : 0.0
dist:
node_type : gpu
backend : nccl
nsteps : 16
ntasks : 1
module : pytorch/1.7.1
conda : software
cd : ./
operation : evaluation
sbatch:
account : m0001
constraint : gpu
qos : regular
job-name : hpo-gpu
time : 30
Trainer mode ¶
The trainer mode specifies how the training is being performed, either through internal built-in available architecture present in the HYPPO software, or via an external package’s function or an external SLURM script calling a separate configuration file. If internal models are being used, this key should be set as
internal
as follows:
trainer : internal
In case you want to use an external package (follow the guideline here on how to convert non-package repository into importable python package), the path to the method that executes the training should be given, for instance:
trainer : package.module.method
Perhaps the user’s own project repository is already set up to be executed through a SLURM script in which a different YAML-style configuration file will be called by the user’s external program. If that is the case, the HYPPO software can handle such setup by specifying the path to the SLURM script in the trainer section:
trainer : path/to/slurm_script.slr
The
hyppo.train.train_evaluation()
method in the HYPPO software will initiate the training depending on the requested approach selected by the user.
Data Setup ¶
Fake data ¶
data:
dataset : fake
size : 4000
limit_low : 0
limit_high : 0.48
n_timestamp : 100
n_out : 1
record : data.txt
verbose : False
Parameter |
Description |
Type |
Choices |
Default |
---|---|---|---|---|
|
Name of dataset to be used |
|
||
|
Size of dataset to be used |
|
||
|
|
0 |
||
|
|
0.48 |
||
|
|
|||
|
|
|||
|
|
None |
||
|
|
|
False |
Model Architecture ¶
model:
trial : 1
library : pt
dl_type : mlp
update : True
verbose : False
validate : True
save_model : False
obj : mse
transform : 10*torch.sigmoid(300*(test_loss-0.035))
Parameter |
Description |
Type |
Choices |
Default |
---|---|---|---|---|
|
Machine Learning Library to be used |
|
|
|
|
Type of Neural Network Architecture |
|
|
|
|
Used prediction-on-prediction inference |
|
|
|
|
Print results and create extra figures |
|
|
|
|
Use validation set across epochs |
|
|
|
|
Save model at each epoch |
|
|
|
|
Objective function to be used |
|
|
|
|
Function to transform loss output |
|
|
Loss transform ¶
This feature allows to change the value of the loss resulting from the training to be used as outer objective loss function for surrogate modeling. This can become useful when losses are very close to each other, making the surrogate model hard to optimize. For instance, let’s consider losses that lie in the range [0.02;0.05], when using a
transform
value of
10*torch.sigmoid(300*(test_loss-0.035))
, this will apply a sigmoid function to the center of this range as follows:
import numpy, torch
import matplotlib.pyplot as plt
x = numpy.linspace(0.02,0.05)
y = 10*torch.sigmoid(torch.Tensor(300*(x-0.035)))
plt.style.use('seaborn')
plt.plot(x,y)
plt.xlabel('Original Loss')
plt.ylabel('Scaled Loss')
plt.show()

Built-in models ¶
Several built-in architecture were implemented to provides an easy way for scientists to explore ML applications in the early phase of their research project. The following architectures can have already been implemented into the software:
-
Multi-Layer Perceptron (MLP)
-
Long Short-Term Memory (LSTM)
-
Recurrent Neural Network (RNN)
To allow flexibility for this software usage, we implemented both PyTorch and Tensorflow version so that one can use the ML library that one is most comfortable with.
Warning
While this functionality is perfect to explore different models for a new science project, it may well be that a more complex architecture will be required and will need be customized to fit the science goal. Fortunately, the HYPPO software was designed to work on model and training modules that are externals to the software. For more information on how to configure the program to run with an external package, see this section .
Hyperparameter Sampling ¶
prms:
names : [epochs,nodes]
mult : [ 1, 1]
xlow : [ 1, 1]
xup : [50,100]
record : samples.pickle
salib : True
Parameter |
Description |
Type |
Choices |
Default |
---|---|---|---|---|
|
Parameters to be optimized |
|
||
|
|
|||
|
Lower bound for corresponding hyperparameter |
|
||
|
Upper bound for corresponding hyperparameter |
|
||
|
|
|||
|
Use low-discrepancy sampling |
|
|
Change default values ¶
default :
layers : 1
batch : 64
dropout : 0
activation : relu
optimizer : sgd
loss : sparse_categorical_crossentropy
opt_args:
lr : 0.01
Parameter |
Description |
Type |
Choices |
Default |
---|---|---|---|---|
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
Optimizer arguments ( opt_args ) are specific to the optimizer selected.
Surrogate Modeling ¶
hpo:
surrogate : gp # gp or rbf
Fbest : inf
phifunction: linear # necessary for rbf
polynomial: linear # necessary for rbf
NumberNewSamples : 1 #number of new samples evaluated in each iteration
Ncand : 500 #number of candidates generated in each iteration
loops : 2
Parameter |
Description |
Type |
Choices |
Default |
---|---|---|---|---|
|
Select surrogate model to use |
|
|
|
|
||||
|
|
|
||
|
|
|
||
|
|
|||
|
|
|||
|
|
Uncertainty Quantification ¶
uq:
uq_on: True
uq_hpo: False # if this is set to be True then, hpo[surrogate] should be set to rbf
uq_weights: [0.5, 0.5]
data_noise: 0.0
Parameter |
Description |
Type |
Choices |
Default |
---|---|---|---|---|
|
Option to perform uncertainty quantification |
|
|
|
|
|
|
||
|
|
|||
|
|
Distributed Setting ¶
dist:
node_type : gpu
backend : nccl
nsteps : 16
ntasks : 1
module : pytorch/1.7.1
conda : software
cd : ./
operation : evaluation
Parameter |
Description |
Type |
Choices |
Default |
---|---|---|---|---|
|
Type of node to run HYPPO code on |
|
|
|
|
For TensorFlow/PyTorch on CPU |
|
|
|
For Tensorflow on GPU |
|
|
||
For PyTorch on GPU |
|
None |
||
|
|
1 |
||
|
|
1 |
||
|
DL package if using CPU version |
|
|
|
DL package if using GPU version |
|
|
||
|
Name of anaconda environment |
|
||
|
|
None |
||
|
|
|
|
Warning
Specify both the node type and backend fields, otherwise the software may default to a different configuration.
Specify SLURM directives ¶
sbatch:
account : m0001
constraint : gpu
qos : regular
job-name : hpo-gpu
time : 30
Parameter |
Description |
Type |
Choices |
Default |
---|---|---|---|---|
|
Project to charge for computing resources |
|
Set in Iris |
|
|
Type of resource |
|
|
|
|
Quality of service |
|
|
|
|
Job name (will be visible under job status) |
|
||
|
Amount of time to request for the job (min) |
|
10 |
More details on SLURM directives can be found here . Each user sets a default project to charge on Iris .