Configuration File

Below is a complete example of a configuration file. This is the only input file needed in order to execute the program. The information therein can be divided into 6 main sections covering the following aspects: (1) trainer model , (2) data setup , (3) model architecture , (4) hyperparameter sampling , (5) surrogate modeling , and (6) distributed setting . Not all sections are required and the required content will depend on what the user wants to do.

trainer: internal

data:
  dataset     : generic
  data_path   : temperature
  n_timestamp : 100
  n_out       : 1
  verbose     : False

model:
  trial      : 5
  library    : pt
  dl_type    : mlp
  update     : True
  verbose    : False
  validate   : True
  save_model : False
  obj        : mse

prms:
  nevals : 32
  names  : [epochs,nodes]
  mult   : [     1,    1]
  xlow   : [     1,    1]
  xup    : [    50,  100]
  record : samples.txt
  salib  : False
  default:
    layers     : 1
    batch      : 64
    dropout    : 0
    activation : relu
    optimizer  : sgd
    opt_args:
        lr : 0.01

hpo:
  surrogate        : gp # gp or rbf
  Fbest            : inf
  phifunction      : linear # necessary for rbf     
  polynomial       : linear # necessary for rbf
  NumberNewSamples : 1 #number of new samples evaluated in each iteration
  Ncand            : 500 #number of candidates generated in each iteration
  loops            : 2

uq:
  uq_on      : True
  uq_hpo     : False # if this is set to be True then, hpo[surrogate] should be set to rbf
  uq_weights : [0.5, 0.5]
  data_noise : 0.0

dist:
  node_type : gpu
  backend   : nccl
  nsteps    : 16
  ntasks    : 1
  module    : pytorch/1.7.1
  conda     : software
  cd        : ./
  operation : evaluation
  sbatch:
    account    : m0001
    constraint : gpu
    qos        : regular
    job-name   : hpo-gpu
    time       : 30

Trainer mode

The trainer mode specifies how the training is being performed, either through internal built-in available architecture present in the HYPPO software, or via an external package’s function or an external SLURM script calling a separate configuration file. If internal models are being used, this key should be set as internal as follows:

trainer : internal

In case you want to use an external package (follow the guideline here on how to convert non-package repository into importable python package), the path to the method that executes the training should be given, for instance:

trainer : package.module.method

Perhaps the user’s own project repository is already set up to be executed through a SLURM script in which a different YAML-style configuration file will be called by the user’s external program. If that is the case, the HYPPO software can handle such setup by specifying the path to the SLURM script in the trainer section:

trainer : path/to/slurm_script.slr

The hyppo.train.train_evaluation() method in the HYPPO software will initiate the training depending on the requested approach selected by the user.

Data Setup

Fake data

data:
  dataset     : fake
  size        : 4000
  limit_low   : 0
  limit_high  : 0.48
  n_timestamp : 100
  n_out       : 1
  record      : data.txt
  verbose     : False

Parameter

Description

Type

Choices

Default

dataset

Name of dataset to be used

str

size

Size of dataset to be used

int

limit_low

float

0

limit_high

float

0.48

n_timestamp

int

n_out

int

record

str

None

verbose

bool

True , False

False

Model Architecture

model:
  trial : 1
  library : pt
  dl_type : mlp
  update : True
  verbose : False
  validate : True
  save_model : False
  obj : mse
  transform : 10*torch.sigmoid(300*(test_loss-0.035))

Parameter

Description

Type

Choices

Default

library

Machine Learning Library to be used

str

pt , tf

dl_type

Type of Neural Network Architecture

str

mlp , cnn , rnn , lstm

update

Used prediction-on-prediction inference

bool

True , False

False

verbose

Print results and create extra figures

bool

True , False

False

use_val

Use validation set across epochs

bool

True , False

False

save_model

Save model at each epoch

bool

True , False

False

obj

Objective function to be used

str

mse

mse

transform

Function to transform loss output

str

None

Loss transform

This feature allows to change the value of the loss resulting from the training to be used as outer objective loss function for surrogate modeling. This can become useful when losses are very close to each other, making the surrogate model hard to optimize. For instance, let’s consider losses that lie in the range [0.02;0.05], when using a transform value of 10*torch.sigmoid(300*(test_loss-0.035)) , this will apply a sigmoid function to the center of this range as follows:

import numpy, torch
import matplotlib.pyplot as plt
x = numpy.linspace(0.02,0.05)
y = 10*torch.sigmoid(torch.Tensor(300*(x-0.035)))
plt.style.use('seaborn')
plt.plot(x,y)
plt.xlabel('Original Loss')
plt.ylabel('Scaled Loss')
plt.show()
../_images/transform.png

Built-in models

Several built-in architecture were implemented to provides an easy way for scientists to explore ML applications in the early phase of their research project. The following architectures can have already been implemented into the software:

To allow flexibility for this software usage, we implemented both PyTorch and Tensorflow version so that one can use the ML library that one is most comfortable with.

Warning

While this functionality is perfect to explore different models for a new science project, it may well be that a more complex architecture will be required and will need be customized to fit the science goal. Fortunately, the HYPPO software was designed to work on model and training modules that are externals to the software. For more information on how to configure the program to run with an external package, see this section .

Hyperparameter Sampling

prms:
  names : [epochs,nodes]
  mult : [ 1,  1]
  xlow : [ 1,  1]
  xup :  [50,100]
  record : samples.pickle
  salib : True

Parameter

Description

Type

Choices

Default

names

Parameters to be optimized

dict

mult

dict

xlow

Lower bound for corresponding hyperparameter

dict

xup

Upper bound for corresponding hyperparameter

dict

record

str

salib

Use low-discrepancy sampling

bool

True , False

Change default values

default :
  layers : 1
  batch : 64
  dropout : 0
  activation : relu
  optimizer : sgd
  loss : sparse_categorical_crossentropy
  opt_args:
    lr : 0.01

Parameter

Description

Type

Choices

Default

layers

int

batch

int

dropout

float

activation

str

optimizer

str

loss

str

Optimizer arguments ( opt_args ) are specific to the optimizer selected.

Surrogate Modeling

hpo:
  surrogate : gp # gp or rbf
  Fbest : inf
  phifunction: linear # necessary for rbf
  polynomial: linear # necessary for rbf
  NumberNewSamples : 1 #number of new samples evaluated in each iteration
  Ncand : 500 #number of candidates generated in each iteration
  loops : 2

Parameter

Description

Type

Choices

Default

surrogate

Select surrogate model to use

str

gp , rbf

Fbest

phifunction

str

linear , cubic , thinplate

polynomial

str

constant , linear , quadratic

NumberNewSamples

int

Ncand

int

loops

int

Uncertainty Quantification

uq:
  uq_on: True
  uq_hpo: False # if this is set to be True then, hpo[surrogate] should be set to rbf
  uq_weights: [0.5, 0.5]
  data_noise: 0.0

Parameter

Description

Type

Choices

Default

uq_on

Option to perform uncertainty quantification

bool

True , False

uq_false

bool

True , False

uq_weights

dict

data_noise

float

Distributed Setting

dist:
  node_type : gpu
  backend   : nccl
  nsteps    : 16
  ntasks    : 1
  module    : pytorch/1.7.1
  conda     : software
  cd        : ./
  operation : evaluation

Parameter

Description

Type

Choices

Default

node_type

Type of node to run HYPPO code on

str

cpu , gpu

cpu

backend

For TensorFlow/PyTorch on CPU

str

mpi

mpi

For Tensorflow on GPU

hvd

hvd

For PyTorch on GPU

nccl , gloo

None

nsteps

int

1

ntasks

int

1

module

DL package if using CPU version

str

tensorflow/intel-2.2.0-py37 , pytorch/1.7.1

DL package if using GPU version

str

tensorflow/2.4.0-gpu , pytorch/1.9.0-gpu

conda

Name of anaconda environment

str

cd

str

None

operation

str

evaluation , surrogate

evaluation

Warning

Specify both the node type and backend fields, otherwise the software may default to a different configuration.

Specify SLURM directives

sbatch:
    account    : m0001
    constraint : gpu
    qos        : regular
    job-name   : hpo-gpu
    time       : 30

Parameter

Description

Type

Choices

Default

account

Project to charge for computing resources

str

Set in Iris

constraint

Type of resource

str

gpu , knl , haswell

qos

Quality of service

str

regular , interactive , debug

debug

job-name

Job name (will be visible under job status)

str

time

Amount of time to request for the job (min)

int

10

More details on SLURM directives can be found here . Each user sets a default project to charge on Iris .