External Package

Users may very likely have already built their own code to do machine learning training on their scientific data. The HYPPO software is designed so that it can work with external modules. External training modules can be called within the trainer mode option in the configuration file. However, it is important that the user’s external module be included as a package within the system’s PYTHONPATH environement. Below we describe how this can be done.

Non-package repository

Let’s suppose that the user’s routines are located within a particular folder within the HOME directory as follows:

$HOME
└── mycode
    ├── models
    │   └── mlp.py
    └── trainer.py

In this example, the trainer.py module is the main script that will be called through the HYPPO software. This script contains a function called train that does the actual training. The input list of arguments of the train function should contain, among other possible variables, the hyperparameters that will be optimized. Below is an example on how such a script would look like:

import torch
import argparse
from models.mlp import TwoLayerModel

def parse_args():
    parser = argparse.ArgumentParser('trainer.py')
    parser.add_argument('-n','--nodes', type=int, help='Number of nodes per layer', required=True) 
    parser.add_argument('-d','--dropout', type=float, default=0, help='Dropout rate') 
    parser.add_argument('-v','--verbose', action='store_true', help='Do verbose')
    return parser.parse_args()

def train(nodes,dropout=0,lr=0.01,epochs=10,verbose=False,**kwargs):
    target, label = torch.rand(100,10,10), torch.rand(100,1)
    mymodel = TwoLayerModel(nodes,dropout)
    if verbose: print(mymodel)
    loss_function = torch.nn.MSELoss()
    optimizer = torch.optim.SGD(mymodel.parameters(),lr=lr)
    for epoch in range(epochs):
        mymodel.train()
        mymodel.zero_grad()
        out = mymodel(target.float())
        loss = loss_function(out, label.float())
        if verbose: print('Epoch %i/%i Loss: %.5f'%(epoch+1,epochs,loss.item()))
        loss.backward()
        optimizer.step()
    return loss.item()

if __name__=="__main__":
    args = parse_args()
    train(**vars(args))

It may well be that the user also built additional modules containing other functions to be imported during the execution of the main script. For instance, in the above example, the model.py module contains the main class that built the neural network model which is imported within the trainer.py module.

import torch

class TwoLayerModel(torch.nn.Module):
    def __init__(self,nodes,dropout,input_size=100,output_size=1):
        super(TwoLayerModel, self).__init__()
        layers = []
        layers.append(torch.nn.Linear(input_size,nodes))
        layers.append(torch.nn.ReLU())
        layers.append(torch.nn.Linear(nodes,nodes))
        layers.append(torch.nn.ReLU())
        layers.append(torch.nn.Dropout(p=dropout))
        layers.append(torch.nn.Linear(nodes,output_size))
        self.layers = torch.nn.Sequential(*layers)

    def forward(self,data):
        data = data.view(-1,torch.mul(*data.shape[1:]))
        out = self.layers(data)
        return out

Very likely, the user made the main script ( trainer.py ) executable via the command line. For example, in the following code snippet, the script is called with the number of nodes set to a value of 50 per layers:

>>> python trainer.py -n 50 -v
TwoLayerModel(
  (layers): Sequential(
    (0): Linear(in_features=100, out_features=50, bias=True)
    (1): ReLU()
    (2): Linear(in_features=50, out_features=50, bias=True)
    (3): ReLU()
    (4): Dropout(p=0, inplace=False)
    (5): Linear(in_features=50, out_features=1, bias=True)
  )
)
Epoch 1/10 Loss: 0.29712
Epoch 2/10 Loss: 0.27439
Epoch 3/10 Loss: 0.25366
Epoch 4/10 Loss: 0.23482
Epoch 5/10 Loss: 0.21773
Epoch 6/10 Loss: 0.20225
Epoch 7/10 Loss: 0.18822
Epoch 8/10 Loss: 0.17563
Epoch 9/10 Loss: 0.16435
Epoch 10/10 Loss: 0.15434

Convert folder into package

In order to properly convert the above repository into a workable Python package, 3 steps should be followed, namely (1) use __init__.py file under every folder present in the repository, (2) adapt import lines for local modules, (3) add parent directory to PYTHONPATH. Below we go through each step to clarify the process.

Initialization file

The initialization file, or __init__.py python module, is used to make the repository and any subdirectories containing such file readable by the Python interpreter so that the modules can be properly imported. Below we show how the original structure should be modified to include __init__.py in relevant places where other python modules exist:

$HOME
└── mycode
    ├── __init__.py
    ├── models
    │   ├── __init__.py
    │   └── mlp.py
    └── trainer.py

Module import adaptation

If python3 is used, the import lines should be changed wherever local modules are being called. For instance, in the main script trainer.py , a class from the local submodule mlp.py is being imported. In order for the import to work properly within the new package-style structure, a dot should be added prior to the relative path to the local submodule as shown below:

import torch
import argparse
from .models.mlp import TwoLayerModel

def parse_args():
    parser = argparse.ArgumentParser('trainer.py')
    parser.add_argument('-n','--nodes', type=int, help='Number of nodes per layer', required=True) 
    parser.add_argument('-d','--dropout', type=float, default=0, help='Dropout rate') 
    parser.add_argument('-v','--verbose', action='store_true', help='Do verbose')
    return parser.parse_args()

def train(nodes,dropout=0,lr=0.01,epochs=10,verbose=False,**kwargs):
    target, label = torch.rand(100,10,10), torch.rand(100,1)
    mymodel = TwoLayerModel(nodes,dropout)
    if verbose: print(mymodel)
    loss_function = torch.nn.MSELoss()
    optimizer = torch.optim.SGD(mymodel.parameters(),lr=lr)
    for epoch in range(epochs):
        mymodel.train()
        mymodel.zero_grad()
        out = mymodel(target.float())
        loss = loss_function(out, label.float())
        if verbose: print('Epoch %i/%i Loss: %.5f'%(epoch+1,epochs,loss.item()))
        loss.backward()
        optimizer.step()
    return loss.item()

if __name__=="__main__":
    args = parse_args()
    train(**vars(args))

PYTHONPATH export

Finally, in order for the package to be found within the HYPPO software, the parent directory in which the mycode repository is located should be included in the system’s PYTHONPATH which can easily be achieved by adding the following line in the startup file ($HOME should be replaced by whichever path the repository is placed):

export PYTHONPATH=$PYTHONPATH:$HOME

HYPPO on external package

Import external package

If the external repository is successfully modified into a proper python package, the user should be able to import the train method directly from a python script outside the package as shown below.

>>> from mycode.trainer import train
>>> loss = train(nodes=50,verbose=True)
TwoLayerModel(
  (layers): Sequential(
    (0): Linear(in_features=100, out_features=50, bias=True)
    (1): ReLU()
    (2): Linear(in_features=50, out_features=50, bias=True)
    (3): ReLU()
    (4): Dropout(p=0, inplace=False)
    (5): Linear(in_features=50, out_features=1, bias=True)
  )
)
Epoch 1/10 Loss: 0.49820
Epoch 2/10 Loss: 0.45433
Epoch 3/10 Loss: 0.41454
Epoch 4/10 Loss: 0.37843
Epoch 5/10 Loss: 0.34569
Epoch 6/10 Loss: 0.31577
Epoch 7/10 Loss: 0.28842
Epoch 8/10 Loss: 0.26350
Epoch 9/10 Loss: 0.24092
Epoch 10/10 Loss: 0.22040
>>> loss
0.22039563953876495

Configuration file

A simple configuration file can then be created that calls the newly created external package along with the hyperparameters to be optimized and their range of evaluation:

trainer: mycode.trainer.train

prms:
  names : [epochs,nodes,dropout]
  mult : [ 1, 5,0.01]
  xlow : [ 1, 1, 1]
  xup :  [10,10,10]

hpo:
  nevals : 3

The following shows an example of a HYPPO execution on the external package:

>>> hyppo evaluation config.yaml
2021-06-14 23:16:44,472 INFO Configuration: {'trainer': 'mycode.trainer.train', 'prms': {'names': ['epochs', 'nodes', 'dropout'], 'mult': [1, 5, 0.01], 'xlow': [1, 1, 1], 'xup': [10, 10, 10]}, 'hpo': {'nevals': 3}}
2021-06-14 23:16:44,473 INFO EVALUATION   1 / 3
2021-06-14 23:16:44,474 INFO Samples: [ 5  4 10]
2021-06-14 23:16:44,962 INFO Output of objective function: 0.43182
2021-06-14 23:16:44,962 INFO Execution time: 0.48868 seconds.
2021-06-14 23:16:44,963 INFO EVALUATION   2 / 3
2021-06-14 23:16:44,963 INFO Samples: [10  5 10]
2021-06-14 23:16:44,972 INFO Output of objective function: 0.17925
2021-06-14 23:16:44,972 INFO Execution time: 0.00903 seconds.
2021-06-14 23:16:44,972 INFO EVALUATION   3 / 3
2021-06-14 23:16:44,973 INFO Samples: [9 4 5]
2021-06-14 23:16:44,981 INFO Output of objective function: 0.31150
2021-06-14 23:16:44,981 INFO Execution time: 0.00834 seconds.