Dataset Preparation

When using built-in models, the user must specify which data to be used. In this section, we describe how custom or public datasets can be loaded.

Public Datasets

Daily Temperatures

CIFAR10 dataset

hyppo.datasets.cifar10. get_data ( library = None , data_path = './data' , ** kwargs ) [source]

Loading CIFAR10 dataset . This image dataset consists of 60,000 32x32 colour images across 10 classes and can be used to test image classification problems. Depending on which library is being used, this function will load the public dataset accordingly.

Parameters :
library str

Machine Learning Library

data_path str

Path to data repository

Returns :
data dict

Training, validation and testing datasets.

Examples

>>> from hyppo.dataset.cifar10 import get_data
>>> get_data(library='pt')
{'dataset': 'cifar10',
 'train': <torch.utils.data.dataset.Subset at 0x11deb3090>,
 'valid': <torch.utils.data.dataset.Subset at 0x11deb3fd0>,
 'test': Dataset CIFAR10
     Number of datapoints: 10000
     Root location: ./data
     Split: Test
     StandardTransform
 Transform: Compose(
                ToTensor()
            )}

Warning

When using the CIFAR10 dataset for image classification, it is important to remember that the classifcation is done over 10 classes. While building the neural network, the size of the output layer should therefore contain 10 neurons, one for each class. Also, the Cross-Entropy Loss will be used to do the training.

Custom Datasets

Fake time-series data

hyppo.datasets.fake. get_data ( n_out , n_timestamp , record = None , verbose = False , n = 4000 , limit_low = 0 , limit_high = 0.48 , ** kwargs ) [source]

https://stackoverflow.com/questions/36286566/how-to-generate-noisy-mock-time-series-or-signal-in-python

Network data

hyppo.datasets.network. get_data ( data_path , link , n_out , n_timestamp , verbose = False , ** kwargs ) [source]

PyTorch data loaders