deepdow.data module

Collection of functions related to data.

class Compose(transforms)[source]

Bases: object

Meta transform inspired by torchvision.

Parameters

transforms (list) – List of callables that represent transforms to be composed.

__call__(X_sample, y_sample, timestamps_sample, asset_names)[source]

Transform.

Parameters
  • X_sample (torch.Tensor) – Feature vector of shape (n_channels, lookback, n_assets).

  • y_sample (torch.Tesnor) – Target vector of shape (n_channels, horizon, n_assets).

  • timestamps_sample (datetime) – Time stamp of the sample.

  • asset_names – Asset names corresponding to the last channel of X_sample and y_sample.

Returns

  • X_sample_new (torch.Tensor) – Transformed version of X_sample.

  • y_sample_new (torch.Tesnor) – Transformed version of y_sample.

  • timestamps_sample_new (datetime) – Transformed version of timestamps_sample.

  • asset_names_new – Transformed version of asset_names.

class Dropout(p=0.2, training=True)[source]

Bases: object

Set random elements of the input to zero with probability p.

Parameters
  • p (float) – Probability of setting an element to zero.

  • training (bool) – If False, then dropout disabled no matter what the p is.

__call__(X_sample, y_sample, timestamps_sample, asset_names)[source]

Perform transform.

Parameters
  • X_sample (torch.Tensor) – Feature vector of shape (n_channels, lookback, n_assets).

  • y_sample (torch.Tesnor) – Target vector of shape (n_channels, horizon, n_assets).

  • timestamps_sample (datetime) – Time stamp of the sample.

  • asset_names – Asset names corresponding to the last channel of X_sample and y_sample.

Returns

  • X_sample_new (torch.Tensor) – Feature vector of shape (n_channels, lookback, n_assets) with some elements being set to zero.

  • y_sample (torch.Tensor) – Same as input.

  • timestamps_sample (datetime) – Same as input.

  • asset_names – Same as input.

class FlexibleDataLoader(dataset, indices=None, n_assets_range=None, lookback_range=None, horizon_range=None, asset_ixs=None, scaler=None, **kwargs)[source]

Bases: torch.utils.data.dataloader.DataLoader

Flexible data loader.

Flexible data loader is well suited for training because one can train the network on different lookbacks, horizons and assets. However, it is not well suited for validation.

Parameters
  • dataset (InRAMDataset) – Dataset containing the actual data.

  • indices (list or None) – List of indices to consider from the provided dataset which is inherently ordered. If None then considering all the samples.

  • n_assets_range (tuple or None) – Only used if asset_ixs is None. Minimum and maximum (only left included) number of assets that are randomly subselected.

  • lookback_range (tuple or None) – Minimum and maximum (only left included) of the lookback that is uniformly sampled. If not specified then using (2, dataset.lookback + 1) which is the biggest range.

  • horizon_range (tuple) – Minimum and maximum (only left included) of the horizon that is uniformly sampled. If not specified then using (2, dataset.horizon + 1) which is the biggest range.

  • asset_ixs (None or list) – If None, and n_assets_range specified then n_assets sampled randomly based on n_assets_range. If list then it represents the indices of desired assets - no randomness. If both asset_ixs and n_assets_range are None then asset_ixs automatically assumed to be all possible indices.

  • scaler (None or {'standard', 'percent'}) – If None then no scaling applied. If string then a specific scaling theme. Only applied to X_batch.

property hparams

Generate dictionary of relevant parameters.

class InRAMDataset(X, y, timestamps=None, asset_names=None, transform=None)[source]

Bases: torch.utils.data.dataset.Dataset

Dataset that lives entirely in RAM.

Parameters
  • X (np.ndarray) – Full features dataset of shape (n_samples, n_input_channels, lookback, n_assets).

  • y (np.ndarray) – Full targets dataset of shape (n_samples, n_input_channels, horizon, n_assets).

  • timestamps (None or array-like) – If not None then of shape (n_samples,) representing a timestamp for each sample.

  • asset_names (None or array-like) – If not None then of shape (n_assets, ) representing the names of assets.

  • transform (None or callable) – If provided, then a callable that transforms a single sample.

class Multiply(c=100)[source]

Bases: object

Transform multiplying the feature tensor X with a constant.

__call__(X_sample, y_sample, timestamps_sample, asset_names)[source]

Perform transform.

Parameters
  • X_sample (torch.Tensor) – Feature vector of shape (n_channels, lookback, n_assets).

  • y_sample (torch.Tesnor) – Target vector of shape (n_channels, horizon, n_assets).

  • timestamps_sample (datetime) – Time stamp of the sample.

  • asset_names – Asset names corresponding to the last channel of X_sample and y_sample.

Returns

  • X_sample_new (torch.Tensor) – Feature vector of shape (n_channels, lookback, n_assets) multiplied by a constant self.c.

  • y_sample (torch.Tesnor) – Same as input.

  • timestamps_sample (datetime) – Same as input.

  • asset_names – Same as input.

class Noise(frac=0.2)[source]

Bases: object

Add noise to each of the channels.

Random (Gaussian) noise is added to the original features X. One can control the standard deviation of the noise via the frac parameter. Mathematically, std(X_noise) = std(X) * frac for each channel.

__call__(X_sample, y_sample, timestamps_sample, asset_names)[source]

Perform transform.

Parameters
  • X_sample (torch.Tensor) – Feature vector of shape (n_channels, lookback, n_assets).

  • y_sample (torch.Tensor) – Target vector of shape (n_channels, horizon, n_assets).

  • timestamps_sample (datetime) – Time stamp of the sample.

  • asset_names – Asset names corresponding to the last channel of X_sample and y_sample.

Returns

  • X_sample_new (torch.Tensor) – Feature vector of shape (n_channels, lookback, n_assets) with some added noise.

  • y_sample (torch.Tesnor) – Same as input.

  • timestamps_sample (datetime) – Same as input.

  • asset_names – Same as input.

class RigidDataLoader(dataset, asset_ixs=None, indices=None, lookback=None, horizon=None, scaler=None, **kwargs)[source]

Bases: torch.utils.data.dataloader.DataLoader

Rigid data loader.

Rigid data loader is well suited for validation purposes since all horizon, lookback and assets are frozen. However, it might not be that good for training since it enforces the user to choose a single setup.

Parameters
  • dataset (torch.utils.data.Dataset) – Instance of our dataset. See InRAMDataset for more details.

  • asset_ixs (list or None) – Represents indices of considered assets (not asset names). If None then considering all assets.

  • indices (list or None) – List of indices to consider (not timestamps) from the provided dataset which is inherently ordered. If None then consider all the samples.

  • lookback (int or None) – How many time steps do we look back. If None then taking the maximum lookback from dataset.

  • horizon (int or None) – How many time steps we look forward. If None then taking the maximum horizon from dataset.

  • scaler (None or {'standard', 'percent'}) – If None then no scaling applied. If string then a specific scaling theme. Only applied to X_batch.

property hparams

Generate dictionary of relevant parameters.

collate_uniform(batch, n_assets_range=5, 10, lookback_range=2, 20, horizon_range=3, 15, asset_ixs=None, random_state=None, scaler=None)[source]

Create batch of samples.

Randomly (from uniform distribution) selects assets, lookback and horizon. If assets are specified then assets kept constant.

Parameters
  • batch (list) – List of tuples representing (X_sample, y_sample, timestamp_sample, asset_names). Note that the sample dimension is not present and all the other dimensions are full (as determined by the dataset).

  • n_assets_range (tuple) – Minimum and maximum (only left included) number of assets that are randomly subselected. Ignored if asset_ixs specified.

  • lookback_range (tuple) – Minimum and maximum (only left included) of the lookback that is randomly selected.

  • horizon_range (tuple) – Minimum and maximum (only left included) of the horizon that is randomly selected.

  • asset_ixs (None or list) – If None, then n_assets sampled randomly. If list then it represents the indices of desired assets - no randomness and n_assets_range is not used.

  • random_state (int or None) – Random state.

  • scaler (None or {'standard', 'percent'}) – If None then no scaling applied. If string then a specific scaling theme. Only applied to X_batch.

Returns

  • X_batch (torch.Tensor) – Features batch of shape (batch_size, n_input_channels, sampled_lookback, n_sampled_assets).

  • y_batch (torch.Tensor) – Targets batch of shape (batch_size, n_input_channels, sampled_horizon, n_sampled_assets).

  • timestamps_batch (list) – List of timestamps (per sample).

  • asset_names_batch (list) – List of asset names in the batch (same for each sample).

scale_features(X, approach='standard')[source]

Scale feature matrix.

Parameters
  • X (torch.Tensor) – Tensor of shape (n_samples, n_channels, lookback, n_assets). Unscaled

  • approach (str, {'standard', 'percent'}) – How to scale features.

Returns

X_scaled – Tensor of shape (n_samples, n_channels, lookback, n_assets). Scaled.

Return type

torch.tensor