deepdow.data.augment module

Collection of callable functions that augment deepdow tensors.

class Compose(transforms)[source]

Bases: object

Meta transform inspired by torchvision.

Parameters

transforms (list) – List of callables that represent transforms to be composed.

__call__(X_sample, y_sample, timestamps_sample, asset_names)[source]

Transform.

Parameters
  • X_sample (torch.Tensor) – Feature vector of shape (n_channels, lookback, n_assets).

  • y_sample (torch.Tesnor) – Target vector of shape (n_channels, horizon, n_assets).

  • timestamps_sample (datetime) – Time stamp of the sample.

  • asset_names – Asset names corresponding to the last channel of X_sample and y_sample.

Returns

  • X_sample_new (torch.Tensor) – Transformed version of X_sample.

  • y_sample_new (torch.Tesnor) – Transformed version of y_sample.

  • timestamps_sample_new (datetime) – Transformed version of timestamps_sample.

  • asset_names_new – Transformed version of asset_names.

class Dropout(p=0.2, training=True)[source]

Bases: object

Set random elements of the input to zero with probability p.

Parameters
  • p (float) – Probability of setting an element to zero.

  • training (bool) – If False, then dropout disabled no matter what the p is. Note that if True then dropout enabled and at the same time all the elements are scaled by 1/p.

__call__(X_sample, y_sample, timestamps_sample, asset_names)[source]

Perform transform.

Parameters
  • X_sample (torch.Tensor) – Feature vector of shape (n_channels, lookback, n_assets).

  • y_sample (torch.Tesnor) – Target vector of shape (n_channels, horizon, n_assets).

  • timestamps_sample (datetime) – Time stamp of the sample.

  • asset_names – Asset names corresponding to the last channel of X_sample and y_sample.

Returns

  • X_sample_new (torch.Tensor) – Feature vector of shape (n_channels, lookback, n_assets) with some elements being set to zero.

  • y_sample (torch.Tensor) – Same as input.

  • timestamps_sample (datetime) – Same as input.

  • asset_names – Same as input.

class Multiply(c=100)[source]

Bases: object

Transform multiplying the feature tensor X with a constant.

__call__(X_sample, y_sample, timestamps_sample, asset_names)[source]

Perform transform.

Parameters
  • X_sample (torch.Tensor) – Feature vector of shape (n_channels, lookback, n_assets).

  • y_sample (torch.Tesnor) – Target vector of shape (n_channels, horizon, n_assets).

  • timestamps_sample (datetime) – Time stamp of the sample.

  • asset_names – Asset names corresponding to the last channel of X_sample and y_sample.

Returns

  • X_sample_new (torch.Tensor) – Feature vector of shape (n_channels, lookback, n_assets) multiplied by a constant self.c.

  • y_sample (torch.Tesnor) – Same as input.

  • timestamps_sample (datetime) – Same as input.

  • asset_names – Same as input.

class Noise(frac=0.2)[source]

Bases: object

Add noise to each of the channels.

Random (Gaussian) noise is added to the original features X. One can control the standard deviation of the noise via the frac parameter. Mathematically, std(X_noise) = std(X) * frac for each channel.

__call__(X_sample, y_sample, timestamps_sample, asset_names)[source]

Perform transform.

Parameters
  • X_sample (torch.Tensor) – Feature vector of shape (n_channels, lookback, n_assets).

  • y_sample (torch.Tensor) – Target vector of shape (n_channels, horizon, n_assets).

  • timestamps_sample (datetime) – Time stamp of the sample.

  • asset_names – Asset names corresponding to the last channel of X_sample and y_sample.

Returns

  • X_sample_new (torch.Tensor) – Feature vector of shape (n_channels, lookback, n_assets) with some added noise.

  • y_sample (torch.Tesnor) – Same as input.

  • timestamps_sample (datetime) – Same as input.

  • asset_names – Same as input.

class Scale(center, scale)[source]

Bases: object

Scale input features.

The input features are per channel centered to zero and scaled to one. We use the same terminology as scikit-learn. However, the equivalent in torchvision is Normalize.

Parameters
  • center (np.ndarray) – 1D array of shape (n_channels,) representing the center of the features (mean or median). Needs to be precomputed in advance.

  • scale (np.ndarray) – 1D array of shape (n_channels,) representing the scale of the features (standard deviation or quantile range). Needs to be precomputed in advance.

__call__(X_sample, y_sample, timestamps_sample, asset_names)[source]

Perform transform.

Parameters
  • X_sample (torch.Tensor) – Feature vector of shape (n_channels, lookback, n_assets).

  • y_sample (torch.Tensor) – Target vector of shape (n_channels, horizon, n_assets).

  • timestamps_sample (datetime) – Time stamp of the sample.

  • asset_names – Asset names corresponding to the last channel of X_sample and y_sample.

Returns

  • X_sample_new (torch.Tensor) – Feature vector of shape (n_channels, lookback, n_assets) scaled appropriately.

  • y_sample (torch.Tesnor) – Same as input.

  • timestamps_sample (datetime) – Same as input.

  • asset_names – Same as input.

prepare_robust_scaler(X, overlap=False, indices=None, percentile_range=(25, 75))[source]

Compute median and percentile range for each channel.

Parameters
  • X (np.ndarray) – Full features array of shape (n_samples, n_channels, lookback, n_assets).

  • overlap (bool) – If False, then only using the most recent timestep. This will guarantee that not counting the same thing multiple times.

  • indices (list or None) – List of indices to consider from the X.shape[0] dimension. If None then considering all the samples.

  • percentile_range (tuple) – The left and right percentile to consider. Needs to be in [0, 100].

Returns

  • medians (np.ndarray) – Median of each channel. Shape (n_channels,).

  • ranges (np.ndarray) – Interquantile range for each channel. Shape (n_channels,).

prepare_standard_scaler(X, overlap=False, indices=None)[source]

Compute mean and standard deviation for each channel.

Parameters
  • X (np.ndarray) – Full features array of shape (n_samples, n_channels, lookback, n_assets).

  • overlap (bool) – If False, then only using the most recent timestep. This will guarantee that not counting the same thing multiple times.

  • indices (list or None) – List of indices to consider from the X.shape[0] dimension. If None then considering all the samples.

Returns

  • means (np.ndarray) – Mean of each channel. Shape (n_channels,).

  • stds (np.ndarray) – Standard deviation of each channel. Shape (n_channels,).