deepdow.data module¶
Collection of functions related to data.
-
class
Compose
(transforms)[source]¶ Bases:
object
Meta transform inspired by torchvision.
- Parameters
transforms (list) – List of callables that represent transforms to be composed.
-
__call__
(X_sample, y_sample, timestamps_sample, asset_names)[source]¶ Transform.
- Parameters
X_sample (torch.Tensor) – Feature vector of shape (n_channels, lookback, n_assets).
y_sample (torch.Tesnor) – Target vector of shape (n_channels, horizon, n_assets).
timestamps_sample (datetime) – Time stamp of the sample.
asset_names – Asset names corresponding to the last channel of X_sample and y_sample.
- Returns
X_sample_new (torch.Tensor) – Transformed version of X_sample.
y_sample_new (torch.Tesnor) – Transformed version of y_sample.
timestamps_sample_new (datetime) – Transformed version of timestamps_sample.
asset_names_new – Transformed version of asset_names.
-
class
Dropout
(p=0.2, training=True)[source]¶ Bases:
object
Set random elements of the input to zero with probability p.
- Parameters
p (float) – Probability of setting an element to zero.
training (bool) – If False, then dropout disabled no matter what the p is.
-
__call__
(X_sample, y_sample, timestamps_sample, asset_names)[source]¶ Perform transform.
- Parameters
X_sample (torch.Tensor) – Feature vector of shape (n_channels, lookback, n_assets).
y_sample (torch.Tesnor) – Target vector of shape (n_channels, horizon, n_assets).
timestamps_sample (datetime) – Time stamp of the sample.
asset_names – Asset names corresponding to the last channel of X_sample and y_sample.
- Returns
X_sample_new (torch.Tensor) – Feature vector of shape (n_channels, lookback, n_assets) with some elements being set to zero.
y_sample (torch.Tensor) – Same as input.
timestamps_sample (datetime) – Same as input.
asset_names – Same as input.
-
class
FlexibleDataLoader
(dataset, indices=None, n_assets_range=None, lookback_range=None, horizon_range=None, asset_ixs=None, scaler=None, **kwargs)[source]¶ Bases:
torch.utils.data.dataloader.DataLoader
Flexible data loader.
Flexible data loader is well suited for training because one can train the network on different lookbacks, horizons and assets. However, it is not well suited for validation.
- Parameters
dataset (InRAMDataset) – Dataset containing the actual data.
indices (list or None) – List of indices to consider from the provided dataset which is inherently ordered. If None then considering all the samples.
n_assets_range (tuple or None) – Only used if asset_ixs is None. Minimum and maximum (only left included) number of assets that are randomly subselected.
lookback_range (tuple or None) – Minimum and maximum (only left included) of the lookback that is uniformly sampled. If not specified then using (2, dataset.lookback + 1) which is the biggest range.
horizon_range (tuple) – Minimum and maximum (only left included) of the horizon that is uniformly sampled. If not specified then using (2, dataset.horizon + 1) which is the biggest range.
asset_ixs (None or list) – If None, and n_assets_range specified then n_assets sampled randomly based on n_assets_range. If
list
then it represents the indices of desired assets - no randomness. If both asset_ixs and n_assets_range are None then asset_ixs automatically assumed to be all possible indices.scaler (None or {'standard', 'percent'}) – If None then no scaling applied. If string then a specific scaling theme. Only applied to X_batch.
-
property
hparams
¶ Generate dictionary of relevant parameters.
-
class
InRAMDataset
(X, y, timestamps=None, asset_names=None, transform=None)[source]¶ Bases:
torch.utils.data.dataset.Dataset
Dataset that lives entirely in RAM.
- Parameters
X (np.ndarray) – Full features dataset of shape (n_samples, n_input_channels, lookback, n_assets).
y (np.ndarray) – Full targets dataset of shape (n_samples, n_input_channels, horizon, n_assets).
timestamps (None or array-like) – If not None then of shape (n_samples,) representing a timestamp for each sample.
asset_names (None or array-like) – If not None then of shape (n_assets, ) representing the names of assets.
transform (None or callable) – If provided, then a callable that transforms a single sample.
-
class
Multiply
(c=100)[source]¶ Bases:
object
Transform multiplying the feature tensor X with a constant.
-
__call__
(X_sample, y_sample, timestamps_sample, asset_names)[source]¶ Perform transform.
- Parameters
X_sample (torch.Tensor) – Feature vector of shape (n_channels, lookback, n_assets).
y_sample (torch.Tesnor) – Target vector of shape (n_channels, horizon, n_assets).
timestamps_sample (datetime) – Time stamp of the sample.
asset_names – Asset names corresponding to the last channel of X_sample and y_sample.
- Returns
X_sample_new (torch.Tensor) – Feature vector of shape (n_channels, lookback, n_assets) multiplied by a constant self.c.
y_sample (torch.Tesnor) – Same as input.
timestamps_sample (datetime) – Same as input.
asset_names – Same as input.
-
-
class
Noise
(frac=0.2)[source]¶ Bases:
object
Add noise to each of the channels.
Random (Gaussian) noise is added to the original features X. One can control the standard deviation of the noise via the frac parameter. Mathematically, std(X_noise) = std(X) * frac for each channel.
-
__call__
(X_sample, y_sample, timestamps_sample, asset_names)[source]¶ Perform transform.
- Parameters
X_sample (torch.Tensor) – Feature vector of shape (n_channels, lookback, n_assets).
y_sample (torch.Tensor) – Target vector of shape (n_channels, horizon, n_assets).
timestamps_sample (datetime) – Time stamp of the sample.
asset_names – Asset names corresponding to the last channel of X_sample and y_sample.
- Returns
X_sample_new (torch.Tensor) – Feature vector of shape (n_channels, lookback, n_assets) with some added noise.
y_sample (torch.Tesnor) – Same as input.
timestamps_sample (datetime) – Same as input.
asset_names – Same as input.
-
-
class
RigidDataLoader
(dataset, asset_ixs=None, indices=None, lookback=None, horizon=None, scaler=None, **kwargs)[source]¶ Bases:
torch.utils.data.dataloader.DataLoader
Rigid data loader.
Rigid data loader is well suited for validation purposes since all horizon, lookback and assets are frozen. However, it might not be that good for training since it enforces the user to choose a single setup.
- Parameters
dataset (torch.utils.data.Dataset) – Instance of our dataset. See
InRAMDataset
for more details.asset_ixs (list or None) – Represents indices of considered assets (not asset names). If None then considering all assets.
indices (list or None) – List of indices to consider (not timestamps) from the provided dataset which is inherently ordered. If None then consider all the samples.
lookback (int or None) – How many time steps do we look back. If None then taking the maximum lookback from dataset.
horizon (int or None) – How many time steps we look forward. If None then taking the maximum horizon from dataset.
scaler (None or {'standard', 'percent'}) – If None then no scaling applied. If string then a specific scaling theme. Only applied to X_batch.
-
property
hparams
¶ Generate dictionary of relevant parameters.
-
collate_uniform
(batch, n_assets_range=5, 10, lookback_range=2, 20, horizon_range=3, 15, asset_ixs=None, random_state=None, scaler=None)[source]¶ Create batch of samples.
Randomly (from uniform distribution) selects assets, lookback and horizon. If assets are specified then assets kept constant.
- Parameters
batch (list) – List of tuples representing (X_sample, y_sample, timestamp_sample, asset_names). Note that the sample dimension is not present and all the other dimensions are full (as determined by the dataset).
n_assets_range (tuple) – Minimum and maximum (only left included) number of assets that are randomly subselected. Ignored if asset_ixs specified.
lookback_range (tuple) – Minimum and maximum (only left included) of the lookback that is randomly selected.
horizon_range (tuple) – Minimum and maximum (only left included) of the horizon that is randomly selected.
asset_ixs (None or list) – If None, then n_assets sampled randomly. If
list
then it represents the indices of desired assets - no randomness and n_assets_range is not used.random_state (int or None) – Random state.
scaler (None or {'standard', 'percent'}) – If None then no scaling applied. If string then a specific scaling theme. Only applied to X_batch.
- Returns
X_batch (torch.Tensor) – Features batch of shape (batch_size, n_input_channels, sampled_lookback, n_sampled_assets).
y_batch (torch.Tensor) – Targets batch of shape (batch_size, n_input_channels, sampled_horizon, n_sampled_assets).
timestamps_batch (list) – List of timestamps (per sample).
asset_names_batch (list) – List of asset names in the batch (same for each sample).
-
scale_features
(X, approach='standard')[source]¶ Scale feature matrix.
- Parameters
X (torch.Tensor) – Tensor of shape (n_samples, n_channels, lookback, n_assets). Unscaled
approach (str, {'standard', 'percent'}) – How to scale features.
- Returns
X_scaled – Tensor of shape (n_samples, n_channels, lookback, n_assets). Scaled.
- Return type
torch.tensor