deepdow.data.load module

Collection of functions related to data.

class FlexibleDataLoader(dataset, indices=None, n_assets_range=None, lookback_range=None, horizon_range=None, asset_ixs=None, batch_size=1, drop_last=False, **kwargs)[source]

Bases: Generic[torch.utils.data.dataloader.T_co]

Flexible data loader.

Flexible data loader is well suited for training because one can train the network on different lookbacks, horizons and assets. However, it is not well suited for validation.

Parameters
  • dataset (InRAMDataset) – Dataset containing the actual data.

  • indices (list or None) – List of indices to consider from the provided dataset which is inherently ordered. If None then considering all the samples.

  • n_assets_range (tuple or None) – Only used if asset_ixs is None. Minimum and maximum (only left included) number of assets that are randomly subselected.

  • lookback_range (tuple or None) – Minimum and maximum (only left included) of the lookback that is uniformly sampled. If not specified then using (2, dataset.lookback + 1) which is the biggest range.

  • horizon_range (tuple) – Minimum and maximum (only left included) of the horizon that is uniformly sampled. If not specified then using (2, dataset.horizon + 1) which is the biggest range.

  • asset_ixs (None or list) – If None, and n_assets_range specified then n_assets sampled randomly based on n_assets_range. If list then it represents the indices of desired assets - no randomness. If both asset_ixs and n_assets_range are None then asset_ixs automatically assumed to be all possible indices.

  • batch_size (int) – Number of samples in a batch.

  • drop_last (bool) – If True, then the last batch that does not have batch_size samples is dropped.

batch_size: Optional[int]
dataset: torch.utils.data.dataset.Dataset[T_co]
drop_last: bool
property hparams

Generate dictionary of relevant parameters.

num_workers: int
pin_memory: bool
prefetch_factor: int
sampler: torch.utils.data.sampler.Sampler
timeout: float
class InRAMDataset(X, y, timestamps=None, asset_names=None, transform=None)[source]

Bases: Generic[torch.utils.data.dataset.T_co]

Dataset that lives entirely in RAM.

Parameters
  • X (np.ndarray) – Full features dataset of shape (n_samples, n_input_channels, lookback, n_assets).

  • y (np.ndarray) – Full targets dataset of shape (n_samples, n_input_channels, horizon, n_assets).

  • timestamps (None or array-like) – If not None then of shape (n_samples,) representing a timestamp for each sample.

  • asset_names (None or array-like) – If not None then of shape (n_assets, ) representing the names of assets.

  • transform (None or callable) – If provided, then a callable that transforms a single sample.

class RigidDataLoader(dataset, asset_ixs=None, indices=None, lookback=None, horizon=None, drop_last=False, batch_size=1, **kwargs)[source]

Bases: Generic[torch.utils.data.dataloader.T_co]

Rigid data loader.

Rigid data loader is well suited for validation purposes since all horizon, lookback and assets are frozen. However, it might not be that good for training since it enforces the user to choose a single setup.

Parameters
  • dataset (torch.utils.data.Dataset) – Instance of our dataset. See InRAMDataset for more details.

  • asset_ixs (list or None) – Represents indices of considered assets (not asset names). If None then considering all assets.

  • indices (list or None) – List of indices to consider (not timestamps) from the provided dataset which is inherently ordered. If None then consider all the samples.

  • lookback (int or None) – How many time steps do we look back. If None then taking the maximum lookback from dataset.

  • horizon (int or None) – How many time steps we look forward. If None then taking the maximum horizon from dataset.

  • batch_size (int) – Number of samples in a batch.

  • drop_last (bool) – If True, then the last batch that does not have batch_size samples is dropped.

batch_size: Optional[int]
dataset: torch.utils.data.dataset.Dataset[T_co]
drop_last: bool
property hparams

Generate dictionary of relevant parameters.

num_workers: int
pin_memory: bool
prefetch_factor: int
sampler: torch.utils.data.sampler.Sampler
timeout: float
collate_uniform(batch, n_assets_range=(5, 10), lookback_range=(2, 20), horizon_range=(3, 15), asset_ixs=None, random_state=None)[source]

Create batch of samples.

Randomly (from uniform distribution) selects assets, lookback and horizon. If assets are specified then assets kept constant.

Parameters
  • batch (list) – List of tuples representing (X_sample, y_sample, timestamp_sample, asset_names). Note that the sample dimension is not present and all the other dimensions are full (as determined by the dataset).

  • n_assets_range (tuple) – Minimum and maximum (only left included) number of assets that are randomly subselected. Ignored if asset_ixs specified.

  • lookback_range (tuple) – Minimum and maximum (only left included) of the lookback that is randomly selected.

  • horizon_range (tuple) – Minimum and maximum (only left included) of the horizon that is randomly selected.

  • asset_ixs (None or list) – If None, then n_assets sampled randomly. If list then it represents the indices of desired assets - no randomness and n_assets_range is not used.

  • random_state (int or None) – Random state.

Returns

  • X_batch (torch.Tensor) – Features batch of shape (batch_size, n_input_channels, sampled_lookback, n_sampled_assets).

  • y_batch (torch.Tensor) – Targets batch of shape (batch_size, n_input_channels, sampled_horizon, n_sampled_assets).

  • timestamps_batch (list) – List of timestamps (per sample).

  • asset_names_batch (list) – List of asset names in the batch (same for each sample).