deepdow.data.load module¶
Collection of functions related to data.
-
class
FlexibleDataLoader
(dataset, indices=None, n_assets_range=None, lookback_range=None, horizon_range=None, asset_ixs=None, batch_size=1, drop_last=False, **kwargs)[source]¶ Bases:
Generic
[torch.utils.data.dataloader.T_co
]Flexible data loader.
Flexible data loader is well suited for training because one can train the network on different lookbacks, horizons and assets. However, it is not well suited for validation.
- Parameters
dataset (InRAMDataset) – Dataset containing the actual data.
indices (list or None) – List of indices to consider from the provided dataset which is inherently ordered. If None then considering all the samples.
n_assets_range (tuple or None) – Only used if asset_ixs is None. Minimum and maximum (only left included) number of assets that are randomly subselected.
lookback_range (tuple or None) – Minimum and maximum (only left included) of the lookback that is uniformly sampled. If not specified then using (2, dataset.lookback + 1) which is the biggest range.
horizon_range (tuple) – Minimum and maximum (only left included) of the horizon that is uniformly sampled. If not specified then using (2, dataset.horizon + 1) which is the biggest range.
asset_ixs (None or list) – If None, and n_assets_range specified then n_assets sampled randomly based on n_assets_range. If
list
then it represents the indices of desired assets - no randomness. If both asset_ixs and n_assets_range are None then asset_ixs automatically assumed to be all possible indices.batch_size (int) – Number of samples in a batch.
drop_last (bool) – If True, then the last batch that does not have batch_size samples is dropped.
-
batch_size
: Optional[int]¶
-
dataset
: torch.utils.data.dataset.Dataset[T_co]¶
-
drop_last
: bool¶
-
property
hparams
¶ Generate dictionary of relevant parameters.
-
num_workers
: int¶
-
pin_memory
: bool¶
-
prefetch_factor
: int¶
-
sampler
: torch.utils.data.sampler.Sampler¶
-
timeout
: float¶
-
class
InRAMDataset
(X, y, timestamps=None, asset_names=None, transform=None)[source]¶ Bases:
Generic
[torch.utils.data.dataset.T_co
]Dataset that lives entirely in RAM.
- Parameters
X (np.ndarray) – Full features dataset of shape (n_samples, n_input_channels, lookback, n_assets).
y (np.ndarray) – Full targets dataset of shape (n_samples, n_input_channels, horizon, n_assets).
timestamps (None or array-like) – If not None then of shape (n_samples,) representing a timestamp for each sample.
asset_names (None or array-like) – If not None then of shape (n_assets, ) representing the names of assets.
transform (None or callable) – If provided, then a callable that transforms a single sample.
-
class
RigidDataLoader
(dataset, asset_ixs=None, indices=None, lookback=None, horizon=None, drop_last=False, batch_size=1, **kwargs)[source]¶ Bases:
Generic
[torch.utils.data.dataloader.T_co
]Rigid data loader.
Rigid data loader is well suited for validation purposes since all horizon, lookback and assets are frozen. However, it might not be that good for training since it enforces the user to choose a single setup.
- Parameters
dataset (torch.utils.data.Dataset) – Instance of our dataset. See
InRAMDataset
for more details.asset_ixs (list or None) – Represents indices of considered assets (not asset names). If None then considering all assets.
indices (list or None) – List of indices to consider (not timestamps) from the provided dataset which is inherently ordered. If None then consider all the samples.
lookback (int or None) – How many time steps do we look back. If None then taking the maximum lookback from dataset.
horizon (int or None) – How many time steps we look forward. If None then taking the maximum horizon from dataset.
batch_size (int) – Number of samples in a batch.
drop_last (bool) – If True, then the last batch that does not have batch_size samples is dropped.
-
batch_size
: Optional[int]¶
-
dataset
: torch.utils.data.dataset.Dataset[T_co]¶
-
drop_last
: bool¶
-
property
hparams
¶ Generate dictionary of relevant parameters.
-
num_workers
: int¶
-
pin_memory
: bool¶
-
prefetch_factor
: int¶
-
sampler
: torch.utils.data.sampler.Sampler¶
-
timeout
: float¶
-
collate_uniform
(batch, n_assets_range=(5, 10), lookback_range=(2, 20), horizon_range=(3, 15), asset_ixs=None, random_state=None)[source]¶ Create batch of samples.
Randomly (from uniform distribution) selects assets, lookback and horizon. If assets are specified then assets kept constant.
- Parameters
batch (list) – List of tuples representing (X_sample, y_sample, timestamp_sample, asset_names). Note that the sample dimension is not present and all the other dimensions are full (as determined by the dataset).
n_assets_range (tuple) – Minimum and maximum (only left included) number of assets that are randomly subselected. Ignored if asset_ixs specified.
lookback_range (tuple) – Minimum and maximum (only left included) of the lookback that is randomly selected.
horizon_range (tuple) – Minimum and maximum (only left included) of the horizon that is randomly selected.
asset_ixs (None or list) – If None, then n_assets sampled randomly. If
list
then it represents the indices of desired assets - no randomness and n_assets_range is not used.random_state (int or None) – Random state.
- Returns
X_batch (torch.Tensor) – Features batch of shape (batch_size, n_input_channels, sampled_lookback, n_sampled_assets).
y_batch (torch.Tensor) – Targets batch of shape (batch_size, n_input_channels, sampled_horizon, n_sampled_assets).
timestamps_batch (list) – List of timestamps (per sample).
asset_names_batch (list) – List of asset names in the batch (same for each sample).