Networks¶

The main goal of deepdow is to provide easy access to building end-to-end differentiable portfolio allocation neural networks. This section proposes multiple different networks and shows how to create new ones. To better understand this section, we encourage the user to read previous sections. In particular, the section Basics discusses the overall pipeline. Additionally, in Layers we describe in detail the building blocks of the networks.

Portfolio allocation neural network \(F\) is function that inputs a raw feature tensor x of shape (n_channels, lookback, n_assets). The output is the allocation vector w of shape (n_assets,) such that \(\sum_{i} w_{i} = 1\). The last requirement is that this function is parametrized by a vector \(\theta\).

Ideally, this network should propose the best portfolio w to be held for horizon number of time steps given what happened in the market up until now x. The actually meaning of best depends on with what loss function the network was trained.

In this section, we propose multiple architectures. They are by no means optimal and should serve as an example.

Existing networks¶

deepdow offers multiple networks which attempt to demonstrate the versatility of the framework. Note that these networks are by no means ideal and the users are encouraged to write their custom networks that are better suited for their use case. See Writing custom networks for more details. For more details on the exact usage see the deepdow.nn module.

Warning

All deepdow networks assume that the input and output tensors have an extra dimension in the front—the sample dimension. We omit this dimension on purpose to make the examples and sketches simpler.

BachelierNet¶

This network relies on RNN to extract features. To find the allocation it uses a convex optimizer. To determine the input of this convex optimizer (NumericalMarkowitz) we make alpha and gamma learnable but independent of the sample. On the other hand, rets and covmat are going to be some functions of the input sample x.

This network has non-trivial branching

Covariance matrix

input x (n_channels, lookback, n_assets)
normalized (n_channels, lookback, n_assets)
first channel (lookback, n_assets)
computed covariance matrix over assets (n_assets, n_assets)

Expected returns

input x (n_channels, lookback, n_assets)
normalized (n_channels, lookback, n_assets)
1D RNN hidden states (hidden_size, lookback, n_assets)
dropped out (hidden_size, lookback, n_assets)
attention over lookback (hidden_size, n_assets)
average over channels (n_assets,)

from deepdow.nn import BachelierNet

n_input_channels = 2
n_assets = 10
max_weight = 0.5
hidden_size = 32
network = BachelierNet(n_input_channels, n_assets, hidden_size=hidden_size, max_weight=max_weight)

print(network)

BachelierNet(
  (norm_layer): InstanceNorm2d(2, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
  (transform_layer): RNN(
    (cell): LSTM(2, 16, bidirectional=True)
  )
  (dropout_layer): Dropout(p=0.5, inplace=False)
  (time_collapse_layer): AttentionCollapse(
    (affine): Linear(in_features=32, out_features=32, bias=True)
    (context_vector): Linear(in_features=32, out_features=1, bias=False)
  )
  (covariance_layer): CovarianceMatrix()
  (channel_collapse_layer): AverageCollapse()
  (portfolio_opt_layer): NumericalMarkowitz(
    (cvxpylayer): CvxpyLayer()
  )
)

KeynesNet¶

This network connects 1D convolutions (or RNN) with softmax allocation. Note that his network learns the temperature parameter to be used inside the SoftmaxAllocator.

The activations have the following shape (omitting the sample dimension).

input x (n_channels, lookback, n_assets)
instance normalized (n_channels, lookback, n_assets)
extracted features (RNN or 1D Conv) (hidden_size, lookback, n_assets)
group normalized (hidden_size, lookback, n_assets)
relu (hidden_size, lookback, n_assets)
average over lookback (hidden_size, n_assets)
average over channels (n_assets,)
softmax allocation (n_assets,)

from deepdow.nn import KeynesNet

n_input_channels = 2
hidden_size = 32
n_groups = 4
transform_type = 'Conv'

network = KeynesNet(n_input_channels,
                    hidden_size=hidden_size,
                    transform_type=transform_type,
                    n_groups=n_groups)

print(network)

KeynesNet(
  (transform_layer): Conv(
    (conv): Conv1d(2, 32, kernel_size=(3,), stride=(1,), padding=(1,))
  )
  (norm_layer_1): InstanceNorm2d(2, eps=1e-05, momentum=0.1, affine=True, track_running_stats=False)
  (norm_layer_2): GroupNorm(4, 32, eps=1e-05, affine=True)
  (time_collapse_layer): AverageCollapse()
  (channel_collapse_layer): AverageCollapse()
  (portfolio_opt_layer): SoftmaxAllocator(
    (layer): Softmax(dim=1)
  )
)

LinearNet¶

This network is very particular, since it uses no structural information contained in the input x. In other words, if we randomly shuffle all our inputs along any dimension and retrain this network, it will yield the same predictions.

Note that his network learns the temperature parameter to be used inside the SoftmaxAllocator.

The activations have the following shape (omitting the sample dimension).

input x (n_channels, lookback, n_assets)
flattened (n_channels * lookback * n_assets,)
normalized (n_channels, lookback, n_assets)
dropped out (n_channels, lookback, n_assets)
after dense layer (multivariate linear model) (n_assets,)
after allocation (n_assets,)

from deepdow.nn import LinearNet

n_channels, lookback, n_assets = 2, 30, 10
network = LinearNet(n_channels, lookback, n_assets)

print(network)

LinearNet(
  (norm_layer): BatchNorm1d(600, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (dropout_layer): Dropout(p=0.5, inplace=False)
  (linear): Linear(in_features=600, out_features=10, bias=True)
  (allocate_layer): SoftmaxAllocator(
    (layer): Softmax(dim=1)
  )
)

MinimalNet¶

MinimalNet is the simplest network. It does not pay any attention to input features and only learns a fixed weight vector that is predicted for all samples. It is a wrapper around the WeightNorm layer.

The activations have the following shape (omitting the sample dimension).

input x (n_channels, lookback, n_assets)
output w (n_assets,)

Note

The reason why we still need to feed the feature tensor x during the forward is to extract the required number of samples (x.shape[0]).

from deepdow.nn import MinimalNet

n_assets = 10
network = MinimalNet(n_assets)

print(network)

assert sum(p.numel() for p in network.parameters() if p.requires_grad) == n_assets

MinimalNet(
  (allocate_layer): WeightNorm()
)

ThorpeNet¶

The goal of this network is to demonstrate the possibility of using deepdow to create a special case of networks that do not depend on the input tensor x. All the important variables for the portfolio allocation are learned when training. This means that this network learns a single optimal set of parameters for the entire training set.

Specifically, we use the NumericalMarkowitz allocator (see Layers for more details). We need to learn the following parameters

matrix - square root of the covariance matrix, initial value is identity matrix
exp_returns - expected returns, initial value is 1
gamma_sqrt - risk and return trade-off, initial value is 1
alpha - weight regularization, initial value is 1

Note that to avoid numerical issues, one can set force_symmetric=True at construction. This way, the matrix is multiplied by its transpose to guarantee that the input to the allocator is symmetric and semi-definite.

from deepdow.nn import ThorpNet

n_assets = 10
max_weight = 0.5
force_symmetric = True
network = ThorpNet(n_assets, max_weight=max_weight, force_symmetric=force_symmetric)

print(network)

n_parameters = 0
n_parameters += n_assets  # Expected returns
n_parameters += n_assets * n_assets # Covariance matrix
n_parameters += 1  # gamma
n_parameters += 1  # alpha

true_n_parameters = sum(p.numel() for p in network.parameters() if p.requires_grad)

assert n_parameters == true_n_parameters

ThorpNet(
  (portfolio_opt_layer): NumericalMarkowitz(
    (cvxpylayer): CvxpyLayer()
  )
)

Writing custom networks¶

One can create infinitely many architectures using deepdow and torch layers. The bare minimum is to subclass torch.nn.Module and deepdow.benchmarks.Benchmark and implement the forward method.

See below an example

from deepdow.benchmarks import Benchmark

class AmazingNetwork(torch.nn.Module, Benchmark):
    """Amazing network.

    Parameters
    ----------
    hyper_param : float
        A hyperparameter.


    Attributes
    ----------
    learnable_param : torch.tensor
        A parameter to be learned during training.

    """
    def __init__(self, hyper_param):
        super().__init__()

        self.hyper_param = hyper_param
        self.learnable_param = torch.nn.Parameter(torch.ones(1), requires_grad=True)

    def forward(self, x):
        """Perform forward pass.

        Parameters
        ----------
        x : torch.Tensor
            Tensor of shape `(n_samples, n_channels, lookback, n_assets)` representing the input features.

        Returns
        -------
        weights : torch.Tensor
            Tensor of shape `(n_samples, n_assets)` representing the final allocation.
        """
        x = self.learnable_param * torch.sin(x + self.hyper_param)
        means = abs(x.mean([1, 2])) +  1e-6

        weights = means / means.sum(dim=1, keepdim=True)

        return weights

    def hparams(self):
        return {'hyper_param': self.hyper_param}


network = AmazingNetwork(2.4)

n_samples, n_channels, lookback, n_assets = 10, 2, 20, 5
x = torch.randn(n_samples, n_channels, lookback, n_assets)
weights = network(x)

print(weights)

assert sum(p.numel() for p in network.parameters() if p.requires_grad) == 1

tensor([[0.2186, 0.1135, 0.2441, 0.2321, 0.1917],
            [0.2096, 0.1877, 0.1719, 0.2010, 0.2297],
            [0.1996, 0.2330, 0.1879, 0.1923, 0.1871],
            [0.1911, 0.2407, 0.1675, 0.2020, 0.1986],
            [0.2495, 0.1988, 0.1833, 0.1703, 0.1981],
            [0.2418, 0.1710, 0.1773, 0.1950, 0.2149],
            [0.1715, 0.2285, 0.3046, 0.0921, 0.2034],
            [0.1825, 0.1882, 0.1603, 0.2631, 0.2058],
            [0.2012, 0.1889, 0.1665, 0.2128, 0.2306],
            [0.1924, 0.2749, 0.1898, 0.1486, 0.1942]], grad_fn=<DivBackward0>)

Note that one needs to always implement the forward assuming the input shape is (n_samples, n_channels, lookback, n_assets). The sample dimension should always be independent. Meaning that shuffling the input x along the sample dimension only results in shuffling the output weights.