Layers

Introduction

As described in Basics our goal is to construct a network, that inputs a 3D tensor x of shape (n_channels, lookback, n_assets) and outputs a 1D tensor w of shape (n_assets,). One can achieve this task by creating a pipeline of layers. See below an example of a pipeline

https://i.imgur.com/RPhxF4j.png
  • L1 - 1D convolution shared across assets, no change in dimensionality

  • L2 - mean over the channels (3D -> 2D)

  • L3 - maximum over timesteps (2D -> 1D)

  • L4 - covariance matrix of columns of h2

  • L5 - given h3 and h4 solves convex optimization problem

deepdow groups all custom layers into 4 categories:

  • Transform

    Feature extractors that do not change the dimensionality of the input tensor. L1 in the example.

  • Collapse

    Remove an entire dimension of the input via some aggregation scheme. L2 and L3 in the example.

  • Allocate

    Given input tensors these layers generate final portfolio weights. L5 in the example.

  • Misc

    Helper layers. L4 in the example.

Note that all custom layers are simply subclasses of torch.nn.Module and one can freely use them together with official PyTorch layers.

Warning

Almost all deepdow layers assume that the input and output tensors have an extra dimension in the front—the sample dimension. We often omit this dimension on purpose to make the examples and sketches simpler.

Transform layers

Transform layers are supposed to extract useful features from input tensors. For the exact usage see deepdow.layers.transform module.

Conv

This layer supports both 1D and 2D convolution controlled via the method parameter. In the forward pass we need to provide tensors of shape (n_samples, n_input_channels, lookback) resp. (n_samples, n_input_channels, lookback, n_assets). The padding is automatically implied by kernel_size such that the output tensor has the same size (for odd kernel_size exactly, for even approximately).

from deepdow.layers import Conv

n_samples, n_input_channels, lookback, n_assets = 2, 4, 20, 11
n_output_channels = 8
x = torch.rand(n_samples, n_input_channels, lookback, n_assets)

layer = Conv(n_input_channels=n_input_channels,
             n_output_channels=n_output_channels,
             kernel_size=3,
             method='1D')

# Apply the same Conv1D layer to all assets
result = torch.stack([layer(x[..., i]) for i in range(n_assets)], dim=-1)

assert result.shape == (n_samples, n_output_channels, lookback, n_assets)

RNN

This layer runs the same recurrent network over all assets and then stacks the hidden layers back together. It provides both the standard RNN as well as LSTM. The choice is controlled via the parameter cell_type. The user specifies the number of output channels via hidden_size. This number corresponds to the actual hidden state dimensionality if bidirectional=False otherwise it is one half of it.

from deepdow.layers import RNN

n_samples, n_input_channels, lookback, n_assets = 2, 4, 20, 11
hidden_size = 8
x = torch.rand(n_samples, n_input_channels, lookback, n_assets)

layer = RNN(n_channels=n_input_channels,
             hidden_size=hidden_size,
             cell_type='LSTM')

result = layer(x)

assert result.shape == (n_samples, n_output_channels, lookback, n_assets)

Collapse layers

Transform layers remove entire dimension. For the exact usage see deepdow.layers.collapse module.

AttentionCollapse

AverageCollapse

ElementCollapse

ExponentialCollapse

MaxCollapse

SumCollapse

Allocation layers

For the exact usage see deepdow.layers.allocate module.

AnalyticalMarkowitz

The AnalyticalMarkowitz layer has two modes. If the user provides only the covariance matrix \(\boldsymbol{\Sigma}\), it returns the Minimum variance portfolio. However, if additionally one supplies the expected return vector \(\boldsymbol{\mu}\) then it computes the Tangency portfolio (also known as the Maximum Sharpe ratio portfolio). Note that risk free rate is assumed to be zero.

\[ \begin{align}\begin{aligned}\textbf{w}_{\text{minvar}} = \frac{\boldsymbol{\Sigma}^{-1} \textbf{1}}{\textbf{1}^{T} \boldsymbol{\Sigma}^{-1} \textbf{1}}\\\textbf{w}_{\text{maxsharpe}} = \frac{\boldsymbol{\Sigma}^{-1} \boldsymbol{\mu}}{\textbf{1}^{T} \boldsymbol{\Sigma}^{-1} \boldsymbol{\mu}}\end{aligned}\end{align} \]

Note that this allocator cannot enforce any additional constraints i.e. maximum weight per asset. For more details and derivations see [LectureNotes].

NCO

The NCO allocator is heavily inspired by Nested Cluster Optimization proposed in [Prado2019]. The main idea is to group assets into n_clusters different clusters and use AnalyticalMarkowitz inside each of them. In the second step, we compute asset allocation across these n_clusters new portfolios. Note that the clustering is currently done via the KMeans layer (see KMeans).

NumericalMarkowitz

While AnalyticalMarkowitz gives us the benefit of analytical solutions, it does not allow for any additional constraints. NumericalMarkowitz is a generic convex optimization solver built on top of cvxpylayers (see [Agrawal2019] for more details). The statement of the problem is shown below. It is motivated by [Bodnar2013].

\[\begin{split}\begin{aligned} \max_{\textbf{w}} \quad & \textbf{w}^{T}\boldsymbol{\mu} - \gamma {\textbf{w}}^{T} \boldsymbol{\Sigma} \textbf{w} - \alpha \textbf{w}^{T} \textbf{w} \\ \textrm{s.t.} \quad & \sum_{i=1}^{N}w_i = 1 \\ \quad & w_i >= 0, i \in \{1,...,N\}\\ \quad & w_i <= w_{\text{max}}, i \in \{1,...,N\}\\ \end{aligned}\end{split}\]

The user needs to provide n_assets (\(N\) in the above formulation) and max_weight (\(w_{\text{max}}\)) when constructing this layer. To perform a forward pass one passes the following tensors (batched along the sample dimension):

  • rets - Corresponds to the expected returns vector \(\boldsymbol{\mu}\)

  • covmat_sqrt - Corresponds to a (matrix) square root of the covariance matrix \(\boldsymbol{\Sigma}\)

  • gamma_sqrt - Corresponds to a square root of \(\gamma\) and controls risk aversion

  • alpha - Corresponds to \(\alpha\) and determines the regularization power. Internally, its absolute value is used to prevent sign changes.

Warning

The major downside of using this allocator is a significant decrease in speed.

Resample

The Resample layer is inspired by [Michaud2007]. It is a metallocator that expects an instance base allocator as an input. Currently supported base allocators are:

  • AnalyticalMarkowitz

  • NCO

  • NumericalMarkowitz

The premise of this metaallocator is that \(\boldsymbol{\mu}\) and \(\boldsymbol{\Sigma}\) are just noisy estimates of their population counterparts. Parametric boostrapping is therefore applied. We sample n_portfolios * n_draws new vectors from the distribution \(\mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})\). We then create estimates \(\boldsymbol{\mu}_{1}, ...,\boldsymbol{\mu}_{\text{n_portfolios}}\) and \(\boldsymbol{\Sigma}_{1}, ..., \boldsymbol{\Sigma}_{\text{n_portfolios}}\) and run the base allocator for each of the pairs. This results in obtaining multiple allocations \(\textbf{w}_{1}, ...,\textbf{w}_{\text{n_portfolios}}\). The final allocation is simply an average \(\textbf{w} = \sum_{i=1}^{\text{n_portfolios}}\textbf{w}_i\).

SoftmaxAllocator

Inspired by portfolio optimization with reinforcement learning (i.e. [Jiang2017]) the SoftmaxAllocator performs a softmax over the input. Additionally, one can also provide custom temperature.

\[w_j = \frac{e^{\frac{z_{j}}{\text{temperature}}}}{\sum_{i} e^{\frac{z_i}{\text{temperature}}}}\]

Note that one can provide a single temperature at construction that is shared across all samples. Alternatively, one can provide per sample temperature when performing the forward pass.

from deepdow.layers import SoftmaxAllocator

layer = SoftmaxAllocator(temperature=None)
x = torch.tensor([[1, 2.3], [2, 4.2]])
temperature = torch.tensor([0.2, 1])

w = layer(x, temperature=temperature)

assert w.shape == (2, 2)
assert torch.allclose(w.sum(1), torch.ones(2))

Misc layers

For the exact usage see deepdow.layers.misc module.

Cov2Corr

Conversion of a covariance matrix into a correlation matrix.

from deepdow.layers import Cov2Corr

layer = Cov2Corr()
covmat = torch.tensor([[[4, 3], [3, 9.0]]])
corrmat = layer(covmat)

assert torch.allclose(corrmat, torch.tensor([[[1.0, 0.5], [0.5, 1.0]]]))

CovarianceMatrix

Computes a sample covariance matrix. One can also apply shrinkage, i.e.

\[\boldsymbol{\Sigma}_{\text{shrink}} = (1 - \delta) F + \delta S\]

The \(F\) is a highly structured matrix whereas \(S\) is the sample covariance matrix. The constant \(\delta\) (shrinkage_coef in the constructor) determines how we weigh the two matrices. See [Ledoit2004] for additional background. deepdow offers multiple preset matrices \(F\) that can be controlled via the shrinkage_strategy parameter.

  • None - no shrinkage applied (can lead to non-PSD matrix)

  • diagonal - diagonal of \(S\) with off-diagonal elements being zero

  • identity - identity matrix

  • scaled-identity - diagonal filled with average variance in \(S\) and off-diagonal elements set to zero

After performing shrinkage, one can also compute the (matrix) square root of the shrinked matrix. This is controlled by the boolean sqrt.

Note

One can also omit the shrinkage_coef in the constructor (shrinkage_coef=None) and pass it dynamically as a torch.Tensor during a forward pass.

from deepdow.layers import CovarianceMatrix

torch.manual_seed(3)

x = torch.rand(1, 10, 3) * 100
layer = CovarianceMatrix(sqrt=False)
layer_sqrt = CovarianceMatrix(sqrt=True)

covmat = layer(x)
covmat_sqrt = layer_sqrt(x)

assert torch.allclose(covmat[0], covmat_sqrt[0] @ covmat_sqrt[0], atol=1e-2)

KMeans

A version of the well-known clustering algorithm. The deepdow interface is very similar to the one of scikit-learn [sklearnkmeans]. Most importantly, one needs to decide on the n_clusters.

from deepdow.layers import KMeans

x = torch.tensor([[0, 0], [0.5, 0], [0.5, 1], [1, 1.0]])
manual_init = torch.tensor([[0, 0], [1, 1]])

kmeans_layer = KMeans(n_clusters=2, init='manual')
cluster_ixs, cluster_centers = kmeans_layer(x, manual_init=manual_init)

assert torch.allclose(cluster_ixs, torch.tensor([0, 0, 1, 1]))

Warning

This layer does not include additional (sample) dimension. Batching can be implemented by a naive for loop and stacking.

References

LectureNotes

http://faculty.washington.edu/ezivot/econ424/portfolioTheoryMatrix.pdf

Prado2019

Lopez de Prado, M. (2019). A Robust Estimator of the Efficient Frontier. Available at SSRN 3469961.

Jiang2017

Jiang, Zhengyao, and Jinjun Liang. “Cryptocurrency portfolio management with deep reinforcement learning.” 2017 Intelligent Systems Conference (IntelliSys). IEEE, 2017

Agrawal2019

Agrawal, Akshay, et al. “Differentiable convex optimization layers.” Advances in Neural Information Processing Systems. 2019.

Michaud2007

Michaud, Richard O., and Robert Michaud. “Estimation error and portfolio optimization: a resampling solution.” Available at SSRN 2658657 (2007).

Ledoit2004

Ledoit, Olivier, and Michael Wolf. “Honey, I shrunk the sample covariance matrix.” The Journal of Portfolio Management 30.4 (2004): 110-119.

sklearnkmeans

https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

Bodnar2013

Bodnar, Taras, Nestor Parolya, and Wolfgang Schmid. “On the equivalence of quadratic optimization problems commonly used in portfolio theory.” European Journal of Operational Research 229.3 (2013): 637-644.