The data dimension

For a model to compile on Vollo, in addition to the normal tensor rank/extent constraints on algorithms/functions, an additional constraint on the data dimension must be satisfied. Conceptually, the data dimension is the contiguous dimension of a tensor, it transforms and constrains algorithms according to the rules below.

In most use cases the data dimension is completely opaque, the Vollo compiler will deduce it and the program will compile without changes.

For the remainder of this page:

We represent an n-dimensional (a.k.a. rank-n) tensors as [a b! c], in this example a rank-3 tensor with extents a, b and, c.
The dimension with a ! is the data dimension.
Tensors that are compile-time constants (like weights) don't have a data dimension.

Pointwise

Shape and data dimension must match.

[a b! c] (*) [a b! c] -> [a b! c]

Where (*) is any pointwise operation, e.g. +, -, *, /, maximum, minimum, the pointwise overload of max and min, etc.

Slicing

This preserves the data dimension.

A slice on the data dimension:

[a b c!][:, :, :n] -> [a b n!]

Non data-dimension slice:

[a b! c][:, :, :n] -> [a b! n]

A non data-dimension slice is free (no compute).

Unsqueeze (a.k.a. new-axis)

You can add a new axis anywhere, the new axis is never the data dimension:

[a! b].unsqueeze(dim=0) -> [1 a! b]

Broadcasting

You can broadcast along a non data-dimension:

[1 b!] -> [n b!]

Or along the data dimension:

[a 1!] -> [a n!]

Broadcasting a non data-dimension is free (no compute), and broadcasting the data dimension is close to free.

Concatenation

Similar to a pointwise operation, shape and data dimension of each concatenated tensor must match.

Along the data dimension:

[a! b c].repeat(n, dim=0) -> [(a * n)! b c]

Non data-dimension:

[a! b c].repeat(n, dim=1) -> [a! (b * n) c]

A non data-dimension concatenation is free (no compute).

Note: stacking is the same but with a new-axis before the concatenation.

Reduction

In general reductions preserve the position of the data dimension.

Along the data dimension:

[a! b].sum(dim=0) -> [1! b]

Non data-dimension reduction (generally slower):

[a! b].sum(dim=1) -> [a!]

Note: keepdim must be used in the former and is optional in the latter.

Matrix multiplication

These operations transform the data dimension in non-obvious ways, here we use * do denote any number of commensurate broadcast dimensions, none of which are allowed to be the data dimension.

With one side a compile-time constant, in this case the LHS (WLOG):

[* i j] @ [* j! k] -> [* i! k]

That is, the data dimension must be along the contracted dimension of the runtime tensor. The output data dimension is along the "replaced" index.

Note: a linear layer is a special case of the above with the k dimension squeezed out.

Dynamic weights

Vollo supports a broader range of matrix multiplication via the Dynamic Weights feature. We define a Constant to be any compile-time constant tensor and an Activation to be any tensor that is not a compile-time constant. The Dynamic Weights feature covers the Constant x Activation case where contraction is not along the data dimension of the activation as well as Activation x Activation matrix multiplication.

This is an advanced feature with non-obvious performance characteristics and is gated behind the allow_dynamic_weights flag.

Constant x Activation with data dimension on the non-contracted dimension:
```
[* i j] @ [* j k!] -> [* i k!]
```
That is, the data dimension of the non-constant tensor is preserved in the output. This potentially has a higher latency and consumes more tensor RAM than contracting along the data dimension.

matrix Activation x vector Activation:

[* i! j] @ [j!] -> [* i!]
[* i j!] @ [j!] -> [* i!]
[j!] @ [* j! k] -> [* k!]
[j!] @ [* j k!] -> [* k!]

That is, output data dimension is along the "replaced" index.

Activation x Activation with opposite data dimensions:
```
[* i! j] @ [* j! k] -> [* i! k]
[* i j!] @ [* j k!] -> [* i k!]
```
That is, when the contracted dimension is the data dimension of exactly one of the input tensors. The output data dimension is that of the side whose data dimension was not contracted.
Activation x Activation with contracted data dimensions:
```
[* i j!] @ [* j! k] -> [* i k!]
```
Note that in this case the output data dimension is determined by convention to be the rightmost output dimension. It can be adjusted by changing the matmul inputs order in the model:
```
[* j! k].t() -> [* k j!]

                   @     -> [* k i!]

[* i j!].t() -> [* j! i]
```
unsupported Activation x Activation:
```
[* i! j] @ [* j k!] -> #
```
This case is unsupported.

Transpose

Tensors can be transposed without restriction. If one of the transposed dimensions is the data dimension, the data dimension is transposed to that dimension:

[a! b c d].transpose(0, 2) -> [c b a! d]

Reshape

For a given tensor's extents, e.g. [a b c], each dimension has a stride equal to the product of all dimensions to its right, i.e. [(b * c) c 1]. The stride of the data dimension must/will be preserved during a reshape. For example:

[a! (b * c)] -> [a! b c]

Is valid because the stride of the data dimension is b * c before and after.

[a b c!] -> [b a c!]

Is valid because the stride of the data dimension is 1 before and after.

[a b! c] -> [(a * b)! c]

Is valid because the stride of the data dimension is c before and after, similarly:

[a (n * b)! c] -> [a n b! c]

Note: that the resultant data dimension is deduced to b! rather than n! to uphold the stride requirement.

However:

[a b! c] -> [a (b * c)!]

Is invalid because the stride of the data dimension is c before but 1 after.

If the output shape has multiple candidate dimensions with the input data dimension's stride (note that these candidate dimensions are all consecutive and all but the leftmost have extent 1), the leftmost of them will be chosen as the output data dimension:

[a! b] -> [a! 1 b]

A reshape that doesn't change the extent of the data dimension is free (no compute).

Note: the strides discussed in this subsection are conceptual and not related to the strides of the tensors queryable from PyTorch etc.

Vollo SDK User Guide