1D Convolutional neural networks (CNN)

We benchmark a simple 1-D convolutional model with a residual connection after every layer.

class ConvBlock(nn.Module):
    def __init__(self, channels, kernel_size):
        super().__init__()
        self.conv = vollo_torch.nn.PaddedConv1d(channels, channels, kernel_size)

    def forward(self, inp):
        x = self.conv(inp)
        return nn.functional.relu(x) + inp


class CNN(nn.Module):
    def __init__(self, num_layers, kernel_size, channels):
        super().__init__()
        assert num_layers >= 1

        self.cnn = nn.Sequential(
            *[ConvBlock(channels, kernel_size) for i in range(num_layers)],
        )

    def forward(self, x):
        x = self.cnn(x)  # N x channels x T
        return x

The kernel size for all models is 8. The batch size and sequence length are both set to 1 (i.e., we benchmark a single timestep). Consecutive inferences are run with spacing between them to minimise latency.

V80: 6 cores, block size 32

V80 PCIe optimisations underway, improvements coming in the next release
ModelLayersChannelsParametersMean latency (us)99th percentile latency (us)
cnn_tiny3128393K3.13.3
cnn_small32561.6M3.33.4
cnn_med62563.1M3.84.0

IA-840F: 3 cores, block size 64

ModelLayersChannelsParametersMean latency (μs)99th Percentile latency (μs)
cnn_tiny3128393K2.22.2
cnn_small32561.6M2.42.5
cnn_med62563.1M3.03.2

IA-420F: 6 cores, block size 32

ModelLayersChannelsParametersMean latency (μs)99th Percentile latency (μs)
cnn_tiny3128393K2.22.3
cnn_small32561.6M2.83.0
cnn_med62563.1M3.93.9