Long Short Term Memory (LSTM) networks

We benchmark an LSTM model consisting of a stack of LSTMs followed by a linear layer.

class LSTM(nn.Module):
    def __init__(self, num_layers, input_size, hidden_size, output_size):
        super().__init__()
        assert num_layers >= 1
        self.lstm = vollo_torch.nn.LSTM(
            input_size, hidden_size, num_layers=num_layers
        )
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = self.lstm(x)
        x = self.fc(x)
        return x

We have also had LSTM models benchmarked and audited as part of a STAC-ML submission where we hold the lowest latency across all models. Please refer to our STAC-ML submissions for more details:

IA-840F: 3 big cores

ModelLayersHidden sizeParametersMean latency (μs)99th Percentile latency (μs)
tiny_lstm2128266K4.04.8
small_lstm32561.6M5.25.9
med_lstm34805.5M8.39.0
med_lstm_deep63204.9M7.88.6

The input size is the same as the hidden size for all models and the output size is set to 32. The layers refers to the number of LSTM layers in the stack.

IA-420F: 6 small cores

ModelLayersHidden sizeParametersMean latency (μs)99th Percentile latency (μs)
tiny_lstm2128266K4.65.3
small_lstm32561.6M8.59.2

The input size is the same as the hidden size and the output size is set to 32.

The two medium models are not supported on the IA-420F accelerator card as they are too large to fit in the accelerator memory.