Selecting an IP core configuration
The Vollo IP core is scalable by the number of cores and the size of each core. To identify the best configuration for your application, we advise the following steps:
-
Test your ML models by running them through the Vollo compiler with different configurations. The virtual machine will give you performance metrics for each configuration.
-
Reference the resource usage table below to understand the resource requirements for each configuration.
Note, for initial integration and testing, it is recommended to start with a single core and a small block size. This will allow for quicker iteration and debugging of the system.
Compiler support
To test an IP Core configuration with the Vollo compiler, you can create the configuration in python as follows:
import vollo_compiler
ip_core_config = vollo_compiler.Config.ip_core(num_cores = 1, block_size = 32)
Once you have received your Vollo IP core, it comes bundled with a Vollo accelerator configuration file in JSON format,
in vollo_ip_core/vollo-ip-core-config.json
.
This can be loaded into an accelerator configuration in python as follows:
import os
os.mkdir("vollo_ip_core")
vollo_compiler.Config.save(vollo_compiler.Config.ip_core(num_cores = 1, block_size = 32), "vollo_ip_core/vollo-ip-core-config.json")
ip_core_config = vollo_compiler.Config.load("vollo_ip_core/vollo-ip-core-config.json")
os.remove("vollo_ip_core/vollo-ip-core-config.json")
os.rmdir("vollo_ip_core")
From here, Vollo programs can be generated using the same workflow as described in example 1
using the config imported above in the call to to_program
:
import torch
import torch.nn as nn
import torch.nn.functional as F
import vollo_torch
class MLP(nn.Module):
def __init__(self, input_size, output_size, hidden_size):
super().__init__()
self.fc1 = nn.Linear(input_size, hidden_size)
self.fc2 = nn.Linear(hidden_size, hidden_size)
self.out = nn.Linear(hidden_size, output_size)
def forward(self, x):
x = F.relu(self.fc1(x))
residual = x
x = F.relu(self.fc2(x)) + residual
return self.out(x)
# Make a pytorch model
input_size = 784
output_size = 10
hidden_size = 128
model = MLP(input_size, output_size, hidden_size)
# Annotate the model with activation shapes
input = torch.randn(input_size)
(model, expected_output) = vollo_torch.fx.prepare_shape(model, input)
# Compile to a Vollo program using the loaded ip_core_config
nnir = vollo_torch.fx.nnir.to_nnir(model)
program = nnir.to_program(ip_core_config)
Run the VM to get the cycle count:
vm = program.to_vm()
vm_output = vm.run(input.detach().numpy())
torch.testing.assert_close(expected_output, torch.from_numpy(vm_output))
print("cycle count:", vm.cycle_count())
Resource usage
The following table shows the resource usage of the Vollo IP Core for different configurations. Note, these resources may vary depending on the Vollo SDK version.
The block size determines the side of the matrix block. The core scales with the square of this parameter e.g. a block size 64 core is around 4 times larger than a block size 32 core.
Cores | Block size | ALMs | M20Ks | DSPs |
---|---|---|---|---|
1 | 32 | 43K | 1084 | 624 |
2 | 32 | 78K | 2000 | 1248 |
3 | 32 | 115K | 2932 | 1872 |
4 | 32 | 152K | 3880 | 2496 |
5 | 32 | 194K | 4844 | 3120 |
6 | 32 | 231K | 5824 | 3744 |
1 | 64 | 106K | 3065 | 2400 |
2 | 64 | 207K | 5840 | 4800 |
3 | 64 | 308K | 8631 | 7200 |