vollo_compiler

class vollo_compiler.Config

A Vollo accelerator configuration.

Each Vollo bitstream contains a specific configuration of the Vollo accelerator, e.g. number of cores, size of each core, etc. Program s need to be compiled for the accelerator configuration that they will be run on.

For the bitstreams included in the Vollo SDK, use the preset configs ia_420f_c6b32(), ia_840f_c3b64().

static ia_420f_c6b32()

IA-420f configuration with 6 cores and block size 32

static ia_840f_c3b64()

IA-840f configuration with 3 cores and block size 64

save(json_path: str) None

Save a hardware configuration to a JSON file

static load(json_path: str, check_version_matches: bool = True) Config

Load a hardware configuration from a JSON file

Parameters:
  • json_path – Path to the JSON file

  • check_version_matches – if the provided JSON file is versioned, check that the version matches the current version

block_size

Size of the tensor blocks

num_cores

Number of cores

tensor_ram_depth

Amount of tensor RAM per-core (in blocks)

tensor_descriptor_count

Maximum number of tensors per core

weight_store_depth

Amount of weight store per-core (in blocks)

accum_store_depth

Amount of accumulator store per-core (in blocks)

cell_state_depth

Amount of LSTM cell state per-core (in blocks)

clamp_store_depth

Amount of clamp store per-core (i.e. the maximum number of different clamp configurations that can be used on a single core)

max_read_size

Maximum size of data that instructions can perform operations on (in blocks)

io_size

Minimum size of IO packet (in values)

class vollo_compiler.NNIR

Neural Network Intermediate Representation

The representation of neural networks in the Vollo compiler. It can be built from a PyTorch model using vollo_torch.fx.nnir.to_nnir(), or from an ONNX model.

static from_onnx(onnx_path: str, overwrite_input_shape: Optional[list[int]]) NNIR

Load an ONNX model from a file and convert it to an NNIR graph

streaming_transform(streaming_axis: int) NNIR

Performs the streaming transform, converting the NNIR to a streaming model

to_program(config: Config, name: Optional[str] = None, *, optimize_transforms: bool = false, output_buffer_capacity: int = 64, write_queue_capacity: int = 32) Program

Compile a NNIR graph to a Program.

Note that the NNIR model given must be a streaming model.

Parameters:
  • config – The hardware configuration to compile the program for

  • optimize_transforms – Whether to run the VM to decide whether to apply certain transformations or not

  • output_buffer_capacity – The size of the output buffer in the VM (only used when optimise_transforms is true)

  • write_queue_capacity – The size of the write queue in the VM (only used when optimise_transforms is true)

  • name – The name of the program

__new__(**kwargs)
class vollo_compiler.Program

A program which can be run on a VM, or can be used with the Vollo Runtime to put the program onto hardware.

hw_config() Config
static io_only_test(config: Config, input_values: int, output_values: int) Program

Make a new program that does no compute and aranges IO such that output only starts when all the input is available on the accelerator

static load(input_path: str) Program
metrics() Metrics

Static Metrics

model_input_shape(model_index: int = 0, input_index: int = 0) Tuple[int]

Get the shape of the input at input_index in model at model_index

model_input_streaming_dim(model_index: int = 0, input_index: int = 0) Optional[int]

Get the shape of the input at input_index in model at model_index

model_num_inputs(model_index: int = 0) int

Get the number of inputs model at model_index uses.

model_num_outputs(model_index: int = 0) int

Get the number of outputs model at model_index uses.

model_output_shape(model_index: int = 0, output_index: int = 0) Tuple[int]

Get the shape of the output at output_index in model at model_index

model_output_streaming_dim(model_index: int = 0, output_index: int = 0) Optional[int]

Get the shape of the output at output_index in model at model_index

num_models() int

The number of models in the program.

save(output_path: str)
to_vm(write_queue_capacity: int = 32, output_buffer_capacity: int = 64, bf16_precision: bool = False) VM

Construct a stateful Virtual Machine for simulating a Vollo Program

Parameters:

bf16_precision (bool) – Use bf16 precision instead of fp32 to simulate the VOLLO accelerator more accurately. Defaults to False.

transform_to_io_only_test() Program

Make a new program that is IO compatible but does no compute, an IO only test

class vollo_compiler.Metrics

Static metrics of a program.

clamp_store_depth

Total amount of clamp store available on each core

clamp_store_used

Amount of clamp store used by the program on each core

input_bytes

Number of bytes input per-inference for each model

model_names

The name of each model if specified

num_instrs

Number of instructions on each core

output_bytes

Number of bytes output per-inference for each model

tensor_ram_depth

Total amount of tensor ram available on each core

tensor_ram_used

Tensor ram used by the program on each core

weight_store_depth

Total amount of weight store available on each core

weight_store_used

Amount of weight store used by the program on each core

class vollo_compiler.VM

A wrapper around a Program and the state of a VM

compute_duration_us(clock_mhz: int = 320) float

Translate the VM’s cycle count to a figure in microseconds by dividing it by the clock speed.

Parameters:

clock_mhz – Clock speed of the Vollo frequency in MHz.

cycle_count() int

The number of cycles performed on the VM.

metrics() Metrics

Get the static metrics of the program held by the VM

run(input: numpy.ndarray, model_index: int = 0) numpy.ndarray

Run the VM on a shaped input.

run_flat(input: numpy.ndarray, model_index: int = 0) numpy.ndarray

Run the VM on a 1D input.

run_flat_timesteps(input: numpy.ndarray, input_timestep_dim: int, output_timestep_dim: int, model_index: int = 0) numpy.ndarray

Run the VM on multiple timesteps of input.

Parameters:
  • input_timestep_dim (int) – The dimension over which to split the input into timesteps.

  • output_timestep_dim (int) – The dimension over which to build up the output timesteps, i.e. the timesteps are stacked along this dimension.

run_timesteps(input: numpy.ndarray, input_timestep_dim: int, output_timestep_dim: int, model_index: int) numpy.ndarray

Run the VM on multiple timesteps with a shaped input.

exception vollo_compiler.AllocationError

Failed to allocate memory during compilation.

This can happen if a model requires more space to store weights/activations, etc. than is available for the accelerator configuration.

exception vollo_compiler.SaveError

Failed to save program.

exception vollo_compiler.LoadError

Failed to load program.