IP Core Interface
The Vollo IP Core is programmed with a neural network model (Vollo program) via a configuration bus. Once this is done, the IP Core can run the model by streaming input data to the IP Core and receiving the output data.
The IP Core has the following interfaces:
- Config clock and reset signals. This a clock which is expected to be at a frequency of around 100MHz. It is only used for configuration and not for running the model.
- Config bus. This is a 32-bit wide AXI4-Lite bus used to activate the device with a license key and to configure the IP Core. It is synchronous to the config clock.
- Compute clock and reset signals. This is the clock used for running the model. In the example design this clock frequency is set to 320MHz.
- Input data bus. This a AXI4-Stream bus used to stream input data to the IP Core. It size varies depending on the size of the cores in the IP. For a 32-block size design, this is 512 wide (16 bits per value using brainfloat 16). It is synchronous to the compute clock.
- Model selection bus. This is an AXI4-Stream interface for providing the model index. It is synchronous to the compute clock.
- Output data bus. This a AXI4-Stream bus used to stream output data from the IP Core. It is synchronous to the compute clock.
Configuration bus
The configuration bus is a 32-bit wide AXI4-Lite bus. The normal rules for AXI4-Lite buses should be followed with the following exceptions:
- Write strobe: The write strobe should either be fully asserted or fully deasserted. Partially asserted write strobes are not supported.
- The protection signals,
config_awprotandconfig_arprot, are unused and ignored.
Verilog signals:
// Config interface clock and active-high synchronous reset:
input logic config_clock
, input logic config_reset
// Config AXI4-Lite interface.
// The config_awprot and config_arprot inputs are unused
// and ignored.
, input logic config_awvalid
, output logic config_awready
, input logic [20:0] config_awaddr
, input logic [2:0] config_awprot
, input logic config_wvalid
, output logic config_wready
, input logic [31:0] config_wdata
, input logic [3:0] config_wstrb
, input logic config_arvalid
, output logic config_arready
, input logic [20:0] config_araddr
, input logic [2:0] config_arprot
, output logic config_rvalid
, input logic config_rready
, output logic [31:0] config_rdata
, output logic [1:0] config_rresp
, output logic config_bvalid
, input logic config_bready
, output logic [1:0] config_bresp
The AXI4-Lite interface is used to configure the Vollo IP Core with a Vollo program. It must
accessible from a system running the vollo_cfg software that can drive the config bus (usually the
host). See Vollo configuration API.
You can load a new program by re-running the configuration.
Input and Output Streams
The input and output streams are AXI4-Stream interfaces. Each input and output is packed as
a flattened tensor and padded to the next multiple of block-size. The data should be packed in
little-endian format. The output stream includes tkeep and tlast signals to indicate when the
end of the packet and which bytes are valid (i.e. not padding).
For example, an input of tensor dimension [62] to an ip-core with block size 32 should be provided
as two words, the first with a full 32 brainfloat values, and the second with the remaining 30
brainfloat values and 2 padding values. They should be packed as follows:
| Word | 511:496 | 495:480 | 479:464 | ... | 31:16 | 15:0 |
|---|---|---|---|---|---|---|
| 0 | input[31] | input[30] | input[29] | ... | input[1] | input[0] |
| 1 | X | X | input[61] | ... | input[33] | input[32] |
When a model has multiple inputs or outputs, they should be in order, and each input or output padded to the next multiple of block size.
Verilog signals:
// Core clock and active-high synchronous reset:
, input logic compute_clock
, input logic compute_reset
// Input AXI4-Stream interface:
, input logic input_tvalid
, output logic input_tready
, input logic [511:0] input_tdata
// Output AXI4-Stream interface:
, output logic output_tvalid
, input logic output_tready
, output logic output_tlast
, output logic [63:0] output_tkeep
, output logic [511:0] output_tdat
Model Selection
The model_select bus picks which model is to be used for the next compute job, and should be provided once per job. It can be driven before or after driving data to the IP Core. Even if the IP Core is programmed with a single model, there will be no output for a job until the model index is provided.
// Model select AXI4-Stream interface:
, input logic model_select_tvalid
, output logic model_select_tready
, input logic [15:0] model_select_tdata
Verilog instantiation
The complete component interface is as follows:
module vollo_ip_core
(
// Config interface clock and active-high synchronous reset:
input logic config_clock
, input logic config_reset
// Config AXI4-Lite interface.
// The config_awprot and config_arprot inputs are unused
// and ignored.
, input logic config_awvalid
, output logic config_awready
, input logic [20:0] config_awaddr
, input logic [2:0] config_awprot
, input logic config_wvalid
, output logic config_wready
, input logic [31:0] config_wdata
, input logic [3:0] config_wstrb
, input logic config_arvalid
, output logic config_arready
, input logic [20:0] config_araddr
, input logic [2:0] config_arprot
, output logic config_rvalid
, input logic config_rready
, output logic [31:0] config_rdata
, output logic [1:0] config_rresp
, output logic config_bvalid
, input logic config_bready
, output logic [1:0] config_bresp
// Core clock and active-high synchronous reset:
, input logic compute_clock
, input logic compute_reset
// Model select AXI4-Stream interface:
, input logic model_select_tvalid
, output logic model_select_tready
, input logic [15:0] model_select_tdata
// Input AXI4-Stream interface:
, input logic input_tvalid
, output logic input_tready
, input logic [511:0] input_tdata
// Output AXI4-Stream interface:
, output logic output_tvalid
, input logic output_tready
, output logic output_tlast
, output logic [63:0] output_tkeep
, output logic [511:0] output_tdata
);