Vollo RT Example

The full code for this example can be found in example/identity.c.

Here we will work through it step by step.


First we need to get hold of a Vollo RT context:

////////////////////////////////////////////////// // Init vollo_rt_context_t ctx; EXIT_ON_ERROR(vollo_rt_init(&ctx));

Note: throughout this example we use EXIT_ON_ERROR, it is just a convenient way to handle errors


Then we need to add accelerators, the accelerator_index refers to the index of the Vollo accelerator in the sorted list of PCI addresses, simply use 0 if you have a single accelerator, or just want to use the first one.

////////////////////////////////////////////////// // Add accelerators size_t accelerator_index = 0; EXIT_ON_ERROR(vollo_rt_add_accelerator(ctx, accelerator_index));

This step will check the accelerator license and make sure the bitstream is the correct version and compatible with this version of the runtime.


Then we load a program:

////////////////////////////////////////////////// // Load program // Program for a block_size 64 accelerator const char* vollo_program_path = "./identity_b64.vollo"; EXIT_ON_ERROR(vollo_rt_load_program(ctx, vollo_program_path));

Here we're using a relative path (in the example directory) to one of the example Vollo program, a program that computes the identity function for a tensor of size 128. The program is specifically for a block_size 64 version of the accelerator such as the default configuration for the IA840F FPGA.


Then we setup some inputs and outputs for a single inference:

////////////////////////////////////////////////// // Setup inputs and outputs size_t model_index = 0; // Assert model only has a single input and a single output tensor assert(vollo_rt_model_num_inputs(ctx, model_index) == 1); assert(vollo_rt_model_num_outputs(ctx, model_index) == 1); assert(vollo_rt_model_input_num_elements(ctx, model_index, 0) == 128); assert(vollo_rt_model_output_num_elements(ctx, model_index, 0) == 128); float input_tensor[128]; float output_tensor[128]; for (size_t i = 0; i < 128; i++) { input_tensor[i] = 42.0; }

We check that the program metadata matches our expectations and we setup an input and output buffer.


Then we run a single inference:

////////////////////////////////////////////////// // Run an inference single_shot_inference(ctx, input_tensor, output_tensor);

Where we define a convenience function to run this type of simple synchronous inference on top of the asynchronous Vollo RT API:

// A small wrapper around the asynchronous Vollo RT API to block on a single inference // This assume a single model with a single input and output tensor static void single_shot_inference(vollo_rt_context_t ctx, const float* input, float* output) { size_t model_index = 0; const float* inputs[1] = {input}; float* outputs[1] = {output}; // user_ctx is not needed when doing single shot inferences // it can be used when doing multiple jobs concurrently to keep track of which jobs completed uint64_t user_ctx = 0; // Register a new job EXIT_ON_ERROR(vollo_rt_add_job_fp32(ctx, model_index, user_ctx, inputs, outputs)); // Poll until completion size_t num_completed = 0; const uint64_t* completed_buffer = NULL; size_t poll_count = 0; while (num_completed == 0) { EXIT_ON_ERROR(vollo_rt_poll(ctx, &num_completed, &completed_buffer)); poll_count++; if (poll_count > 1000000) { EXIT_ON_ERROR("Timed out while polling"); } } }

This function does 2 things. First it registers a new job with the Vollo RT context and then it polls in a loop until that job is complete.

For a more thorough overview of how to use this asynchronous API to run multiple jobs concurrently take a look at example/example.c


And finally we print out the newly obtained results and cleanup the Vollo RT context:

////////////////////////////////////////////////// // Print outputs printf("Output values: ["); for (size_t i = 0; i < 128; i++) { if (i % 8 == 0) { printf("\n "); } printf("%.1f, ", output_tensor[i]); } printf("\n]\n"); ////////////////////////////////////////////////// // Release resources / Cleanup vollo_rt_destroy(ctx);