StencilStream v3.0.0
SYCL-based Stencil Simulation Framework Targeting FPGAs
|
StencilStream is a SYCL-based simulation framework for iterative stencil codes, primarily targeting FPGAs. With StencilStream, application developers and domain scientists can merely write down their stencil code definition and obtain a fully functional and optimized, FPGA-accelerated application.
There are many stencil acceleration frameworks available, even for FPGAs. However, many of them use customized toolchains to support domain-specific languages, which makes them both hard to use for real-world applications and hard to extend. Therefore, StencilStream uses the SYCL/oneAPI framework and C++ templates, so that it is:
We have implemented multiple example applications. One is a simple sketch to show how to get started, one is a benchmark, and two are proper applications that use StencilStream's advanced features. They are presented in the following:
Our implementation of Conway's Game of Life is found in the subfolder examples/conway. It reads in the current state of a grid from standard-in, computes a requested number of iterations, and then writes it out again.
This our implementation of the HotSpot benchmark from the Rodinia Benchmark Suite, found in the subfolder examples/hotspot. It achieves an arithmetic throughput of more than 1 TFLOPS on a Bittware 520N accelerator using an Intel Stratic 10 X 2800 FPGA.
The FDTD application in examples/fdtd is used to simulate the behavior of electro-magnetic waves within micro-cavities. The computed experiment is highly configurable, using configuration files written in JSON. Computationally, it is interesting because it utilizes StencilStream's time-dependent value feature to precompute the source wave and the sub-iterations feature to alternate between a electric and a magnetic field update. Below, you find a rendering of the final magnetic field, computed for the "Max Grid" experiment:
The convection app, found in examples/convection, simulates the convection within Earth's Mantle. It is a port of an example app for the ParallelStencil.jl framework and can also be configured using a JSON file. Below, you find the animated output of the default experiment.
Metric | HotSpot (Monotile) | HotSpot (Tiling) | FDTD (Monotile) | FDTD (Tiling) | Convection (Monotile) |
---|---|---|---|---|---|
Replications (Total PEs) | 280 (280) | 224 (224) | 100 (200) | 95 (190) | 8 (24) |
Grid Size (height × width) | 720 × 1024 | 16384 × 16384 | 512 × 512 | 4608 × 4608 | 512 × 1536 |
No. of Iterations | 280 | 1120 | 204580 | 184911 | 100 |
Modeled Update Rate | 73.51 GCells/s | 48.23 GCells/s | 21.79 GCells/s | 14.30 GCells/s | 1.26 GCells/s |
Measured Update Rate | 73.40 GCells/s | 37.54 GCells/s | 21.09 GCells/s | 14.20 GCells/s | 0.83 GCells/s |
Throughput [GFLOPS] | 1085.97 | 563.11 | 421.80 | 284.00 | 54.78 |
Clock Frequency | 337.50 MHz | 317.50 MHz | 308.33 MHz | 284.38 MHz | 335.00 MHz |
ALMs (% of 706k) | 426k (60.4%) | 481k (68.1%) | 304k (43.2%) | 444k (62.9%) | 462k (65.5%) |
BRAMs (% of 9094) | 3415 (37.6%) | 6025 (66.3%) | 4073 (44.8%) | 6780 (74.6%) | 2848 (31.3%) |
DSPs (% of 4713) | 2522 (53.5%) | 2240 (47.5%) | 1725 (36.7%) | 1734 (38.8%) | 1031 (22.1%) |
All applications were built and benchmarked at commit 5d82883fe3302f6bbf7a1adcc353ed464dd1d35e, using Intel OneAPI 23.2.0, Boost 1.81.0, and the Bittware 520N HPC board support package 20.4.0.
StencilStream is published under MIT license, as found in LICENSE.md. When using StencilStream for a scientific publication, please cite the following: