Industry-Standard Benchmarks for Embedded Systems
EEMBC, an industry alliance, develops benchmarks to help system designers select the optimal processors and understand the performance and energy characteristics of their systems. EEMBC has benchmark suites targeting cloud and big data, mobile devices (for phones and tablets), networking, ultra-low power microcontrollers, the Internet of Things (IoT), digital media, automotive, and other application areas. EEMBC also has benchmarks for general-purpose performance analysis including CoreMark, MultiBench (multicore), and FPMark (floating-point).

Heterogeneous Compute
an EEMBC Benchmark

Heterogeneous Compute FAQs

Is this also a benchmark for GPGPU?

Essentially, yes. The term 'heterogeneous compute' encompasses many different types of architectures. The use of a GPU is not disappearing anytime soon. There are product categories where it makes more sense to have a dedicated DSP, but for products where there is a demand for flexibility between graphics and compute processing, the GPGPU story is strong. We do however see an evolution of the GPU cores to support more functionality in the shaders, to not only be graphics processors that can be used for compute as a secondary objective, but to be cores that are specifically designed for dual purpose usage. 

Will this EEMBC benchmark evolve to utilize newer APIs?

API support will change over the years. Once the market embraces Vulkan and it becomes easy to use via the evolution of support layers and SDKs, we can expect SPIR-V to become the next-gen GPGPU language and also gain adoption in the DSP market. But up until at least 2018, OpenCL 1.2 EP will still be the most widespread compute API.

For this benchmark, wouldn't OpenMP be much easier to use and deploy? Furthermore, can't OpenMP4 work better for heterogeneous multiprocessing?

There’s been some activity to get OpenMP to output OpenCL, but it’s not very mature yet.OpenMP was primarily designed to accelerate multiprocessing using C/C++ and Fortran. OpenMP4 adds accelerator support, with CUDA being the primary GPU compute API backend enabled. Since a majority of the workgroup currently prefers OpenCL 1.2 EP as the common compute API for the devices they represent, OpenMP is not seen as a target for this benchmark.

Is this benchmark more focused on imaging rather than computer vision?

The Phase 1 proof of concept development proposes a focus on image conversion & stitching algorithm performance, but the long term goal is to add more processing steps for pedestrian detection to the automotive case, and to implement facial feature recognition in the mobile case. By active participation in the group, you have the possibility to influence (& contribute") the processing algorithms in the benchmark.

Heterogeneous architectures are very complex, so how can an industry benchmark be representative of a real-world application?

Many heterogeneous architecture approaches are great for optimization of the company's own SW and HW, but for the Tier-1’s it’s not worth a lot in the sales dialogue. No one trusts a vendor's own benchmarks, they want a common benchmark that enables comparison of devices from multiple vendors. This goes back to the roots of EEMBC Hetmark. There is a need for a benchmark that allows the customer to do an apples-to-apples comparison, but at the same time we do not want to be limited by a rigid benchmark that is biased towards the architecture it was initially written on. Thus Hetmark that can run both unoptimized for a 1:1 comparison, and then the optimized run that shows the optimal performance for the identical use case across devices.

Does the benchmark cover mono/stereo vision, radar, or lidar?

By their own, the individual sensor technologies have each their own strengths and weaknesses. When combined together, you get the best of all worlds. Radar & lidar data also need to be processed to eliminate noise and to recognize objects and patterns – it’s the same story as with camera sensors. For this benchmark, a next-gen sensor fusion application use case could take multiple parallel inputs, and help identify how to best parallelize the data processing on the heterogeneous architecture.

This benchmark measures frames/second. So if measuring FPS in image recognition, how to comparably measure performance when different implementations have different levels of accuracy? In other words, trading off accuracy for performance.

We measure both fps, as well as the time to process each uBenchmark, and the latencies introduced by memory copies and orchestration of the parallel processing. The uBenchmarks run with a defined (and required) input and output precision of data formats. If a vendor choses to change the algorithm implementation to use lower precision, then EEMBC no longer blesses this optimization. We do not allow to use the “-cl-unsafe-math-optimizations” and “-cl-fast-relaxed-math”, which would break compliance with the IEEE 754 standard and the numerical compliance requirements of the OpenCL spec. If the implementation of the driver cheats internally and lowers the precision of compiled kernel, our randomized conformance tests will show a difference between the reference implementation and the output of the benchmark on the vendor hardware. Long story short, to be an EEMBC-blessed implementation, we require that the implementation stays within the strict precision requirements of the OpenCL spec. If they enable the optimizations that reduce the precision, we no longer bless the implementation.