The BenchPress
The Original Embedded Microprocessor Benchmark Newsletter, from EEMBC
Issue #64 - Q4'2020


Benchmark Update: IoTMark Wi-Fi and a New Assessment of IoT Battery Life

The IoTMark-Wi-Fi release is in its home stretch and final beta-testing is underway. For those of you unfamiliar with it, IoTMark-Wi-Fi is the second in our series of IoT-specific benchmarks that targets the energy costs associated with wireless capability—in this case 802.11—in low-power edge nodes.

During development, one of our clever members discovered that this benchmark also exposes vendor-specific variations among access points (routers) than can have significant impact on the battery life of IoT devices. Initial studies reveal that some top-selling routers can cause an IoT device to use up its battery faster than other routers. This is due to router configuration options that either aren’t exposed, or are non-obvious to the typical consumer. This discovery really flips the script, creating a new aspect of the benchmark which provides a capability to assess a router’s "IoT Friendliness". How to quantify this fairly and succinctly is still under investigation, so stay tuned.

Collaboration: EEMBC and tinyMLperf Update

In our last newsletter, we mentioned engaging with MLCommons in order to align our machine-learning energy benchmark, called ULPMark-ML, with MLCommons’s machine-learning performance benchmark, called tinyMLperf. Over the past six months, our teams have aligned on such areas as model selection, datasets, run rules, and accuracy equations, to name a few. A key element that EEMBC brings to this effort, in addition to years of energy measurement experience, is the know-how for making embedded software automation seamless.

With desktop benchmarks, a user simply compiles and runs an executable. In contrast to an embedded platform, this one simple task requires cross-compilation, a flash/FPGA programmer, and a debugger IDE to collect results. This, in turn, requires a firmware API to speak with various vendor SDKs, and a portable communication mechanism. Furthermore, in the case of machine learning, it isn’t enough to perform a single inference: potentially thousands of inputs are required to determine an accuracy metric, which would turn into a tedious task if the user had to repeat these steps by hand.

The EEMBC benchmark framework, called IoTConnect, solves exactly this problem: it provides a tiny firmware skeleton API for connecting to the framework, it defines a simple text-based protocol, and it uses a cross-platform Host UI runner application for coordinating execution of the benchmark, which usually requires participation of several other devices (like an energy monitor or wireless gateway). First developed five years ago, versions of the IoTConnect framework have been deployed globally in all of our modern IoT benchmarks.

With this capability, ML embedded benchmarking will become repeatable, consistent, and most of all, easy.

The basic test-harness firmware is on our GitHub site, and a version of the Host UI runner for ULPMark-ML / tinyMLperf is currently in development, slated for first release in Q1’2021.

Tech Talk: When Your Benchmark Requires RF Isolation

All benchmark writers have to face the facts: run-to-run variation exists, and its sources accumulate as the sophistication of the benchmark increases. If not properly addressed, score variation challenges the believability of measurements, and weakens trust in the benchmark. At EEMBC, we’ve deployed many different responses to these challenges over three generations of benchmarks. With our latest dive into the RF spectrum, we’ve had some help from the engineers at octoScope, a company known for its ubiquitous RF testbeds.

To better understand the issue, consider the three main categories our benchmarks fall into, and their associated sources of variation:

  1. First generation benchmarks focus on MCU performance, and are simplistic compile-and-run algorithms with zero run-to-run ISA variation[1], but they are subject to compilers’ optimization diversity and operating-system latency—if not running bare-metal;
  2. Second generation benchmarks examine energy-efficiency. They have to deal with the first-generation problems and be aware of the impact from temperature, as well as both stochastic and systemic silicon manufacturing variations;
  3. Third generation benchmarks communicate externally, where they now accrete parasitic load variations (via wired connectivity), or electromagnetic interference (via the RF spectrum).

All of these sources of variation need to be addressed programmatically by the developer, mitigated through run rules, or as a last result, bounded by a margin-of-error based on empirical analysis. However, RF interference in the Wi-Fi spectrum has proven a remarkably challenging beast. Our mitigation efforts have reduced susceptibility and increased detection to the point where simply re-running the benchmark may be sufficient in most environments. But that’s not a particularly graceful user experience, and most labs are really noisy in the S-Band ISM. Short of moving to a remote island (don’t temp us), RF isolation makes life easier for everyone involved.

octoScope specializes in building wireless testbeds used by a very long list of Wi-Fi operators, equipment vendors, chipset vendors and labs. EEMBC is only using their chambers simply for RF isolation, but their intended usage is a bit more complex, extending into 802.11 MIMO verification with dozens of additional test modules and an extensive software automation suite.

While this plug is due to our gratitude for their generous loan of an BOX-26, its availability has also helped us view the problem in different ways, and to explore additional functional testing possibilities, such as: the impact to battery life from injected network traffic, because keeping that receiver powered-up during long contention windows costs precious micro-Joules! Check out their website and YouTube channel for some really interesting Wi-Fi test automation.

[1] With the exception of cycle-accurate architecture (or RTL) simulators that lack fuzzing.

Old School Cool: AutoBench Gets a Tune Up

AutoBench V1.1 is one of EEMBC’s most widely-used benchmarks, not just by semiconductor vendors, but by compiler designers as well. This is due to the fact that it contains the actual source code (not approximations) of many real-world algorithms used by auto manufacturers in electronic control units, such as tooth-to-spark, angle-to-time, and road-speed calculations. This code cannot easily be optimized away by modern compilers, which is a problem for other synthetic benchmarks.

First published in 2002, Version 1.1 has adhered to EEMBC’s policy of "living with errata". Meaning: errata that have an impact on scores are intentionally not corrected, and only documented. Clearly this desire to maintain the comparability of scores over time comes at the expense of clean code. In this case, however, the aforementioned "time" is approaching 20 years, so we have begun making some long-needed updates. For existing AutoBench licensees, a release candidate for V1.2 is in the download area. RC3 contains about 40% of the most-requested corrections from the errata document, and we expect to clean up the remainder later in 2021.

Proof again that here at EEMBC, we take long-term support seriously.

Academics and Non-profits Around the World using EEMBC Benchmarks

Since our zero-fee academic licensing structure went into effect last June, we've seen a large increase in new universities and non-profit organizations signing up to use EEMBC benchmarks. Among the most recent additions:

And my personal favorite:

Score Certifications

EEMBC offers a certification program that verifies the results of a benchmark. While most of our benchmarks self-check, some Run Rules cannot be explicitly enforced remotely. The certification process is performed at the EEMBC lab, and recreates the scores on the actual platforms. Here we scrutinize the implementation with logic analyzers and power probes to verify correctness, even going so far as to hand-analyze assembly code if a custom compiler is used. EEMBC certification guarantees that a score is valid.

Since the last newsletter, several ULPMark and CoreMark scores have been published:

Renesas certified a member of the RA2L1 group, the R7FA2L1 with ULPMark scores of 244 at 3.0V and 304 at 1.8V:

The RA2L1 group is based on the Arm® Cortex®-M23 core, the most energy-efficient CPU among Arm Cortex-M today. The optimized processing and Renesas’ low power process technology makes it the industry's most energy-efficient ultra-low power microcontroller. The RA2L1 group supports a wide operating voltage range of 1.6V to 5.5V, and a maximum CPU clock frequency of 48MHz, lower active mode current and standby mode current. The RA2L1 group also features an enhanced Capacitive Touch Sensing Unit (CTSU2), a set of serial communication interfaces, highly accurate converters and timers. The products are available with pin counts ranging from 48-pin to 100-pin.

The Nanjing Low Power IC Technology Institute certified a ULPMark-CP score of 856 at 3.0V for their LP5100 device:

LP5100 is a low power MCU for power sensitive applications, such as wearable devices and IoT devices powered by batteries. Near threshold low voltage technology is applied in this chip, which helps to achieve low power consumptions in both active and sleep status. Benefits from its low sleep power, this chip is very suitable for heavy duty-cycle scenarios.

STMicroelectronics had several certifications, starting first with the STM32H72x/73x CoreMark score of 2778:

The STM32H7 Series is the first series of STMicroelectronics microcontrollers in 40 nm-process technology. This technology enables STM32H7 devices to integrate high-density embedded Flash memory and SRAM that decrease the resource constraints typically complicating high-end embedded development. It also unleashes the performance of the core and enables ultra-fast data transfers through the system while realizing major power savings.

Followed by the STM32WLEx/5x with a pair of CoreMark scores executing from SRAM or Flash, and two ULPMark scores: 216 at 3.0V and 313 at 1.8V. These devices are described by their datasheets:

...long-range wireless and ultra-low-power devices embed a powerful and ultra-low-power LPWAN-compliant radio solution, enabling the following modulations: LoRa®, (G)FSK, (G)MSK, and BPSK