EEMBC & CoreMark Blog

August 20, 2010

Atmel MCU: How low can you go?

Filed under: Coremark — Markus Levy @ 07:28

Not too long ago, one of our CoreMark users submitted a score for the Atmel AT89C51RE2, an 8051 derivative. It’s truly amazing that these devices are still around, with the original architecture that dates back more than 3 decades. As you’d expect in any benchmark contest, these devices are not blazing fast. With a CoreMark/MHz rating of 0.107, this AT89C51RE2 is at the bottom of our list (with the exception of another 8-bit microconroller). But what makes this AT89C51RE2 interesting is that it is representative of the classic 8051 class of CPUs. In fact, it is actually faster than the classic, since the classic was 12 clocks/machine cycle, and this part is 6 clocks/machine cycle.

Also interesting is that even though CoreMark size requirements are minimal, many 8051s don’t have enough flash or RAM for CoreMark, however this AT89C51RE2 does. Newer 8-bit CPUs such as the Atmel AVR series are more RISC in nature, and execute most instructions in one clock, giving them an advantage over the classic 8051 by a factor of probably 5 – 10. Furthermore, 8051’s are handicapped by having a small stack that is shared with internal RAM and it is typically only good for 128 bytes.  C compilers for 8051, such as the Keil C51, use workarounds such as a fake stack in RAM, but efficiency suffers with such workarounds. Well, that wraps it up for today’s history lesson.

July 19, 2010

On inlining and other compiler optimizations

Filed under: Coremark, EEMBC — shay@eembc.org @ 08:17

CoreMark run rules allow for any compiler optimizations. Unlike Dhrystone, CoreMark relies on a design that forces any computation to happen at compile time by tracing any computation chain from a value that is not available at compile time and ends with an output.

While compilers can find more efficient ways of implementing those computations, the computations cannot be done at compile time, and thus actual operation cannot be “optimized away”. Dhrystone for example was split to multiple files since the advent of the technique called inlining allowed compilers to avoid performing the very tasks the benchmark was trying to analyze. Since compilers can now analyze the program even when it is split to multiple files, this cure did not work…

In fact, many modern architectures rely on the compiler to create code that can make efficient use of hardware resources, and restricting the compiler from optimizing is not a reasonable restriction. CoreMark does not attempt to force a specific number of branches or loads, rather only that all computations are performed at run time. Complex architectures are going to find different ways of doing those computations (e.g. SIMD, predicated execution and more), while simple architectures are going to perform the operations head on. As long as the operation is performed, the benchmark is useful to analyze the performance potential of the core.

June 7, 2010

CoreMark Analytic Evaluation – Interview

Filed under: Coremark, EEMBC — shay@eembc.org @ 19:02

Recently Van Smith of Canalabs submitted several scores to the CoreMark website. We asked him about his choices for run parameters…

[NOTE: To put this blog into context, refer to the scores submitted on April 12 for  Intel Atom N450, VIA Nano L3050, AMD Mobile Athlon XP-M (Barton), and Freescale i.MX515 at http://coremark.org/benchmark/index.php?pg=benchmark)

Why did you use FORK rather then PTHREADS?

Answer: I used the same set of CoreMark compiler flags and settings as I had come up with several months  ago after experimenting on an AMD Phenom II system. However, these settings came directly from the best performing Intel system on CoreMark website. Of course, forking is the most reliable way of getting multicore scaling across a broad range of platforms, which is my goal. Threading can lead to headaches in those situations.

*NOTE from Shay Gal-On: CoreMark actually contains 3 separate methods to use concurrency. All of them have been thoroughly validated, so feel free to pick any method you wish and tell us why you picked it…

Why did you choose to oversubscribe the cores with 4 threads?

Answer: I wanted to use exactly the same settings across all platforms under test. The only platform in my ARM versus x86 report to benefit from forking was the Intel Atom; if all CPUs had been single-core without HyperThreading, I would have only used one thread.  As you know, threading/forking places greater pressure on  caches and memory and I wanted to keep this consistent across all systems under test.

* We like this approach, though what is a fair comparison? As mentioned in previous posts, CoreMark is not really focused on multiple cores. We highly recommend using EEMBC MultiBench when trying to evaluate performance in a MultiCore environment.

Why did you under-clock the N450 to 1GHz when it normally runs at 1.66GHz? Did it change the core/bus ratio?

Answer: The ARM Cortex-A8 system ran at 800MHz and I wanted to have all of the platforms operate at this speed for a fair IPC performance comparison.

Unfortunately, the Atom N450 can only be downclocked to 1GHz, so that’s what I was forced to use.  Only the multiplier was changed on the Atom and the VIA Nano L3050; adjusting the bus clock / system clock should always be avoided in performance benchmarking for many, many obvious reasons.

* For processors with even a small cache, CoreMark will operate entirely inside the cache, and will not be affected by memory performance at all. This is a critical concern for small embedded devices though, as mentioned in a previous blog post.

All in all, nice to see that industry analysts are picking up CoreMark and using it to test processors at any level. Check the news section to see the latest…

March 8, 2010

On CPU and Memory tangles

Filed under: Coremark, EEMBC — shay@eembc.org @ 16:59

Two CoreMark scores for the TI Stellaris were submitted recently. It is interesting to note that while the only difference between the submissions is the frequency, the CoreMark/MHz has changed (1.9 at 50MHz vs. 1.6 at 80MHz; a 16% drop). Since the device does not have cache, the CPU frequency to memory frequency ratio may come into effect, and indeed we find that the flash used on the device can only scale 1:1 with the CPU frequency up to 50MHz. Once frequency goes above 50MHz, the memory frequency scales 1:2 with the CPU.

The memory to CPU frequency ratio is a common limitation, and various technological solutions are available. Cache is one answer, but expensive in terms of silicon area and resulting cost for the end product, especially critical in low-end microcontrollers. Other solutions may have wide reads (e.g. NXP ARM7 parts read 128 bits at a time) which will speed up execution of serial blocks of code, or more advanced techniques such as the “Enhanced Flash Memory Accelerator”  (see NXP LPC1759 CPU).

In general, performance will not increase linearly with frequency if the code and/or data the program needs resides in memory that cannot scale at the same ratio. This will be true in benchmarks and in ‘real’ life. Does it matter to your application?

March 6, 2010

EEMBC Director of Software Engineering takes on sumo wrestler with EEMBC power

Filed under: Coremark, EEMBC — Markus Levy @ 09:47

Shay Gal-On always wins when it comes to wrestling with EEMBC benchmarks. But he’s no match for sumo wrestler.

http://www.youtube.com/watch?v=njBGUzAExo4

February 11, 2010

What do YOU use to edit your code?

Filed under: EEMBC — shay@eembc.org @ 08:49

Being the main software developer for EEMBC, I spend a lot of time writing code. For big projects, an IDE such as Eclipse, Visual Studio, or Multi is essential. But what about other editing needs? When you need a small Perl script? HTML page with some JavaScript? Quick edit to a small C file? Edit a JavaScript that was encoded so that one line is more then 16K chars?

Every programmer needs a code editor, and once you get used to the quirks of an editor, it is hard to switch. Many still use EMACS or even VI – on some embedded platforms I use VI since it is easily available on even the most minimal Linux distribution.

For a graphical environment though, I like more convenience that comes with a GUI but am not willing to give up on some useful features:
- Syntax highlighting
- Regular expressions for find/replace
- Auto code layout (smart tabs, brace matching and their ilk)
- Browse/tags database support
- Performance (time to initial open, time to load/edit/search large files)
- Customization
- Macros (Define commands and key sequences)
- Bindings (the ability to customize any key combo to an editor command)

Other nice to have features:
- Code completion
- Column editing mode
- Tabs for documents
- Integration with external commands
- Code folding
- Integrated file explorer
- Integrated source control (SVN)
- Sessions (open up with whatever was in the editor when it was closed)
- Hex edit

Anything else is a bonus I will gladly take but am not willing to give up
convenience or speed.

I count 60 editors on http://en.wikipedia.org/wiki/Comparison_of_text_editors#Programming_features, how do you choose the right one for you? I try to check what is available and try 3 new editors once a year.Sometimes I will even switch…

Currently on Windows I am using Notepad++. Originally I was drawn by the fact this had most of what I wanted and I had access to the source to make a few modifications. Since then this editor has matured and I no longer have a private version compiled from the source.

Things I wish were different but not enough to fix the source code:
- Regex for find/replace could be better (e.g. just use Perl Regex)
- Really long lines don’t display correctly (compressed JavaScript)

Other editors I have used in the past and switched from:
- Slickedit (awesome editor, but cost issues when I switched to a new
company)
- Editplus (was not maintained for a long while. Looks like it is actively
developed again, may need to check it out)
- Ultraedit (somehow it just did not measure up and I ended up dropping it after a mere 3 months of use)
- XEmacs (EMACS is extremely powerful, but the human interface of the GUI version doesn’t cut it)
- PSpad (performance issues caused me to drop this one)

Which 3 editors do you think I should try in 2010?

January 12, 2010

Data types

Filed under: Coremark — shay@eembc.org @ 15:27

People often ask about the applicability of CoreMark for 8-, 16-, and 32-bit processors. They wonder if it provides a realistic measure of performance for an 8-bit micro when it does calculations based on 32-bit data (and vice versa). CoreMark will work on any architecture, though 8b handling is most efficient on 8b processors, 16b data types are handled optimally on 16b processors, and similarly 32b processors are the best at handling 32b data.

Realistically though, all of those data types are commonly used in most C code. The compiler is in charge of making the best use of the processor resources, and making sure the end result is correct.

CoreMark intrinsically uses  several integer data types:

8b – used as data for the state machine (mostly in compares).

16b – used as data and status info during the list processing (read/write and bit manipulation) as well as input data for the matrix operations (computations), and for crc.

32b – used as result data type for matrix multiply operations.

This introduces a mix of the common programming data types and covers core integer functionality in the processor.

September 4, 2009

Compilers Make a Difference

Filed under: Coremark — shay@eembc.org @ 10:05

With 6 scores posted for the PIC32 starter kit, I thought we should check out the results to understand the variance.The scores submitted by using 3 different compilers: MPLAB C32 1.0 (2007) , MPLAB C 1.05 (2009), Sourcery G++4.3 (2009).

Since the microcontroller is not using cache, the memory to cpu frequency is going to make a difference. (more…)

July 30, 2009

Running Coremark

Filed under: Coremark — shay@eembc.org @ 09:34

For those who don’t like reading the readme

Running on a Linux platform with a native toolchain:

Open up linux/core_portme.h (or linux64/core_portme.h on a 64b platform), and check all predefined macros (for example endianness) for your platform.

Make Results are in run1.log and run2.log. (more…)

June 8, 2009

Using Coremark on Multiple Cores

Filed under: EEMBC — shay@eembc.org @ 09:00

Although we don’t recommend using this because it will only yield a linear performance increase, CoreMark standard allows for execution of multiple copies on multiple cores, and in fact already contains 3 common implementations (PThreads, Fork(with shared memory) and Sockets). The portable layer allows extension to other proprietary mechanisms as well. E5405 scores posted recently for 1,2 and 4 cores show a linear speedup with the number of cores used. How come? (more…)

Older Posts »

Powered by WordPress