EEMBC & CoreMark Blog

July 19, 2010

On inlining and other compiler optimizations

Filed under: Coremark, EEMBC — shay@eembc.org @ 08:17

CoreMark run rules allow for any compiler optimizations. Unlike Dhrystone, CoreMark relies on a design that forces any computation to happen at compile time by tracing any computation chain from a value that is not available at compile time and ends with an output.

While compilers can find more efficient ways of implementing those computations, the computations cannot be done at compile time, and thus actual operation cannot be “optimized away”. Dhrystone for example was split to multiple files since the advent of the technique called inlining allowed compilers to avoid performing the very tasks the benchmark was trying to analyze. Since compilers can now analyze the program even when it is split to multiple files, this cure did not work…

In fact, many modern architectures rely on the compiler to create code that can make efficient use of hardware resources, and restricting the compiler from optimizing is not a reasonable restriction. CoreMark does not attempt to force a specific number of branches or loads, rather only that all computations are performed at run time. Complex architectures are going to find different ways of doing those computations (e.g. SIMD, predicated execution and more), while simple architectures are going to perform the operations head on. As long as the operation is performed, the benchmark is useful to analyze the performance potential of the core.

June 7, 2010

CoreMark Analytic Evaluation – Interview

Filed under: Coremark, EEMBC — shay@eembc.org @ 19:02

Recently Van Smith of Canalabs submitted several scores to the CoreMark website. We asked him about his choices for run parameters…

[NOTE: To put this blog into context, refer to the scores submitted on April 12 for  Intel Atom N450, VIA Nano L3050, AMD Mobile Athlon XP-M (Barton), and Freescale i.MX515 at http://coremark.org/benchmark/index.php?pg=benchmark)

Why did you use FORK rather then PTHREADS?

Answer: I used the same set of CoreMark compiler flags and settings as I had come up with several months  ago after experimenting on an AMD Phenom II system. However, these settings came directly from the best performing Intel system on CoreMark website. Of course, forking is the most reliable way of getting multicore scaling across a broad range of platforms, which is my goal. Threading can lead to headaches in those situations.

*NOTE from Shay Gal-On: CoreMark actually contains 3 separate methods to use concurrency. All of them have been thoroughly validated, so feel free to pick any method you wish and tell us why you picked it…

Why did you choose to oversubscribe the cores with 4 threads?

Answer: I wanted to use exactly the same settings across all platforms under test. The only platform in my ARM versus x86 report to benefit from forking was the Intel Atom; if all CPUs had been single-core without HyperThreading, I would have only used one thread.  As you know, threading/forking places greater pressure on  caches and memory and I wanted to keep this consistent across all systems under test.

* We like this approach, though what is a fair comparison? As mentioned in previous posts, CoreMark is not really focused on multiple cores. We highly recommend using EEMBC MultiBench when trying to evaluate performance in a MultiCore environment.

Why did you under-clock the N450 to 1GHz when it normally runs at 1.66GHz? Did it change the core/bus ratio?

Answer: The ARM Cortex-A8 system ran at 800MHz and I wanted to have all of the platforms operate at this speed for a fair IPC performance comparison.

Unfortunately, the Atom N450 can only be downclocked to 1GHz, so that’s what I was forced to use.  Only the multiplier was changed on the Atom and the VIA Nano L3050; adjusting the bus clock / system clock should always be avoided in performance benchmarking for many, many obvious reasons.

* For processors with even a small cache, CoreMark will operate entirely inside the cache, and will not be affected by memory performance at all. This is a critical concern for small embedded devices though, as mentioned in a previous blog post.

All in all, nice to see that industry analysts are picking up CoreMark and using it to test processors at any level. Check the news section to see the latest…

March 8, 2010

On CPU and Memory tangles

Filed under: Coremark, EEMBC — shay@eembc.org @ 16:59

Two CoreMark scores for the TI Stellaris were submitted recently. It is interesting to note that while the only difference between the submissions is the frequency, the CoreMark/MHz has changed (1.9 at 50MHz vs. 1.6 at 80MHz; a 16% drop). Since the device does not have cache, the CPU frequency to memory frequency ratio may come into effect, and indeed we find that the flash used on the device can only scale 1:1 with the CPU frequency up to 50MHz. Once frequency goes above 50MHz, the memory frequency scales 1:2 with the CPU.

The memory to CPU frequency ratio is a common limitation, and various technological solutions are available. Cache is one answer, but expensive in terms of silicon area and resulting cost for the end product, especially critical in low-end microcontrollers. Other solutions may have wide reads (e.g. NXP ARM7 parts read 128 bits at a time) which will speed up execution of serial blocks of code, or more advanced techniques such as the “Enhanced Flash Memory Accelerator”  (see NXP LPC1759 CPU).

In general, performance will not increase linearly with frequency if the code and/or data the program needs resides in memory that cannot scale at the same ratio. This will be true in benchmarks and in ‘real’ life. Does it matter to your application?

March 6, 2010

EEMBC Director of Software Engineering takes on sumo wrestler with EEMBC power

Filed under: Coremark, EEMBC — Markus Levy @ 09:47

Shay Gal-On always wins when it comes to wrestling with EEMBC benchmarks. But he’s no match for sumo wrestler.

http://www.youtube.com/watch?v=njBGUzAExo4

February 11, 2010

What do YOU use to edit your code?

Filed under: EEMBC — shay@eembc.org @ 08:49

Being the main software developer for EEMBC, I spend a lot of time writing code. For big projects, an IDE such as Eclipse, Visual Studio, or Multi is essential. But what about other editing needs? When you need a small Perl script? HTML page with some JavaScript? Quick edit to a small C file? Edit a JavaScript that was encoded so that one line is more then 16K chars?

Every programmer needs a code editor, and once you get used to the quirks of an editor, it is hard to switch. Many still use EMACS or even VI – on some embedded platforms I use VI since it is easily available on even the most minimal Linux distribution.

For a graphical environment though, I like more convenience that comes with a GUI but am not willing to give up on some useful features:
- Syntax highlighting
- Regular expressions for find/replace
- Auto code layout (smart tabs, brace matching and their ilk)
- Browse/tags database support
- Performance (time to initial open, time to load/edit/search large files)
- Customization
- Macros (Define commands and key sequences)
- Bindings (the ability to customize any key combo to an editor command)

Other nice to have features:
- Code completion
- Column editing mode
- Tabs for documents
- Integration with external commands
- Code folding
- Integrated file explorer
- Integrated source control (SVN)
- Sessions (open up with whatever was in the editor when it was closed)
- Hex edit

Anything else is a bonus I will gladly take but am not willing to give up
convenience or speed.

I count 60 editors on http://en.wikipedia.org/wiki/Comparison_of_text_editors#Programming_features, how do you choose the right one for you? I try to check what is available and try 3 new editors once a year.Sometimes I will even switch…

Currently on Windows I am using Notepad++. Originally I was drawn by the fact this had most of what I wanted and I had access to the source to make a few modifications. Since then this editor has matured and I no longer have a private version compiled from the source.

Things I wish were different but not enough to fix the source code:
- Regex for find/replace could be better (e.g. just use Perl Regex)
- Really long lines don’t display correctly (compressed JavaScript)

Other editors I have used in the past and switched from:
- Slickedit (awesome editor, but cost issues when I switched to a new
company)
- Editplus (was not maintained for a long while. Looks like it is actively
developed again, may need to check it out)
- Ultraedit (somehow it just did not measure up and I ended up dropping it after a mere 3 months of use)
- XEmacs (EMACS is extremely powerful, but the human interface of the GUI version doesn’t cut it)
- PSpad (performance issues caused me to drop this one)

Which 3 editors do you think I should try in 2010?

June 8, 2009

Using Coremark on Multiple Cores

Filed under: EEMBC — shay@eembc.org @ 09:00

Although we don’t recommend using this because it will only yield a linear performance increase, CoreMark standard allows for execution of multiple copies on multiple cores, and in fact already contains 3 common implementations (PThreads, Fork(with shared memory) and Sockets). The portable layer allows extension to other proprietary mechanisms as well. E5405 scores posted recently for 1,2 and 4 cores show a linear speedup with the number of cores used. How come? (more…)

June 6, 2009

Concurrency Analysis on CoreMark

Filed under: Coremark, EEMBC — Markus Levy @ 12:52

Although the current run rules for CoreMark don’t allow messing with the code, it’s an interesting exercise to see how much concurrency can be extracted from this relatively sequential benchmark code. CriticalBlue took this on with its Prism tool, writing up an interesting article on the process and results. Check it out

May 28, 2009

Quick CoreMark Rundown

Filed under: Coremark, EEMBC — shay@eembc.org @ 13:10

CoreMark includes 3 major algorithms looking at core characteristics:

1. List manipulation – pointers and data access through pointers.

2. Matrix manipulation – serial data access and potentially using ILP.

3. Simple state machine – FSM that excercises the branch unit in the pipeline.

The source code for each of those is in a single file. The other files take care of setting up the data for the benchmark, measuring the time, and spitting out a report with the results.

The platform specific code is in a special folder, and there are several sample ports for self hosted platforms using gcc – linux, linux64, cygwin and simple.

Check the documentation for more info, and check back next week for some interesting analysis…

Free is good and you also get what you pay for.

Filed under: Coremark, EEMBC — Markus Levy @ 12:32

Since 1997, EEMBC’s goal has been to provide a service to all constituents of the embedded world (including processor and compiler vendors and system developers). In our continuing effort to provide this service, we bring you the EEMBC CoreMark. Our original goal with CoreMark was to replace the once-useful-but-now-antiquated Dhrystone benchmark, and therefore, we have set it up so you can download CoreMark for free. In our initial testing, we have found that CoreMark provides a good starting point for analyzing a processor performance.
You wouldn’t judge a book by its cover, so don’t judge a processor [only] by its core. So, although CoreMark is free, you’ll definitely want to use EEMBC’s application benchmarks which you can easily license for a fee.

Powered by WordPress