Marks for MultiBench - continued
is 10 times the geometric mean of the iterations per second achieved with the best configuration for each workload. (Note: 10 is a multiplication factor)
The workloads for this mark are:
- rgbcmyk-5x12M1worker
- ipres-72M1worker
- ippktcheck-64M-1worker
- md5-32M1worker
- rotate-16x4Ms1w1
- rotate-16x4Ms32w1
- rotate-16x4Ms4w1
- 64M-x264-1worker
MultiWorkerMark – This mark consolidates the best throughput of workloads with only one work item that uses multiple workers. The throughput factor is 10 times the geometric mean of the iterations per second achieved with the best configuration for each workload. (Note: 10 is a multiplication factor)
The workloads for this mark are:
- rgbcmyk-5x12M2workers
- rgbcmyk-5x12M4workers
- rgbcmyk-5x12M8workers
- ipres-72M2worker
- rotate-color-4M-90deg
- md5-32M2worker
- md5-32M4worker
- rotate-34kX512-90deg
- rotate-16x4Ms1w2
- rotate-16x4Ms1w4
- rotate-16x4Ms1w8
- rotate-16x4Ms32w2
- rotate-16x4Ms32w4
- rotate-16x4Ms32w8
- rotate-16x4Ms4w2
- rotate-16x4Ms4w4
- rotate-16x4Ms4w8
- 64M-x264-2workers
- 64M-x264-4workers
- 64M-x264-8workers
MultiItemMark – Perhaps the most telling mark, it consolidates the best throughput of workloads with multiple different work items. These workloads are closest to workloads on actual systems. The throughput factor is 10 times the geometric mean of the iterations per second achieved with the best configuration for each workload. (Note: 10 is a multiplication factor)
The workloads for this mark are:
- 64M-check-reassembly-tcp
- 64M-check-reassembly-tcp-cmyk
- 64M-check-reassembly-tcp-h264
- 64M-check-reassembly
- 64M-cmykw2-rotatew2
- 64M-rotatew2
- 64M-cmykw2 64M-tcp-mixed
- 64M-x264-2workers
- ipres-72M1worker
- ippktcheck-64M-1Worker
Sample Scores on Simulated 16-Core Platform
| |
|
Performance Factor |
Scale Factor |
| |
SingleWorkerMark |
10.9 |
8.9 |
| |
MultiWorkerMark |
10.5 |
4.7 |
| |
MultiItemMark |
4.5 |
8.8 |
Sample scores on a simulated platform with 16 cores show some interesting information even without diving into the details of specific workloads. For example, consider the fact that the Single Worker Mark scaling factor is 8.9 – rather than a number closer to 16 which you might expect. This factor strongly hints that the system can only use about one half of the computing resources available to it on any particular problem. This may be related to the memory bottlenecks, synchronization efficiency of the platform and operating system, or both. A more detailed examination of the individual results will yield more answers and highlight the trends, rather than a single number representation.
In the table below, we compare the results from two dual-core platforms. It’s interesting to note the significant difference in the performance factors between these two platforms. While both platforms use Linux and GCC, and both have the same core frequency, they are based on different architectures and have different memory hierarchies. Platform #1 has a shared L2 while Platform #2 has separate L2 caches.
From basic analysis of the results we see that Platform #1 is 30% faster on SingleWorkerMark and 80% faster on MultiWorkerMark. A more detailed examination of the actual scores, as well of the underlying architectures, can yield some of the reasons, even without using sophisticated performance monitoring tools.
Without using affinity, process migration may result in Platform #2 suffering higher performance penalties than Platform #1. In fact, we see a performance factor that is 30% higher for Platform #1. The different architectures are highlighted even further when using data decomposition and splitting the data between multiple workers. In this case, the synchronization overhead inherent in the architecture of Platform #2, as well as coherency traffic between the caches, results in a performance factor which is 80% higher for Platform #1.
Sample Scores From Two Dual Core Platforms Running at 2GHz:
| |
Platform #1 |
Performance Factor |
Scale Factor |
| |
SingleWorkerMark |
33.9 |
1.9 |
| |
MultiWorkerMark |
24.8 |
1.9 |
| |
MultiItemMark |
11.8 |
1.7 |
| |
Platform #2 |
Performance Factor |
Scale Factor |
| |
SingleWorkerMark |
24.7 |
1.8 |
| |
MultiWorkerMark |
13.8 |
1.7 |
| |
MultiItemMark |
7.2 |
1.5 |
|