1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
|
Quick start
===========
This package contains some loop-based microbenchmarks. Each benchmark
contains code to verify that the results from the optimised loop are the
same as from an unoptimised one.
Cross environment with root access to target board
--------------------------------------------------
By default, the package is configured for a cross-testing environment.
The build runs on a workstation and the tests run on a target board that
has ssh access. The user is assumed to have root access to the target
board. If this matches your configuration, you can run the testsuite
as follows:
./run CPU=<cpu> BOARD=<board-hostname> > results.txt
E.g.:
./run CPU=cortex-a8 BOARD=beagle-04 > results.txt
This will run all tests at maximum scheduling priority and store
the results in a file called "results.txt"
Cross environment without root access to target board
-----------------------------------------------------
If you are using a cross environment but do not have root access
to the target board, you can instead use:
./run CPU=<cpu> BOARD=<board-hostname> RUNFLAGS= > results.txt
This allows you to log into the target board as your normal user,
but it means that the test will run at normal scheduling priority.
Native environment
------------------
If you are using a native environment, you can instead use:
./run CPU=<cpu> CC=gcc RUN= > results.txt
Or, if you have sudo access, you might prefer:
./run CPU=<cpu> CC=gcc RUN="sudo nice -20" > results.txt
Shortcuts
---------
Rather than specify things like CPU= and BOARD= each time,
you can put them into a file called "local.mk". E.g.:
CPU=cortex-a8
BOARD=beagle-04
Recompiling with new flags
--------------------------
If you want to rebuild and rerun the tests, use "make clean" and then
repeat the "./run" command.
You can use COPT= to override the default optimisation flags (see
"Makefile" for the current set). You can also use EXTRA_COPT= to
add additional flags. For example:
./run EXTRA_COPT=-fno-auto-inc-dec > new-results.txt
Again, COPT= and EXTRA_COPT= can be specified in "local.mk".
"./run" is really just a small wrapper around "make", and passes
all its arguments directly to "make". See "Makefile" for the
available rules.
Comparing results
-----------------
To compare the results between two runs, use:
./compare <first-results> <second-results>
where both files are the piped output from "./run" (such as
"results.txt" and "new-results.txt" in the examples above).
Adding new tests
================
Each test needs several things:
- a function to initialise the arrays
- a function to preload the arrays[*]
- two copies of the loop, one optimised and one unoptimised
- a function to check that the two copies produce the same results
[*] this is necessary to get consistent results on CPUs that don't
perform write allocation
In order to cut down on the amount of cut-&-paste, each loop is
specified in a text file that is then used to automatically
generate the required code.
The first step in adding a new loop is to create a new .txt
file in spec/. This file will be automatically picked up by
the build system.
The format of the text file is a series of "key value" pairs,
where the value may be a single word or text enclosed in braces.
The required keys are:
- count
A constant that controls the number of iterations in
the innermost loop. This value is available in C
as the COUNT macro.
- repeat
The number of times that the innermost loop should be
executed relative to the other tests. The idea is that
this value can be adjusted to make the tests take roughly
the same amount of time (or at least, to bring their
execution times into the same order of magnitude).
- loop
The loop code itself
Optional keys are:
- arrays
The list of arrays that the loop operates on. Each entry has
the form:
TYPE NAME[DIM1][DIM2]...;
See the "init_values" definition near the start of
"scripts/generate.tcl" for the recognised types.
You can also add more types to the list if you need them.
The benchmark will automatically initialise these arrays with
random data, and will ensure that the arrays are left in the
same state by the optimised and unoptimised versions of the loop.
The arrays are presented to the loop as pointer variables; e.g.:
uint8_t foo[10][16];
will be presented to the loop as:
uint8_t (*foo)[16];
By default, the pointer points to the start of the array,
i.e. to element 0 of the outermost array. You can specify
a different element using:
TYPE NAME[DIM1][DIM2]... : START;
- inputs
A list of scalar parameters to the loop. Each entry has the form:
TYPE NAME = VALUE;
VALUE is not expoed to the main loop code. NAME instead
has more the weight of a function parameter whose value is
completely unknown to the compiler when optimising the loop.
- outputs
A list of scalar outputs from the loop. Each entry has the form:
TYPE NAME = VALUE;
VALUE is the value that should be used to initialise NAME
before the loop. (Unlike the inputs described above, this value
is exposed to the compiler when optimising the loop.)
The benchmark will check that NAME has the same value after
the optimised loop as it does after the unoptimised loop.
- decls
Other declarations that are needed by the main loop, such as
inline functions.
|