~linaro-toolchain-dev/+junk/lca2011

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
.. include:: <s5defs.txt>

====================
Linaro, GCC, and ARM
====================

:Author: Michael Hope <michael.hope@linaro.org>

.. image:: images/logo.png
   :align: center

Linaro
======
Not-for-profit company

ARM, Freescale, IBM, Samsung, ST-Ericson, TI

Kick-started using Canonical's methods and CodeSourcery's talent

Engineering, engineering, engineering

.. class:: handout
   
   Linaro is about nine months old.  Things started up pretty quickly and we're up to
   around 100 people.  The next six months is where we have to prove ourselves.  Talk
   to me later if you'd like to know more about how things work and how the resources come
   in.

   We're not a foundation or committee, nor do we make a distribution.  It's all about the
   engineering, especially on common things that shouldn't be duplicated across the
   member companies like a good toolchain, bootloader, kernel, and multimedia approach.
   We cover many things but I'd rather point you at the wiki (the top
   website is more company-ish).

.. class:: incremental

   .. class:: center

      https://wiki.linaro.org/

Linaro
======
**What's next**: Cortex-A9, NEON, SMP

.. class:: incremental

   .. image:: images/orion.png

   .. image:: images/omap4.png

   .. image:: images/freescale.png

.. class:: handout

Linaro is about what's next.  Esp A9, NEON, SMP, get ahead on the tools so they're ready
when you start a new product.  This means that we work well and bring out the best in...

Us
==
.. list-table::
   :class: borderless

   * - Kiko

       .. image:: images/kiko-flag.png

     - Michael Hope

       .. image:: images/michaelh-flag.png

.. class:: handout

Here at LCA we Kiko, our Engineering VP, and myself, the toolchain lead.  Paul McKenny is
here with his IBM/Linaro kernel hat on as well.  Ask us anything.

Us
==
Around the world

.. image:: images/toolchain-map.png

Check out https://wiki.linaro.org/EngineeringTeam for mugshots

.. class:: handout

My group of twelve is spread around the world from China to Israel.  I'm the only southern
hemisperian so far.  Have a look at the engineering team page on the wiki if you'd like to
see who else is involved.

Why?
====
Better ARM toolchain

.. class:: handout

ARM themselves decided about a year ago to improve the GNU tools and bring them closer to
the commercial products, and that fits well with what we do.  The split at the moment is
with ARM adding support for things like the Cortex-A15 that are not-yet-released or
things that are considered too sensitive like pipeline descriptions, and we work on the
published architecture features like Thumb-2 and NEON.  It's a perfectly good split.

Done in the open

.. class:: handout

All of the Linaro work is done in the open.  Others are welcome to get involved if they
want from the simplest bit of picking up and using the toolchain through planning and
working on the implementation.  I'm interested in how we can encourage this and what has
worked for you guys before - I've been plesently surprised by people picking up the
compiler and making it an option in OpenEmbedded, Yocto, and OpenBricks, with a chance
that MeeGo and Gentoo will go the same way.  I've also had good suggestions from people who came
along to the planning summit, like on how NEON intrinsics can out perform hand writen code
or how libunwind might solve the ARM bactracing woes.  How can I keep that flow going?

Making new work available now

.. class:: handout

An issue with GCC is the long development cycle.  If you time things just wrong and make an
improvement in the 4.x stage 3, then the change won't be available until 4.y is released
up to 18 months later.

Approach
========
Performance:

* Don't want to be 'on balance, the best choice'
* Want to be the best on performance, and neutral on other areas
* Want to be 'The toolchain for ARM'

.. class:: handout

   It's good to have goals, even if they seem over the top.  Rather than try to be all things
   to all men, I've said we'll focus on time-based performance and be neutral on correctness.  There are
   a few cavets - we're monitoring for siginificant regressions in size, correctness on
   pre-v7 ARM, and correctness on x86 and x86_64.  This hasn't been a burden so far.  Note
   that we also support x86 and x86_64 as it helps with single sourcing and getting
   patches upstream.

   The hope is that we'll be the toolchain people use when working on ARM.  I'm very
   interested in what you guys look for in a toolchain and how we fit.  We're not set up
   to support end users directly, or for supplying binaries for every combination of
   architecture and target, or for providing long term support but we do have plans for
   many of these.

   I've skipped all the engineering approach parts like CI and test practices.  Let me
   know if you'd like to know more.

A benchmark...
==============
.. container:: animation

   .. image:: images/pybench-1.png
      :class: hidden slide-display

   .. class:: incremental hidden slide-display

      .. image:: images/pybench-2.png
      .. image:: images/pybench-3.png
      .. image:: images/pybench-4.png
      .. image:: images/pybench-5.png

   .. image:: images/pybench-6.png
      :class: incremental

.. class:: handout

   We're doing OK so far.  Here's a cherry-picked example of the Python 2.7 benchmark
   suite across a range of compilers and running on a Cortex-A8.  These are normalised
   against the 4.4 -O2 you get by default out of Python.

   4.4 at -O3 gains a couple of percent.  The other 4.4 based compilers show a worthwhile
   gain.  Our 4.4 is in the same league.

   Upstream 4.5 is a good set up over upstream 4.4, but the lasted CodeSourcery 4.5 based
   compiler and Linaro release do even better.  There's a 15 % gain to be have by
   switching from a 4.4 based compiler to a -O3, 4.5 based Linaro compiler.  EEMBC
   DENBench shows similar results.  CoreMark not so much, but it's a deeply embedded
   benchmark and we'll fix that up.  I'm interested in what you consider a valid benchmark
   for an ARM device.

How?
====
Mainly upstream

.. class:: handout

We're a performance branch of upstream GCC.  The best way of handling this is doing the
work upstream using the normal workflow, getting it accepted, and then backporting into
the consolidation branch.

Project is at http://launchpad.net/gcc-linaro

.. class:: handout

The branch is hosted on Launchpad which gives us the normal features such as code hosting, bug tracking, releases, and
planning.  There's quite a few nice features like a DVCS with good merge tracking, a SVN
to bzr automatic import, an easy feature branch/merge request workflow, and ties between
the bits like a bug, the merge that fixes it, and the release it comes out in.

with links to the others at https://launchpad.net/linaro-toolchain

.. class:: handout

There's a problem here with good tools vs commonly used tools.  Launchpad is a good
solution, but it can look like a fork instead of a branch and has a different flow to what
other GCC developers are used to.  Interested in thoughts on how this has been handled in
other projects, such as the easier cases when you've got one project with some in VCS X
and some in Y (where Y is normally git)

Outputs
=======
Consolidation branches

.. class:: handout

These need a better name, and Kiko's working on that, but these are the branches that we
backport to and release from.  There's a current and a previous branch, both supported,
with the current one being 4.5 and the previous being 4.4.  When we bring out a Linaro GCC
4.6 support for 4.4 will end and 4.5 will become the previous branch.  Interested: if you
pick up a compiler, how long do you want it supported for?  Pent up demand for 4.5, so we
may want to support for longer.

Binary builds

.. class:: handout

Picking up the toolchain is easier if there are binary builds.  Install Ubuntu Maverick today and you can run an 'apt-get install gcc-arm-linux-gnueabi'
and end up with a Linux-targeting cross compiler.  We're working on backports to The
Ubuntu LTS, Lucid, as well as backports as new versions come out.  I'm keen on a straight
binary tarball that will extract anywhere and run on anything.  You still need to sort the
target issues out but otherwise it reduces the need to roll-your-own

Others?  What's useful?  Kernel?  Distros?  As a patch?  Windows?  IDE integration?

Methods
=======
.. class:: handout

We've already hit the point of diminishing returns.  GCC 4.5 is a good step up on 4.4,
which is a really good step up on 4.3.  Thanks to the Full Employment Theorem though
there's always work to do.  Some of those are through starting to use architectural
features, but most will be benchmark driven.

Top-down: architectural features

.. class:: handout

  On the architectual features side, there are bits of the ARM arch that we know aren't currently supported and should be.
  NEON is one - it's in there but it's a bit lowest-common-denominator.  Half of the
  improvements should come through the vectoriser through taking it's current PowerPC focus
  and adding support for the unique bits of ARM, and taking the ARM backend and using every
  feature the vectoriser has.

  * Conditional execution
  * Core register set SIMD in ARMv6
  * Saturated operations in ARMv5 (minor)

Bottom-up: benchmarking and regressions

.. class:: handout

   A tried, true, and productive method is picking good benchmarks, running them across a
   range of compilers, and seeing how you compare.  The biggest regression is then a good
   place to start and analyise.  This includes looking across historic revisions of the
   same compiler - there could be a version during the 4.5 development that did really
   well on benchmark X that got wiped out later in the cycle.  coremark is a good
   example - the best score at the moment comes from the 4.4 series compiler, as the 4.5
   doesn't unroll a particular bitwise CRC function.  The next step there is to poke the
   unroller and the passes before it to see why.

Historical runs across everything

.. class:: handout

   It takes about three hours to do a non-bootstrapped build, or 24 hours for a more
   complete build and test run.  I've got a nice Gumstix Stagecoach build that holds seven
   OMAP3/NFS root boards and these chomp through the runs.  I recommend it.  Once we've
   done these historical builds a bit of automation should slice and dice the data to pick
   out the regressions.

   It's important to pick the right benchmarks to represent the profiles and workloads you
   care about.  We're focusing on handheld worloads like media and web browsing so EEMBC
   comes in heavily, along with coremark and SPEC as you can't avoid them.  Interested in
   what you use and what matches your workloads.

Asking!

.. class:: handout

   If you've written a hand-done assembly routine, why and what speed up did you get?
   Are there particular areas where GCC on ARM has regressed or underperforms?
   Side note:  we're fixing the inefficiencies in the GCC NEON intrinsics.  I recommend
   using these as one step up from assembly in the future - there's anecdotal reports that
   the intrinsics run faster as the compiler kicks in and does the scheduling and register
   allocation better than when done by hand

Being part of the community
===========================
Test run results

Reviewing patches

Helping 4.6 get out the door

.. class:: handout

   We want to be part of the GCC community.  This is a funny section to write, as some of
   the words like 'upstream' that I've used in this presentation imply a separation that
   isn't or shouldn't be there.  The goal is to work inside GCC and have a better ARM
   compiler long term.  Until now we've been in startup, 'get things done' mode so there's
   work to be done in terms of landing patches.  Past that helping out by sending build
   farm test run results to gcc-testresults, reviewing patches, and working on the P1 bugs
   and other things to help GCC 4.6 get out the door.  Note that all of our work must go
   upstream - we won't carry any patches long term.  At any time Linaro GCC should be
   equivalent to a later FSF GCC version, just available a bit earlier.

   So for the other GCC people in the house, what areas could use some more manpower and
   how can we get involved?

Other bits
==========
.. class:: handout

Toolchain is more than GCC.  I've focused on GCC as it's a detailed, interesting area.
We're also doing significant work on other tools such as

GDB

.. class:: handout

GDB is a primary project like GCC and has the whole branch, monthly releases, and support
side.  It's more ameniable to upstream work than GCC, but then again so is everything, so
the straight upstream GDB 7.2 is pretty good and 7.3 will be better.  The work is bringing
GDB up to scratch on ARM through supporting all of the basic features like NEON register
visibility, all registers in coredumps, backtracing through signal calls, hardware watchpoints,
prologue parsing, backtracing using the ARM specific unwind tables, etc.

QEMU

.. class:: handout

This was just recently promoted to a primary project, so expect to see usable monthly
releases coming out soon.  The work here is strengthening upstream and trying to reduce
the fragmentation in qemu-maemo, qemu-samsung, and perhaps Google's, so that upstream QEMU
has good, solid A9 support.  We'll be making releases along the way.

valgrind

.. class:: handout

The upstream 3.6.0 is now in a decent state on Thumb-2.  memcheck, cachegrind, and massif
work properly.  The other, lesser used tools don't due to the general backtracing on ARM
problem.  We have a plan here; ask me later if you're interested.

cortex-strings

.. class:: handout

We've got a package that is currently a collection of the fastest string routines from
around the web.  The plan is to get these plus a bunch of new work we've been doing out
into all the common C libraries so that everyone can have a nice, fast memcpy() and
strlen().  Perhaps even into the kernel?

ltrace

.. class:: handout

ltrace has a weak upstream, so getting things landed is tricky, but otherwise you can now
trace library calls on a Thumb-2 system

Finding out more
================
Wiki!  https://wiki.linaro.org/

IRC

Mailing lists

Meetings, minutes, and recordings

Flyer

.. class:: handout

   We're #linaro on Freenode.  There's good 24 hour coverage on there and it's the best
   place to pop in and ask a question.  There's a linaro-dev and linaro-toolchain email
   list on lists.linaro.org.  There's two meetings a week, the main Monday one at 0900 UTC
   and the quick Wednesday standup at 1800 UTC.  The main one is probably best if you're
   in from Australia - the Wednesday one is a bit ungodly.  The meetings are minuted,
   recorded, and pushed up into a podcast which I know one person has listed to once.
   Which is better than I expected.  The wiki is your link into all of this - start at the
   front, find the toolchain group, and you'll find links off to the releases, meeting
   details, and everything else.

Future
======
GCC 4.6

Developer tools investigation

Toolchain Working Group, not GCC Group
 * LLVM?
 * OpenCL?

.. class:: handout

   I'm focused on the short term.  We will have a 4.6 shortly after upstream releases
   theirs.  I do want to think about what we'll be doing in two years time as well - we're
   GCC focused at the moment as that's the best in class, but other efforts like LLVM and
   OpenCL are contenders.  Interested: where else we should look.  Is there a tool on
   Intel that you wish you had on ARM?  What fancy features are you hacking in to a
   product because there's not good, general purpose support?

.. footer:: Michael Hope / LCA 2011 / Non-confidential