~ols-jenkaas-admins/ols-jenkaas/trunk

75 by Vincent Ladeuil
One more pass on the doc.
1
Inspiration
2
===========
3
4
Six rules for setting up continuous integration systems
5
https://rhonabwy.com/2016/01/31/six-rules-for-setting-up-continuous-integration-systems/
6
captures a lot of the motivations for what is implemented here, below are a
7
few notes and ideas that fit in.
8
9
10
Rule #1: Keep all the logic under version control
11
=================================================
12
13
The only way to get a fully reproducible CI is to make its deployment fully
14
automated from version controlled sources.
15
16
Rule #2: Leave no test failure unnoticed and unfixed.
17
=====================================================
18
19
Also, sleep() calls and timeouts should be eagerly monitored, limited to the
20
bare minimum and carefully tuned.
21
22
Rule #3: Fast gates directly impact velocity and productivity
23
=============================================================
24
25
Making gates faster is /not/ achieved by running less tests but by running
26
them concurrently.
27
28
The longest test is the ultimate barrier, not the sum of all test run times.
29
30
Only the slowest test needs to be optimized and its workflow
31
streamlined. Most of the time, it comes down to provisioning the right
32
testbed.
33
34
Rule #4: Everything in CI can be reproduced locally
35
===================================================
36
37
CI itself should never fail.
38
39
That is, a developer should never be blocked by a bug in CI, nor should the
40
deployment pipeline.
41
42
That means every failure that can't be reproduced locally (i.e. outside
43
of CI) is a bug in CI.
44
45
Therefore, the same rule should apply to CI itself: every part should be
46
testable locally.
47
48
A fallout is that no single jenkins specific part should be needed to run
49
tests and builds.
50
51
Keeping it as simple as possible is not optional.
52
53
This was quite apparent in the old UE CI where both root causes caused
54
issues:
55
56
- devs were blocked by failures they couldn't reproduce locally,
57
58
- the CI infra couldn't be tested locally and needed constant care to stay
59
  up and running because of "emerging" bugs caused by several weaknesses in
60
  dependencies, external setup or complicated internal setup which where
61
  never fixed (and were'nt fixed because they couldn't be reproduced and
62
  therefore never diagnosed properly).
63
64
So: jenkins jobs should produce results from commands happening in
65
envrionments air-gaped from jenkins itself and kept under strict version
66
control.
67
68
69
Rule #5: Cascade fully automated builds and tests
70
=================================================
71
72
All manual interventions are technical debt.
73
74
75
Rule #6: Metrics
76
================
77
78
Many tests already capture meaningful metrics about basic operations: that's
79
the code they are exercising on precisely controlled environments.
80
81
The trends to process merge proposals, build assets, deploy servers and
82
others directly define the time between a dev proposing a fix or a feature
83
and the time it's available to users.
84
85
Across the deployment pipeline, other manual interventions can increase that
86
time though.