75
by Vincent Ladeuil
One more pass on the doc. |
1 |
Inspiration
|
2 |
===========
|
|
3 |
||
4 |
Six rules for setting up continuous integration systems |
|
5 |
https://rhonabwy.com/2016/01/31/six-rules-for-setting-up-continuous-integration-systems/ |
|
6 |
captures a lot of the motivations for what is implemented here, below are a |
|
7 |
few notes and ideas that fit in. |
|
8 |
||
9 |
||
10 |
Rule #1: Keep all the logic under version control
|
|
11 |
=================================================
|
|
12 |
||
13 |
The only way to get a fully reproducible CI is to make its deployment fully |
|
14 |
automated from version controlled sources. |
|
15 |
||
16 |
Rule #2: Leave no test failure unnoticed and unfixed.
|
|
17 |
=====================================================
|
|
18 |
||
19 |
Also, sleep() calls and timeouts should be eagerly monitored, limited to the |
|
20 |
bare minimum and carefully tuned. |
|
21 |
||
22 |
Rule #3: Fast gates directly impact velocity and productivity
|
|
23 |
=============================================================
|
|
24 |
||
25 |
Making gates faster is /not/ achieved by running less tests but by running |
|
26 |
them concurrently. |
|
27 |
||
28 |
The longest test is the ultimate barrier, not the sum of all test run times. |
|
29 |
||
30 |
Only the slowest test needs to be optimized and its workflow |
|
31 |
streamlined. Most of the time, it comes down to provisioning the right |
|
32 |
testbed. |
|
33 |
||
34 |
Rule #4: Everything in CI can be reproduced locally
|
|
35 |
===================================================
|
|
36 |
||
37 |
CI itself should never fail. |
|
38 |
||
39 |
That is, a developer should never be blocked by a bug in CI, nor should the |
|
40 |
deployment pipeline. |
|
41 |
||
42 |
That means every failure that can't be reproduced locally (i.e. outside |
|
43 |
of CI) is a bug in CI. |
|
44 |
||
45 |
Therefore, the same rule should apply to CI itself: every part should be |
|
46 |
testable locally. |
|
47 |
||
48 |
A fallout is that no single jenkins specific part should be needed to run |
|
49 |
tests and builds. |
|
50 |
||
51 |
Keeping it as simple as possible is not optional. |
|
52 |
||
53 |
This was quite apparent in the old UE CI where both root causes caused |
|
54 |
issues: |
|
55 |
||
56 |
- devs were blocked by failures they couldn't reproduce locally,
|
|
57 |
||
58 |
- the CI infra couldn't be tested locally and needed constant care to stay
|
|
59 |
up and running because of "emerging" bugs caused by several weaknesses in |
|
60 |
dependencies, external setup or complicated internal setup which where |
|
61 |
never fixed (and were'nt fixed because they couldn't be reproduced and |
|
62 |
therefore never diagnosed properly). |
|
63 |
||
64 |
So: jenkins jobs should produce results from commands happening in |
|
65 |
envrionments air-gaped from jenkins itself and kept under strict version |
|
66 |
control. |
|
67 |
||
68 |
||
69 |
Rule #5: Cascade fully automated builds and tests
|
|
70 |
=================================================
|
|
71 |
||
72 |
All manual interventions are technical debt. |
|
73 |
||
74 |
||
75 |
Rule #6: Metrics
|
|
76 |
================
|
|
77 |
||
78 |
Many tests already capture meaningful metrics about basic operations: that's |
|
79 |
the code they are exercising on precisely controlled environments. |
|
80 |
||
81 |
The trends to process merge proposals, build assets, deploy servers and |
|
82 |
others directly define the time between a dev proposing a fix or a feature |
|
83 |
and the time it's available to users. |
|
84 |
||
85 |
Across the deployment pipeline, other manual interventions can increase that |
|
86 |
time though. |