150
|
1 ===================================================================
|
|
2 How To Add Your Build Configuration To LLVM Buildbot Infrastructure
|
|
3 ===================================================================
|
|
4
|
|
5 Introduction
|
|
6 ============
|
|
7
|
|
8 This document contains information about adding a build configuration and
|
221
|
9 buildbot-worker to private worker builder to LLVM Buildbot Infrastructure.
|
150
|
10
|
|
11 Buildmasters
|
|
12 ============
|
|
13
|
|
14 There are two buildmasters running.
|
|
15
|
236
|
16 * The main buildmaster at `<https://lab.llvm.org/buildbot>`_. All builders
|
|
17 attached to this machine will notify commit authors every time they break
|
|
18 the build.
|
|
19 * The staging buildmaster at `<https://lab.llvm.org/staging>`_. All builders
|
|
20 attached to this machine will be completely silent by default when the build
|
252
|
21 is broken. This buildmaster is reconfigured every two hours with any new
|
|
22 commits from the llvm-zorg repository.
|
236
|
23
|
|
24 In order to remain connected to the main buildmaster (and thus notify
|
|
25 developers of failures), a builbot must:
|
|
26
|
|
27 * Be building a supported configuration. Builders for experimental backends
|
|
28 should generally be attached to staging buildmaster.
|
|
29 * Be able to keep up with new commits to the main branch, or at a minimum
|
|
30 recover to tip of tree within a couple of days of falling behind.
|
|
31
|
|
32 Additionally, we encourage all bot owners to point their bots towards the
|
|
33 staging master during maintenance windows, instability troubleshooting, and
|
|
34 such.
|
|
35
|
|
36 Roles & Expectations
|
|
37 ====================
|
|
38
|
|
39 Each buildbot has an owner who is the responsible party for addressing problems
|
|
40 which arise with said buildbot. We generally expect the bot owner to be
|
|
41 reasonably responsive.
|
|
42
|
|
43 For some bots, the ownership responsibility is split between a "resource owner"
|
|
44 who provides the underlying machine resource, and a "configuration owner" who
|
|
45 maintains the build configuration. Generally, operational responsibility lies
|
|
46 with the "config owner". We do expect "resource owners" - who are generally
|
|
47 the contact listed in a workers attributes - to proxy requests to the relevant
|
|
48 "config owner" in a timely manner.
|
|
49
|
|
50 Most issues with a buildbot should be addressed directly with a bot owner
|
|
51 via email. Please CC `Galina Kistanova <mailto:gkistanova@gmail.com>`_.
|
150
|
52
|
|
53 Steps To Add Builder To LLVM Buildbot
|
|
54 =====================================
|
221
|
55 Volunteers can provide their build machines to work as build workers to
|
150
|
56 public LLVM Buildbot.
|
|
57
|
|
58 Here are the steps you can follow to do so:
|
|
59
|
|
60 #. Check the existing build configurations to make sure the one you are
|
|
61 interested in is not covered yet or gets built on your computer much
|
|
62 faster than on the existing one. We prefer faster builds so developers
|
|
63 will get feedback sooner after changes get committed.
|
|
64
|
|
65 #. The computer you will be registering with the LLVM buildbot
|
252
|
66 infrastructure should have all dependencies installed and be able to
|
|
67 build your configuration successfully. Please check what degree
|
150
|
68 of parallelism (-j param) would give the fastest build. You can build
|
|
69 multiple configurations on one computer.
|
|
70
|
252
|
71 #. Install buildbot-worker (currently we are using buildbot version 2.8.4).
|
|
72 This specific version can be installed using ``pip``, with a command such
|
|
73 as ``pip3 install buildbot-worker==2.8.4``.
|
150
|
74
|
221
|
75 #. Create a designated user account, your buildbot-worker will be running under,
|
150
|
76 and set appropriate permissions.
|
|
77
|
221
|
78 #. Choose the buildbot-worker root directory (all builds will be placed under
|
|
79 it), buildbot-worker access name and password the build master will be using
|
|
80 to authenticate your buildbot-worker.
|
150
|
81
|
221
|
82 #. Create a buildbot-worker in context of that buildbot-worker account. Point it
|
236
|
83 to the **lab.llvm.org** port **9994** (see `Buildbot documentation,
|
221
|
84 Creating a worker
|
|
85 <http://docs.buildbot.net/current/tutorial/firstrun.html#creating-a-worker>`_
|
150
|
86 for more details) by running the following command:
|
|
87
|
|
88 .. code-block:: bash
|
|
89
|
221
|
90 $ buildbot-worker create-worker <buildbot-worker-root-directory> \
|
236
|
91 lab.llvm.org:9994 \
|
221
|
92 <buildbot-worker-access-name> \
|
|
93 <buildbot-worker-access-password>
|
150
|
94
|
252
|
95 Only once a new worker is stable, and
|
236
|
96 approval from Galina has been received (see last step) should it
|
|
97 be pointed at the main buildmaster.
|
150
|
98
|
252
|
99 Now start the worker:
|
|
100
|
|
101 .. code-block:: bash
|
|
102
|
|
103 $ buildbot-worker start <buildbot-worker-root-directory>
|
|
104
|
|
105 This will cause your new worker to connect to the staging buildmaster
|
|
106 which is silent by default.
|
|
107
|
|
108 Try this once then check the log file
|
|
109 ``<buildbot-worker-root-directory>/worker/twistd.log``. If your settings
|
|
110 are correct you will see a refused connection. This is good and expected,
|
|
111 as the credentials have not been established on both ends. Now stop the
|
|
112 worker and proceed to the next steps.
|
|
113
|
221
|
114 #. Fill the buildbot-worker description and admin name/e-mail. Here is an
|
|
115 example of the buildbot-worker description::
|
150
|
116
|
|
117 Windows 7 x64
|
|
118 Core i7 (2.66GHz), 16GB of RAM
|
|
119
|
|
120 g++.exe (TDM-1 mingw32) 4.4.0
|
|
121 GNU Binutils 2.19.1
|
|
122 cmake version 2.8.4
|
|
123 Microsoft(R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86
|
|
124
|
252
|
125 See `here <http://docs.buildbot.net/current/manual/installation/worker.html>`_
|
|
126 for which files to edit.
|
150
|
127
|
221
|
128 #. Send a patch which adds your build worker and your builder to
|
236
|
129 `zorg <https://github.com/llvm/llvm-zorg>`_. Use the typical LLVM
|
221
|
130 `workflow <https://llvm.org/docs/Contributing.html#how-to-submit-a-patch>`_.
|
150
|
131
|
221
|
132 * workers are added to ``buildbot/osuosl/master/config/workers.py``
|
150
|
133 * builders are added to ``buildbot/osuosl/master/config/builders.py``
|
|
134
|
|
135 Please make sure your builder name and its builddir are unique through the
|
|
136 file.
|
|
137
|
236
|
138 All new builders should default to using the "'collapseRequests': False"
|
|
139 configuration. This causes the builder to build each commit individually
|
|
140 and not merge build requests. To maximize quality of feedback to developers,
|
|
141 we *strongly prefer* builders to be configured not to collapse requests.
|
|
142 This flag should be removed only after all reasonable efforts have been
|
|
143 exhausted to improve build times such that the builder can keep up with
|
|
144 commit flow.
|
|
145
|
221
|
146 It is possible to allow email addresses to unconditionally receive
|
150
|
147 notifications on build failure; for this you'll need to add an
|
|
148 ``InformativeMailNotifier`` to ``buildbot/osuosl/master/config/status.py``.
|
|
149 This is particularly useful for the staging buildmaster which is silent
|
|
150 otherwise.
|
|
151
|
221
|
152 #. Send the buildbot-worker access name and the access password directly to
|
252
|
153 `Galina Kistanova <mailto:gkistanova@gmail.com>`_, and wait until she
|
|
154 lets you know that your changes are applied and buildmaster is
|
150
|
155 reconfigured.
|
|
156
|
252
|
157 #. Make sure you can start the buildbot-worker and successfully connect
|
|
158 to the silent buildmaster. Then set up your buildbot-worker to start
|
|
159 automatically at the start up time. See the buildbot documentation
|
|
160 for help. You may want to restart your computer to see if it works.
|
|
161
|
236
|
162 #. Check the status of your buildbot-worker on the `Waterfall Display (Staging)
|
|
163 <http://lab.llvm.org/staging/#/waterfall>`_ to make sure it is
|
|
164 connected, and the `Workers Display (Staging)
|
|
165 <http://lab.llvm.org/staging/#/workers>`_ to see if administrator
|
|
166 contact and worker information are correct.
|
|
167
|
|
168 #. At this point, you have a working builder connected to the staging
|
|
169 buildmaster. You can now make sure it is reliably green and keeps
|
|
170 up with the build queue. No notifications will be sent, so you can
|
|
171 keep an unstable builder connected to staging indefinitely.
|
|
172
|
|
173 #. (Optional) Once the builder is stable on the staging buildmaster with
|
252
|
174 several days of green history, you can choose to move it to the production
|
236
|
175 buildmaster to enable developer notifications. Please email `Galina
|
|
176 Kistanova <mailto:gkistanova@gmail.com>`_ for review and approval.
|
|
177
|
|
178 To move a worker to production (once approved), stop your worker, edit the
|
|
179 buildbot.tac file to change the port number from 9994 to 9990 and start it
|
|
180 again.
|
|
181
|
|
182 Best Practices for Configuring a Fast Builder
|
|
183 =============================================
|
|
184
|
|
185 As mentioned above, we generally have a strong preference for
|
|
186 builders which can build every commit as they come in. This section
|
|
187 includes best practices and some recommendations as to how to achieve
|
|
188 that end.
|
|
189
|
|
190 The goal
|
|
191 In 2020, the monorepo had just under 35 thousand commits. This works
|
|
192 out to an average of 4 commits per hour. Already, we can see that a
|
|
193 builder must cycle in less than 15 minutes to have a hope of being
|
|
194 useful. However, those commits are not uniformly distributed. They
|
|
195 tend to cluster strongly during US working hours. Looking at a couple
|
|
196 of recent (Nov 2021) working days, we routinely see ~10 commits per
|
|
197 hour during peek times, with occasional spikes as high as ~15 commits
|
|
198 per hour. Thus, as a rule of thumb, we should plan for our builder to
|
|
199 complete ~10-15 builds an hour.
|
|
200
|
|
201 Resource Appropriately
|
|
202 At 10-15 builds per hour, we need to complete a new build on average every
|
|
203 4 to 6 minutes. For anything except the fastest of hardware/build configs,
|
|
204 this is going to be well beyond the ability of a single machine. In buildbot
|
|
205 terms, we likely going to need multiple workers to build requests in parallel
|
|
206 under a single builder configuration. For some rough back of the envelope
|
|
207 numbers, if your build config takes e.g. 30 minutes, you will need something
|
|
208 on the order of 5-8 workers. If your build config takes ~2 hours, you'll
|
|
209 need something on the order of 20-30 workers. The rest of this section
|
|
210 focuses on how to reduce cycle times.
|
|
211
|
|
212 Restrict what you build and test
|
|
213 Think hard about why you're setting up a bot, and restrict your build
|
|
214 configuration as much as you can. Basic functionality is probably
|
|
215 already covered by other bots, and you don't need to duplicate that
|
|
216 testing. You only need to be building and testing the *unique* parts
|
|
217 of the configuration. (e.g. For a multi-stage clang builder, you probably
|
|
218 don't need to be enabling every target or building all the various utilities.)
|
150
|
219
|
236
|
220 It can sometimes be worthwhile splitting a single builder into two or more,
|
|
221 if you have multiple distinct purposes for the same builder. As an example,
|
|
222 if you want to both a) confirm that all of LLVM builds with your host
|
|
223 compiler, and b) want to do a multi-stage clang build on your target, you
|
|
224 may be better off with two separate bots. Splitting increases resource
|
|
225 consumption, but makes it easy for each bot to keep up with commit flow.
|
|
226 Additionally, splitting bots may assist in triage by narrowing attention to
|
|
227 relevant parts of the failing configuration.
|
|
228
|
|
229 In general, we recommend Release build types with Assertions enabled. This
|
|
230 generally provides a good balance between build times and bug detection for
|
|
231 most buildbots. There may be room for including some debug info (e.g. with
|
|
232 `-gmlt`), but in general the balance between debug info quality and build
|
|
233 times is a delicate one.
|
|
234
|
|
235 Use Ninja & LLD
|
|
236 Ninja really does help build times over Make, particularly for highly
|
|
237 parallel builds. LLD helps to reduce both link times and memory usage
|
|
238 during linking significantly. With a build machine with sufficient
|
252
|
239 parallelism, link times tend to dominate critical path of the build, and are
|
236
|
240 thus worth optimizing.
|
|
241
|
|
242 Use CCache and NOT incremental builds
|
|
243 Using ccache materially improves average build times. Incremental builds
|
|
244 can be slightly faster, but introduce the risk of build corruption due to
|
|
245 e.g. state changes, etc... At this point, the recommendation is not to
|
|
246 use incremental builds and instead use ccache as the latter captures the
|
|
247 majority of the benefit with less risk of false positives.
|
|
248
|
|
249 One of the non-obvious benefits of using ccache is that it makes the
|
|
250 builder less sensitive to which projects are being monitored vs built.
|
|
251 If a change triggers a build request, but doesn't change the build output
|
|
252 (e.g. doc changes, python utility changes, etc..), the build will entirely
|
|
253 hit in cache and the build request will complete in just the testing time.
|
|
254
|
|
255 With multiple workers, it is tempting to try to configure a shared cache
|
|
256 between the workers. Experience to date indicates this is difficult to
|
|
257 well, and that having local per-worker caches gets most of the benefit
|
|
258 anyways. We don't currently recommend shared caches.
|
|
259
|
|
260 CCache does depend on the builder hardware having sufficient IO to access
|
|
261 the cache with reasonable access times - i.e. a fast disk, or enough memory
|
|
262 for a RAM cache, etc.. For builders without, incremental may be your best
|
|
263 option, but is likely to require higher ongoing involvement from the
|
|
264 sponsor.
|
|
265
|
|
266 Enable batch builds
|
|
267 As a last resort, you can configure your builder to batch build requests.
|
|
268 This makes the build failure notifications markedly less actionable, and
|
|
269 should only be done once all other reasonable measures have been taken.
|
|
270
|
|
271 Leave it on the staging buildmaster
|
|
272 While most of this section has been biased towards builders intended for
|
|
273 the main buildmaster, it is worth highlighting that builders can run
|
|
274 indefinitely on the staging buildmaster. Such a builder may still be
|
|
275 useful for the sponsoring organization, without concern of negatively
|
|
276 impacting the broader community. The sponsoring organization simply
|
|
277 has to take on the responsibility of all bisection and triage.
|
|
278
|
|
279
|