1
= DTrace-based Master monitoring under Solaris 10 =
3
== Copyright statement ==
5
'''Note:''' This document is under Grid Engine project [http://gridengine.sunsource.net/license.html SISSL license].
9
Trouble-shooting in a distributed system spanning multiple thousands active components like Grid Engine can become a real challenge. In practice Grid Engine administrators have no apriori certainty about reasons for performance
10
degradation degregation in their production environments and reproducing such problems is not easy for those who wish to help. For that reason it is important to have a utility for on-site performance monitoring of the Grid Engine
11
master component. Thanks to [http://docs.sun.com/app/docs/doc/817-6223 DTrace(1) support in Solaris 10] a platform exists that is suited even for monitoring of multi-threaded/multi-processing components like Grid Engine master
12
(=qmaster+scheduler) :)
14
== Project Overview ==
18
Aim is to provide an easy-to-use DTrace-based bottleneck analysis first-aid kit to Grid Engine administrators. Though with this utility it will not be possible to entirely understand any single case where performance degregation
19
occurs. However in a majority of cases it will provide sufficient insight required by administrator to independently find an appropriate remedy. In other cases it shall deliver a base that facilitates more experienced people to
20
help in a determined and time-saving way.
22
=== Project Benefit ===
24
Grid Engine administrators will be able to independently tune their Grid Engine clusters and can get faster and better help from those who made comparable experiences in other challenging cases.
28
Imagine one of these nice block diagrams pictures. Components apparently are
31
* DTrace master monitor (D scripts)
35
== Performance Impact ==
37
There is no performance impact to Grid Engine when DTrace is not enabled. When DTrace is enabled, the overhead is negligible because DTrace does not stop and restart the qmaster for each traced function. Instead, DTrace collects
38
data in per-CPU buffers which dtrace(1M) asynchronously reads. (ref 1)
40
== User Documentation ==
42
That is kept under [http://gridengine.sunsource.net/source/browse/*checkout*/gridengine/source/scripts/dtrace/README-dtrace.txt?rev=HEAD README-dtrace.txt].
46
Master monitor will be part of the 'common' package in $SGE_ROOT/dtrace directory.
50
The master monitor soley depends on DTrace pid provider probes. These probes are used to track calls to throughput-sensitive functions in qmaster/scheduler. Note, for the DTrace pid provider based monitor to function it is required
51
these functions are not inlined as explained in DTrace documentation [http://bugs.opensolaris.org/view_bug.do?bug_id=6480235 CR #6480235].
53
The list of the functions tracked by the monitor is:
55
* cl_message_list_append_receive()
56
* cl_message_list_append_send()
57
* cl_message_list_remove_receive()
58
* cl_message_list_remove_send()
63
* select_assign_debit()
69
* sge_c_gdi_permcheck()
74
* sge_mirror_process_events()
76
* spool_delete_object()
77
* spool_write_object()
79
* spool_delete_script()
80
* spool_write_script()
87
The following monitoring output sample illustrates a case where a Grid Engine master bottleneck could be detected. With the at hand sample performance degregation happend between 17:40:32 and 17:41:05.
90
0 1 :BEGIN Time | #wrt wrt/ms |#rep #gdi #ack| #dsp dsp/ms #sad| #snd #rcv| #in++ #in-- #out++ #out--| #lck0 #ulck0 #lck1 #ulck1
91
0 36909 :tick-3sec 2006 Nov 24 17:39:23 | 43 3| 0 8 4| 3 691 121| 4 4| 11 11 15 15| 68 68 289 288
92
0 36909 :tick-3sec 2006 Nov 24 17:39:26 | 83 16| 0 10 3| 3 699 122| 3 3| 14 13 17 17| 90 90 681 681
93
0 36909 :tick-3sec 2006 Nov 24 17:39:29 | 117 24| 0 9 4| 4 1092 198| 4 4| 13 13 17 17| 71 71 591 591
94
0 36909 :tick-3sec 2006 Nov 24 17:39:32 | 19 4| 0 9 3| 3 591 147| 3 3| 12 12 15 15| 44 43 249 249
95
0 36909 :tick-3sec 2006 Nov 24 17:39:35 | 144 28| 0 9 4| 4 1012 173| 4 4| 13 13 17 17| 61 62 1246 1247
96
0 36909 :tick-3sec 2006 Nov 24 17:39:38 | 46 5| 0 8 3| 3 705 122| 3 3| 11 11 14 14| 67 67 293 293
97
0 36909 :tick-3sec 2006 Nov 24 17:39:41 | 154 31| 0 9 3| 4 894 198| 3 3| 13 13 16 16| 73 72 968 969
98
0 36909 :tick-3sec 2006 Nov 24 17:39:44 | 46 5| 0 10 4| 4 971 162| 4 4| 13 13 17 17| 71 72 304 304
99
0 36909 :tick-3sec 2006 Nov 24 17:39:47 | 154 29| 0 8 3| 3 739 158| 3 3| 11 11 14 14| 67 67 990 990
100
0 36909 :tick-3sec 2006 Nov 24 17:39:50 | 46 5| 0 10 4| 4 815 162| 4 4| 14 14 18 18| 76 76 692 693
101
0 36909 :tick-3sec 2006 Nov 24 17:39:53 | 74 15| 0 8 3| 3 746 136| 3 3| 12 12 15 15| 54 53 571 571
102
0 36909 :tick-3sec 2006 Nov 24 17:39:56 | 116 20| 0 11 4| 4 992 184| 4 4| 14 14 18 18| 80 81 669 669
103
0 36909 :tick-3sec 2006 Nov 24 17:39:59 | 87 18| 0 11 4| 4 851 176| 5 4| 15 15 21 21| 77 76 670 670
104
0 36909 :tick-3sec 2006 Nov 24 17:40:02 | 109 20| 0 12 5| 4 930 184| 4 5| 17 17 20 20| 77 78 624 624
105
0 36909 :tick-3sec 2006 Nov 24 17:40:05 | 88 15| 0 9 3| 4 995 176| 3 3| 12 12 15 15| 71 71 1026 1026
106
0 36909 :tick-3sec 2006 Nov 24 17:40:08 | 112 20| 0 12 4| 4 927 184| 5 4| 16 16 22 22| 81 81 652 652
107
0 36909 :tick-3sec 2006 Nov 24 17:40:11 | 32 6| 0 7 4| 3 618 121| 3 4| 11 11 13 13| 54 53 336 336
108
0 36909 :tick-3sec 2006 Nov 24 17:40:14 | 145 30| 0 11 4| 4 988 199| 4 4| 15 15 19 19| 64 65 827 827
109
0 36909 :tick-3sec 2006 Nov 24 17:40:17 | 43 3| 0 7 3| 3 618 121| 3 3| 10 10 13 13| 64 64 286 286
110
0 36909 :tick-3sec 2006 Nov 24 17:40:20 | 157 31| 0 11 4| 4 977 199| 4 4| 15 15 19 19| 80 80 1406 1408
111
0 36909 :tick-3sec 2006 Nov 24 17:40:23 | 43 4| 0 7 3| 3 701 121| 3 3| 10 10 13 13| 64 64 285 285
112
0 36909 :tick-3sec 2006 Nov 24 17:40:26 | 73 18| 0 11 4| 4 948 171| 4 4| 15 15 19 19| 77 77 700 700
113
0 36909 :tick-3sec 2006 Nov 24 17:40:29 | 127 31| 0 10 4| 4 968 189| 4 4| 14 14 18 18| 74 74 584 584
114
0 36909 :tick-3sec 2006 Nov 24 17:40:32 | 10 3| 0 6 0| 1 203 41| 0 0| 58 8 62 62| 23 22 106 106
115
0 36909 :tick-3sec 2006 Nov 24 17:40:35 | 19 5| 0 5 0| 0 0 0| 0 0| 8 5 13 13| 30 30 200 200
116
0 36909 :tick-3sec 2006 Nov 24 17:40:38 | 16 5| 0 5 1| 0 0 0| 0 0| 5 6 10 10| 27 26 558 559
117
0 36909 :tick-3sec 2006 Nov 24 17:40:41 | 1 0| 0 4 0| 0 0 0| 0 0| 7 4 11 11| 9 9 34 34
118
0 36909 :tick-3sec 2006 Nov 24 17:40:44 | 0 0| 0 4 0| 0 0 0| 0 0| 7 4 11 11| 8 8 28 28
119
0 36909 :tick-3sec 2006 Nov 24 17:40:47 | 0 0| 0 6 0| 1 744 81| 1 1| 10 6 15 15| 14 14 33 33
120
0 36909 :tick-3sec 2006 Nov 24 17:40:50 | 1 0| 0 5 1| 0 0 0| 0 0| 8 6 14 14| 11 11 49 49
121
0 36909 :tick-3sec 2006 Nov 24 17:40:53 | 0 0| 0 4 0| 0 0 0| 0 0| 9 4 12 12| 6 7 28 28
122
0 36909 :tick-3sec 2006 Nov 24 17:40:56 | 0 0| 0 5 0| 0 0 0| 0 0| 8 5 13 13| 12 12 420 420
123
0 36909 :tick-3sec 2006 Nov 24 17:40:59 | 0 0| 0 4 0| 0 0 0| 0 0| 8 4 12 12| 9 8 30 30
124
0 36909 :tick-3sec 2006 Nov 24 17:41:02 | 0 0| 0 4 1| 0 0 0| 0 0| 12 5 16 16| 7 8 25 25
125
0 36909 :tick-3sec 2006 Nov 24 17:41:05 | 165 41| 0 48 60| 0 0 0| 1 1| 23 106 71 71| 96 97 1236 1236
126
0 36909 :tick-3sec 2006 Nov 24 17:41:08 | 178 28| 0 15 53| 4 965 206| 4 4| 68 68 75 75| 130 130 1336 1336
127
0 36909 :tick-3sec 2006 Nov 24 17:41:11 | 106 23| 0 27 35| 4 855 166| 4 4| 82 82 91 91| 115 114 1040 1040
128
0 36909 :tick-3sec 2006 Nov 24 17:41:14 | 198 37| 0 41 70| 4 1189 196| 4 4| 185 185 185 185| 134 135 1327 1327
129
0 36909 :tick-3sec 2006 Nov 24 17:41:17 | 16 5| 0 9 5| 4 940 161| 3 3| 17 17 20 20| 43 42 234 234
130
0 36909 :tick-3sec 2006 Nov 24 17:41:20 | 162 35| 0 13 8| 4 958 200| 4 4| 23 23 28 28| 80 81 1018 1018
131
0 36909 :tick-3sec 2006 Nov 24 17:41:23 | 44 6| 0 6 3| 2 544 81| 3 3| 8 8 11 11| 63 63 747 747
132
0 36909 :tick-3sec 2006 Nov 24 17:41:26 | 150 34| 0 13 6| 4 921 199| 4 4| 21 21 25 25| 73 72 923 923
133
0 36909 :tick-3sec 2006 Nov 24 17:41:29 | 43 3| 0 5 2| 2 506 81| 2 2| 7 7 9 9| 57 57 260 260
134
0 36909 :tick-3sec 2006 Nov 24 17:41:32 | 157 37| 0 9 3| 4 978 199| 3 3| 13 13 16 16| 73 72 970 970
135
0 36909 :tick-3sec 2006 Nov 24 17:41:35 | 43 3| 0 7 3| 2 512 85| 3 3| 9 9 12 12| 61 62 274 274
136
0 36909 :tick-3sec 2006 Nov 24 17:41:38 | 127 29| 0 8 3| 4 994 185| 3 3| 11 11 14 14| 68 68 1265 1265
137
0 36909 :tick-3sec 2006 Nov 24 17:41:41 | 66 11| 0 10 4| 4 973 171| 4 4| 14 14 18 18| 67 67 354 354
138
0 36909 :tick-3sec 2006 Nov 24 17:41:44 | 48 10| 0 8 3| 3 785 128| 3 3| 11 11 14 14| 52 51 399 399
139
0 36909 :tick-3sec 2006 Nov 24 17:41:47 | 142 31| 0 12 4| 4 913 192| 5 4| 17 17 23 23| 89 90 830 830
140
0 36909 :tick-3sec 2006 Nov 24 17:41:50 | 64 13| 0 11 5| 4 853 168| 4 5| 15 15 18 18| 75 75 542 542
143
cause for the performance degregation were a couple of dozens 'qsub -sync y' as explained in [http://gridengine.sunsource.net/issues/show_bug.cgi?id=2125 qmaster performance gets throttled if qsub -sync y is used when many jobs are
149
This case shows how the 6.0u8+ based DTrace script ([http://gridengine.sunsource.net/files/documents/7/97/SGE60-dtrace.tar.gz SGE60-dtrace.tar.gz])
150
helped to understand the cause of sporadic 'qmaster alive timeout expiring' loggings
153
Time | #wrt wrt/ms|#rep #gdi #ack| #dsp dsp/ms #sad| #snd #rcv| #lck0 #ulck0 #lck1 #ulck1
154
2006 Oct 21 04:04:52 | 80 2020| 77 30 15| 15 810 30| 15 15| 199 199 1660 1660
155
2006 Oct 21 04:05:07 | 101 2563| 76 30 15| 15 808 36| 15 15| 197 197 1701 1702
156
2006 Oct 21 04:05:22 | 130 3502| 100 31 15| 15 818 40| 15 15| 232 232 2028 2027
157
2006 Oct 21 04:05:37 | 161 4798| 103 30 15| 15 804 47| 15 15| 224 224 2050 2050
158
2006 Oct 21 04:05:52 | 169 4926| 108 30 15| 15 818 50| 15 15| 230 230 2133 2136
159
2006 Oct 21 04:06:07 | 134 3809| 98 30 15| 15 801 41| 15 15| 219 219 1942 1942
160
2006 Oct 21 04:06:22 | 192 5838| 113 31 15| 15 836 53| 15 15| 245 245 2172 2170
161
2006 Oct 21 04:06:37 | 346 13390| 185 27 13| 14 828 89| 13 13| 296 295 3064 3063
162
2006 Oct 21 04:06:52 | 364 13975| 178 13 6| 7 449 80| 7 7| 231 231 2879 2879
163
2006 Oct 21 04:07:07 | 355 14263| 182 7 4| 3 243 72| 3 3| 212 212 2991 2992
164
2006 Oct 21 04:07:22 | 335 14190| 179 5 2| 2 187 70| 2 2| 207 207 2857 2858
165
2006 Oct 21 04:07:37 | 297 14203| 178 3 1| 2 206 90| 2 2| 190 190 2639 2641
166
2006 Oct 21 04:07:52 | 383 13728| 133 4 2| 3 419 219| 2 3| 153 153 2442 2442
167
2006 Oct 21 04:08:07 | 432 13680| 115 3 1| 2 328 190| 1 2| 126 126 1930 1931
168
2006 Oct 21 04:08:22 | 498 13451| 226 6 2| 2 281 142| 3 2| 251 251 3086 3085
169
2006 Oct 21 04:08:37 | 411 13331| 89 4 2| 2 422 245| 2 2| 114 114 1938 1939
170
2006 Oct 21 04:08:52 | 473 13595| 136 3 1| 2 420 264| 1 2| 147 147 2194 2193
171
2006 Oct 21 04:08:57 | sge_log(4, qmaster alive timeout expired)
172
2006 Oct 21 04:09:07 | 322 11950| 85 3 1| 0 0 0| 2 1| 100 100 1561 1561
173
2006 Oct 21 04:09:22 | 512 9825| 291 9 1| 1 356 215| 2 1| 325 325 3850 3849
174
2006 Oct 21 04:09:37 | 226 14398| 0 0 0| 1 353 215| 0 1| 0 0 843 843
175
2006 Oct 21 04:09:40 | sge_log(4, qmaster alive timeout expired)
176
2006 Oct 21 04:09:52 | 200 13072| 30 1 0| 0 0 0| 1 1| 35 35 1301 1301
177
2006 Oct 21 04:10:07 | 516 11657| 139 2 0| 0 0 0| 1 0| 148 148 1900 1900
178
2006 Oct 21 04:10:22 | 302 9965| 434 3 0| 0 0 0| 0 0| 449 450 4439 4441
179
2006 Oct 21 04:10:37 | 404 12147| 10 5 1| 2 759 456| 1 2| 26 25 1214 1214
180
2006 Oct 21 04:10:44 | sge_log(4, qmaster alive timeout expired)
181
2006 Oct 21 04:10:52 | 225 14003| 0 0 0| 0 0 0| 0 1| 0 0 834 834
182
2006 Oct 21 04:11:07 | 381 12033| 120 1 0| 0 0 0| 1 0| 127 127 1733 1733
183
2006 Oct 21 04:11:22 | 502 12540| 338 2 0| 0 0 0| 1 0| 346 346 3492 3493
184
2006 Oct 21 04:11:37 | 434 12547| 151 7 1| 2 708 422| 1 2| 179 179 2964 2963
185
2006 Oct 21 04:11:47 | sge_log(4, qmaster alive timeout expired)
186
2006 Oct 21 04:11:52 | 187 14300| 0 0 0| 0 0 0| 0 1| 0 0 706 706
187
2006 Oct 21 04:12:07 | 378 12619| 93 1 0| 0 0 0| 1 0| 100 100 1483 1483
188
2006 Oct 21 04:12:22 | 474 13377| 357 2 0| 0 0 0| 1 0| 365 365 3897 3897
189
2006 Oct 21 04:12:37 | 384 8363| 154 8 1| 1 371 220| 1 1| 184 184 2809 2809
190
2006 Oct 21 04:12:52 | 203 14475| 0 0 0| 1 376 220| 0 1| 0 0 770 770
191
2006 Oct 21 04:12:52 | sge_log(4, qmaster alive timeout expired)
192
2006 Oct 21 04:13:07 | 234 13167| 98 1 0| 0 0 0| 1 1| 105 105 1556 1556
193
2006 Oct 21 04:13:22 | 430 11001| 21 1 0| 0 0 0| 0 0| 24 24 1080 1080
194
2006 Oct 21 04:13:37 | 332 14139| 387 1 0| 0 0 0| 1 0| 394 394 4101 4101
195
2006 Oct 21 04:13:52 | 420 12256| 141 8 1| 2 820 488| 1 2| 169 169 2227 2227
196
2006 Oct 21 04:14:03 | sge_log(4, qmaster alive timeout expired)
197
2006 Oct 21 04:14:07 | 212 14377| 0 0 0| 0 0 0| 0 1| 0 0 788 788
198
2006 Oct 21 04:14:22 | 274 12997| 86 1 0| 0 0 0| 1 0| 93 93 1537 1537
199
2006 Oct 21 04:14:37 | 376 11234| 23 1 0| 0 0 0| 0 0| 26 26 1020 1020
200
2006 Oct 21 04:14:52 | 273 14371| 325 1 0| 0 0 0| 1 0| 332 332 3274 3274
203
here inadequately set-up spooling (classic over NFS) caused that qmaster was almost entirely busy with IO. This can be concluded from the interval being 15 seconds with 'wrt/ms' being up to 14.475 seconds. The qmaster alive timeout
204
was a side implication. The bug is described under [http://gridengine.sunsource.net/issues/show_bug.cgi?id=2108 qmaster logging "scheduler tried to remove a incomplete"]
207
* [http://www.sun.com/books/catalog/solaris_perf_tools.xml McDougall, Richard; Mauro, Jim; and Gregg, Brendan. Solaris Performance and Tools, page 57. Prentice Hall PTR, 2007.]
210
* [https://dtrace.developer.network.com/nonav/using-dtrace-in-grid.html Sun Grid Application Observability Using DTrace]