1
2011-09-22 Steven Dake <sdake@redhat.com>
3
Deliver all messages from my_high_seq_recieved to the last gap
4
This patch passes two test cases:
9
Two node cluster - run cpgbench on each node
11
modify totemsrp with following defines:
19
start 5 nodes randomly at about same time, start 5 nodes randomly at about
20
same time, wait 10 seconds and attempt to send a message. If message blocks
21
on "TRY_AGAIN" likely a message loss has occured. Wait a few minutes without
22
cyclng the nodes and see if the TRY_AGAIN state becomes unblocked.
24
If it doesn't the test case has failed
26
Reviewed-by: Reviewed-by: Jan Friesse <jfriesse@redhat.com>
27
(cherry picked from commit 2ec4ddb039b310b308a8748c88332155afd62608)
29
2011-07-14 Russell Bryant <russell@russellbryant.net>
31
Resolve a deadlock between the timer and serialize locks.
32
This patch resolves a deadlock between the serialize lock (in
33
exec/main.c) and the timer lock (in exec/timer.c). I observed this
34
deadlock happening fairly quickly on a cluster using the EVT service
35
from OpenAIS. (OpenAIS 1.1.4, Corosync 1.4.1)
37
In prioritized_timer_thread(), it was grabbing:
41
In another thread, you have:
42
1) grab the serialize lock in deliver_fn() of exec/main.c
43
2) grab the timer lock in corosync_timer_add_duration().
45
The patch just swaps the locking order in the timer thread.
47
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
49
2011-09-08 Jan Friesse <jfriesse@redhat.com>
51
totemconfig: change minimum RRP threshold
52
RRP threshold can be lower value then 5.
54
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
55
(cherry picked from commit f6c2a8dab786c50ece36dd3424e258e93a1000d3)
57
2011-09-05 Steven Dake <sdake@redhat.com>
59
Ignore memb_join messages during flush operations
60
a memb_join operation that occurs during flushing can result in an
61
entry into the GATHER state from the RECOVERY state. This results in the
62
regular sort queue being used instead of the recovery sort queue, resulting
65
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
66
(cherry picked from commit 48ffa8892daac18935d96ae46a72aebe2fb70430)
68
2011-09-01 Jan Friesse <jfriesse@redhat.com>
70
rrp: Higher threshold in passive mode for mcast
71
There were too much false positives with passive mode rrp when high
72
number of messages were received.
74
Patch adds new configurable variable rrp_problem_count_mcast_threshold
75
which is by default 10 times rrp_problem_count_threshold and this is
76
used as threshold for multicast packets in passive mode. Variable is
77
unused in active mode.
79
Reviewed by: Steven Dake <sdake@redhat.com>
80
(cherry picked from commit 752239eaa1edd68695a6e40bcde60471f34a02fd)
82
rrp: Handle endless loop if all ifaces are faulty
83
If all interfaces were faulty, passive_mcast_flush_send and related
84
functions ended in endless loop. This is now handled and if there is no
85
live interface, message is dropped.
87
Reviewed by: Steven Dake <sdake@redhat.com>
88
(cherry picked from commit 0eade8de79b6e5b28e91604d4d460627c7a61ddd)
90
2011-08-18 Tim Beale <tim.beale@alliedtelesis.co.nz>
92
A CPG client can sometimes lockup if the local node is in the downlist
93
In a 10-node cluster where all nodes are booting up and starting corosync
94
at the same time, sometimes during this process corosync detects a node as
95
leaving and rejoining the cluster.
97
Occasionally the downlist that gets picked contains the local node. When the
98
local node sends leave events for the downlist (including itself), it sets
99
its cpd state to CPD_STATE_UNJOINED and clears the cpd->group_name. This
100
means it no longer sends CPG events to the CPG client.
102
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
103
(cherry picked from commit 08f07be323b777118264eb37413393065b360f8e)
105
Display ring-ID consistently in debug
106
Ring ID was being displayed both as hex and decimal in places. Update so
107
it's displayed consistently (I chose hex) to make debugging easier.
109
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
110
(cherry picked from commit 370d9bcecf2716e52c8f729a53e9600fe6cc6aa4)
112
Add code comment mapping for message handler defines
113
As a corosync-newbie it can be hard to bridge the gap between where a
114
particular message is sent and where the receive handler processes it,
117
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
118
(cherry picked from commit 5a724a9c39465f7e63888f33375261506f69bd02)
120
2011-08-17 Jan Friesse <jfriesse@redhat.com>
122
cfg: Handle errors from totem_mcast
123
totem_mcast function can return -1 if corosync is overloaded. Sadly
124
in many calls of this functions was error code ether not handled at
125
all, or handled by assert.
127
Commit changes behaviour to ether return CS_ERR_TRY_AGAIN or put
128
error code to later layers to handle it.
130
Reviewed-by: Steven Dake <sdake@redhat.com>
132
cpg: Handle errors from totem_mcast
133
totem_mcast function can return -1 if corosync is overloaded. Sadly in
134
many calls of this functions was error code ether not handled at all, or
137
Commit changes behaviour to ether return CS_ERR_TRY_AGAIN or put error
138
code to later layers to handle it.
140
Reviewed-by: Steven Dake <sdake@redhat.com>
142
coroipcc: use malloc for path in service_connect
143
Coroipcc appropriately uses PATH_MAX sized variables for various data
144
structures handling files in the initialization of the client. Due to
145
the use of 12 of these structures declared as stack variables, the
146
application stack balloons to over 12*4k. This is especially problematic
147
if threads are used by long running daemons to restart the connection
148
to corosync so as to be resilient in the face of system services
149
restarting (service corosync restart).
151
A simple alternative is to allocate temporary memory to avoid
152
requirements of large thread stacks.
154
Original patch by Dan Clark <2clarkd@gmail.com>
156
Reviewed-by: Steven Dake <sdake@redhat.com>
158
2011-07-26 Jan Friesse <jfriesse@redhat.com>
160
main: let poll really stop before totempg_finalize
161
Reviewed-by: Steven Dake <sdake@redhat.com>
162
(cherry picked from commit d4fb83e971b6fa9af0447ce0a70345fb20064dc1)
164
Revert "totemsrp: Remove recv_flush code"
165
This reverts commit 1a7b7a39f445be63c697170c1680eeca9834de39.
167
Reversion is needed to remove overflow of receive buffers and dropping
170
(cherry picked from commit ddb5214c2c57194fe8e12d775398bfc5726743c4)
172
2011-07-26 MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
174
totemsrp: fix buffer overflows for large clusters (> 100 nodes)
175
Reviewed-by: Steven Dake <sdake@redhat.com>
176
(cherry picked from commit 1d9f444feced36ec6118b4df5560f093ec44aba8)
178
2011-07-21 Jan Friesse <jfriesse@redhat.com>
180
specfile: Install corosync-signals.conf for dbus
181
Reviewed-by: Steven Dake <sdake@redhat.com>
182
(cherry picked from commit 2d75c7058f32b0f58aa5c825c13187103fcde1b2)
184
specfile: use _datadir as var expansion not exec
185
Reviewed-by: Steven Dake <sdake@redhat.com>
186
(cherry picked from commit a197e7b1cec329701167def358f8c603b5dd054c)
188
specfile: Correct URL and source0
189
Reviewed-by: Steven Dake <sdake@redhat.com>
190
(cherry picked from commit f103fb29b3e062dff67338fd75b91ea59ceb4972)
192
2011-07-21 Tim Beale <tim.beale@alliedtelesis.co.nz>
194
Add some more stats for debugging
195
+ overload - number of times client is told to try again
196
+ invalid_request - message contained invalid paramter, e.g. invalid size
197
+ msg_queue_avail - messages currently available at the Totem layer
198
+ msg-queue_reserved - messages currently reserved at the Totem layer
200
Reviewed-by: Steven Dake <sdake@redhat.com>
201
(cherry picked from commit 04f37df2f774b0d25540e27102c8a60527aa7125)
203
2011-07-18 Jan Friesse <jfriesse@redhat.com>
205
rrp: Handle rollower in passive rrp properly
206
Reviewed-by: Steven Dake <sdake@redhat.com>
207
(cherry picked from commit ad5cda223c0916ea517d6f9f6c0ff4af3cd32246)
209
rrp: handle rollover in active rrp properly
210
Reviewed-by: Steven Dake <sdake@redhat.com>
211
(cherry picked from commit d02d2887471423bd23247895d96a0d687255aa55)
213
totemconfig: Change default FAIL_TO_RECV_CONST
214
Previous default (50) was too low for most modern switch hardware. This
215
may trigger abort because the aru doesn't increase for 50 token
216
rotations combined with a defect in how failed to recv conditions are
217
handled. By increasing this tunable, the condition should no longer
218
trigger the errant code.
220
Reviewed-by: Steven Dake <sdake@redhat.com>
221
(cherry picked from commit a48c8e517d82d099bfd3f4a8ebc11716eeb3962b)
223
2011-07-18 Steven Dake <sdake@redhat.com>
225
Correct missing poll funtions from service handler struct needed for confdb APIs
226
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
227
(cherry picked from commit c544e87bb0bfdb6e2c5a43ec01113f814e738550)
229
Fix problem where corosync will segfault if there are gaps in recovery queue
230
Fixes a problem where there are gaps in the recovery queue. Example my_aru = 5,
231
but there are messages at 7,8. 8 = my_high_seq_received which results
232
in data slots taken up in new message queue. What should really happen
233
is these last messages should be delivered after a transitional
234
configuration to maintain SAFE agreement. We don't have support for
235
SAFE atm, so it is probably safe just to throw these messages away. Without
236
this change, the new message queue on a new configuraton change is out of sync.
238
Tested-by: Tim Beale <tlbeale@gmail.com>
239
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
240
(cherry picked from commit a3d98f1652011d6bc75101c7b8aa098c2d2977e4)
242
2011-07-15 Jan Friesse <jfriesse@redhat.com>
244
Ensure that strings are null terminated after strncpy().
245
From the strcpy(3) man page, the following warning is given:
246
The strncpy() function is similar, except that at most n bytes of src
247
are copied. Warning: If there is no null byte among the first n bytes
248
of src, the string placed in dest will not be null-terminated.
250
The current corosync code base does not take this warning into account
251
when using strncpy, potentially resulting in non-null terminated strings.
253
Reviewed-by: Steven Dake <sdake@redhat.com>
254
(backported from commit a609f79f1f8d23f8e57fe2afb383bd62621545f6)
256
cfgtool: print list of IP with space between items
257
Reviewed-by: Steven Dake <sdake@redhat.com>
258
(cherry picked from commit b4bef1cbf533ec1b8bdefb21a7987c6f69a40b3d)
260
cpgtool: print list of IP with space between items
261
Reviewed-by: Steven Dake <sdake@redhat.com>
262
(cherry picked from commit f6df7823fafed80f555b6a7d76643d2b555d17bb)
264
cfg_get_node_addrs: Return correct addresses
265
Zero element array behavior is very different from normal array or
266
pointer. This behavior is root of problem in not returning correctly
267
filled array of addresses. This appeared only in rrp mode, where more
268
then one address is returned.
270
All memcpy's are now correctly converted to copy pointer to char.
272
Reviewed-by: Steven Dake <sdake@redhat.com>
273
(cherry picked from commit 033f7ced1061e39647b0b9d07e1eecb74839cd8a)
275
corosync-fplay: use uint32_t and remove bit-shift
276
The flight recorder records all data in 32 bit words. Use uint32_t type
277
rather then unsigned int. Also remove bit-shift with multiply by sizeof
280
Reviewed-by: Steven Dake <sdake@redhat.com>
281
(cherry picked from commit 12163b62d2d84ec438f35f5b942d3e8525585755)
283
corosync-fplay: Use size_t length mod in printf
284
Reviewed-by: Steven Dake <sdake@redhat.com>
285
(cherry picked from commit d3e9382d57e02724b44ea5f5736f42deb6c65a82)
287
corosync-fplay: handle too large rec_size
288
Corrupted files may contain items with rec_size larger then g_record
289
buffer and/or flt_data_size.
291
Also g_record array size is now defined as constant.
293
Reviewed-by: Steven Dake <sdake@redhat.com>
294
(cherry picked from commit 7b0517f5e97af89ecb0a1c3145ad1db2a35475f5)
296
logsys: Properly lock flt data before dump
297
Data needs to be locked, otherwise resulting fdata file may be
300
Reviewed-by: Steven Dake <sdake@redhat.com>
301
(cherry picked from commit c5e823732504e0c6e9e0eb66870bcacafde080c9)
303
logsys: Don't leak fd on successful fdata dump
304
Reviewed-by: Steven Dake <sdake@redhat.com>
305
(cherry picked from commit 88515e3d20d9b34cc7a15e8da717aeb0a9965900)
307
Handle "nocluster" kernel parameter in init script
308
Init script checks kernel parameters and refuses to start corosync if
309
nocluster parameter exist on boot time. The init script will
310
continue to work as expected from console/tty after boot.
312
Reviewed-by: Steven Dake <sdake@redhat.com>
313
(cherry picked from commit fbbb3f01cbb7b5a6a105dbc4fe1541ca8bdb5e4d)
315
2011-07-08 Jan Friesse <jfriesse@redhat.com>
317
totemiba: free send_buf on ibv_reg_mr failure
318
Reviewed-by: Steven Dake <sdake@redhat.com>
319
(cherry picked from commit 57749ec02a081b21218508355f139315bb95b652)
321
2011-07-07 Florian Haas <florian.haas@linbit.com>
323
build: disable RDMA support in RPMs by default
324
Rather than curiously disable RDMA support by default in configure and
325
enable it by default in RPM builds, streamline the default
326
configuration to always turn RDMA support off. It can be enabled in
327
RPM builds with "--with rdma".
329
Reviewed-by: Steven Dake <sdake@redhat.com>
330
(cherry picked from commit 051bca82df29d3448c55b772a4c4935c70c83643)
332
build: set RDMA related _LIBS and _CFLAGS only if building with RDMA support
333
Having to force {ibverbs,rdmacm}_{LIBS,CFLAGS} looks positively odd;
334
so this may warrant further review. However, they are definitely not
335
needed if building without RDMA support.
337
Reviewed-by: Steven Dake <sdake@redhat.com>
338
(cherry picked from commit e715a455b6fc2582f505f4b24ac1500068687ba9)
340
build: make RDMA support an RPM build conditional
341
Enable RDMA in RPM builds by default to maintain the previous behavior
342
(which always included --enable-rdma in the %configure invocation).
344
Reviewed-by: Steven Dake <sdake@redhat.com>
345
(cherry picked from commit 17fb819af1168d2d271a4d49a3f2536addcb80ed)
347
build: force LC_ALL=C correctly for dates
348
Failure to force "C" dates will have RPM et al. complain about invalid
349
dates and timestamps.
351
Reviewed-by: Steven Dake <sdake@redhat.com>
352
(cherry picked from commit b8809eaf270196ecb061fefa043c7bca8af75b06)
354
2011-07-07 Tim Beale <tim.beale@alliedtelesis.co.nz>
356
Fix compile/runtime issues for _POSIX_THREAD_PROCESS_SHARED < 1
357
For the case where _POSIX_THREAD_PROCESS_SHARED < 1, the code doesn't compile
358
for corosync v1.3.1. And when it does compile, it crashes on our system - our
359
version of uClibc seems to always expect a 4th arg. The man pages suggests
360
the 4th arg is optional, but does say: 'For greater portability it is best to
361
always call semctl() with four arguments', which is what this patch does.
362
Also removed semop as it's an unused variable.
364
Reviewed-by: Steven Dake <sdake@redhat.com>
365
(cherry picked from commit 77f7e5b0fe40338e6e5760feb12768defa6b0cf9)
367
getpwnam_r()/getgrnam_r() returns ERANGE for some systems
368
On our system the expected buffer length is 256. This means calls to
369
getpwnam_r()/getgrnam_r() return ERANGE error and corosync fails to startup.
370
These 2 functions return ERANGE when insufficient buffer space is supplied.
371
Judging by the man page for getpwnam_r, the correct way to determine the
372
buffersize on any given system is to use sysconf().
374
Reviewed-by: Steven Dake <sdake@redhat.com>
375
(cherry picked from commit ba107f0a33fd5e6ef4073b9cc5539740e6ae3c12)
377
2011-07-07 Jiaju Zhang <jjzhang.linux@gmail.com>
379
RRP: redundant ring automatic recovery
380
This patch automatically recovers redundant ring failures.
382
Please note that this patch introduced rrp_autorecovery_check_timeout
383
in totem config hence breaks internal ABI. The internal ABI users
384
of totem.h need to rebuild their binaries.
386
Tested-by: Jan Friesse <jfriesse@redhat.com>
387
Tested-by: Florian Haas <florian.haas@linbit.com>
388
Tested-by: Jiaju Zhang <jjzhang@suse.de>
389
(cherry picked from commit 5dc33c2824e9fd2b8c18e2e30cf60210c5e8617e)
391
2011-07-07 Jan Friesse <jfriesse@redhat.com>
393
flatiron: enable compile with --enable-fatal-warnings
394
Reviewed-by: Steven Dake <sdake@redhat.com>
396
2011-07-04 Tim Serong <tserong@novell.com>
398
Correct mailing list address in corosync_overview manpage
399
Reviewed-by: Steven Dake <sdake@redhat.com>
400
(cherry picked from commit 5a3a42dd2b6f9c12af4a653f6bd1b0b808581690)
402
2011-07-04 Masatake YAMATO <yamato@redhat.com>
404
fix typos in cpg_mcast_joined.3 and cpg_zcb_mcast_joined.3
405
(cherry picked from commit 7ba892dac323f9656c16981e02d3612f521bfbdb)
407
2011-07-04 Steven Dake <sdake@redhat.com>
409
Add coverity target to corosync makefile.am
410
Allow a make coverity target for those developers with coverity tools
413
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
414
(cherry picked from commit 899052484eaf3cee08d0a56b6579b73bf2ce99a0)
416
2011-06-29 Jan Friesse <jfriesse@redhat.com>
418
coroipcc: Test _SC_PAGESIZE result
419
Reviewed-by: Steven Dake <sdake@redhat.com>
420
(cherry picked from commit 94d934e0e0fa55027a974eb709a488802ee6134e)
423
Spinlocks are now removed, because even spinlock can improve
424
speed is some special cases, in most cases it makes corosync CPU usage
425
much more intensive and less responsive then if only mutexes are used.
427
What we were doing is:
435
Reviewed-by: Steven Dake <sdake@redhat.com>
436
(backported from commit 8c717c22b2f137f81ac5ae1a3437d73b62bb451d)
438
votequorum: free newly allocated node if nodeid==0
439
Reviewed-by: Steven Dake <sdake@redhat.com>
440
(cherry picked from commit 5458d4f27ad956d23a27a0d83b9cf9a6e36e68d0)
442
2011-06-28 Jerome Flesch <jerome.flesch@netasq.com>
444
Fix usage of strerror_r()/perror()
445
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
446
(backported from commit 00434a4f10f0a0b0dfb1714504860d7ef560f7fb)
448
2011-06-28 Steven Dake <sdake@redhat.com>
450
sched_params log message incorrect
451
The sched_params parameter was set before being printed.
453
Reviewed-by: <sdake@redhat.com>
454
(cherry picked from commit ae4a3af3407ec185f88172fdc88cc6227647565b)
456
2011-06-28 Jan Friesse <jfriesse@redhat.com>
458
confdb: Resolve dispatch deadlock
459
Following situation could happen:
460
- one thread is waiting for finish write operation (line 853), objdb is
462
- flush (done in objdb_notify_dispatch) is called in main thread, but
463
this call will never appear because main thread is waiting for objdb
466
In this situation deadlock appears.
468
Commit solves this by:
469
- setting pipe to non-blocking mode
470
- pipe is used only as trigger for coropoll
471
- dispatch messages are stored in list
472
- main thread is processing messages from list
474
Reviewed-by: Steven Dake <sdake@redhat.com>
475
(cherry picked from commit b5d2f4578a239c6ee500e43542a93d0fa48d7fb6)
477
objdb: save copy of handles in object_find_create
478
Following situation could happen:
479
- process 1 thru confdb creates find handle
480
- calls find iteration once
481
- different process 2 deletes object pointed by process 1 iterator
482
- process 1 calls iteration again ->
483
object_find_instance->find_child_list is invalid pointer
487
Now object_find_create creates array of matching object handlers and
488
object_find_next uses that array together with check for name. This
489
prevents situation where between steps 2 and 3 new object is created
490
with different name but sadly with same handle.
492
Also good to note that this patch is more or less quick hack rather
493
then proper solution. Real proper solution is to not use pointers
494
and rather use handles everywhere. This is big TODO.
496
Reviewed-by: Steven Dake <sdake@redhat.com>
497
(cherry picked from commit e8000c7b9b93b2ac4e6bec39df26755fdd4a8cf0)
499
2011-06-28 Jiaju Zhang <jjzhang.linux@gmail.com>
501
RRP: Fix ring initialization issue for UDPU mode
502
Redundant ring has some problem in the UDP unicast mode. The problem
503
is the second ring has not been successfully initialized, that is, the
504
second time iface_changes happens, the member list for that interface
505
has not been added, which results in that ring cannot transmit normal
506
message. So the second ring cannot take over the work if the first
507
ring is down. This patch fixes this issue.
509
comments from review:
510
More work is needed probably in totemnet where totemnet maintains the
511
the of node list and an iterator for them, and totemudpu_member_add adds
512
state information to a context for the iteration.
514
In any regard, that is somewhat difficult to test, so I'll merge this
515
patch for now - keep in mind interface changes on the bindnetaddr will
516
cause problems with udpu after this patch has been commmitted.
518
Reviewed-by: Steven Dake <sdake@redhat.com>
519
(cherry picked from commit c6bfc6b5d62d19686104265e8a1b2409f4c1eaf8)
521
2011-06-28 Jan Friesse <jfriesse@redhat.com>
523
coroipcc: check recvmsg result in socket_recv
524
According specification recvmsg can return 0, which means that
525
connection is closed. We had this check, but limited only for systems
526
other then Linux. recvmsg can return 0 even on Linux, so check is now
527
applied on all systems.
529
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
530
(cherry picked from commit 2e5dc5f322a5a852d31f5775debbf51e062695ba)
532
confdb: Properly check result of object_find_create
533
in confdb_object_iter result of object_find_create is now properly
534
checked. object_find_create can return -1 if object doesn't exists.
535
Without this check, incorrect handle (memory garbage) was directly
536
passed to object_find_next.
538
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
539
(cherry picked from commit 9afb4bdaa84aa3e7b48aa0a5136ee039dc73e19a)
541
crypto: rng_make_prng prevent buf overflow
542
with bits set to 1023, buf of 256 bytes was filled by rng_get_bytes
543
up to 257 bytes. Buf is now 258 bytes so it's no longer problem.
545
Reviewed-by: Steven Dake <sdake@redhat.com>
546
(cherry picked from commit 50f05bfa15622e10f58511e8b0b8dadfe670e12f)
548
mainconfig: Check retval of logsys_format_set
549
Reviewed-by: Steven Dake <sdake@redhat.com>
550
(cherry picked from commit afa0398ca4a605c0896b0d02b02805db736c0090)
552
testcpgzc: fgets buffer to really allocated size
553
Reviewed-by: Steven Dake <sdake@redhat.com>
554
(cherry picked from commit aa23d20125ed9845186471e417bbe010978b7c29)
556
cpg: do_proc_join change list_slice to list_add
557
In this concrete case result is equivalent but makes coverity happy.
559
Reviewed-by: Steven Dake <sdake@redhat.com>
560
(cherry picked from commit f95d3b3bf206995d0bc04ae4b1855932eaaa4911)
562
totemudp: memset of proper size
563
In totemudp_mcast_thread_state_constructor memset to
564
sizeof(struct totemudp_mcast_thread_state) instead of size of
567
Reviewed-by: Steven Dake <sdake@redhat.com>
568
(cherry picked from commit 531e81602f8b47846aec8573dc57cb8941100367)
570
coroipcs: init buf in coroipcs_handler_dispatch
571
Reviewed-by: Steven Dake <sdake@redhat.com>
572
(cherry picked from commit ea0a24866ccf27a4010edf75c5d0d223a84c80cd)
574
coroparse: don't leak dirent
575
Reviewed-by: Steven Dake <sdake@redhat.com>
576
(cherry picked from commit c2a39cb8e2b3cc717dfe273425df3f2b4d0b48c0)
578
logsys: _logsys_wthread_create never returns != 0
579
Reviewed-by: Steven Dake <sdake@redhat.com>
580
(cherry picked from commit d76bb76d1fef350eef74ada4f834c2011a70889e)
582
notifyd: Check retval of corosync_cfg_initialize
583
Reviewed-by: Steven Dake <sdake@redhat.com>
584
(cherry picked from commit 844c8759d72637e1c7776d598744343ddee62e2e)
586
totemconfig: discard check of objdb_get_string ret
587
Reviewed-by: Steven Dake <sdake@redhat.com>
588
(cherry picked from commit 6b9297131cda9ae874effa4e27ad70601a56d977)
590
coroipcc: proper path size in coroipcc_zcb_alloc
591
memory_map function internally limits maximum path size to
592
PATH_MAX but coroipcc_zcb_alloc passed smaller buffer.
594
Reviewed-by: Steven Dake <sdake@redhat.com>
595
(cherry picked from commit 0273c54054f7e8c83b165daa1a4ded13f78f0515)
597
libquorum: memset/memcpy proper size of callbacks
598
Reviewed-by: Steven Dake <sdake@redhat.com>
599
(cherry picked from commit 6af98e79ee7f0278b641cb8f0cd8d8499988e373)
601
iazc: Reduce number of mem alloc and memcpy
602
X86 processors are able to handle unaligned memory access. Improve
603
performance by using that feature on i386 and x86_64 compatible
604
processors, and use old aligning code on different processors.
606
Reviewed-by: Steven Dake <sdake@redhat.com>
607
(cherry picked from commit 77d98081251d1821ff62777dffd4543700737e02)
609
2011-06-28 Jerome Flesch <jerome.flesch@netasq.com>
611
logsys: When corosync is compiled with --enable-small-memory-footprint, also reduce the size of the logsys SHM
612
Reviewed-by: Steven Dake <sdake@redhat.com>
613
(cherry picked from commit 6bec0aa2276530d25a1984e90f7bd274f8d0c75b)
615
coroipcc_dispatch_get(): Fix --enable-small-memory-footprint support
616
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
617
(cherry picked from commit 795aa5e24cee83c88b8a6ea3a3fd06e754f55010)
619
coroipcs_handler_dispatch(): Fix conn_info->service security value: -1 is not a good security value since it's equal to SOCKET_SERVICE_INIT
620
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
621
(cherry picked from commit b11267211515e4fc50882acd5f2afe493c363708)
623
coroipcc: Fix unhandled BSD EOF in coroipcc_dispatch_get()
624
Reviewed-by: Steven Dake <sdake@redhat.com>
625
(cherry picked from commit 76426d7901def8bd7f3da8b07107f765dd8572d4)
627
Corosync: Fix build when done with --enable-fatal-warnings
628
Reviewed-by: Jan Friesse<jfriesse@redhat.com>
629
(backported from commit fe51e703675232a69009245cd9e0523bb1858dd6)
631
2011-06-28 Russell Bryant <russell@russellbryant.net>
633
logsys.c: Use snprintf() instead of sprintf().
634
Change a couple of string functions to use the the output length
635
limiting counterpart.
637
(cherry picked from commit a53e402912a7c4c4039b928d3b741fe8239ab2f7)
639
2011-06-28 Jan Friesse <jfriesse@redhat.com>
641
corosync-objctl: Option to display binary data
642
Reviewed-by: Steven Dake <sdake@redhat.com>
643
(cherry picked from commit 801717e46391af0b4d3103746b721e663f6db167)
645
2011-06-28 Angus Salkeld <asalkeld@redhat.com>
647
cpg: fix sync master selection when one node paused.
648
If one node is paused it can miss a config change and
649
thus report a larger old_members than expected.
651
The solution is to use the left_nodes field.
653
Master selection used to be "choose node with":
654
1) largest previous membership
655
2) (then as a tie-breaker) node with smallest nodeid
658
1) largest (previous #nodes - #nodes know to have left)
659
2) (then as a tie-breaker) node with smallest nodeid
661
(cherry picked from commit 956a1dcb4236acbba37c07e2ac0b6c9ffcb32577)
663
2011-06-28 Jan Friesse <jfriesse@redhat.com>
665
totemsrp: Enhance mcast failure detection
666
memb_state_gather_enter increase stats.continuous_gather only if
667
previous state was gather also. This should happen only if multicast is
668
not working properly (local firewall in most cases) and not if many
669
nodes joins at one time.
671
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
672
(cherry picked from commit 61d83cd719bcc93a78eb5c718a138b96c325cc3e)
674
coroipcs: Deny connect to service without initfn
675
If library connect to service with no init function, coroipcs will try
676
to dereference NULL pointer. Now we correctly return error code
679
Reviewed-by: Steven Dake <sdake@redhat.com>
680
(cherry picked from commit 719fddd8e16b6da8694fa84dd2fafbb202401200)
682
2011-04-15 Tim Serong <tserong@novell.com>
684
Add ipc_refcnt to message_handler_req_{exec, lib}_cfg_ringreenable()
685
Without refcounting the conn pointer here, corosync will segfault
686
if one kills a running instance of "corosync-cfgtool -r" (rhbz#695191)
688
Reviewed-by: Steven Dake <sdake@redhat.com>
690
Fix tyop in RRP faulty error messages
691
Reviewed-by: Russell Bryant <russell@russellbryant.net>
693
2011-04-15 Steven Dake <sdake@redhat.com>
695
Align ipc on 8 byte boundaries
696
Align all ipc messages on 8 byte boundaries. This alignment will remove bus
697
errors on systems that can't access non-byte aligned data and should improve
700
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
702
Fix problem where unaligned totemip address access would result in bus error on non-unaligned-safe architectures.
703
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
705
2011-04-15 Greg Walton <corosync@gwalton.net>
707
Clean up ENDIAN ifdef tests
708
Reviewed-by: Steven Dake <sdake@redhat.com>
710
2011-04-13 Angus Salkeld <asalkeld@redhat.com>
712
IPC: place calls to stats functions outside of mutexes
713
This is to prevent nasty deadlocks between IPC and objdb.
715
Reviewed-by: Steven Dake <sdake@redhat.com>
717
2011-04-12 Zane Bitter <zane.bitter@gmail.com>
719
Provide better checking of the message type
720
A negative value for the message type (on systems where char is signed)
721
would cause a crash. This is highly probable if the cluster is, for example,
722
misconfigured to have encryption enabled on some nodes but not others.
724
Reviewed-by: Steven Dake <sdake@redhat.com>
726
2011-03-29 Steven Dake <sdake@redhat.com>
728
Fix problem in previous commit leading to compiler error
729
commit 78ae800f80fa9cd0fe593724f5c64138c205fec5 was backported from master
730
without addressing the lack of a few services in flatiron.
732
2011-03-29 Angus Salkeld <asalkeld@redhat.com>
734
Fix shutdown when a confdb client is still connected
735
If you are connected to corosync and registered for
736
object notifications then corosync is asked to shutdown
737
the IPC server will get stuck. This is because the pipe
738
is closed and the refcount is increased. This leaves ipcs
739
with a connection that it can't destroy.
742
1) if a write to the pipe fails (pipe closed) decrement the refcounter.
743
2) fix the object_track_stop() - it was not working as the functions
744
did not match up. (this caused the late callbacks).
745
3) in ipcs call exit_fn() then stats_destroy_connection() so that
746
the service engine can have time to call object_track_stop()
747
before the object gets destroyed.
749
Reviewed-by: Steven Dake <sdake@redhat.com>
751
STATS: add the service name to the connection name.
752
This helps to quickly identify what service the application
755
The object will now look like:
756
runtime.connections.corosync-objctl:CONFDB:19654:13.service_id=11
757
runtime.connections.corosync-objctl:CONFDB:19654:13.client_pid=19654
760
This also makes it clearer to receivers of the dbus/snmp events
763
Reviewed-by: Steven Dake <sdake@redhat.com>
765
NOTIFYD: prevent duplicate quorate events.
766
Reviewed-by: Steven Dake <sdake@redhat.com>
768
NOTIFYD: fix retrieving the application's parent name.
769
Reviewed-by: Steven Dake <sdake@redhat.com>
771
2011-03-24 Angus Salkeld <asalkeld@redhat.com>
773
confdb: send notifications from the main thread not IPC thread
774
corosync-notifyd has exposed an issue with confdb notifications.
776
The normal state of affairs is:
777
IPC thread > lock > objdb > lock
779
objdb notification whilst really useful turn things around:
780
<middle of big call chain>
781
objdb > lock > confdb > ipc > lock
783
This reverse ordering of locks causes a horrible dead lock.
785
I see this patch as a work around until corosync-2.0
786
when most of the threads and locking disappear.
788
This patch adds a pipe to confdb service. When we get a
789
objdb notification a struct gets written to the pipe.
790
The poll loop then runs the dispatch in the main thread.
791
In the dispatch we call the real ipc_dispatch_send().
793
Reviewed-by: Steven Dake <sdake@redhat.com>
795
2011-03-24 Steven Dake <sdake@redhat.com>
797
totemsrp: free messages originated in recovery rather then rely on messages_free
798
Relying on messages_free may seem like it should work, but it leads to a
799
situation where every node has released the messages, yet some nodes think
800
messages are missing. The output then looks like "Retransmit: #" in
801
repitition. This patch frees those messages immediately during the transition
802
to the OPERATIONAL state and sets the internal variables totemsrp depends
803
upon to the proper values.
805
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
807
totemsrp: Only restore old ring id information one time
808
The current code stores the current ring information every time a commit
809
token is generated. This causes the old ring id used for comparison purposes
810
to increase if a token is lost in commit or recovery, resulting in failure of
811
totem. This patch changes the behavior to only store the old ring id one
812
time when the commit token is received, and then further commit token ring
813
id saves are not done until OPERATIONAL is reached.
815
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
817
totemsrp: Remove recv_flush code
818
The recv_flush code is no longer necessary because of the miss_count_count
819
addition. It can in some cases lead to register corruption because of
820
interactions with -fstack-protector, the recursive nature of how this code
821
works, and interactions with the optimizer in some versions of gcc.
823
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
825
2011-03-21 Steven Dake <sdake@redhat.com>
827
Resolve abort during simulatenous stopping of atleast 4 nodes
830
node 3,4 stopped (by random stopping) node 1,2,5 form new configuration
831
and during recovery node 1 and node 2 are stopped (via service service
832
corosync stop). This causes 5 never to finish recovery within the timeout
833
period, triggering a token loss in recovery. Bug #623176 resolved an assert
834
which happens because the full ring id was being restored. The resolution
835
to Bug #623176 was to not restore the full ring id, and instead operate
836
(according to specifications) the new ring id. Unfortunately this exposes
837
a problem whereby the restarting of nodes 1-4 generate the same ring id.
838
This ring id gets to the recovery failed node 5 which is now in gather,
839
and triggers a condition not accounted for in the original totem specification.
841
It appears later work from Dr. Agarwal's PHD dissertation considers this
842
scenario. That solution entails rejecting the regular token in the above
843
condition. Since the ring id is also used to make decisions for commit token
844
acceptance, we must also take care to reject the regular token in all cases
845
after transitioning from OPERATIONAL.
847
Reviewed-by: Steven Dake <sdake@redhat.com>
849
2011-03-21 Angus Salkeld <asalkeld@redhat.com>
851
notifyd: dispatch only one message at a time.
852
This is avoid getting stuck in the dispatch processing
853
messages when the user is trying to shutdown the service.
855
Reviewed-by: Steven Dake <sdake@redhat.com>
857
2011-03-15 Angus Salkeld <asalkeld@redhat.com>
859
Remove the ttl option from udpu and rely on the kernel ttl setting.
860
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
862
Fix the ttl defaults and range
863
1) both IPv4 and IPv6 mcast should default to ttl=1
864
2) the range should be 0..255
865
0 is valid meaning localhost only (cluster of one)
867
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
869
2011-03-08 Steven Dake <sdake@redhat.com>
871
Fix abort when token is lost in RECOVERY state
872
A commit token should be rejected when a token is lost in the recovery
873
state. This occurs naturally because the ring id increases by 4 for
874
every new ring. Prior to this patch, if the token was lost, the old
875
ring id information was restored, causing a commit token to be accepted
876
when it should be rejected. This erronously accepted commit token would
877
lead to an assertion which is fixed by this patch.
879
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
881
2011-02-28 Jan Friesse <jfriesse@redhat.com>
883
objdb: destroy all handles in _clear_object
884
Patch replaces free for object_instance with handle_destroy to remove
885
leaks in handles (and also memory leak).
887
Reviewed-by: Steven Dake <sdake@redhat.com>
889
Iterate all items in object_reload_notification
890
Reviewed-by: Steven Dake <sdake@redhat.com>
892
2011-02-24 Steven Dake <sdake@redhat.com>
894
Don't assert when ring id file is less then 8 bytes
895
If the ring id file for the processor is less then 8 bytes, totemsrp would
896
assert. Our speculation is that this condition happens during a fencing
897
operation or local filesystem corruption.
899
With this patch, Corosync will create fresh ring id file data when the
900
incorrect number of bytes are read from the ring id.
902
Amend to use sizeof the strerror string length and PATH_MAX for the path length.
904
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
906
snmp: Allow buildling of corosync on already existing older install of corosync
907
When building corosync against older libraries already installed on the system,
908
the corosync-notifyd application uses the wrong Makefile.am commands. This
909
results in the SNMPLIBS (which includes -L/usr/lib64) coming before the proper
910
LDADD flags. The result is an inability to compile on an already existing
913
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
915
2011-02-10 Angus Salkeld <asalkeld@redhat.com>
917
Fix merge markers in spec file
919
2011-02-09 Angus Salkeld <asalkeld@redhat.com>
921
Make node state a string (not an integer)
922
Ryan noticed this inconsistency, all other status's
923
are string so this should be too.
925
Reviewed-by: Seven Dake <sdake@redhat.com>
926
Reviewed-by: Ryan O'Hara <rohara@redhat.com>
928
objdb: fix some strange types (uint8_t* -> void*).
929
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@3045 fd59a12c-fef9-0310-b244-a6a79926bd2f
931
2011-02-04 Steven Dake <sdake@redhat.com>
933
Conflicts previously resolved were not merged.
935
2011-02-04 Angus Salkeld <asalkeld@redhat.com>
937
MIB: expand the descriptions of the notifications
938
Reviewed-by: Steven Dake <sdake@redhat.com>
940
2011-02-04 Lon Hohberger <lhh@redhat.com>
942
Match up MIB to notifyd & add SNMP quorum events
943
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
945
Make SNMP MIB match what is being sent over DBUS
946
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
948
2011-02-04 Steven Dake <sdake@redhat.com>
950
Add dbus and snmp notifier
951
This is to send dbus events on major cluster events:
953
- application connect/dissconnet from corosync
956
dbus events can then be converted into snmp traps by foghorn or
957
corosync-notifyd can be run to directly send snmp traps.
959
Reviewed-by: Steven Dake <sdake@redhat.com>
960
Reviewed-by: Russell Bryant <russell@russellbryant.net>
961
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
969
Conflicts Reviewed-by: Steven Dake <sdake@redhat.com>
971
2011-02-04 Angus Salkeld <asalkeld@redhat.com>
973
CONFDB: add confdb_object_name_get()
974
This is useful when tracking object changes.
976
Reviewed-by: Seven Dake <sdake@redhat.com>
978
STATS: fix key name length on "join_count"
979
Reviewed-by: Seven Dake <sdake@redhat.com>
981
STATS: increase the space for application names
982
Reviewed-by: Seven Dake <sdake@redhat.com>
984
2011-01-26 Angus Salkeld <asalkeld@redhat.com>
986
CPG: make sure coroipcc_service_disconnect() is always called.
987
This prevents a shared mem leak if corosync dies while clients
990
Calling cpg_finalize() did not release the shared mem as
991
coroipcc_msg_send_reply_receive() returned an error and
992
thus coroipcc_service_disconnect() did not get called.
994
Reviewed-by: Steven Dake <sdake@redhat.com>
996
IPC: send failure message to client if memory maps fail
997
Reviewed-by: Steven Dake <sdake@redhat.com>
999
2011-01-26 Jan Friesse <jfriesse@redhat.com>
1001
Add objdb firewall_enabled_or_nic_failure
1002
New objdb var runtime.totem.pg.mrp.srp.firewall_enabled_or_nic_failure
1003
is set to 1 if continuous_gather is larger then MAX_NO_CONT_GATHER.
1004
Under normal conditions, value of variable is 0.
1006
Reviewed-by: Steven Dake <sdake@redhat.com>
1008
Display warning when not possible to form cluster
1009
This may typically happen if local firewall is enabled. Patch adds new
1010
item to statistics called continuous_gather where is number of
1011
continuous entered gather state. If this number is bigger then
1012
MAX_NO_CONT_GATHER, warning message is displayed. This is also used on
1013
exiting, so stop of corosync is now possible even with enabled firewall.
1015
Reviewed-by: Steven Dake <sdake@redhat.com>
1017
2011-01-26 Angus Salkeld <asalkeld@redhat.com>
1019
Add totem/interface/ttl config option.
1020
This adds a per-interface config option to
1023
Reviewed-by: Steven Dake <sdake@redhat.com>
1025
2011-01-11 Steven Dake <sdake@redhat.com>
1027
Handle delayed multicast packets that occur with switches
1028
Some switches delay multicast packets vs the unicast token. This patch works
1029
around that problem by providing a new tuneable called miss_count_const. This
1030
tuneable works by counting the number of times a message is found missing
1031
and once reaching the const value, marks it as missing in the retransmit list.
1033
This improves performance and doesn't display warning messages about missed
1034
multicast messages when operating in these switching environments.
1036
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
1
1038
2010-12-01 Fabio M. Di Nitto <fdinitto@redhat.com>
3
1040
build: fix make srpm from release tarball