~percona-dev/percona-xtradb-cluster/galera-3.x : changes

~percona-dev/percona-xtradb-cluster/galera-3.x » Changes from revision 218

From Revision 218 to 199

Rev	Summary	Authors	Date
218	Port the split packaging from 2.x tree Port the split packaging from 2.x tree	Raghavendra D Prabhu	9 years ago
217	Port the metapackages from 2.x tree Port the metapackages from 2.x tree	Raghavendra D Prabhu	9 years ago
216	Bump the versions Bump the versions	Raghavendra D Prabhu	9 years ago
215	Merge galera-3.x upto revno 183, also reverting changes made... Merge galera-3.x upto revno 183, also reverting changes made in revno 209 for Bug#1285380	Raghavendra D Prabhu	10 years ago
214	Merge galera-3.x upto revn 177 Merge galera-3.x upto revn 177	Raghavendra D Prabhu	10 years ago
213	Fix the conflicts Fix the conflicts	Raghavendra D Prabhu	10 years ago
212	Add garbd2 to conflicts Add garbd2 to conflicts	Raghavendra D Prabhu	10 years ago
211	Bump versions Bump versions	Raghavendra D Prabhu	10 years ago
210	Merge galera-3.x upto revno 176 Merge galera-3.x upto revno 176	Raghavendra D Prabhu	10 years ago
209	Bug#1285380: Donor in desynced state makes the joiner wait ... Bug#1285380: Donor in desynced state makes the joiner wait indefinitely So, this is how it looks: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%5 do { end = strchr(begin, ','); int len; if (NULL == end) { len = str_len - (begin - str); } else { len = end - begin; } assert (len >= 0); int const idx = len > 0 ? /* consider empty name as "any" / group_find_node_by_name (group, joiner_idx, begin, len, status) : / err == -EAGAIN here means that at least one of the nodes in the * list will be available later, so don't try others. / (err == -EAGAIN ? err : group_find_node_by_state(group, joiner_idx, status)); if (idx >= 0) return idx; / once we hit -EAGAIN, don't try to change error code: this means * that at least one of the nodes in the list will become available. / if (-EAGAIN != err) err = idx; begin = end + 1; / skip comma / } while (end != NULL); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Based on my tests, when wsrep_sst_donor='A1,A2,' and A1 is unavailable (non-SYNCED), A3 does SST from A2 without any issues. However, if wsrep_sst_donor='A1,' and A1 is unavailable, then it keeps looping without any bounds (there are no strict bounds on number of retries): %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% do { tries++; gcs_seqno_t seqno_l; ret = gcs_.request_state_transfer(req->req(), req->len(), sst_donor_, &seqno_l); if (ret < 0) { if (!retry_str(ret)) { log_error << "Requesting state transfer failed: " << ret << "(" << strerror(-ret) << ")"; } else if (1 == tries) { log_info << "Requesting state transfer failed: " << ret << "(" << strerror(-ret) << "). " << "Will keep retrying every " << sst_retry_sec_ << " second(s)"; } } if (seqno_l != GCS_SEQNO_ILL) { / Check that we're not running out of space in monitor. / if (local_monitor_.would_block(seqno_l)) { long const seconds = sst_retry_sec_ tries; log_error << "We ran out of resources, seemingly because " << "we've been unsuccessfully requesting state " << "transfer for over " << seconds << " seconds. " << "Please check that there is " << "at least one fully synced member in the group. " << "Application must be restarted."; ret = -EDEADLK; } else { // we are already holding local monitor LocalOrder lo(seqno_l); local_monitor_.self_cancel(lo); } } } %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% From log: 140309 0:01:32 [Note] [Debug] WSREP: gcs/src/gcs.c:gcs_replv():1568: Freeing gcache buffer 0x7f528bfff528 after receiving -11 140309 0:01:32 [Note] WSREP: galera/src/replicator_str.cpp:send_state_request():560: Requesting state transfer failed: -11(Resource temporarily unavailable). Will keep retrying every 1 second(s) WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:16001,reuseaddr stdio \| xbstream -x; RC=( ${PIPESTATUS[@]} ) (20140309 00:01:32.391) 140309 0:01:33 [Note] [Debug] WSREP: gcs/src/gcs.c:gcs_replv():1568: Freeing gcache buffer 0x7f528bfff592 after receiving -11 140309 0:01:34 [Note] [Debug] WSREP: gcs/src/gcs.c:gcs_replv():1568: Freeing gcache buffer 0x7f528bfff5fc after receiving -11 140309 0:01:35 [Note] [Debug] WSREP: gcs/src/gcs.c:gcs_replv():1568: Freeing gcache buffer 0x7f528bfff666 after receiving -11 140309 0:01:36 [Note] [Debug] WSREP: gcs/src/gcs.c:gcs_replv():1568: Freeing gcache buffer 0x7f528bfff6d0 after receiving -11 140309 0:01:37 [Note] [Debug] WSREP: gcs/src/gcs.c:gcs_replv():1568: Freeing gcache buffer 0x7f528bfff73a after receiving -11 140309 0:01:38 [Note] [Debug] WSREP: gcs/src/gcs.c:gcs_replv():1568: Freeing gcache buffer 0x7f528bfff7a4 after receiving -11 So I see that, according to current design, it looks good. The current design being if a node in wsrep_sst_donor is unavailable then don't fall back at all (to group_find_node_by_state). However, there is a flaw in that which is that in wsrep_sst_donor there is a choice given whether to leave a dangling comma or not. The former implies that check all nodes and try the fall provider logic whereas the latter implies a strict checking. Currently, irrespective of whether the comma exists or not, it does only strict membership checking for donor without ever falling back. In case when a dangling comma is provided by user (for precisely that reason) - "A1,A2," it should check A1, A2 and if both are unavailable then check for others (may A3 or A5 are available (in a cluster of A1,A2,A3,A4,A5) as well. The number of retries should also be bounded, but that is for another bug.	Raghavendra D Prabhu	10 years ago
208	Merge galera-3.x upto revno 174 Merge galera-3.x upto revno 174	Raghavendra D Prabhu	10 years ago
207	Bump versions in spec file Bump versions in spec file	Raghavendra D Prabhu	10 years ago
206	Fix build issues with ssl=0 Fix build issues with ssl=0	Raghavendra D Prabhu	10 years ago
205	Fix the merge fragment Fix the merge fragment	Raghavendra D Prabhu	10 years ago
204	Merge galera-3.x upto revno 172 Merge galera-3.x upto revno 172	Raghavendra D Prabhu	10 years ago
203	Bump the RPM's version Bump the RPM's version	Raghavendra D Prabhu	10 years ago
202	Bumped the version from 3.2 to 3.3 Bumped the version from 3.2 to 3.3	Raghavendra D Prabhu	10 years ago
201	Replace crc32 with sse4.2 in gcc CFLAGS for build failures o... Replace crc32 with sse4.2 in gcc CFLAGS for build failures on ubuntu-lucid-32/64 and debian6-32/64	Raghavendra D Prabhu	10 years ago
200	Merge Galera tree upto revno 171 Merge Galera tree upto revno 171	Raghavendra D Prabhu	10 years ago
199	Few config cleanups for garbd Few config cleanups for garbd	Raghavendra D Prabhu	10 years ago

Older »