137
There is a swift-stats-report tool for measuring overall cluster health. This
138
is accomplished by checking if a set of deliberately distributed containers and
139
objects are currently in their proper places within the cluster.
137
There is a swift-dispersion-report tool for measuring overall cluster health.
138
This is accomplished by checking if a set of deliberately distributed
139
containers and objects are currently in their proper places within the cluster.
141
141
For instance, a common deployment has three replicas of each object. The health
142
142
of that object can be measured by checking if each replica is in its proper
153
153
The first thing that needs to be done to provide this health value is create a
154
154
new account solely for this usage. Next, we need to place the containers and
155
155
objects throughout the system so that they are on distinct partitions. The
156
swift-stats-populate tool does this by making up random container and object
157
names until they fall on distinct partitions. Last, and repeatedly for the life
158
of the cluster, we need to run the swift-stats-report tool to check the health
159
of each of these containers and objects.
156
swift-dispersion-populate tool does this by making up random container and
157
object names until they fall on distinct partitions. Last, and repeatedly for
158
the life of the cluster, we need to run the swift-dispersion-report tool to
159
check the health of each of these containers and objects.
161
161
These tools need direct access to the entire cluster and to the ring files
162
162
(installing them on a proxy server will probably do). Both
163
swift-stats-populate and swift-stats-report use the same configuration file,
164
/etc/swift/stats.conf. Example conf file::
163
swift-dispersion-populate and swift-dispersion-report use the same
164
configuration file, /etc/swift/dispersion.conf. Example conf file::
167
167
auth_url = http://saio:11000/auth/v1.0
169
169
auth_key = testing
171
171
There are also options for the conf file for specifying the dispersion coverage
172
(defaults to 1%), retries, concurrency, CSV output file, etc. though usually
173
the defaults are fine.
172
(defaults to 1%), retries, concurrency, etc. though usually the defaults are
175
Once the configuration is in place, run `swift-stats-populate -d` to populate
175
Once the configuration is in place, run `swift-dispersion-populate` to populate
176
176
the containers and objects throughout the cluster.
178
178
Now that those containers and objects are in place, you can run
179
`swift-stats-report -d` to get a dispersion report, or the overall health of
179
`swift-dispersion-report` to get a dispersion report, or the overall health of
180
180
the cluster. Here is an example of a cluster in perfect health::
182
$ swift-stats-report -d
182
$ swift-dispersion-report
183
183
Queried 2621 containers for dispersion reporting, 19s, 0 retries
184
184
100.00% of container copies found (7863 of 7863)
185
185
Sample represents 1.00% of the container partition space
195
195
$ swift-ring-builder object.builder set_weight d0 200
196
196
$ swift-ring-builder object.builder rebalance
198
$ swift-stats-report -d
198
$ swift-dispersion-report
199
199
Queried 2621 containers for dispersion reporting, 8s, 0 retries
200
200
100.00% of container copies found (7863 of 7863)
201
201
Sample represents 1.00% of the container partition space
212
212
place and then rerun the dispersion report::
214
214
... start object replicators and monitor logs until they're caught up ...
215
$ swift-stats-report -d
215
$ swift-dispersion-report
216
216
Queried 2621 containers for dispersion reporting, 17s, 0 retries
217
217
100.00% of container copies found (7863 of 7863)
218
218
Sample represents 1.00% of the container partition space
221
221
100.00% of object copies found (7857 of 7857)
222
222
Sample represents 1.00% of the object partition space
224
So that's a summation of how to use swift-stats-report to monitor the health of
225
a cluster. There are a few other things it can do, such as performance
226
monitoring, but those are currently in their infancy and little used. For
227
instance, you can run `swift-stats-populate -p` and `swift-stats-report -p` to
228
get performance timings (warning: the initial populate takes a while). These
229
timings are dumped into a CSV file (/etc/swift/stats.csv by default) and can
230
then be graphed to see how cluster performance is trending.
232
------------------------------------
233
Additional Cleanup Script for Swauth
234
------------------------------------
236
With Swauth, you'll want to install a cronjob to clean up any
237
orphaned expired tokens. These orphaned tokens can occur when a "stampede"
238
occurs where a single user authenticates several times concurrently. Generally,
239
these orphaned tokens don't pose much of an issue, but it's good to clean them
240
up once a "token life" period (default: 1 day or 86400 seconds).
242
This should be as simple as adding `swauth-cleanup-tokens -A
243
https://<PROXY_HOSTNAME>:8080/auth/ -K swauthkey > /dev/null` to a crontab
244
entry on one of the proxies that is running Swauth; but run
245
`swauth-cleanup-tokens` with no arguments for detailed help on the options
248
225
------------------------
249
226
Debugging Tips and Tools