~david-goetz/swift/wal_again

Viewing changes to doc/source/admin_guide.rst

Committer: Tarmac
Author(s): gholt
Date: 2011-06-16 21:12:04 UTC
mfrom: (291.19.6 postcopy)
mto: This revision was merged to the branch mainline in revision 294.
Revision ID: tarmac-20110616211204-s5slh4h8nt9mrd2v

You can specify X-Newest: true on GETs and HEADs to indicate you want Swift to query all backend copies and return the newest version retrieved.
Object COPY requests now always copy the newest object they can find.
Object POSTs are implemented as COPYs now by default (you can revert to previous implementation with conf object_post_as_copy = false)
Account and container GETs and HEADs now shuffle the nodes they use to balance load.

files added:
CHANGELOG

bin/swift

bin/swift-container-stats-logger

bin/swift-dispersion-populate

bin/swift-dispersion-report

etc/dispersion.conf-sample

swift/common/middleware/tempauth.py

swift/stats/db_stats_collector.py

test/unit/common/test_init.py

test/unit/stats/test_db_stats_collector.py

files removed:
bin/st

bin/swauth-add-account

bin/swauth-add-user

bin/swauth-cleanup-tokens

bin/swauth-delete-account

bin/swauth-delete-user

bin/swauth-list

bin/swauth-prep

bin/swauth-set-account-service

swift/common/middleware/swauth.py

swift/stats/account_stats.py

test/unit/stats/test_account_stats.py

files renamed:
test/unit/common/middleware/test_swauth.py => test/unit/common/middleware/test_tempauth.py

files modified:
bin/swift-account-stats-logger

bin/swift-log-uploader

bin/swift-ring-builder

bin/swift-stats-populate

bin/swift-stats-report

doc/source/_theme/layout.html

doc/source/admin_guide.rst

doc/source/debian_package_guide.rst

doc/source/deployment_guide.rst

doc/source/development_auth.rst

doc/source/development_saio.rst

doc/source/howto_installmultinode.rst

doc/source/misc.rst

doc/source/overview_auth.rst

doc/source/overview_large_objects.rst

doc/source/overview_stats.rst

etc/log-processor.conf-sample

etc/proxy-server.conf-sample

etc/stats.conf-sample

setup.py

swift/__init__.py

swift/account/server.py

swift/common/bench.py

swift/common/bufferedhttp.py

swift/common/client.py

swift/common/daemon.py

swift/common/db.py

swift/common/db_replicator.py

swift/common/direct_client.py

swift/common/middleware/catch_errors.py

swift/common/middleware/staticweb.py

swift/common/ring/builder.py

swift/common/utils.py

swift/container/server.py

swift/obj/auditor.py

swift/obj/replicator.py

swift/obj/server.py

swift/proxy/server.py

swift/stats/log_processor.py

swift/stats/log_uploader.py

test/functional/swift.py

test/functional/tests.py

test/probe/common.py

test/probe/test_account_failures.py

test/probe/test_container_failures.py

test/probe/test_object_handoff.py

test/unit/__init__.py

test/unit/account/test_server.py

test/unit/common/middleware/test_except.py

test/unit/common/middleware/test_ratelimit.py

test/unit/common/middleware/test_staticweb.py

test/unit/common/test_bufferedhttp.py

test/unit/common/test_utils.py

test/unit/container/test_server.py

test/unit/obj/test_replicator.py

test/unit/proxy/test_server.py

test/unit/stats/test_log_uploader.py

Show diffs side-by-side

added added

removed removed

doc/source/admin_guide.rst

134

Cluster Health

135

--------------

136

137

There is a swift-stats-report tool for measuring overall cluster health. This

138

is accomplished by checking if a set of deliberately distributed containers and

139

objects are currently in their proper places within the cluster.

137

There is a swift-dispersion-report tool for measuring overall cluster health.

138

This is accomplished by checking if a set of deliberately distributed

139

containers and objects are currently in their proper places within the cluster.

140

141

For instance, a common deployment has three replicas of each object. The health

142

of that object can be measured by checking if each replica is in its proper

153

The first thing that needs to be done to provide this health value is create a

154

new account solely for this usage. Next, we need to place the containers and

155

objects throughout the system so that they are on distinct partitions. The

156

swift-stats-populate tool does this by making up random container and object

157

names until they fall on distinct partitions. Last, and repeatedly for the life

158

of the cluster, we need to run the swift-stats-report tool to check the health

159

of each of these containers and objects.

156

swift-dispersion-populate tool does this by making up random container and

157

object names until they fall on distinct partitions. Last, and repeatedly for

158

the life of the cluster, we need to run the swift-dispersion-report tool to

159

check the health of each of these containers and objects.

160

161

These tools need direct access to the entire cluster and to the ring files

162

(installing them on a proxy server will probably do). Both

163

swift-stats-populate and swift-stats-report use the same configuration file,

164

/etc/swift/stats.conf. Example conf file::

163

swift-dispersion-populate and swift-dispersion-report use the same

164

configuration file, /etc/swift/dispersion.conf. Example conf file::

165

166

[stats]

167

auth_url = http://saio:11000/auth/v1.0

169

auth_key = testing

170

171

There are also options for the conf file for specifying the dispersion coverage

172

(defaults to 1%), retries, concurrency, CSV output file, etc. though usually

173

the defaults are fine.

172

(defaults to 1%), retries, concurrency, etc. though usually the defaults are

173

fine.

174

175

Once the configuration is in place, run `swift-stats-populate -d` to populate

175

Once the configuration is in place, run `swift-dispersion-populate` to populate

176

the containers and objects throughout the cluster.

177

178

Now that those containers and objects are in place, you can run

179

`swift-stats-report -d` to get a dispersion report, or the overall health of

179

`swift-dispersion-report` to get a dispersion report, or the overall health of

180

the cluster. Here is an example of a cluster in perfect health::

181

182

$ swift-stats-report -d

182

$ swift-dispersion-report

183

Queried 2621 containers for dispersion reporting, 19s, 0 retries

184

100.00% of container copies found (7863 of 7863)

185

Sample represents 1.00% of the container partition space

195

$ swift-ring-builder object.builder set_weight d0 200

196

$ swift-ring-builder object.builder rebalance

197

...

198

$ swift-stats-report -d

198

$ swift-dispersion-report

199

Queried 2621 containers for dispersion reporting, 8s, 0 retries

200

100.00% of container copies found (7863 of 7863)

201

Sample represents 1.00% of the container partition space

212

place and then rerun the dispersion report::

213

214

... start object replicators and monitor logs until they're caught up ...

215

$ swift-stats-report -d

215

$ swift-dispersion-report

216

Queried 2621 containers for dispersion reporting, 17s, 0 retries

217

100.00% of container copies found (7863 of 7863)

218

Sample represents 1.00% of the container partition space

221

100.00% of object copies found (7857 of 7857)

222

Sample represents 1.00% of the object partition space

223

224

So that's a summation of how to use swift-stats-report to monitor the health of

225

a cluster. There are a few other things it can do, such as performance

226

monitoring, but those are currently in their infancy and little used. For

227

instance, you can run `swift-stats-populate -p` and `swift-stats-report -p` to

228

get performance timings (warning: the initial populate takes a while). These

229

timings are dumped into a CSV file (/etc/swift/stats.csv by default) and can

230

then be graphed to see how cluster performance is trending.

231

232

------------------------------------

233

Additional Cleanup Script for Swauth

234

------------------------------------

235

236

With Swauth, you'll want to install a cronjob to clean up any

237

orphaned expired tokens. These orphaned tokens can occur when a "stampede"

238

occurs where a single user authenticates several times concurrently. Generally,

239

these orphaned tokens don't pose much of an issue, but it's good to clean them

240

up once a "token life" period (default: 1 day or 86400 seconds).

241

242

This should be as simple as adding `swauth-cleanup-tokens -A

243

https://<PROXY_HOSTNAME>:8080/auth/ -K swauthkey > /dev/null` to a crontab

244

entry on one of the proxies that is running Swauth; but run

245

`swauth-cleanup-tokens` with no arguments for detailed help on the options

246

available.

247

224

248

225

------------------------

249

226

Debugging Tips and Tools

Older »