1
<style type="text/css">
15
border: 1 solid black;
23
<title>Chirp Protocol Version 2</title>
25
<h1>Chirp Protocol Version 2</h2>
26
<address>Douglas Thain<br>
27
University of Notre Dame<br>
33
Chirp is a simple and lightweight remote I/O protocol with multiple implementations. It is used in several distributed computing systems where users wish to access data over the wide area using a Unix filesystem like protocol. This document lays out the underlying protocol so as to promote interoperability between implementations, and permit additional implementations by other parties. At the time of writing, there are four independent implementations used for both research and production:
36
<li> A standalone C server and client maintained by Douglas Thain at the University of Notre Dame.
37
<li> A Java client implementation by Justin Wozniak used by the GEMS project at the University of Notre Dame.
38
<li> A C server and Java client embedded in the Condor distributed processing system at the University of Wisconsin.
39
<li> A C client implementation that is part of the ROOT I/O library used in the high energy physics community.
42
<h2>Authentication Protocol</h2>
44
Chirp is carried over a stream protocol such as TCP or Unix domain sockets. A server waits for a client to connect by the appropriate mechanism, and then waits for the client to identify itself. There are currently two methods of authentication: cookie authentication and negotiated authentication.
46
<h3>Cookie Authentication</h3>
48
In certain situations, the client may be provided in advance with a "cookie" that provides expedited access to a server with a fixed identity. This method is typically used within Condor when communicating with the embedded Chirp proxy.
50
In this situation, the client is provided with a file named <tt>chirp.config</tt> which contains the host and port of the server, and a cookie string. The client must connect to the server and immediately send <tt>cookie</tt>, a space, the cookie string, and a newline. If the response from the server is <tt>0\n</tt> then the cookie is accepted, and the client may proceed to the main Chirp protocol described below. Any other response indicates the cookie was rejected, and the client must abort.
52
<h3>Negotiated Authentication</h3>
54
In the general case, the client must prove its identity to the server. There are several possible methods of authentication, and both client and server may only accept some subset of them. The client drives the negotiation process by simply sending the requested authentication type as a plain string followed by a newline. If the server does not support the method, it responds <tt>no\n</tt> and waits for the client to request another. If it does support the method, it responds <tt>yes\n</tt> and then performs that method. At the time of writing, the supported authentication methods are <tt>hostname</tt>, <tt>unix</tt>, <tt>kerberos</tt>, and <tt>globus</tt>.
56
In this example, a client first requests <tt>kerberos</tt> authentication, and then <tt>hostname</tt>:
64
<h4>Hostname Authentication</h4>
66
This method of authentication simply asks that the server identify by the client by the name of the host it is calling from. The server responds by performing a remote domain name lookup on the caller�s IP address. If this succeeds, the server responds <tt>yes\n</tt>, otherwise <tt>no\n</tt> and fails (see below). If the client is authorized to connect, the server sends an additional <tt>yes\n</tt> otherwise <tt>no\n</tt> and fails.
68
<h4>Unix Authentication</h4>
70
This method of authentication causes the server the challenge the client to prove its identity by creating a file on the local file system. The server transmits the name of a non-existent file in a writable directory. The client must then attempt to create it. If successful, the client should respond <tt>yes\n</tt> otherwise <tt>no\n</tt> and fails. The server then checks that the file was actually created, and if satisfied responds <tt>yes\n</tt> otherwise <tt>no\n</tt> and fails.
72
<h4>Kerberos Authentication</h4>
74
This method is performed by invoking the standard Kerberos 5 libraries to authenticate in the usual manner, and is defined by that standard.
76
<h4>Globus Authentication</h4>
78
This method is performed by invoking the Secure Sockets Layer to authenticate using Globus credentials and is defined by that standard.
80
<h3>Completing Authentication</h3>
82
If the authentication method succeeds, then the server will send two lines to the client, giving the successful authentication method and the client's identity. The client may then proceed to the main protocol below. If the method fails, then the server and client return to negotiating a method.
84
<h2>Chirp Protocol</h2>
86
A request consists of an LF-terminated line possibly followed by binary data. At a minimum, a Chirp server must accept requests of 1024 characters. It may accept longer lines. If a line exceeds a server's internal maximum, it must gracefully consume the excess characters and return a TOO_BIG error code, defined below. Certain requests are immediately followed by binary data. The number of bytes is dependent on the semantics of the request, also described below.
88
A request is parsed into words separated by any amount of white space consisting of spaces and tabs. The first word of the request is called the "command" and the rest are called "arguments."
91
Words fall into two categories: strings and decimals. A string is any arbitrary sequence of characters, with whitespaces and unprintables encoding according to RFC 2396. (Very briefly, RFC 2396 says that whitespace must be encoding using a percent sign and two hex digits: <tt>ab cd</tt> becomes <tt>ab%20cd</tt>.) A decimal is any sequence of the digits 0-9, optionally prefixed by a single + or -.
94
At the protocol level, words have no maximum size. They may stretch on to fill any number of bytes. Of course, each implementation has limits on concepts such a file name lengths and integer sizes. If a receiver cannot parse, store, or execute a word contained in a request, it must gracefully consume the excess characters and return a TOO_BIG error response, defined below.
96
A response consists of an LF-terminated line, bearing an ASCII integer. A valid response may also contain whitespace and additional optional material following the response. If the response is greater than or equal to zero, the response indicates the operation succeeded. If negative, the operation failed, and the exact value tells why:
99
-1 NOT_AUTHENTICATED The client has not authenticated its identity.
100
-2 NOT_AUTHORIZED The client is not authorized to perform that action.
101
-3 DOESNT_EXIST There is no object by that name.
102
-4 ALREADY_EXISTS There is already an object by that name.
103
-5 TOO_BIG That request is too big to execute.
104
-6 NO_SPACE There is not enough space to store that.
105
-7 NO_MEMORY The server is out of memory.
106
-8 INVALID_REQUEST The form of the request is invalid.
107
-9 TOO_MANY_OPEN There are too many resources in use.
108
-10 BUSY That object is in use by someone else.
109
-11 TRY_AGAIN A temporary condition prevented the request.
110
-12 BAD_FD The file descriptor requested is invalid.
111
-13 IS_DIR A file-only operation was attempted on a directory.
112
-14 NOT_DIR A directory operation was attempted on a file.
113
-15 NOT_EMPTY A directory cannot be removed because it is not empty.
114
-16 CROSS_DEVICE_LINK A hard link was attempted across devices.
115
-17 OFFLINE The requested resource is temporarily not available.
116
-127 UNKNOWN An unknown error occurred.
120
Negative values beyond -17 are reserved for future expansion. The
121
receipt of such a values should be interpreted in the same way as UNKNOWN.
123
<h2>Chirp Commands</h2>
125
Following are the available commands. Each argument to a command is specific
126
Here are the available commands that are standardized. Note that each implementation may support some additional commands which are not (yet) documented because they are still experimental. Each argument to a command is specified with a type and a name.
128
<div id=cmd>open (string:name) (string:flags) (decimal:mode)</div>
130
Open the file named "name" "mode" is interpreted in the same manner as a POSIX file mode. Note that the mode is printed in decimal representation, although the UNIX custom is to use octal. For example, the octal mode 0755, representing readable and executable by everyone would be printed as 493. "flags" may contain any of these characters which affect the nature of the call:
133
<li> r - open for reading
134
<li> w - open for writing
135
<li> a - force all writes to append
136
<li> t - truncate before use
137
<li> c - create if it doesn't exist
138
<li> x - fail if 'c' is given and the file already exists
141
The open command returns an integer file description which may be used with later calls referring to open files. The implementation may optionally return file metadata in the same form as the stat command, immediately following the result. A client implementation must be prepared for both forms of result. On failure, returns a negative value in the response.
143
Note that a file descriptor returned by open only has meaning within the current connection between the client and the server. If the connection is lost, the server will implicitly close all open files. A file descriptor opened in one connection has no meaning in another connection.
145
<div id=cmd>close (decimal:fd)</div>
147
Complete access to this file descriptor. If the connection between a client an server is lost, all files are implicitly closed.
149
<div id=cmd>read (decimal:fd) (decimal:length)</div>
151
Read up to "length" bytes from the file descriptor "fd." If successful, the server may respond with any value between zero and "length." Immediately following the response will be binary data of exactly as many bytes indicated in the response. If zero, the end of the file has been reached. If any other value is given, no assumptions about the file size may be made.
153
<div id=cmd>pread (decimal:fd) (decimal:length) (decimal:offset)</div>
155
Read up to �length� bytes from the file descriptor �fd�, starting at �offset�. Return value is identical to that of read.
157
<div id=cmd>read (decimal:fd) (decimal:length) (decimal:offset) (decimal:stride_length) (decimal:stride_skip)</div>
159
Read up to "length" bytes from the file descriptor "fd", starting at �offset�, retrieving �stride_length� bytes every �stride_skip� bytes. Return value is identical to that of read.
161
<div id=cmd>write (decimal:fd) (decimal:length)</div>
163
Write up to "length" bytes to the file descriptor "fd." This request should be immediately followed by "length" binary bytes. If successful, may return any value between 0 and "length," indicating the number of bytes actually accepted for writing. An efficient server should avoid accepting fewer bytes than requested, but the client must be prepared for this possibility.
165
<div id=cmd>pwrite (decimal:fd) (decimal:length) (decimal:offset)</div>
167
Write up to "length" bytes to the file descriptor "fd" at offset �offset�. Return value is identical to that of write.
169
<div id=cmd>swrite (decimal:fd) (decimal:length) (decimal:offset) (decimal:stride_length) (decimal:stride_skip)</div>
171
Write up to "length" bytes to the file descriptor "fd", starting at �offset�, writing �stride_length� bytes every �stride_skip� bytes. Return value is identical to that of write.
173
<div id=cmd>fstat (decimal:fd)</div>
177
<div id=cmd>fsync (decimal:fd)</div>
179
Block until all uncommitted changes to this file descriptor have been written to stable storage.
181
<div id=cmd>lseek (decimal:fd) (decimal:offset) (decimal:whence)</div>
183
Move the current seek pointer of "fd" by "offset" bytes from the base given by "whence." "whence" may be:
186
<li> 0 - from the beginning of the file
187
<li> 1 - from the current seek position
188
<li> 2 - from the end of the file.
191
Returns the current seek position.
193
<div id=cmd>rename (string:oldpath) (string:newpath)</div>
195
Rename the file "oldpath" to be renamed to "newpath".
197
<div id=cmd>unlink (string:path)</div>
199
Delete the file named "path".
201
<div id=cmd>rmdir (string:path)</div>
203
Delete a directory by this name.
205
<div id=cmd>rmall (string:path)</div>
207
Recursively delete an entire directory.
209
<div id=cmd>mkdir (string:name) (decimal:mode)</div>
211
Create a new directory by this name. "mode" is interpreted in the same manner as a POSIX file mode. Note that mode is expressed in decimal rather than octal form.
213
<div id=cmd>fstat (decimal:fd)</div>
215
Get metadata describing an open file. If the response line indicates success, it is followed by a second line of thirteen integer values, indicating the following values:
217
<li> st_dev - Device number.
218
<li> st_ino - Inode number
219
<li> st_mode - Mode bits.
220
<li> st_nlink - Number of hard links.
221
<li> st_uid - User ID of the file�s owner.
222
<li> st_gid - Group ID of the file.
223
<li> st_rdev - Device number, if this file represents a device.
224
<li> st_size - Size of the file in bytes.
225
<li> st_blksize - Recommended transfer block size for accessing this file.
226
<li> st_blocks - Number of physical blocks consumed by this file.
227
<li> st_atime - Last time file was accessed in Unix time() format.
228
<li> st_mtime - Last time file data was modified in Unix time() format.
229
<li> st_ctime - Last time the inode was changed, in Unix time() format.
232
Note that not all fields may have meaning to all implementations. For example, in the standalone Chirp server, st_uid has no bearing on access control, and the user should employ setacl and getacl instead.
234
<div id=cmd>fstatfs (string:path)</div>
236
Get filesystem metadata for an open file. If the response line indicates success, it is followed by a second line of seven decimals with the following interpretation:
239
<li>f_type - The integer type of the filesystem.
240
<li>f_blocks - The total number of blocks in the filesystem.
241
<li>f_bavail - The number of blocks available to an ordinary user.
242
<li>f_bsize - The size in bytes of a block.
243
<li>f_bfree - The number of blocks free.
244
<li>f_files - The maximum number of files (inodes) on the filesystem.
245
<li>f_ffree - The number of files (inodes) currently in use
248
<div id=cmd>fchown (decimal:fd) (decimal:uid) (decimal:gid)</div>
250
Change the ownership of an open file to the UID and GID indicated.
252
<div id=cmd>fchmod (decimal:fd) (decimal:mode)</div>
254
Change the Unix mode bits on an open file.
256
<div id=cmd>ftruncate (decimal:fd) (decimal:length)</div>
258
Truncate an open file.
260
<div id=cmd>getfile (string:path)</div>
262
Retrieves an entire file efficiently. A positive response indicates the number of bytes in the file, and is followed by exactly that many bytes of binary data which are the file�s contents.
264
<div id=cmd>putfile (string:path) (decimal:mode) (decimal:length)</div>
266
Stores an entire file efficiently. The client first sends the request line indicating the file name, the desired mode bits, and the length in bytes. The response indicates whether the client may proceed. If it indicates success, the client must write exactly �length� bytes of data, which are the file�s contents. If the response indicates failure, the client must not send any additional data.
268
<div id=cmd>getlongdir (string:path)</div>
270
Lists a directory and all metadata. If the response indicates success, it will be followed by a series of lines, alternating the name of a directory entry with its metadata in the same form as returned by fstat. The end of the list is indicated by a single blank line.
272
<div id=cmd>getdir (string:path)</div>
274
Lists a directory. If the response indicates success, it will be followed by a series of lines indicating the name of each directory entry. The end of the list is indicated by a single blank line.
276
<div id=cmd>getacl (string:path)</div>
278
Get an access control list. If the response indicates success, it will be followed by a series of lines indicating each entry of the access control list. The end of the list is indicated by a single blank line.
280
<div id=cmd>setacl (string:path) (string:subject) (string:rights)</div>
282
Modify an access control list. On an object identified by path, set rights rights for a given subject. The string �-� may be used to indicate no rights.
284
<div id=cmd>whoami</div>
286
Get the user�s current identity with respect to this server. If the response is greater than zero, it is followed by exactly that many bytes in data, containing the user�s identity.
288
<div id=cmd>whoareyou (string:rhost)</div>
290
Get the server�s current identity with respect to a remote host. If the response is greater than zero, it is followed by exactly that many bytes in data, containing the server�s identity.
292
<div id=cmd>link (string:oldpath) (string:newpath)</div>
294
Create a hard link from newpath to oldpath.
296
<div id=cmd>symlink (string:oldpath) (string:newpath)</div>
298
Create a symlink from newpath to oldpath.
300
<div id=cmd>readlink (string:path)</div>
302
Read the contents of a symbolic link. If the response is greater than zero, it is followed by exactly that many bytes in data, containing the contents of the link.
304
<div id=cmd>mkdir (string:path) (decimal:mode)</div>
306
Create a new directory with the given Unix mode bits.
308
<div id=cmd>stat (string:path)</div>
310
Get file status. If the response indicates success, it is followed by a second line in the same format as the command fstat. If the pathname represents a symbolic link, this command examines the target of the link.
312
<div id=cmd>lstat (string:path)</div>
314
Get file status. If the response indicates success, it is followed by a second line in the same format as the command fstat. If the pathname represents a symbolic link, this command examines the link itself.
316
<div id=cmd>statfs (string:path)</div>
318
Get filesystem status. If the response indicates success, it is followed by a second line in the same format as the command fstatfs.
320
<div id=cmd>access (string:path) (decimal:mode)</div>
322
Check access permissions. Returns success if the user may access the file according to the specified mode. The �mode� may be any of the values logical-or�d together:
324
<li> 0 (F_OK) Test for existence of the file.
325
<li> 1 (X_OK) Test if the file is executable.
326
<li> 2 (W_OK) Test if the file is readable.
327
<li> 4 (R_OK) Test if the file is executable.
330
<div id=cmd>chmod (string:path) (decimal:mode)</div>
332
Change the Unix mode bits on a given path. (NOTE 1)
334
<div id=cmd>chown (string:path) (decimal:uid) (decimal:gid)</div>
336
Change the Unix UID or GID on a given path. If the path is a symbolic link, change its target. (NOTE 1)
338
<div id=cmd>lchown (string:path) (decimal:uid) (decimal:gid)</div>
340
Change the Unix UID or GID on a given path. If the path is a symbolic link, change the link. (NOTE 1)
342
<div id=cmd>truncate (string:path) (decimal:length)</div>
344
Truncate a file to a given number of bytes.
346
<div id=cmd>utime (string:path) (decimal:actime) (decimal:mtime)</div>
348
Change the access and modification times of a file.
350
<div id=cmd>md5 (string:path)</div>
352
Checksum a remote file using the MD5 message digest algorithm. If successful, the response will be 16, and will be followed by 16 bytes of data representing the checksum in binary form.
354
<div id=cmd>thirdput (string:path) (string:remotehost) (string:remotepath)</div>
356
Direct the server to transfer the path to a remote host and remote path. If the indicated path is a directory, it will be transferred recursively, preserving metadata such as access control lists.
358
<div id=cmd>mkalloc (string:path) (decimal:size) (decimal:mode)</div>
360
Create a new space allocation at the given path that can contain �size� bytes of data and has initial mode of �mode�.
362
<div id=cmd>lsalloc (string:path)</div>
364
List the space allocation state on a directory. If the response indicates success, it is followed by a second line which gives the path of the containing allocation, the total size of the allocation, and the
366
<b>(NOTE 1)</b> The standalone Chirp server ignores the traditional Unix mode bits when performing access control. Calls such as fchmod, chmod, chown, and fchown are essentially ignored, and the caller should employ setacl and getacl instead.