NAStore Internal Design Specification
NAStore
Virtual Volume Manager
Internal Design Specification
Bill Ross, Network Archive Systems
The Virtual Volume Manager (VVM) provides a disk cache
layer between clients and removable volumes (e.g. tape)
in the form of "virtual volumes" (VV's).
Note: removable volumes are referred to as "VSN's" (based on
volman terminology: Volume Serial Number).
Inter-Process Communication (IPC)
The VVM components may run on different hosts sharing
filesystem access to the disk cache; they communicate
via the machine-independant XDR protocol using
root-only Internet sockets (an additional communication layer may
be added later to provide non-root clients secure access).
The VVM configuration file,
VVCONF,
specifies which hosts are acceptable.
The client library hides a synchronous request/response protocol.
An asynchronous protocol could be built using the lower-level
routines in the library, but NOTE that since the XDR interface
is buffered, if one wants to use a protocol that blocks on
select() instead of on VVCLRecv(), the result of
VVCLNextRecord() needs to be checked after
each VVCLRecv() and another VVCLRecv()
executed if nonzero, or any buffered packet will be ignored
until the next fresh one hits the underlying socket.
Client Library
The client library provides Virtual Volume equivalents of
open(), lseek(), read(), write() and close(). The
vv_openwrite() and vv_openread() routines return handles
for virtual volumes, which are used by the other routines.
In addition to these analogs to standard filesystem
routines, there is a vv_finish() routine which permanently
closes the virtual volume for writing, at which point it
is saved to a physical volume such as tape. These routines
communicate with the VVMD as necessary to get approval and
to cause the VVMD's database to be updated. They are detailed
in the
External Reference Specification.
Virtual Volume Manager Daemon
The Virtual Volume Manager Daemon (VVMD) manages client
requests for VV's, maintaining a database of VV's and
physical volumes. It mounts physical volumes as necessary
via the Volume Manager
and copies VV's to and from these physical volumes via
Virtual Volume Manager Mover Daemons
described below. It also manages the VV disk cache,
freeing unused VV's when space is needed, and declaring
long-idle 'hot' VV's 'finished' so that they can be
copied to physical volumes for their own safety from
disk problems, and so the disk version can be freed.
VVMD Initialization
Upon initialization, the VVMD reads its configuration
file, VVCONF,
starts its database and the volman connection, then
proceeds to mount 'hot' physical volumes in each of the
storage classes for writing 'finished' VV's to.
It then checks the database to see if there are any
finished VV's to write (including ones that were
previously not copied to a physical volume of each
type in the storage class), and starts any such
write requests. The VVMD then begins to accept client
connections.
VVMD Database
The VVMD Database consists of two 'tables', each with its
own B-tree indexes. The tables are ASCII; each record has
single spaces separating the fields and a newline character
at the end. The record size is an integer factor of the
disk block size so that disk block boundaries are not
crossed, i.e. a partial write of a record cannot
happen on a machine crash.
VV Table
The VV Table has a record for each VV. Its contents are
specified in vvm_db.h. The indexes
on it are:
- VV name
unique key = VV name
- 'hot' VV's (available for writing)
non-unique key = class + client_id_num
(used for looking up writable VV's)
- 'finished' VV's not fully written to VSN's
unique key = VV name
(used for recovery on startup)
- VSN
non-unique key = VSN this VV has been written to
(there are normally multiple VSN's for a VV)
- ID
non-unique key = integer ID assigned by client
(for RASH, this would be RASH client UID)
VSN Table
This table contains information on removable volumes
(referred to as "VSN's" based on volman terminology:
Volume Serial Number). Its contents are also specified
in vvm_db.h.
The indexes on it are:
- VSN
unique key = VSN name
- 'hot' VSNS
non-unique key = storage_class_id + 'slot'
Virtual Volume Manager Mover Daemons
In the NAStore 3 design, tape
drives are mounted on dedicated hosts in order to get maximum
bandwidth. Each such host has a Virtual Volume Manager Mover
Daemon (VVMVD) running on it to copy VV's between tape
and the VVM's shared on-disk VV cache. A VVMVD is selected
by the VVMD after the Volume Manager
has mounted a tape on a drive that is attached to its host.
Once it receives a request to copy a VV from/to a given mounted
VSN, it forks a child to do the work and checks periodically
to make sure that it is not hung; if successful, the child
responds directly to the VVMD. (The VVMVD is modeled on the
volman volnd.) When writing a VV to tape, the VVMVD writes
a label before and after the VV data; these labels are checked
later when the VV is read from the tape.
VV Label Format
The label format is somewhat analogous to the ANSI format,
but gets all the information needed in a single label.
data chars format
label 3 'HDR' or 'EOF'
vv class 1 alphanum
vv serial 10 decimal, left-justified
id 10 decimal
file number 10 decimal
size 16 64-bit hex
time finished 10 unix time in 40-bit hex, like dbase
time written 10 ditto
version 10 (remaining chars)
(all alpha chars uppercase)
Author (mail): Bill Ross