NAStore Internal Design Specification

NAStore
Virtual Volume Manager
Internal Design Specification


Bill Ross, Network Archive Systems

The Virtual Volume Manager (VVM) provides a disk cache layer between clients and removable volumes (e.g. tape) in the form of "virtual volumes" (VV's).

Note: removable volumes are referred to as "VSN's" (based on volman terminology: Volume Serial Number).


Inter-Process Communication (IPC)

The VVM components may run on different hosts sharing filesystem access to the disk cache; they communicate via the machine-independant XDR protocol using root-only Internet sockets (an additional communication layer may be added later to provide non-root clients secure access). The VVM configuration file,
VVCONF, specifies which hosts are acceptable.

The client library hides a synchronous request/response protocol. An asynchronous protocol could be built using the lower-level routines in the library, but NOTE that since the XDR interface is buffered, if one wants to use a protocol that blocks on select() instead of on VVCLRecv(), the result of VVCLNextRecord() needs to be checked after each VVCLRecv() and another VVCLRecv() executed if nonzero, or any buffered packet will be ignored until the next fresh one hits the underlying socket.


Client Library

The client library provides Virtual Volume equivalents of open(), lseek(), read(), write() and close(). The vv_openwrite() and vv_openread() routines return handles for virtual volumes, which are used by the other routines. In addition to these analogs to standard filesystem routines, there is a vv_finish() routine which permanently closes the virtual volume for writing, at which point it is saved to a physical volume such as tape. These routines communicate with the VVMD as necessary to get approval and to cause the VVMD's database to be updated. They are detailed in the
External Reference Specification.

Virtual Volume Manager Daemon

The Virtual Volume Manager Daemon (VVMD) manages client requests for VV's, maintaining a database of VV's and physical volumes. It mounts physical volumes as necessary via the
Volume Manager and copies VV's to and from these physical volumes via Virtual Volume Manager Mover Daemons described below. It also manages the VV disk cache, freeing unused VV's when space is needed, and declaring long-idle 'hot' VV's 'finished' so that they can be copied to physical volumes for their own safety from disk problems, and so the disk version can be freed.

VVMD Initialization

Upon initialization, the VVMD reads its configuration file, VVCONF, starts its database and the volman connection, then proceeds to mount 'hot' physical volumes in each of the storage classes for writing 'finished' VV's to. It then checks the database to see if there are any finished VV's to write (including ones that were previously not copied to a physical volume of each type in the storage class), and starts any such write requests. The VVMD then begins to accept client connections.

VVMD Database

The VVMD Database consists of two 'tables', each with its own B-tree indexes. The tables are ASCII; each record has single spaces separating the fields and a newline character at the end. The record size is an integer factor of the disk block size so that disk block boundaries are not crossed, i.e. a partial write of a record cannot happen on a machine crash.

VV Table

The VV Table has a record for each VV. Its contents are specified in vvm_db.h. The indexes on it are:

VSN Table

This table contains information on removable volumes (referred to as "VSN's" based on volman terminology: Volume Serial Number). Its contents are also specified in vvm_db.h. The indexes on it are:


Virtual Volume Manager Mover Daemons

In the
NAStore 3 design, tape drives are mounted on dedicated hosts in order to get maximum bandwidth. Each such host has a Virtual Volume Manager Mover Daemon (VVMVD) running on it to copy VV's between tape and the VVM's shared on-disk VV cache. A VVMVD is selected by the VVMD after the Volume Manager has mounted a tape on a drive that is attached to its host. Once it receives a request to copy a VV from/to a given mounted VSN, it forks a child to do the work and checks periodically to make sure that it is not hung; if successful, the child responds directly to the VVMD. (The VVMVD is modeled on the volman volnd.) When writing a VV to tape, the VVMVD writes a label before and after the VV data; these labels are checked later when the VV is read from the tape.

VV Label Format The label format is somewhat analogous to the ANSI format, but gets all the information needed in a single label.
     data            chars           format

        label           3               'HDR' or 'EOF'
        vv class        1               alphanum
        vv serial       10              decimal, left-justified
        id              10              decimal
        file number     10              decimal
        size            16              64-bit hex
        time finished   10              unix time in 40-bit hex, like dbase
        time written    10              ditto
        version         10              (remaining chars)
 
(all alpha chars uppercase)

Author (mail): Bill Ross


 NAS HOME PAGE  Storage Systems home page WebWork: Harry Waddell
NASA Official: John Lekashman