venti – archival storage server
Venti is a block storage server intended for archival data.
In a Venti server, the SHA1 hash of a block’s contents acts
as the block identifier for read and write operations.
This approach enforces a write-once policy, preventing
accidental or malicious destruction of data. In addition,
duplicate copies of a block are coalesced, reducing the
consumption of storage and simplifying the implementation
This manual page documents the basic concepts of
block storage using Venti as well as the Venti network protocol.
documents some simple clients.
are more complex clients.
describes a C library interface for accessing
Venti servers and manipulating Venti data structures.
describes the programs used to run a Venti server.
The SHA1 hash that identifies a block is called its
The score of the zero-length block is called the
zero score .
Scores may have an optional
prefix, typically used to
describe the format of the data.
prefix, while vbackup uses prefixes corresponding to the file system
and so on.
Files and Directories
Venti accepts blocks up to 56 kilobytes in size.
By convention, Venti clients use hash trees of blocks to
represent arbitrary-size data
The data to be stored is split into fixed-size
blocks and written to the server, producing a list
The resulting list of scores is split into fixed-size pointer
blocks (using only an integral number of scores per block)
and written to the server, producing a smaller list
The process continues, eventually ending with the
score for the hash tree’s top-most block.
Each file stored this way is summarized by
structure recording the top-most score, the depth
of the tree, the data block size, and the pointer block size.
One or more
structures can be concatenated
and stored as a special file called a
manner, arbitrary trees of files can be constructed
Scores passed between programs conventionally refer
blocks, which contain descriptive information
as well as the score of a directory block containing a small number
of directory entries.
Conventionally, programs do not mix data and directory entries
in the same file. Instead, they keep two separate files, one with
directory entries and one with metadata referencing those
entries by position.
Keeping this parallel representation is a minor annoyance
but makes it possible for general programs like
to traverse the block tree without knowing the specific details
of any particular program’s data.
To allow programs to traverse these structures without
needing to understand their higher-level meanings,
Venti tags each block with a type. The types are:
VtDataType 000 data\fL
VtDataType+1 001 scores of VtDataType blocks\fL
VtDataType+2 002 scores of VtDataType+1 blocks\fL
VtDirType 010 VtEntry structures\fL
VtDirType+1 011 scores of VtDirType blocks\fL
VtDirType+2 012 scores of VtDirType+1 blocks\fL
VtRootType 020 VtRoot structure\fL
The octal numbers listed are the type numbers used
by the commands below.
(For historical reasons, the type numbers used on
disk and on the wire are different from the above.
They do not distinguish
To avoid storing the same short data blocks padded with
differing numbers of zeros, Venti clients working with fixed-size
‘zero truncate’ the blocks before writing them to the server.
For example, if a 1024-byte data block contains the
‘hello world ’
followed by 1013 zero bytes,
a client would store only the 11-byte block.
When the client later read the block from the server,
it would append zero bytes to the end as necessary to
reach the expected size.
When truncating pointer blocks
trailing zero scores are removed
instead of trailing zero bytes.
Because of the truncation convention,
any file consisting entirely of zero bytes,
no matter what its length, will be represented by the zero score:
the data blocks contain all zeros and are thus truncated
to the empty block, and the pointer blocks contain all zero scores
and are thus also truncated to the empty block,
and so on up the hash tree.
A Venti session begins when a
connects to the network address served by a Venti
the conventional address is
port is 17034).
Both client and server begin by sending a version
string of the form
field is a list of acceptable versions separated by
The protocol described here is version
The client is responsible for choosing a common
version and sending it in the
message, described below.
After the initial version exchange, the client transmits
to the server, which subsequently returns
to the client.
The combined act of transmitting (receiving) a request
of a particular type, and receiving (transmitting) its reply
is called a
of that type.
Each message consists of a sequence of bytes.
Two-byte fields hold unsigned integers represented
in big-endian order (most significant byte first).
Data items of variable lengths are represented by
a one-byte field specifying a count,
bytes of data.
Text strings are represented similarly,
using a two-byte count with
the text itself stored as a UTF-encoded sequence
of Unicode characters (see
Text strings are not
counts the bytes of UTF data, which include no final
character is illegal in text strings in the Venti protocol.
The maximum string length in Venti is 1024 bytes.
Each Venti message begins with a two-byte size field
specifying the length in bytes of the message,
not including the length field itself.
The next byte is the message type, one of the constants
in the enumeration in the include file
The next byte is an identifying
used to match responses to requests.
The remaining bytes are parameters of different sizes.
In the message descriptions, the number of bytes in a field
is given in brackets after the field name.
is not a constant represents a variable-length parameter:
bytes of data forming the
(using a literal
is shorthand for
bytes of UTF-8 text.
is the last field in the message represents a
variable-length field that comprises all remaining
bytes in the message.
All Venti RPC messages are prefixed with a field
giving the length of the message that follows
(not including the
The message bodies are:
Each T-message has a one-byte
field, chosen and used by the client to identify the message.
The server will echo the request’s
field in the reply.
Clients should arrange that no two outstanding
messages have the same tag field so that responses
can be distinguished.
The type of an R-message will either be one greater than
the type of the corresponding T-message or
indicating that the request failed.
In the latter case, the
field contains a string describing the reason for failure.
Venti connections must begin with a
message contains the protocol
that the client has chosen to use.
could be used to add authentication, encryption,
and compression to the Venti session
but are currently ignored.
fields in the
response are similarly ignored.
fields are intended to be the identity
of the client and server but, given the lack of
authentication, should be treated only as advisory.
should be the only
transaction during the session.
message has no effect and
is used mainly for debugging.
Servers should respond immediately to pings.
message requests a block with the given
to convert a block type enumeration value
used on disk and in the protocol.
field specifies the maximum expected size
of the block.
in the reply is the block’s contents.
message writes a new block of the given
to the server.
The response includes the
to use to read the block,
which should be the SHA1 hash of
The Venti server may buffer written blocks in memory,
waiting until after responding to the
message before writing them to
The server will delay the response to a
message until after all blocks in earlier
messages have been written to permanent storage.
message ends a session. There is no
upon receiving the
message, the server terminates up the connection.
Sean Quinlan and Sean Dorward,
“Venti: a new approach to archival storage”,
Usenix Conference on File and Storage Technologies ,