XLattice's Crypto Facilities
What follows is a summary description; it will be improved on as the
crypto API stabilizes. Much more information is available from the
Javadoc descriptions, but probably the most useful documentation is
the suite of tests. As with all XLattice software components, unit
tests are to be found in a parallel directory structure. In this
contains the software being tested and
and subdirectories the corresponding unit tests. The software is compiled and tested by running
For the near future much of XLattice's crypto component will be just a thin wrapper around Java 1.4's crypto functionality as documented in
It is likely that much of this functionality will be replaced by equivalent XLattice code, but this will probably have to wait until XLattice has servers outside of the United States.
SHA-1 Hash Generation
The Secure Hash Algorithm (SHA) maps a byte array of arbitrary size into a small fixed-size byte array, a hash of the input. For SHA-1 the hash is 160 bits (20 bytes) long.
In XLattice terminology, the 20-byte SHA1 hash of a data file's contents is referred to as a content hash or content key. Because the chance of a collision is so incredibly small (approximately 0.7E-48), the content key is for all practical purposes unique to a given document and therefore can be used as a unique identifier.
Bloom Filters using SHA1 Keys
The principal application of XLattice's Bloom filters is in cheaply determining whether an item with a given data key is on another node. Each such item (file) has a 20-byte key. This is most often an SHA1 hash of the file's contents. Rather than sending one another lists of keys, nodes can exchange Bloom filters which can be used to test whether a given file is present. The test never gives a false negative, but sometimes may give a false positive: it indicates that a key is present when it is not. The false positive rate decreases with increasing filtersize.
A Bloom filter uses a set of k hash functions to determine set membership. Each hash function produces a value in the range 0..M-1. The filter is of size M bits. To add a member (the 20-byte key) to the set, apply each function to the new member and set the corresponding bit in the filter. For M very large relative to k, this will normally set k bits in the filter; that is, it is very improbable that two or more functions will set the same bit.
To check whether x is a member of the set, apply each of the k hash functions to x and check whether the corresponding bits are set in the filter. If any are not set, x is definitely not a member. If all are set, x may be a member. The probability of error (the false positive rate) is f = (1 - e^(-kN/M))^k, where N is the number of set members.
The XLattice class takes advantage of the fact that SHA1 digests are good- quality pseudo-random numbers. The k hash functions are the values of distinct sets of bits taken from the 20-byte SHA1 hash. The number of bits in the filter, M, is constrained to be a power of 2; M == 2^m. The number of bits in each hash function may not exceed floor(m/k).
There are actually two XLattice Bloom filter classes. BloomSHA1 is a conventional filter and the type best used for summarizing contents to neighboring nodes. CountingBloom is a counting Bloom filter, one in which a four-bit counter is associated with each filter bit. With very high confidence this can be used to both add and subtract members. Typically a node will keep a counting Bloom filter to track its contents and then compress the filter to the non-counting form for copying to neighbors. This is much cheaper than generating a new BloomSHA1 on each neighbor request.
The Bloom filters are *not* part of Sun's Java crypto functionality.
RSA Keys and Pair Generation
The only digital signature algorithm currently supported by XLattice is SHA1withRSA.
A SignedList is typically a dated and digitally signed list of documents. The SignedList itself has a unique identifier based on SHA1. This signed key is not a content hash but a function of the public part of the RSA key used to sign the list and the title of the document.
The digital signature covers the entire document,
- the RSA public key, base64-encoded
- the title
- the timestamp in YY-MM-DD HH:MM:SS format
- and the content lines themselves
To make it easier for a human to read, the String version of the SignedList precedes and follows the content lines with delimiter lines.
Because the SignedList has a unique 20-byte identifier, it can be stored and retrieved in exactly the same way as other documents, using its signed hash as a key.
The `SignedList` and its subclasses are *not* part of the Sun Java libraries.