Appendix A: Inside The Almond Doubler

The key feature of the Almond Doubler that makes it so very different from other file compression products is that it is able to instantly decompress files as soon as a user accesses the data. Most other compression/expansion software products force the user to first run a special decompression program on the compressed data file (e.g., "gunzip" or "uncompress"), and then access the data once it has been manually decompressed. The Almond Doubler automatically performs this first step with no user intervention; the mere action of the user in attempting to look at or change the data in the file is all it takes for The Almond Doubler to expand the compressed file.

This special ability of the Almond Doubler is called "transparent decompression", or "on-the-fly decompression". It is a feature that is commonly found in PC and Macintosh disk doublers (e.g. Stacker), but is rarely found in UNIX products. Some UNIX disk doubler programs have created a transparent decompression capability by actually modifying the disk device drivers or file system, and intercepting all disk access requests in order to see if the request pertains to a compressed file. Although this approach "works", it is not a desirable method, because:

* it requires the actual modification of device drivers, which isan inherently risky thing to do* it requires different software for each different disk device

* it requires intimate knowledge of kernel programming for eachversion of Unix, and is therefore not portable

The Almond Doubler uses a safer and more portable method for achieving transparent decompression, with these important goals:

1. The Almond Doubler software must be portable to all Unix platforms.

2. The Almond Doubler software must run at the user level, and not use any special kernel modifications.

3. The Almond Doubler software must not modify any device drivers.

4. All files must have positive verification of correct compression prior to removal of the original uncompressed version.

5. The software must gracefully handle the situation where disks fill up and there is no more disk space.

How did we achieve all these goals? We relied upon a UNIX standard: the NFS RPC protocol. NFS allows different machines to communicate with each other through UNIX standard Remote Procedure Calls. When an NFS host machine is mounted on another NFS host machine, a user can view the files on the remote host "transparently." From the user's perspective, the remote files look as if they are actually on the local disk.

This is the exact model that we wished our Almond Doubler to use; we want users to be able to access the compressed data "transparently". We achieved this by creating our own NFS server program within the Almond Doubler software. This server program is called "dblrd", and is launched whenever the Almond Doubler is started. A second program, called "dblrc", is the compression daemon program. The compression daemon finds files that need to be compressed, based on the parameters established by the system administrator. Once a file is found that needs to be compressed, the compression daemon creates a new compressed version of that data file. It then sends the compressed data to the "dblrd" daemon using the built-in NFS capabilities of UNIX operating systems.

For all intents and purposes, the "dblrd" NFS server looks and acts like a remote NFS host machine. It stores the compressed data in its own database, and sends a message back to the compression daemon that the compressed file is safely tucked away. The compression daemon then removes the original (uncompressed) data, and creates a link to the compressed data file. This link has a pathname which specifies the mounted remote NFS host "dblrd" program as the file system on which the data now resides.

Whenever a user accesses the data in the compressed file (for example, the user may open the file for editing), the UNIX operating system realizes that the data is resident on a remote NFS host machine. The kernel generates various NFS requests and sends them to the "dblrd" NFS server program. The "dblrd" daemon, upon receiving requests for data, decompressesthe file and returns the original (uncompressed) data to the user. After the user is no longer accessing the data in the file, the Almond Doubler daemon leaves the data uncompressed in its original location. In this manner, files that are accessed frequently will be left uncompressed (to avoid thrashing the NFS server); files that are not being accessed are compressed until they are needed by a user.

Users can tell if a particular file has been compressed by The Almond Doubler. If a file has been compressed, the user will see a symbolic (soft) link to the file which includes the pathname of the NFS dblrd server. This pathname is always "/XXX/dblr", where /XXX is the installation directory for The Almond Doubler. If there is no symbolic link, then the file is not compressed.

Let's say that there are two files in the directory /home: "bigfile" and "smallfile". "bigfile" was automatically compressed by the Almond Doubler. "smallfile" was not compressed (because its size was smaller than the parameter set for the smallest file allowed for compression). Here is what the "ls -l" command shows for the /home directory:

myhost$ ls -l
lrwxrwxrwx 1 mfd staff 47 Mar 28 08:16 bigfile -> /Doubler/dblr/home/bigfile
-rw-r--r-- 1 mfd staff 970 Mar 27 08:14 smallfile

Note that the first part of the pathname "/Doubler/dblr" for bigfile specifies the NFS host machine for the Almond Doubler's NFS server program. The rest of the pathname is the normal pathname for the file.If the user then issues a command which accesses the data in bigfile, e.g. "cat bigfile", the file is automatically decompressed and the data inbigfile is returned to the user. Here is what the "ls -l" command shows for the /home directory after "bigfile" has been accessed:

myhost$ ls -l
-rw-r--r-- 1 mfd staff 6283 Mar 27 08:16 bigfile
-rw-r--r-- 1 mfd staff 970 Mar 27 08:14 smallfile

Note that both the size of "bigfile" has increased (to its original size), and that the symbolic link has disappeared. The data is no longer compressed.