Discussion of "dbm" data base system

Jerry Leichter leichter at yale-com.UUCP
Tue Jan 17 06:41:40 AEST 1984


The algorithm used in the "dbm" system was described in TODS (Transactions on
Database Systems) about 4 years ago in an article by a couple of people at
IBM (whose name will not come back to me now) title "Extendible Hashing".
It's a beautiful algorithm, and sparked a series of other articles on hashing
schemes that can grow and shrink their hash tables dynamically.  The TODS
article contains some theoretical and simulation results that indicate that
asymtotically extendible hashing and B-trees require about the same amount
of space, on average.  Extendible hashing has the big advantage that it takes
at most 2 disk accesses to get to any data item, independent of the size of
the data-base; B-trees have logarithmically-growing access times, and typically
require 3 or 4 accesses for real-life, large databases.

The problem of blocks filling with records with identical hash keys is an
exact analogue of the B-tree problems with blocks filling with identical
keyed records.  The solution is the same in both cases:  You "logically
extend" the leaf block with a continuation pointer.  This costs you at
least one extra data access, so you hope it is rare.

It can be shown that for large numbers of bits in the hash, the chance of this
happening is very small; of course, the code MUST be there; dbm is just a
poor implementation.

I don't know the origin of the dbm "magic hash function", but if the author
followed the advice of the authors of the "Extendible Hashing" paper, he would
have read a paper on "Universal Classes of Hash Functions", by Carter and
Wegman, which shows how to construct excellent hash functions "almost all the
time".

If you want further information, like the exact references, write.
							-- Jerry
					decvax!yale-comix!leichter leichter at yale



More information about the Comp.unix.wizards mailing list