seeks and dbm

Chris Torek chris at umcp-cs.UUCP
Tue May 20 06:35:25 AEST 1986


In article <297 at maynard.UUCP> campbell at maynard.UUCP (Larry Campbell) writes:
>... the manual describes the [lseek] offset as a long (NOT a "simple
>integer" nor a "magic cookie").  Second, if the offsets AREN'T integers,
>then almost any database library or package won't work (like dbm(3)),
>because they often hash keys into offsets in the index file.

It is true that the current dbm code assumes that lseek offsets
are simple `long's; but it would be trivial to get dbm to run on
a system where this were not the case, as long as there were a
simple mapping from an integer index to a `block' of some fixed
size.  The dbm scheme (`extensible hashing') is to convert a string
(`datum') into a 32 bit hash index, then to start with no bits of
the hash, and increase the number of bits used until it has reached
a `leaf' value for that hash index.  There is a bit map of size
2^{max-n-bits-ever-used}-1 that has a bit set for each value that
is no longer a leaf.  When you reach a leaf, the datum is either
in the corresponding block, or not present in the database.

The only really tricky thing is storing a datum, because if a
leaf block fills up, it must be turned into a non-leaf block.
If the hash values are `good', approximately half of the objects
in the block will move to a new leaf (the other half will stay
because they have a zero in the next bit of their hash).
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris at umcp-cs		ARPA:	chris at mimsy.umd.edu



More information about the Comp.unix.wizards mailing list