Hashing (?) 8-)

Sun Sep 11 04:09:16 AEST 1988

Michael Steiner writes:
>I don't know what hashing is. Could someone tell me a little about what it is.

I remember when hashing was first explained to me at university...

Lecturer: As you are all aware searching for an item in a linear list
  is an order N operation.

Students: Groan.  Kids stuff.

Lecturer: I am going to discuss today a method of storing and retrieving
  information that is much faster than this.  Storing and retrieving an
  item from a binary tree can be done as a order log N operation.

Captive Audience: Not trees again!

Lecturer: The method I will discuss can be achieved with order one
  operations.

Audience (agog): What? In your dreams perhaps!

Lecturer: First we select a field from the item to be stored, it
  doesn't really matter what the data type of the field is, but for
  the purposes of this discussion it will be an integer...

Audience: (cries of 'Rigged!')

Lecturer: ...then we use it as a seed for a pseudo uniform random
  number generator.  The random number that is generated is used
  as an index into a table where the item is then stored.  Providing
  the index fields are different for each item then the items will be
  stored evenly throughout the table.

Students: What's the use of that?  The data is now stored at some random
  spot in the table and it will need an order N search to get to it
  again.

Lecturer: To access an item in order 1 you merely give its index
  field to the random number generator, which returns the same index
  returned when you originally stored the item.  You then look up the
  table using the index.

Students: (getting very excitable) What happens if the table fills up?
  What if the random number generator returns the same value for two
  different index fields? Where does hashing come into this?
  What a stupid idea.

Lecturer: This whole process is called 'hashing'.  The table is called
  a 'hash table' and the random number generator is called a 'hash
  function', other types of strategies can be used to generate hash
  numbers.  Obviously it is going to be hard to design a hash function
  that will generate completely uniform hash numbers across the hash
  table.  At some stage an item will be hashed to a spot that is already
  used.  This is called a 'collision' and a number of strategies can
  be used for 'collision resolution'.
      If you are sure that there are more empty entries in the table than
  items to be stored you can keep re-hashing the index field until you
  reach an empty spot in the table.
      If it is possible to completely fill up the table then you can
  store items in a linear list at the table entry they hash to.  So
  looking up an entry would involve hashing the index field, then
  doing a linear search for the correct item in the list stored at
  that table entry.  You are not limited to using a linear list either,
  you can implement the storage for items that have collided as a
  binary tree, or even as another hash table.

Students: Unbelievable.  What will they think of next?
-- 
{lsuc,utzoo,utcsri}!hcr!hcrvax!stacey Stacey Campbell, HCR Corporation,
130 Bloor St W, Toronto, Ontario, Canada. +1 416 922 1937 X48