15.4. Open hashing

One collision avoidance strategy is separate chaining. In separate chaining the hash table is implemented as an array of variable sized containers that can hold however many elements that have actually collided at each array location.

A linked list is a typical choices for implementing the chain, which is where the term “chaining” actually originates.

Fruit set

When the ADT is a map, the process is similar. In a map ADT, the value hashed is the map key, since this is what uniquely identifies map items.

Each bucket provides access to one or more map entries (key-value pairs).

Fruit inventory map

The linked lists allows the hash table to be dynamically sized, and each array element is its own bucket.

A set provides a simple demonstration of the capabilities of a hashed data structure. Recall that set defines a container that stores unique items.

The template variables for a hash set defines the type of data stored in the set: the Key. This hash table will be fixed size, so we denote that with the non-type template parameter N. The Comparator allows the template to accept a function used to find items in the chain. The default is equal, but another binary predicate can be substituted.

template <class Key,
          size_t N,
          class Comparator=std::equal_to<Key>>
class hash_set
{
  public:
    using Container = std::list<Key>;
    using value_type = Key;
    using key_type   = Key;
    using iterator   = typename Container::iterator;
    using const_iterator  = const iterator;

    hash_set () = default;

  private:
    std::array<Container, N> buckets;
    Comparator compare;
    int sz = 0;

};

Finding anything in a hash table using separate chaining is a two step process. Consider the following hash table:

Simple hash table

How does the software find the value 34 in this data structure?

First we need to find the right bucket. The hash override is used to compute the bucket. In this case the bucket is at index position 8.

Note we use std::hash<> here. Any Key type stored in this set must override std::hash.

private:
   Container& find_bucket (const Key& value)
   {
     return buckets[std::hash<Key>()(value) % N];
   }

Next, we search through the list stored in that bucket looking for a specific value. Each element in the list stored in the bucket is evaluated using operator== - the default for std::equal_to. As soon as operator== evaluates to true the value is returned.

iterator find (const Key& value)
{
  Container& b = find_bucket(value);
  return find_if(b.begin(), b.end(),
           [this, &value](Key x) {
             return compare(x, value);
           });
}

It should be clear that the performance of this data structure is highly dependent upon the quality of the hash function. Always returning 42 is a legitimate value for a hash, but an extremely poor one, because your hash table is no better than a linked list.

Insert is similar to find. We use find_bucket to get the correct array element, if it exists. The we search to see if the value already exists in the linked list. If it does, replace the existing value with the new one. Otherwise, we add it to the list.

void insert (const Key& value)
{
  Container& b = find_bucket(value);
  iterator pos =
    find_if(b.begin(), b.end(),
        [this, &value](Key x) { return compare(x, value); });
  if (pos == b.end()) {
    b.push_back(value);
    ++sz;
  }
  else {
    *pos = value;
  }
}

Erase is similar to insert.

  1. Find the bucket

  2. Search for the value

  3. If you find it, erase it.

    Otherwise, do nothing.

void erase (const Key& value)
{
Container& b = find_bucket(value);
iterator pos =
  find_if(b.begin(), b.end(),
      [this, &value](Key x) { return compare(x, value); });
if (pos != b.end()) {
  b.erase(pos);
  --sz;
  }
}

More to Explore

  • The content on this page was adapted from Resolving Collisions, by Steven J. Zeil for his data structures course CS361.

You have attempted of activities on this page