15.4. Open hashing¶
One collision avoidance strategy is separate chaining. In separate chaining the hash table is implemented as an array of variable sized containers that can hold however many elements that have actually collided at each array location.
A linked list is a typical choices for implementing the chain, which is where the term “chaining” actually originates.
When the ADT is a map, the process is similar. In a map ADT, the value hashed is the map key, since this is what uniquely identifies map items.
Each bucket provides access to one or more map entries (key-value pairs).
The linked lists allows the hash table to be dynamically sized, and each array element is its own bucket.
A set provides a simple demonstration of the capabilities of a hashed data structure. Recall that set defines a container that stores unique items.
The template variables for a hash set defines the type
of data stored in the set: the Key
.
This hash table will be fixed size, so we denote that with
the non-type template parameter N
.
The Comparator allows the template to accept a function
used to find items in the chain.
The default is equal, but another
binary predicate can be substituted.
template <class Key,
size_t N,
class Comparator=std::equal_to<Key>>
class hash_set
{
public:
using Container = std::list<Key>;
using value_type = Key;
using key_type = Key;
using iterator = typename Container::iterator;
using const_iterator = const iterator;
hash_set () = default;
private:
std::array<Container, N> buckets;
Comparator compare;
int sz = 0;
};
Finding anything in a hash table using separate chaining is a two step process. Consider the following hash table:
How does the software find the value 34
in this data structure?
First we need to find the right bucket.
The hash
override is used to compute the bucket.
In this case the bucket is at index position 8
.
Note we use std::hash<>
here.
Any Key
type stored in this set must override std::hash
.
private:
Container& find_bucket (const Key& value)
{
return buckets[std::hash<Key>()(value) % N];
}
Next, we search through the list stored in that bucket
looking for a specific value.
Each element in the list stored in the bucket is evaluated using
operator==
- the default for std::equal_to
.
As soon as operator==
evaluates to true
the value is returned.
iterator find (const Key& value)
{
Container& b = find_bucket(value);
return find_if(b.begin(), b.end(),
[this, &value](Key x) {
return compare(x, value);
});
}
It should be clear that the performance of this data structure is
highly dependent upon the quality of the hash function.
Always returning 42
is a legitimate value for a hash,
but an extremely poor one,
because your hash table is no better than a linked list.
Insert is similar to find.
We use find_bucket
to get the correct array element,
if it exists.
The we search to see if the value already exists in the linked list.
If it does, replace the existing value with the new one.
Otherwise, we add it to the list.
void insert (const Key& value)
{
Container& b = find_bucket(value);
iterator pos =
find_if(b.begin(), b.end(),
[this, &value](Key x) { return compare(x, value); });
if (pos == b.end()) {
b.push_back(value);
++sz;
}
else {
*pos = value;
}
}
Erase is similar to insert.
Find the bucket
Search for the value
If you find it, erase it.
Otherwise, do nothing.
void erase (const Key& value)
{
Container& b = find_bucket(value);
iterator pos =
find_if(b.begin(), b.end(),
[this, &value](Key x) { return compare(x, value); });
if (pos != b.end()) {
b.erase(pos);
--sz;
}
}
More to Explore
The content on this page was adapted from Resolving Collisions, by Steven J. Zeil for his data structures course CS361.