Explain in simple terms how Hash Tables are implemented?

Technology Community › Category: Hash Tables › Explain in simple terms how Hash Tables are implemented?

VietMX Staff asked 4 years ago

Hash tables are often implemented as arrays of linked lists. If we imagine a table storing people’s names, after a few insertions it might be laid out in memory as below, where (enclosed numbers), like (42) are hash values of the name.

bucket#  bucket content / linked list

[0]      --> "sue"(780) --> null
[1]      null
[2]      --> "fred"(42) --> "bill"(9282) --> "jane"(42) --> null
[3]      --> "mary"(73) --> null
[4]      null
[5]      --> "masayuki"(75) --> "sarwar"(105) --> null
[6]      --> "margaret"(2626) --> null
[7]      null
[8]      --> "bob"(308) --> null
[9]      null

A few points:

each of the array entries (indices [0], [1]…) is known as a bucket, and starts a – possibly empty – linked list of values (aka elements, in this example – people’s names)
each value (e.g. "fred" with hash 42) is linked from bucket [hash % number_of_buckets] e.g. 42 % 10 == [2]; % is the modulo operator – the remainder when divided by the number of buckets
multiple data values may collide at and be linked from the same bucket, most often because their hash values collide after the modulo operation (e.g. 42 % 10 == [2], and 9282 % 10 == [2]), but occasionally because the hash values are the same (e.g. "fred" and "jane" both shown with hash 42 above)
- most hash tables handle collisions – with slightly reduced performance but no functional confusion – by comparing the full value (here text) of a value being sought or inserted to each value already in the linked list at the hashed-to bucket
If the table size grows, hash tables implemented as above tend to resize themselves (i.e. create a bigger array of buckets, create new/updated linked lists there-from, delete the old array) to keep the ratio of values to buckets (aka load factor) somewhere in the 0.5 to 1.0 range.
The buckets don’t necessarily have to be lists or arrays, they can be any container type, such as a balanced BST. That means O(log n) worst case for collisions. But this is why it’s important to choose a good hashing function to avoid putting too many elements into one bucket.

Question Tags: Hash Tables, Middle Level