Hashing Data Structure
Definition and Concepts
Hashing is a technique used to map data of arbitrary size to fixed-size values, typically for efficient storage and retrieval in a data structure called a hash table. A hash table uses a hash function to compute an index (or hash code) for each key, which determines where the key-value pair is stored in an array. The goal is to enable fast access to data using keys, such as strings or numbers. You can visualize a hash table as a set of labeled drawers, where each label (key) quickly directs you to the corresponding drawer (value). Collisions, where multiple keys map to the same index, are resolved using techniques like separate chaining (using linked lists) or open addressing (using probing). Hashing is fundamental for applications requiring quick lookups, insertions, and deletions.
Why Use It?
Hashing is used to achieve fast data retrieval, insertion, and deletion with an average time complexity of O(1). It is ideal for scenarios where you need to associate keys with values and access them efficiently without iterating through the entire dataset. Hash tables are versatile and widely used in databases, caches, and other systems requiring rapid data access.
Where to Use? (Real-Life Examples)
- Database Indexing: Databases use hashing to index records, allowing quick retrieval of data based on keys like user IDs.
- Caches: Web browsers and content delivery networks use hash tables to cache web pages, mapping URLs to cached content for fast access.
- Symbol Tables in Compilers: Compilers use hash tables to store variable names and their attributes during code compilation.
- Password Storage: Systems store hashed passwords to verify user credentials quickly without storing plain text.
Explain Operations
- Insert: This operation adds a key-value pair to the hash table. The hash function computes an index for the key, and the pair is stored at that index (or in a linked list if a collision occurs). It has an average time complexity of O(1).
- Lookup: This operation retrieves the value associated with a given key by computing its hash and checking the corresponding index. It has an average time complexity of O(1).
- Delete: This operation removes a key-value pair by finding the key’s index and removing it from the bucket (or linked list). It has an average time complexity of O(1).
- isEmpty: This operation checks whether the hash table is empty. It has a time complexity of O(1).
- Size: This operation returns the number of key-value pairs in the hash table. It has a time complexity of O(1).
Java Implementation
The following Java code implements a hash table using separate chaining (linked lists for collision resolution).
public class HashTable {
private static class Node {
String key; // Key for the hash table entry
int value; // Value associated with the key
Node next; // Reference to the next node in case of collision
Node(String key, int value) {
this.key = key;
this.value = value;
this.next = null;
}
}
private Node[] buckets; // Array of linked lists for storing key-value pairs
private int capacity; // Size of the buckets array
private int size; // Number of key-value pairs in the hash table
// Constructor to initialize the hash table with a given capacity
public HashTable(int capacity) {
this.capacity = capacity;
this.buckets = new Node[capacity];
this.size = 0;
}
// Hash function to compute index for a key
private int hash(String key) {
return Math.abs(key.hashCode() % capacity); // Ensures non-negative index
}
// Insert: Adds a key-value pair to the hash table
public void insert(String key, int value) {
int index = hash(key); // Computes the index for the key
if (buckets[index] == null) { // If bucket is empty, create new node
buckets[index] = new Node(key, value);
size++;
return;
}
Node current = buckets[index];
while (current != null) { // Traverses the linked list
if (current.key.equals(key)) { // Updates value if key exists
current.value = value;
return;
}
if (current.next == null) break; // Reaches the end of the list
current = current.next;
}
current.next = new Node(key, value); // Adds new node at the end
size++;
}
// Lookup: Retrieves the value for a given key
public int lookup(String key) {
int index = hash(key); // Computes the index for the key
Node current = buckets[index];
while (current != null) { // Traverses the linked list
if (current.key.equals(key)) {
return current.value; // Returns value if key is found
}
current = current.next;
}
throw new IllegalArgumentException("Key not found in the hash table.");
}
// Delete: Removes a key-value pair from the hash table
public void delete(String key) {
int index = hash(key); // Computes the index for the key
Node current = buckets[index];
Node prev = null;
while (current != null) { // Traverses the linked list
if (current.key.equals(key)) {
if (prev == null) { // If key is at the head
buckets[index] = current.next;
} else {
prev.next = current.next; // Bypasses the node
}
size--;
return;
}
prev = current;
current = current.next;
}
throw new IllegalArgumentException("Key not found in the hash table.");
}
// isEmpty: Checks if the hash table is empty
public boolean isEmpty() {
return size == 0;
}
// Size: Returns the number of key-value pairs in the hash table
public int size() {
return size;
}
}
How It Works
- Initialization: The constructor initializes an array of
Nodeobjects (buckets) with a givencapacityand setssizeto 0, indicating an empty hash table. - Hash Function: The
hashmethod computes an index by applying Java’shashCodeto the key and using modulocapacityto fit within the array bounds, ensuring a non-negative index. - Insert Operation:
- The method computes the index for the key using the hash function.
- If the bucket at that index is empty, it creates a new
Nodewith the key-value pair. - If the bucket contains a linked list, it traverses to check if the key exists (updating the value if found) or appends a new node at the end.
- It increments
sizefor new entries.
- Lookup Operation:
- The method computes the index for the key and traverses the linked list at that index.
- If the key is found, it returns the associated value; otherwise, it throws an exception.
- Delete Operation:
- The method computes the index and traverses the linked list to find the key.
- If found, it removes the node by updating pointers (either setting the bucket to the next node or bypassing the node) and decrements
size. - If the key is not found, it throws an exception.
- isEmpty Operation: The method returns true if
sizeequals 0, indicating an empty hash table, and false otherwise. - Size Operation: The method returns
size, which represents the number of key-value pairs.
Complexity Analysis Table
| Operation | Average Time Complexity | Worst-Case Time Complexity | Space Complexity |
|---|---|---|---|
| Insert | O(1) | O(n) | O(1) |
| Lookup | O(1) | O(n) | O(1) |
| Delete | O(1) | O(n) | O(1) |
| isEmpty | O(1) | O(1) | O(1) |
| Size | O(1) | O(1) | O(1) |
Note: Worst-case time complexity occurs when many keys hash to the same index, causing long linked lists.
Key Differences / Notes
- Separate Chaining vs. Open Addressing:
- The implementation above uses separate chaining (linked lists) to handle collisions, which is simpler and handles high load factors well.
- Open addressing resolves collisions by probing for the next available slot, which can be more memory-efficient but requires careful handling of deletions.
- Load Factor: The load factor (size/capacity) affects performance. A high load factor increases collisions, slowing operations. Resizing the hash table (doubling capacity) can maintain efficiency.
- Thread Safety: The provided implementation is not thread-safe. For concurrent applications, use Java’s
ConcurrentHashMapor synchronize access. - Java’s Built-in Hash Table: Java provides
HashMapandHashtableinjava.util.HashMapis preferred for non-thread-safe applications, whileConcurrentHashMapis used for thread safety.
✅ Tip: Choose a good hash function to minimize collisions, as this directly impacts performance. Java’s
hashCodeis a reasonable default, but custom hash functions may be needed for specific key types.
⚠ Warning: A poorly designed hash function or a small table size can lead to frequent collisions, degrading performance to O(n) in the worst case. Monitor the load factor and resize the table if necessary.
Exercises
- Word Frequency Counter: Write a Java program that uses the hash table implementation above to count the frequency of words in a given text. Use words as keys and their counts as values, then print the results.
- Phone Book Application: Create a program that implements a phone book using the hash table. Allow users to insert, lookup, and delete contacts (name as key, phone number as value).
- Collision Analysis: Modify the hash table to track the number of collisions (multiple keys mapping to the same index). Test it with a set of keys and report the collision count.
- Two Sum Problem: Implement a solution to the Two Sum problem (find two numbers in an array that add up to a target sum) using the hash table. Return the indices of the two numbers.
- Custom Hash Function: Create a custom hash function for a specific key type (e.g., strings with a fixed format) and integrate it into the hash table implementation. Compare its performance with Java’s default
hashCode.