Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Hashing Data Structure

Definition and Concepts

Hashing is a technique used to map data of arbitrary size to fixed-size values, typically for efficient storage and retrieval in a data structure called a hash table. A hash table uses a hash function to compute an index (or hash code) for each key, which determines where the key-value pair is stored in an array. The goal is to enable fast access to data using keys, such as strings or numbers. You can visualize a hash table as a set of labeled drawers, where each label (key) quickly directs you to the corresponding drawer (value). Collisions, where multiple keys map to the same index, are resolved using techniques like separate chaining (using linked lists) or open addressing (using probing). Hashing is fundamental for applications requiring quick lookups, insertions, and deletions.

Why Use It?

Hashing is used to achieve fast data retrieval, insertion, and deletion with an average time complexity of O(1). It is ideal for scenarios where you need to associate keys with values and access them efficiently without iterating through the entire dataset. Hash tables are versatile and widely used in databases, caches, and other systems requiring rapid data access.

Where to Use? (Real-Life Examples)

  • Database Indexing: Databases use hashing to index records, allowing quick retrieval of data based on keys like user IDs.
  • Caches: Web browsers and content delivery networks use hash tables to cache web pages, mapping URLs to cached content for fast access.
  • Symbol Tables in Compilers: Compilers use hash tables to store variable names and their attributes during code compilation.
  • Password Storage: Systems store hashed passwords to verify user credentials quickly without storing plain text.

Explain Operations

  • Insert: This operation adds a key-value pair to the hash table. The hash function computes an index for the key, and the pair is stored at that index (or in a linked list if a collision occurs). It has an average time complexity of O(1).
  • Lookup: This operation retrieves the value associated with a given key by computing its hash and checking the corresponding index. It has an average time complexity of O(1).
  • Delete: This operation removes a key-value pair by finding the key’s index and removing it from the bucket (or linked list). It has an average time complexity of O(1).
  • isEmpty: This operation checks whether the hash table is empty. It has a time complexity of O(1).
  • Size: This operation returns the number of key-value pairs in the hash table. It has a time complexity of O(1).

Java Implementation

The following Java code implements a hash table using separate chaining (linked lists for collision resolution).

public class HashTable {
    private static class Node {
        String key; // Key for the hash table entry
        int value; // Value associated with the key
        Node next; // Reference to the next node in case of collision

        Node(String key, int value) {
            this.key = key;
            this.value = value;
            this.next = null;
        }
    }

    private Node[] buckets; // Array of linked lists for storing key-value pairs
    private int capacity; // Size of the buckets array
    private int size; // Number of key-value pairs in the hash table

    // Constructor to initialize the hash table with a given capacity
    public HashTable(int capacity) {
        this.capacity = capacity;
        this.buckets = new Node[capacity];
        this.size = 0;
    }

    // Hash function to compute index for a key
    private int hash(String key) {
        return Math.abs(key.hashCode() % capacity); // Ensures non-negative index
    }

    // Insert: Adds a key-value pair to the hash table
    public void insert(String key, int value) {
        int index = hash(key); // Computes the index for the key
        if (buckets[index] == null) { // If bucket is empty, create new node
            buckets[index] = new Node(key, value);
            size++;
            return;
        }
        Node current = buckets[index];
        while (current != null) { // Traverses the linked list
            if (current.key.equals(key)) { // Updates value if key exists
                current.value = value;
                return;
            }
            if (current.next == null) break; // Reaches the end of the list
            current = current.next;
        }
        current.next = new Node(key, value); // Adds new node at the end
        size++;
    }

    // Lookup: Retrieves the value for a given key
    public int lookup(String key) {
        int index = hash(key); // Computes the index for the key
        Node current = buckets[index];
        while (current != null) { // Traverses the linked list
            if (current.key.equals(key)) {
                return current.value; // Returns value if key is found
            }
            current = current.next;
        }
        throw new IllegalArgumentException("Key not found in the hash table.");
    }

    // Delete: Removes a key-value pair from the hash table
    public void delete(String key) {
        int index = hash(key); // Computes the index for the key
        Node current = buckets[index];
        Node prev = null;
        while (current != null) { // Traverses the linked list
            if (current.key.equals(key)) {
                if (prev == null) { // If key is at the head
                    buckets[index] = current.next;
                } else {
                    prev.next = current.next; // Bypasses the node
                }
                size--;
                return;
            }
            prev = current;
            current = current.next;
        }
        throw new IllegalArgumentException("Key not found in the hash table.");
    }

    // isEmpty: Checks if the hash table is empty
    public boolean isEmpty() {
        return size == 0;
    }

    // Size: Returns the number of key-value pairs in the hash table
    public int size() {
        return size;
    }
}

How It Works

  1. Initialization: The constructor initializes an array of Node objects (buckets) with a given capacity and sets size to 0, indicating an empty hash table.
  2. Hash Function: The hash method computes an index by applying Java’s hashCode to the key and using modulo capacity to fit within the array bounds, ensuring a non-negative index.
  3. Insert Operation:
    • The method computes the index for the key using the hash function.
    • If the bucket at that index is empty, it creates a new Node with the key-value pair.
    • If the bucket contains a linked list, it traverses to check if the key exists (updating the value if found) or appends a new node at the end.
    • It increments size for new entries.
  4. Lookup Operation:
    • The method computes the index for the key and traverses the linked list at that index.
    • If the key is found, it returns the associated value; otherwise, it throws an exception.
  5. Delete Operation:
    • The method computes the index and traverses the linked list to find the key.
    • If found, it removes the node by updating pointers (either setting the bucket to the next node or bypassing the node) and decrements size.
    • If the key is not found, it throws an exception.
  6. isEmpty Operation: The method returns true if size equals 0, indicating an empty hash table, and false otherwise.
  7. Size Operation: The method returns size, which represents the number of key-value pairs.

Complexity Analysis Table

OperationAverage Time ComplexityWorst-Case Time ComplexitySpace Complexity
InsertO(1)O(n)O(1)
LookupO(1)O(n)O(1)
DeleteO(1)O(n)O(1)
isEmptyO(1)O(1)O(1)
SizeO(1)O(1)O(1)

Note: Worst-case time complexity occurs when many keys hash to the same index, causing long linked lists.

Key Differences / Notes

  • Separate Chaining vs. Open Addressing:
    • The implementation above uses separate chaining (linked lists) to handle collisions, which is simpler and handles high load factors well.
    • Open addressing resolves collisions by probing for the next available slot, which can be more memory-efficient but requires careful handling of deletions.
  • Load Factor: The load factor (size/capacity) affects performance. A high load factor increases collisions, slowing operations. Resizing the hash table (doubling capacity) can maintain efficiency.
  • Thread Safety: The provided implementation is not thread-safe. For concurrent applications, use Java’s ConcurrentHashMap or synchronize access.
  • Java’s Built-in Hash Table: Java provides HashMap and Hashtable in java.util. HashMap is preferred for non-thread-safe applications, while ConcurrentHashMap is used for thread safety.

✅ Tip: Choose a good hash function to minimize collisions, as this directly impacts performance. Java’s hashCode is a reasonable default, but custom hash functions may be needed for specific key types.

⚠ Warning: A poorly designed hash function or a small table size can lead to frequent collisions, degrading performance to O(n) in the worst case. Monitor the load factor and resize the table if necessary.

Exercises

  1. Word Frequency Counter: Write a Java program that uses the hash table implementation above to count the frequency of words in a given text. Use words as keys and their counts as values, then print the results.
  2. Phone Book Application: Create a program that implements a phone book using the hash table. Allow users to insert, lookup, and delete contacts (name as key, phone number as value).
  3. Collision Analysis: Modify the hash table to track the number of collisions (multiple keys mapping to the same index). Test it with a set of keys and report the collision count.
  4. Two Sum Problem: Implement a solution to the Two Sum problem (find two numbers in an array that add up to a target sum) using the hash table. Return the indices of the two numbers.
  5. Custom Hash Function: Create a custom hash function for a specific key type (e.g., strings with a fixed format) and integrate it into the hash table implementation. Compare its performance with Java’s default hashCode.