Word Frequency Counter

Problem Statement

Write a Java program that uses a hash table to count the frequency of words in a given text. The hash table should use words (case-insensitive) as keys and their frequency counts as values. The program should process the input text by splitting it into words, ignoring punctuation, and print the word-frequency pairs in a readable format. Test the implementation with different text inputs, including empty text, single-word text, and text with repeated words. You can visualize this as analyzing a book page to count how often each word appears, storing the results in a dictionary-like structure for quick lookup.

Input:

A string of text containing words separated by spaces, punctuation, or other delimiters. Output: A string representation of the hash table’s word-frequency pairs, or "Empty" if no words are present. Constraints:
The text length is between 0 and 10^5 characters.
Words are sequences of alphabetic characters (a-z, A-Z).
Punctuation and non-alphabetic characters are ignored.
Words are case-insensitive (e.g., "The" and "the" are the same). Example:
Input: Text = "The quick brown fox, the quick!"
Output: "the: 2, quick: 2, brown: 1, fox: 1"
Explanation: Words are counted case-insensitively, ignoring punctuation.
Input: Text = ""
Output: "Empty"
Explanation: No words in empty text.
Input: Text = "hello Hello HELLO"
Output: "hello: 3"
Explanation: Case-insensitive, all "hello" variants count as one word.

Pseudocode

FUNCTION countWordFrequencies(text)
    CREATE hashTable as new HashMap
    IF text is empty THEN
        RETURN "Empty"
    ENDIF
    SET words to split text by non-alphabetic characters
    FOR each word in words
        IF word is not empty THEN
            SET word to lowercase(word)
            IF hashTable contains word THEN
                INCREMENT hashTable[word] by 1
            ELSE
                SET hashTable[word] to 1
            ENDIF
        ENDIF
    ENDFOR
    IF hashTable is empty THEN
        RETURN "Empty"
    ENDIF
    CREATE result as new StringBuilder
    FOR each entry in hashTable
        APPEND entry.key and ":" and entry.value to result
        IF not last entry THEN
            APPEND ", " to result
        ENDIF
    ENDFOR
    RETURN result as string
ENDFUNCTION

FUNCTION main()
    SET testCases to array of text inputs
    FOR each testCase in testCases
        PRINT test case details
        CALL countWordFrequencies(testCase.text)
        PRINT word frequencies
    ENFOR
ENDFUNCTION

Algorithm Steps

Define countWordFrequencies: a. Create a HashMap<String, Integer> to store word frequencies. b. If text is empty, return "Empty". c. Split text into words using non-alphabetic characters as delimiters. d. For each non-empty word:
- Convert to lowercase for case-insensitive counting.
- If word exists in hash table, increment its count.
- Otherwise, add word with count 1. e. If hash table is empty, return "Empty". f. Build a string of word-frequency pairs, separated by commas.
In main, test with different texts: a. Normal text with repeated words. b. Empty text. c. Single-word text. d. Text with punctuation and mixed case. e. Text with only non-alphabetic characters.

Java Implementation

import java.util.*;

public class WordFrequencyCounter {
    // Counts word frequencies in the given text
    public String countWordFrequencies(String text) {
        HashMap<String, Integer> hashTable = new HashMap<>();
        if (text == null || text.trim().isEmpty()) {
            return "Empty";
        }
        // Split text into words, ignoring non-alphabetic characters
        String[] words = text.split("[^a-zA-Z]+");
        for (String word : words) {
            if (!word.isEmpty()) {
                word = word.toLowerCase();
                hashTable.put(word, hashTable.getOrDefault(word, 0) + 1);
            }
        }
        if (hashTable.isEmpty()) {
            return "Empty";
        }
        // Build result string
        StringBuilder result = new StringBuilder();
        int index = 0;
        for (Map.Entry<String, Integer> entry : hashTable.entrySet()) {
            result.append(entry.getKey()).append(": ").append(entry.getValue());
            if (index < hashTable.size() - 1) {
                result.append(", ");
            }
            index++;
        }
        return result.toString();
    }

    // Main method to test word frequency counter
    public static void main(String[] args) {
        WordFrequencyCounter counter = new WordFrequencyCounter();

        // Test cases
        String[] testCases = {
            "The quick brown fox, the quick!",           // Normal text with repeats
            "",                                         // Empty text
            "hello Hello HELLO",                        // Case-insensitive repeats
            "hello",                                    // Single word
            "!!! --- 123",                              // No words
        };

        // Run test cases
        for (int i = 0; i < testCases.length; i++) {
            System.out.println("Test case " + (i + 1) + ":");
            System.out.println("Input text: \"" + testCases[i] + "\"");
            String result = counter.countWordFrequencies(testCases[i]);
            System.out.println("Word frequencies: " + result + "\n");
        }
    }
}

Output

Running the main method produces:

Test case 1:
Input text: "The quick brown fox, the quick!"
Word frequencies: the: 2, quick: 2, brown: 1, fox: 1

Test case 2:
Input text: ""
Word frequencies: Empty

Test case 3:
Input text: "hello Hello HELLO"
Word frequencies: hello: 3

Test case 4:
Input text: "hello"
Word frequencies: hello: 1

Test case 5:
Input text: "!!! --- 123"
Word frequencies: Empty

Explanation:

Test case 1: Counts "the" (2), "quick" (2), "brown" (1), "fox" (1), ignoring punctuation.
Test case 2: Empty text returns "Empty".
Test case 3: Counts "hello" (3) case-insensitively.
Test case 4: Single word "hello" has frequency 1.
Test case 5: No alphabetic words, returns "Empty".

How It Works

countWordFrequencies:
- Creates a HashMap<String, Integer> for word frequencies.
- Handles null or empty text by returning "Empty".
- Splits text using regex [^a-zA-Z]+ to ignore non-alphabetic characters.
- Processes each non-empty word:
  - Converts to lowercase.
  - Updates count in hash table using getOrDefault.
- Builds a comma-separated string of word-frequency pairs or returns "Empty" if none.
Example Trace (Test case 1):
- Input: "The quick brown fox, the quick!".
- Split: ["The", "quick", "brown", "fox", "the", "quick"].
- Process: hashTable={the:2, quick:2, brown:1, fox:1}.
- Output: "the: 2, quick: 2, brown: 1, fox: 1".
Main Method: Tests normal text, empty text, repeated words, single word, and non-alphabetic text.

Complexity Analysis Table

Operation	Time Complexity	Space Complexity
Count Frequencies	O(n)	O(m)

Note:

n is the length of the input text (characters).
m is the number of unique words.
Time complexity: O(n) for splitting and processing text; HashMap operations (put, get) are O(1) average case.
Space complexity: O(m) for HashMap storing unique words; O(n) for output string in worst case.
Worst case: O(n) time, O(n) space for many unique words or long output.

✅ Tip: Use a regular expression like [^a-zA-Z]+ to split text and handle punctuation effectively. Convert words to lowercase to ensure case-insensitive counting.

⚠ Warning: Handle edge cases like empty text or non-alphabetic input to avoid empty hash tables. Ensure regex splitting correctly separates words to prevent invalid tokens.

Keyboard shortcuts

Java Data Structures and Algorithms for Beginners