Word Frequency Counter
Problem Statement
Write a Java program that uses a hash table to count the frequency of words in a given text. The hash table should use words (case-insensitive) as keys and their frequency counts as values. The program should process the input text by splitting it into words, ignoring punctuation, and print the word-frequency pairs in a readable format. Test the implementation with different text inputs, including empty text, single-word text, and text with repeated words. You can visualize this as analyzing a book page to count how often each word appears, storing the results in a dictionary-like structure for quick lookup.
Input:
- A string of text containing words separated by spaces, punctuation, or other delimiters. Output: A string representation of the hash table’s word-frequency pairs, or "Empty" if no words are present. Constraints:
- The text length is between 0 and 10^5 characters.
- Words are sequences of alphabetic characters (a-z, A-Z).
- Punctuation and non-alphabetic characters are ignored.
- Words are case-insensitive (e.g., "The" and "the" are the same). Example:
- Input: Text = "The quick brown fox, the quick!"
- Output: "the: 2, quick: 2, brown: 1, fox: 1"
- Explanation: Words are counted case-insensitively, ignoring punctuation.
- Input: Text = ""
- Output: "Empty"
- Explanation: No words in empty text.
- Input: Text = "hello Hello HELLO"
- Output: "hello: 3"
- Explanation: Case-insensitive, all "hello" variants count as one word.
Pseudocode
FUNCTION countWordFrequencies(text)
CREATE hashTable as new HashMap
IF text is empty THEN
RETURN "Empty"
ENDIF
SET words to split text by non-alphabetic characters
FOR each word in words
IF word is not empty THEN
SET word to lowercase(word)
IF hashTable contains word THEN
INCREMENT hashTable[word] by 1
ELSE
SET hashTable[word] to 1
ENDIF
ENDIF
ENDFOR
IF hashTable is empty THEN
RETURN "Empty"
ENDIF
CREATE result as new StringBuilder
FOR each entry in hashTable
APPEND entry.key and ":" and entry.value to result
IF not last entry THEN
APPEND ", " to result
ENDIF
ENDFOR
RETURN result as string
ENDFUNCTION
FUNCTION main()
SET testCases to array of text inputs
FOR each testCase in testCases
PRINT test case details
CALL countWordFrequencies(testCase.text)
PRINT word frequencies
ENFOR
ENDFUNCTION
Algorithm Steps
- Define
countWordFrequencies: a. Create aHashMap<String, Integer>to store word frequencies. b. If text is empty, return "Empty". c. Split text into words using non-alphabetic characters as delimiters. d. For each non-empty word:- Convert to lowercase for case-insensitive counting.
- If word exists in hash table, increment its count.
- Otherwise, add word with count 1. e. If hash table is empty, return "Empty". f. Build a string of word-frequency pairs, separated by commas.
- In
main, test with different texts: a. Normal text with repeated words. b. Empty text. c. Single-word text. d. Text with punctuation and mixed case. e. Text with only non-alphabetic characters.
Java Implementation
import java.util.*;
public class WordFrequencyCounter {
// Counts word frequencies in the given text
public String countWordFrequencies(String text) {
HashMap<String, Integer> hashTable = new HashMap<>();
if (text == null || text.trim().isEmpty()) {
return "Empty";
}
// Split text into words, ignoring non-alphabetic characters
String[] words = text.split("[^a-zA-Z]+");
for (String word : words) {
if (!word.isEmpty()) {
word = word.toLowerCase();
hashTable.put(word, hashTable.getOrDefault(word, 0) + 1);
}
}
if (hashTable.isEmpty()) {
return "Empty";
}
// Build result string
StringBuilder result = new StringBuilder();
int index = 0;
for (Map.Entry<String, Integer> entry : hashTable.entrySet()) {
result.append(entry.getKey()).append(": ").append(entry.getValue());
if (index < hashTable.size() - 1) {
result.append(", ");
}
index++;
}
return result.toString();
}
// Main method to test word frequency counter
public static void main(String[] args) {
WordFrequencyCounter counter = new WordFrequencyCounter();
// Test cases
String[] testCases = {
"The quick brown fox, the quick!", // Normal text with repeats
"", // Empty text
"hello Hello HELLO", // Case-insensitive repeats
"hello", // Single word
"!!! --- 123", // No words
};
// Run test cases
for (int i = 0; i < testCases.length; i++) {
System.out.println("Test case " + (i + 1) + ":");
System.out.println("Input text: \"" + testCases[i] + "\"");
String result = counter.countWordFrequencies(testCases[i]);
System.out.println("Word frequencies: " + result + "\n");
}
}
}
Output
Running the main method produces:
Test case 1:
Input text: "The quick brown fox, the quick!"
Word frequencies: the: 2, quick: 2, brown: 1, fox: 1
Test case 2:
Input text: ""
Word frequencies: Empty
Test case 3:
Input text: "hello Hello HELLO"
Word frequencies: hello: 3
Test case 4:
Input text: "hello"
Word frequencies: hello: 1
Test case 5:
Input text: "!!! --- 123"
Word frequencies: Empty
Explanation:
- Test case 1: Counts "the" (2), "quick" (2), "brown" (1), "fox" (1), ignoring punctuation.
- Test case 2: Empty text returns "Empty".
- Test case 3: Counts "hello" (3) case-insensitively.
- Test case 4: Single word "hello" has frequency 1.
- Test case 5: No alphabetic words, returns "Empty".
How It Works
- countWordFrequencies:
- Creates a
HashMap<String, Integer>for word frequencies. - Handles null or empty text by returning "Empty".
- Splits text using regex
[^a-zA-Z]+to ignore non-alphabetic characters. - Processes each non-empty word:
- Converts to lowercase.
- Updates count in hash table using
getOrDefault.
- Builds a comma-separated string of word-frequency pairs or returns "Empty" if none.
- Creates a
- Example Trace (Test case 1):
- Input: "The quick brown fox, the quick!".
- Split: ["The", "quick", "brown", "fox", "the", "quick"].
- Process: hashTable={the:2, quick:2, brown:1, fox:1}.
- Output: "the: 2, quick: 2, brown: 1, fox: 1".
- Main Method: Tests normal text, empty text, repeated words, single word, and non-alphabetic text.
Complexity Analysis Table
| Operation | Time Complexity | Space Complexity |
|---|---|---|
| Count Frequencies | O(n) | O(m) |
Note:
- n is the length of the input text (characters).
- m is the number of unique words.
- Time complexity: O(n) for splitting and processing text; HashMap operations (put, get) are O(1) average case.
- Space complexity: O(m) for HashMap storing unique words; O(n) for output string in worst case.
- Worst case: O(n) time, O(n) space for many unique words or long output.
✅ Tip: Use a regular expression like
[^a-zA-Z]+to split text and handle punctuation effectively. Convert words to lowercase to ensure case-insensitive counting.
⚠ Warning: Handle edge cases like empty text or non-alphabetic input to avoid empty hash tables. Ensure regex splitting correctly separates words to prevent invalid tokens.