LeetCode #2157 — HARD

Groups of Strings

Break down a hard problem into reliable checkpoints, edge-case handling, and complexity trade-offs.

Solve on LeetCode
The Problem

Problem Statement

You are given a 0-indexed array of strings words. Each string consists of lowercase English letters only. No letter occurs more than once in any string of words.

Two strings s1 and s2 are said to be connected if the set of letters of s2 can be obtained from the set of letters of s1 by any one of the following operations:

  • Adding exactly one letter to the set of the letters of s1.
  • Deleting exactly one letter from the set of the letters of s1.
  • Replacing exactly one letter from the set of the letters of s1 with any letter, including itself.

The array words can be divided into one or more non-intersecting groups. A string belongs to a group if any one of the following is true:

  • It is connected to at least one other string of the group.
  • It is the only string present in the group.

Note that the strings in words should be grouped in such a manner that a string belonging to a group cannot be connected to a string present in any other group. It can be proved that such an arrangement is always unique.

Return an array ans of size 2 where:

  • ans[0] is the maximum number of groups words can be divided into, and
  • ans[1] is the size of the largest group.

Example 1:

Input: words = ["a","b","ab","cde"]
Output: [2,3]
Explanation:
- words[0] can be used to obtain words[1] (by replacing 'a' with 'b'), and words[2] (by adding 'b'). So words[0] is connected to words[1] and words[2].
- words[1] can be used to obtain words[0] (by replacing 'b' with 'a'), and words[2] (by adding 'a'). So words[1] is connected to words[0] and words[2].
- words[2] can be used to obtain words[0] (by deleting 'b'), and words[1] (by deleting 'a'). So words[2] is connected to words[0] and words[1].
- words[3] is not connected to any string in words.
Thus, words can be divided into 2 groups ["a","b","ab"] and ["cde"]. The size of the largest group is 3.  

Example 2:

Input: words = ["a","ab","abc"]
Output: [1,3]
Explanation:
- words[0] is connected to words[1].
- words[1] is connected to words[0] and words[2].
- words[2] is connected to words[1].
Since all strings are connected to each other, they should be grouped together.
Thus, the size of the largest group is 3.

Constraints:

  • 1 <= words.length <= 2 * 104
  • 1 <= words[i].length <= 26
  • words[i] consists of lowercase English letters only.
  • No letter occurs more than once in words[i].
Patterns Used

Roadmap

  1. Brute Force Baseline
  2. Core Insight
  3. Algorithm Walkthrough
  4. Edge Cases
  5. Full Annotated Code
  6. Interactive Study Demo
  7. Complexity Analysis
Step 01

Brute Force Baseline

Problem summary: You are given a 0-indexed array of strings words. Each string consists of lowercase English letters only. No letter occurs more than once in any string of words. Two strings s1 and s2 are said to be connected if the set of letters of s2 can be obtained from the set of letters of s1 by any one of the following operations: Adding exactly one letter to the set of the letters of s1. Deleting exactly one letter from the set of the letters of s1. Replacing exactly one letter from the set of the letters of s1 with any letter, including itself. The array words can be divided into one or more non-intersecting groups. A string belongs to a group if any one of the following is true: It is connected to at least one other string of the group. It is the only string present in the group. Note that the strings in words should be grouped in such a manner that a string belonging to a group cannot be

Baseline thinking

Start with the most direct exhaustive search. That gives a correctness anchor before optimizing.

Pattern signal: Array · Hash Map · Bit Manipulation · Union-Find

Example 1

["a","b","ab","cde"]

Example 2

["a","ab","abc"]

Related Problems

  • Word Ladder II (word-ladder-ii)
  • Similar String Groups (similar-string-groups)
  • Largest Component Size by Common Factor (largest-component-size-by-common-factor)
Step 02

Core Insight

What unlocks the optimal approach

  • Can we build a graph from words, where there exists an edge between nodes i and j if words[i] and words[j] are connected?
  • The problem now boils down to finding the total number of components and the size of the largest component in the graph.
  • How can we use bit masking to reduce the search space while adding edges to node i?
Interview move: turn each hint into an invariant you can check after every iteration/recursion step.
Step 03

Algorithm Walkthrough

Iteration Checklist

  1. Define state (indices, window, stack, map, DP cell, or recursion frame).
  2. Apply one transition step and update the invariant.
  3. Record answer candidate when condition is met.
  4. Continue until all input is consumed.
Use the first example testcase as your mental trace to verify each transition.
Step 04

Edge Cases

Minimum Input
Single element / shortest valid input
Validate boundary behavior before entering the main loop or recursion.
Duplicates & Repeats
Repeated values / repeated states
Decide whether duplicates should be merged, skipped, or counted explicitly.
Extreme Constraints
Largest constraint values
Re-check complexity target against constraints to avoid time-limit issues.
Invalid / Corner Shape
Empty collections, zeros, or disconnected structures
Handle special-case structure before the core algorithm path.
Step 05

Full Annotated Code

Source-backed implementations are provided below for direct study and interview prep.

// Accepted solution for LeetCode #2157: Groups of Strings
class Solution {
    private Map<Integer, Integer> p;
    private Map<Integer, Integer> size;
    private int mx;
    private int n;

    public int[] groupStrings(String[] words) {
        p = new HashMap<>();
        size = new HashMap<>();
        n = words.length;
        mx = 0;
        for (String word : words) {
            int x = 0;
            for (char c : word.toCharArray()) {
                x |= 1 << (c - 'a');
            }
            p.put(x, x);
            size.put(x, size.getOrDefault(x, 0) + 1);
            mx = Math.max(mx, size.get(x));
            if (size.get(x) > 1) {
                --n;
            }
        }
        for (int x : p.keySet()) {
            for (int i = 0; i < 26; ++i) {
                union(x, x ^ (1 << i));
                if (((x >> i) & 1) != 0) {
                    for (int j = 0; j < 26; ++j) {
                        if (((x >> j) & 1) == 0) {
                            union(x, x ^ (1 << i) | (1 << j));
                        }
                    }
                }
            }
        }
        return new int[] {n, mx};
    }

    private int find(int x) {
        if (p.get(x) != x) {
            p.put(x, find(p.get(x)));
        }
        return p.get(x);
    }

    private void union(int a, int b) {
        if (!p.containsKey(b)) {
            return;
        }
        int pa = find(a), pb = find(b);
        if (pa == pb) {
            return;
        }
        p.put(pa, pb);
        size.put(pb, size.get(pb) + size.get(pa));
        mx = Math.max(mx, size.get(pb));
        --n;
    }
}
Step 06

Interactive Study Demo

Use this to step through a reusable interview workflow for this problem.

Press Step or Run All to begin.
Step 07

Complexity Analysis

Time
O(n)
Space
O(1)

Approach Breakdown

SORT + SCAN
O(n log n) time
O(n) space

Sort the array in O(n log n), then scan for the missing or unique element by comparing adjacent pairs. Sorting requires O(n) auxiliary space (or O(1) with in-place sort but O(n log n) time remains). The sort step dominates.

BIT MANIPULATION
O(n) time
O(1) space

Bitwise operations (AND, OR, XOR, shifts) are O(1) per operation on fixed-width integers. A single pass through the input with bit operations gives O(n) time. The key insight: XOR of a number with itself is 0, which eliminates duplicates without extra space.

Shortcut: Bit operations are O(1). XOR cancels duplicates. Single pass → O(n) time, O(1) space.
Coach Notes

Common Mistakes

Review these before coding to avoid predictable interview regressions.

Off-by-one on range boundaries

Wrong move: Loop endpoints miss first/last candidate.

Usually fails on: Fails on minimal arrays and exact-boundary answers.

Fix: Re-derive loops from inclusive/exclusive ranges before coding.

Mutating counts without cleanup

Wrong move: Zero-count keys stay in map and break distinct/count constraints.

Usually fails on: Window/map size checks are consistently off by one.

Fix: Delete keys when count reaches zero.