int kmpSearch(String text, String pattern) {
int[] lps = new int[pattern.length()];
for (int i = 1, len = 0; i < pattern.length(); ) {
if (pattern.charAt(i) == pattern.charAt(len))
lps[i++] = ++len;
else if (len > 0) len = lps[len - 1];
else i++;
}
for (int i = 0, j = 0; i < text.length(); ) {
if (text.charAt(i) == pattern.charAt(j)) { i++; j++; }
if (j == pattern.length()) return i - j;
if (i < text.length()
&& text.charAt(i) != pattern.charAt(j)) {
j = j > 0 ? lps[j - 1] : 0;
if (j == 0 && text.charAt(i) != pattern.charAt(0)) i++;
}
}
return -1;
} func kmpSearch(text, pattern string) int {
lps := make([]int, len(pattern))
for i, length := 1, 0; i < len(pattern); {
if pattern[i] == pattern[length] {
length++; lps[i] = length; i++
} else if length > 0 {
length = lps[length-1]
} else { i++ }
}
i, j := 0, 0
for i < len(text) {
if text[i] == pattern[j] { i++; j++ }
if j == len(pattern) { return i - j }
if i < len(text) && text[i] != pattern[j] {
if j > 0 { j = lps[j-1] } else { i++ }
}
}
return -1
} def kmp_search(text, pattern):
# Build failure function
lps = [0] * len(pattern)
length, i = 0, 1
while i < len(pattern):
if pattern[i] == pattern[length]:
length += 1
lps[i] = length
i += 1
elif length:
length = lps[length - 1]
else:
i += 1
# Search
i = j = 0
while i < len(text):
if text[i] == pattern[j]:
i += 1; j += 1
if j == len(pattern):
return i - j # match found
elif i < len(text) and text[i] != pattern[j]:
j = lps[j - 1] if j else 0
if not j: i += 1
return -1 fn kmp_search(text: &str, pattern: &str) -> i32 {
let (t, p) = (text.as_bytes(), pattern.as_bytes());
let mut lps = vec![0usize; p.len()];
let (mut i, mut len) = (1, 0);
while i < p.len() {
if p[i] == p[len] { len += 1; lps[i] = len; i += 1; }
else if len > 0 { len = lps[len - 1]; }
else { i += 1; }
}
let (mut i, mut j) = (0usize, 0usize);
while i < t.len() {
if t[i] == p[j] { i += 1; j += 1; }
if j == p.len() { return (i - j) as i32; }
if i < t.len() && t[i] != p[j] {
if j > 0 { j = lps[j - 1]; } else { i += 1; }
}
}
-1
} function kmpSearch(text: string, pattern: string): number {
const lps = new Array(pattern.length).fill(0);
for (let i = 1, len = 0; i < pattern.length; ) {
if (pattern[i] === pattern[len]) lps[i++] = ++len;
else if (len) len = lps[len - 1];
else i++;
}
for (let i = 0, j = 0; i < text.length; ) {
if (text[i] === pattern[j]) { i++; j++; }
if (j === pattern.length) return i - j;
if (i < text.length && text[i] !== pattern[j]) {
if (j) j = lps[j - 1]; else i++;
}
}
return -1;
} Find a pattern inside a text efficiently using KMP or Rabin-Karp, avoiding the brute-force O(n*m) trap.
String matching is the problem of finding all occurrences of a short pattern string inside a longer text string. The naive approach slides the pattern one character at a time and compares every character, giving O(n*m) time. Efficient algorithms like KMP and Rabin-Karp reduce this to O(n + m) by reusing information from previous comparisons.
Align the pattern under the text and compare character by character. When a mismatch occurs, instead of restarting from scratch, use precomputed information to skip ahead intelligently.
Why does this matter? In the brute force, after a mismatch you throw away everything you learned about the text. KMP and Rabin-Karp ensure you never re-examine a text character (KMP) or compare entire substrings in O(1) (Rabin-Karp). This is the difference between O(n*m) and O(n+m).
Look for these recognition signals in the problem statement. If you spot one, a string matching algorithm is very likely the intended approach.
"Find the first occurrence" or "indexOf"
"Repeated pattern" or "periodic string"
"Longest prefix which is also a suffix"
"Subtree matching" or "subarray matching"
The key clue: You need to check if one string (or serialized structure) appears inside another. A brute-force nested loop would be O(n*m). KMP solves it in O(n+m) by precomputing a failure function. Rabin-Karp achieves the same average-case with a rolling hash, and is easier to extend to multi-pattern or 2D matching.
KMP precomputes a failure function (also called the "partial match table" or "prefix function") for the pattern. For each position j in the pattern, fail[j] stores the length of the longest proper prefix of pattern[0..j] that is also a suffix. When a mismatch occurs at position j, we jump to fail[j-1] instead of restarting at 0.
We compare the pattern against itself to find prefix-suffix overlaps.
Why this works: When text[i] mismatches pattern[j], we know that pattern[0..j-1] matched the text. The failure function tells us the longest prefix of the pattern that is also a suffix of what we already matched — so we can jump j = fail[j-1] and continue without re-scanning any text character.
Instead of comparing characters one by one, Rabin-Karp computes a hash of the pattern and a rolling hash of each text window. If the hashes match, we verify character by character (to handle collisions). The rolling hash updates in O(1) as the window slides.
Hash of "ABD" = hp. Slide a window of length 3 across the text.
When to choose Rabin-Karp over KMP: Rabin-Karp shines when you need to search for multiple patterns simultaneously (each has its own hash), when doing 2D pattern matching, or when the problem involves finding duplicate substrings (binary search on length + rolling hash). KMP is preferred for single-pattern search with guaranteed O(n+m) worst case.
Here are the annotated Java templates for both KMP and Rabin-Karp. KMP is the go-to for interview problems that need guaranteed linear time.
// Build the failure (prefix) function for pattern int[] buildFail(String pat) { int m = pat.length(); int[] fail = new int[m]; int len = 0; // length of prev longest prefix-suffix int i = 1; while (i < m) { if (pat.charAt(i) == pat.charAt(len)) { len++; fail[i] = len; i++; } else { if (len > 0) { len = fail[len - 1]; // fall back, don't increment i } else { fail[i] = 0; i++; } } } return fail; }
// Build the failure (prefix) function for pattern func buildFail(pat string) []int { p := []rune(pat) m := len(p) fail := make([]int, m) length := 0 // length of prev longest prefix-suffix i := 1 for i < m { if p[i] == p[length] { length++ fail[i] = length i++ } else { if length > 0 { length = fail[length-1] // fall back, don't increment i } else { fail[i] = 0 i++ } } } return fail }
# Build the failure (prefix) function for pattern def build_fail(pat: str) -> list[int]: m = len(pat) fail = [0] * m length = 0 # length of prev longest prefix-suffix i = 1 while i < m: if pat[i] == pat[length]: length += 1 fail[i] = length i += 1 else: if length > 0: length = fail[length - 1] # fall back, don't increment i else: fail[i] = 0 i += 1 return fail
// Build the failure (prefix) function for pattern fn build_fail(pat: &str) -> Vec<usize> { let p: Vec<char> = pat.chars().collect(); let m = p.len(); let mut fail = vec![0usize; m]; let mut length: usize = 0; // length of prev longest prefix-suffix let mut i: usize = 1; while i < m { if p[i] == p[length] { length += 1; fail[i] = length; i += 1; } else if length > 0 { length = fail[length - 1]; // fall back, don't increment i } else { fail[i] = 0; i += 1; } } fail }
// Build the failure (prefix) function for pattern function buildFail(pat: string): number[] { const m = pat.length; const fail = new Array<number>(m).fill(0); let len = 0; // length of prev longest prefix-suffix let i = 1; while (i < m) { if (pat[i] === pat[len]) { len++; fail[i] = len; i++; } else { if (len > 0) { len = fail[len - 1]; // fall back, don't increment i } else { fail[i] = 0; i++; } } } return fail; }
// Return first index where pattern appears in text, or -1 int kmpSearch(String text, String pat) { int[] fail = buildFail(pat); int i = 0, j = 0; // i for text, j for pattern while (i < text.length()) { if (text.charAt(i) == pat.charAt(j)) { i++; j++; if (j == pat.length()) { return i - j; // match found } } else { if (j > 0) { j = fail[j - 1]; // use failure function } else { i++; // no fallback, advance text } } } return -1; // no match }
// Return first index where pattern appears in text, or -1 func kmpSearch(text, pat string) int { t := []rune(text) p := []rune(pat) n, m := len(t), len(p) fail := buildFail(pat) i, j := 0, 0 // i for text, j for pattern for i < n { if t[i] == p[j] { i++ j++ if j == m { return i - j // match found } } else if j > 0 { j = fail[j-1] // use failure function } else { i++ // no fallback, advance text } } return -1 // no match }
# Return first index where pattern appears in text, or -1 def kmp_search(text: str, pat: str) -> int: fail = build_fail(pat) i, j = 0, 0 # i for text, j for pattern while i < len(text): if text[i] == pat[j]: i += 1; j += 1 if j == len(pat): return i - j # match found else: if j > 0: j = fail[j - 1] # use failure function else: i += 1 # no fallback, advance text return -1 # no match
// Return first index where pattern appears in text, or -1 fn kmp_search(text: &str, pat: &str) -> i32 { let t: Vec<char> = text.chars().collect(); let p: Vec<char> = pat.chars().collect(); let n = t.len(); let m = p.len(); let fail = build_fail(pat); let mut i: usize = 0; let mut j: usize = 0; // i for text, j for pattern while i < n { if t[i] == p[j] { i += 1; j += 1; if j == m { return (i - j) as i32; // match found } } else if j > 0 { j = fail[j - 1]; // use failure function } else { i += 1; // no fallback, advance text } } -1 // no match }
// Return first index where pattern appears in text, or -1 function kmpSearch(text: string, pat: string): number { const fail = buildFail(pat); let i = 0, j = 0; // i for text, j for pattern while (i < text.length) { if (text[i] === pat[j]) { i++; j++; if (j === pat.length) { return i - j; // match found } } else { if (j > 0) { j = fail[j - 1]; // use failure function } else { i++; // no fallback, advance text } } } return -1; // no match }
// Rabin-Karp: average O(n+m), worst O(n*m) on hash collisions int rabinKarp(String text, String pat) { int n = text.length(), m = pat.length(); long B = 31, MOD = 1_000_000_007L; // base and modulus long patHash = 0, winHash = 0, power = 1; for (int i = m - 1; i >= 0; i--) { // compute hash + highest power patHash = (patHash + pat.charAt(i) * power) % MOD; winHash = (winHash + text.charAt(i) * power) % MOD; if (i > 0) power = power * B % MOD; } for (int i = 0; i <= n - m; i++) { if (winHash == patHash && text.substring(i, i + m).equals(pat)) return i; // verify on hash match if (i < n - m) { // roll the hash forward winHash = (winHash - text.charAt(i) * power % MOD + MOD) % MOD; winHash = (winHash * B + text.charAt(i + m)) % MOD; } } return -1; }
// Rabin-Karp: average O(n+m), worst O(n*m) on hash collisions func rabinKarp(text, pat string) int { tb := []byte(text) pb := []byte(pat) n, m := len(tb), len(pb) const B, MOD = 31, 1_000_000_007 // base and modulus var patHash, winHash, power int = 0, 0, 1 for i := m - 1; i >= 0; i-- { // compute hash + highest power patHash = (patHash + int(pb[i])*power) % MOD winHash = (winHash + int(tb[i])*power) % MOD if i > 0 { power = power * B % MOD } } for i := 0; i <= n-m; i++ { if winHash == patHash && string(tb[i:i+m]) == pat { return i // verify on hash match } if i < n-m { // roll the hash forward winHash = (winHash - int(tb[i])*power%MOD + MOD) % MOD winHash = (winHash*B + int(tb[i+m])) % MOD } } return -1 }
# Rabin-Karp: average O(n+m), worst O(n*m) on hash collisions def rabin_karp(text: str, pat: str) -> int: n, m = len(text), len(pat) B, MOD = 31, 1_000_000_007 # base and modulus pat_hash = win_hash = 0 power = 1 for i in range(m - 1, -1, -1): # compute hash + highest power pat_hash = (pat_hash + ord(pat[i]) * power) % MOD win_hash = (win_hash + ord(text[i]) * power) % MOD if i > 0: power = power * B % MOD for i in range(n - m + 1): if win_hash == pat_hash and text[i:i + m] == pat: return i # verify on hash match if i < n - m: # roll the hash forward win_hash = (win_hash - ord(text[i]) * power % MOD + MOD) % MOD win_hash = (win_hash * B + ord(text[i + m])) % MOD return -1
// Rabin-Karp: average O(n+m), worst O(n*m) on hash collisions fn rabin_karp(text: &str, pat: &str) -> i32 { let tb = text.as_bytes(); let pb = pat.as_bytes(); let n = tb.len(); let m = pb.len(); const B: u64 = 31; const MOD: u64 = 1_000_000_007; // base and modulus let (mut pat_hash, mut win_hash, mut power) = (0u64, 0u64, 1u64); for i in (0..m).rev() { // compute hash + highest power pat_hash = (pat_hash + pb[i] as u64 * power) % MOD; win_hash = (win_hash + tb[i] as u64 * power) % MOD; if i > 0 { power = power * B % MOD; } } for i in 0..=n - m { if win_hash == pat_hash && &tb[i..i + m] == pb { return i as i32; // verify on hash match } if i < n - m { // roll the hash forward win_hash = (win_hash + MOD - tb[i] as u64 * power % MOD) % MOD; win_hash = (win_hash * B + tb[i + m] as u64) % MOD; } } -1 }
// Rabin-Karp: average O(n+m), worst O(n*m) on hash collisions function rabinKarp(text: string, pat: string): number { const n = text.length, m = pat.length; const B = 31, MOD = 1_000_000_007; // base and modulus let patHash = 0, winHash = 0, power = 1; for (let i = m - 1; i >= 0; i--) { // compute hash + highest power patHash = (patHash + pat.charCodeAt(i) * power) % MOD; winHash = (winHash + text.charCodeAt(i) * power) % MOD; if (i > 0) power = power * B % MOD; } for (let i = 0; i <= n - m; i++) { if (winHash === patHash && text.slice(i, i + m) === pat) return i; // verify on hash match if (i < n - m) { // roll the hash forward winHash = (winHash - text.charCodeAt(i) * power % MOD + MOD) % MOD; winHash = (winHash * B + text.charCodeAt(i + m)) % MOD; } } return -1; }
KMP guarantees O(n+m) worst case. Rabin-Karp is O(n+m) on average but can degrade to O(n*m) with hash collisions. In interviews, KMP is the safer choice for single-pattern search. Use Rabin-Karp when you need hashing flexibility (multi-pattern, duplicate substrings).
Let us trace KMP on text = "ABABABABC" and pattern = "ABABC", showing how the failure function prevents re-scanning text characters.
Failure function: fail = [0, 0, 1, 2, 0]
| STEP | COMPARE | ACTION | i, j |
|---|---|---|---|
| 1 | text[0]='A' == pat[0]='A' | Match. i++, j++ | 1, 1 |
| 2 | text[1]='B' == pat[1]='B' | Match. i++, j++ | 2, 2 |
| 3 | text[2]='A' == pat[2]='A' | Match. i++, j++ | 3, 3 |
| 4 | text[3]='B' == pat[3]='B' | Match. i++, j++ | 4, 4 |
| 5 | text[4]='A' ≠ pat[4]='C' | Mismatch! j = fail[3] = 2 | 4, 2 |
| 6 | text[4]='A' == pat[2]='A' | Match. i++, j++ | 5, 3 |
| 7 | text[5]='B' == pat[3]='B' | Match. i++, j++ | 6, 4 |
| 8 | text[6]='A' ≠ pat[4]='C' | Mismatch! j = fail[3] = 2 | 6, 2 |
| 9 | text[6]='A' == pat[2]='A' | Match. i++, j++ | 7, 3 |
| 10 | text[7]='B' == pat[3]='B' | Match. i++, j++ | 8, 4 |
| 11 | text[8]='C' == pat[4]='C' | Match! j == m. Found at index 4. | 9, 5 |
Notice: At step 5, i stays at 4 while j falls back from 4 to 2. The text pointer never moves backward. This is the fundamental guarantee of KMP — it processes each text character at most twice (once on match, once triggering a fallback), giving O(n+m) total comparisons.
These are classic LeetCode problems that use string matching algorithms, listed in rough order of difficulty.
Practice tip: Start with #28 (direct KMP application). Then try #459, which uses a clever property of the failure function: a string has a repeating unit if and only if n % (n - fail[n-1]) == 0. For #214 (Shortest Palindrome), concatenate pattern + "#" + reverse(pattern) and compute the failure function to find the longest palindromic prefix.
Enter a text and pattern, then step through the KMP algorithm. Watch the failure function being built, then see pattern alignment shift on mismatches without ever moving backward in the text.
Building the failure function: O(m). The pointer i advances at most m times, and len decreases at most as many times as it increased. Total operations bounded by 2m.
Searching: O(n). The text pointer i advances at most n times and never moves backward. The pattern pointer j can fall back via the failure function, but each fallback corresponds to a previous advance, so total fallbacks are bounded by n.
Space: The failure function array uses O(m) space. Rabin-Karp uses O(1) extra space (just hash values) but has worst-case O(n*m) due to hash collisions.
The key insight: Both algorithms achieve linear time by ensuring the text pointer never backtracks. KMP does this via the failure function; Rabin-Karp does it via O(1) hash updates. The brute force backtracks the text pointer on every mismatch, which is why it degrades to O(n*m).
The failure function is the heart of KMP. It precomputes, for each position in the pattern, the longest proper prefix that is also a suffix. This single array enables O(n+m) matching by telling you exactly where to continue after a mismatch.
The text pointer never moves backward in KMP. This is the fundamental property that guarantees O(n+m). Each text character is examined at most a constant number of times. Contrast this with brute force, which re-scans text characters.
Rabin-Karp trades guarantees for flexibility. Average O(n+m) with O(1) space. Excels at multi-pattern search, 2D matching, and duplicate substring detection via binary search + rolling hash.
The failure function reveals string structure. Beyond matching, it detects repeated patterns (n % (n - fail[n-1]) == 0), finds the longest palindromic prefix (via reverse concatenation), and computes the longest happy prefix directly.
Many "non-obvious" problems reduce to string matching. Subtree matching (serialize trees as strings), repeated string match (repeat + KMP), and shortest palindrome (reverse + failure function) all become straightforward once you recognize the pattern.