Off-by-one on range boundaries
Wrong move: Loop endpoints miss first/last candidate.
Usually fails on: Fails on minimal arrays and exact-boundary answers.
Fix: Re-derive loops from inclusive/exclusive ranges before coding.
Move from brute-force thinking to an efficient approach using array strategy.
Given an integer array data representing the data, return whether it is a valid UTF-8 encoding (i.e. it translates to a sequence of valid UTF-8 encoded characters).
A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules:
0, followed by its Unicode code.n bits are all one's, the n + 1 bit is 0, followed by n - 1 bytes with the most significant 2 bits being 10.This is how the UTF-8 encoding would work:
Number of Bytes | UTF-8 Octet Sequence
| (binary)
--------------------+-----------------------------------------
1 | 0xxxxxxx
2 | 110xxxxx 10xxxxxx
3 | 1110xxxx 10xxxxxx 10xxxxxx
4 | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
x denotes a bit in the binary form of a byte that may be either 0 or 1.
Note: The input is an array of integers. Only the least significant 8 bits of each integer is used to store the data. This means each integer represents only 1 byte of data.
Example 1:
Input: data = [197,130,1] Output: true Explanation: data represents the octet sequence: 11000101 10000010 00000001. It is a valid utf-8 encoding for a 2-bytes character followed by a 1-byte character.
Example 2:
Input: data = [235,140,4] Output: false Explanation: data represented the octet sequence: 11101011 10001100 00000100. The first 3 bits are all one's and the 4th bit is 0 means it is a 3-bytes character. The next byte is a continuation byte which starts with 10 and that's correct. But the second continuation byte does not start with 10, so it is invalid.
Constraints:
1 <= data.length <= 2 * 1040 <= data[i] <= 255Problem summary: Given an integer array data representing the data, return whether it is a valid UTF-8 encoding (i.e. it translates to a sequence of valid UTF-8 encoded characters). A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules: For a 1-byte character, the first bit is a 0, followed by its Unicode code. For an n-bytes character, the first n bits are all one's, the n + 1 bit is 0, followed by n - 1 bytes with the most significant 2 bits being 10. This is how the UTF-8 encoding would work: Number of Bytes | UTF-8 Octet Sequence | (binary) --------------------+----------------------------------------- 1 | 0xxxxxxx 2 | 110xxxxx 10xxxxxx 3 | 1110xxxx 10xxxxxx 10xxxxxx 4 | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx x denotes a bit in the binary form of a byte that may be either 0 or 1. Note: The input is an array of integers. Only the least significant 8 bits of each integer is
Start with the most direct exhaustive search. That gives a correctness anchor before optimizing.
Pattern signal: Array · Bit Manipulation
[197,130,1]
[235,140,4]
Source-backed implementations are provided below for direct study and interview prep.
// Accepted solution for LeetCode #393: UTF-8 Validation
class Solution {
public boolean validUtf8(int[] data) {
int cnt = 0;
for (int v : data) {
if (cnt > 0) {
if (v >> 6 != 0b10) {
return false;
}
--cnt;
} else if (v >> 7 == 0) {
cnt = 0;
} else if (v >> 5 == 0b110) {
cnt = 1;
} else if (v >> 4 == 0b1110) {
cnt = 2;
} else if (v >> 3 == 0b11110) {
cnt = 3;
} else {
return false;
}
}
return cnt == 0;
}
}
// Accepted solution for LeetCode #393: UTF-8 Validation
func validUtf8(data []int) bool {
cnt := 0
for _, v := range data {
if cnt > 0 {
if v>>6 != 0b10 {
return false
}
cnt--
} else if v>>7 == 0 {
cnt = 0
} else if v>>5 == 0b110 {
cnt = 1
} else if v>>4 == 0b1110 {
cnt = 2
} else if v>>3 == 0b11110 {
cnt = 3
} else {
return false
}
}
return cnt == 0
}
# Accepted solution for LeetCode #393: UTF-8 Validation
class Solution:
def validUtf8(self, data: List[int]) -> bool:
cnt = 0
for v in data:
if cnt > 0:
if v >> 6 != 0b10:
return False
cnt -= 1
elif v >> 7 == 0:
cnt = 0
elif v >> 5 == 0b110:
cnt = 1
elif v >> 4 == 0b1110:
cnt = 2
elif v >> 3 == 0b11110:
cnt = 3
else:
return False
return cnt == 0
// Accepted solution for LeetCode #393: UTF-8 Validation
struct Solution;
impl Solution {
fn valid_utf8(data: Vec<i32>) -> bool {
let mut count = 0;
for x in data {
if count == 0 {
if x >> 3 & 0b11111 == 0b11110 {
count += 3;
continue;
}
if x >> 4 & 0b1111 == 0b1110 {
count += 2;
continue;
}
if x >> 5 & 0b111 == 0b110 {
count += 1;
continue;
}
if x >> 7 & 0b1 == 0 {
continue;
}
} else {
if x >> 6 & 0b11 == 0b10 {
count -= 1;
continue;
}
}
return false;
}
count == 0
}
}
#[test]
fn test() {
let data = vec![197, 130, 1];
assert_eq!(Solution::valid_utf8(data), true);
let data = vec![235, 140, 4];
assert_eq!(Solution::valid_utf8(data), false);
}
// Accepted solution for LeetCode #393: UTF-8 Validation
function validUtf8(data: number[]): boolean {
let cnt = 0;
for (const v of data) {
if (cnt > 0) {
if (v >> 6 !== 0b10) {
return false;
}
--cnt;
} else if (v >> 7 === 0) {
cnt = 0;
} else if (v >> 5 === 0b110) {
cnt = 1;
} else if (v >> 4 === 0b1110) {
cnt = 2;
} else if (v >> 3 === 0b11110) {
cnt = 3;
} else {
return false;
}
}
return cnt === 0;
}
Use this to step through a reusable interview workflow for this problem.
Sort the array in O(n log n), then scan for the missing or unique element by comparing adjacent pairs. Sorting requires O(n) auxiliary space (or O(1) with in-place sort but O(n log n) time remains). The sort step dominates.
Bitwise operations (AND, OR, XOR, shifts) are O(1) per operation on fixed-width integers. A single pass through the input with bit operations gives O(n) time. The key insight: XOR of a number with itself is 0, which eliminates duplicates without extra space.
Review these before coding to avoid predictable interview regressions.
Wrong move: Loop endpoints miss first/last candidate.
Usually fails on: Fails on minimal arrays and exact-boundary answers.
Fix: Re-derive loops from inclusive/exclusive ranges before coding.