Off-by-one on range boundaries
Wrong move: Loop endpoints miss first/last candidate.
Usually fails on: Fails on minimal arrays and exact-boundary answers.
Fix: Re-derive loops from inclusive/exclusive ranges before coding.
Move from brute-force thinking to an efficient approach using array strategy.
Given a list paths of directory info, including the directory path, and all the files with contents in this directory, return all the duplicate files in the file system in terms of their paths. You may return the answer in any order.
A group of duplicate files consists of at least two files that have the same content.
A single directory info string in the input list has the following format:
"root/d1/d2/.../dm f1.txt(f1_content) f2.txt(f2_content) ... fn.txt(fn_content)"It means there are n files (f1.txt, f2.txt ... fn.txt) with content (f1_content, f2_content ... fn_content) respectively in the directory "root/d1/d2/.../dm". Note that n >= 1 and m >= 0. If m = 0, it means the directory is just the root directory.
The output is a list of groups of duplicate file paths. For each group, it contains all the file paths of the files that have the same content. A file path is a string that has the following format:
"directory_path/file_name.txt"Example 1:
Input: paths = ["root/a 1.txt(abcd) 2.txt(efgh)","root/c 3.txt(abcd)","root/c/d 4.txt(efgh)","root 4.txt(efgh)"] Output: [["root/a/2.txt","root/c/d/4.txt","root/4.txt"],["root/a/1.txt","root/c/3.txt"]]
Example 2:
Input: paths = ["root/a 1.txt(abcd) 2.txt(efgh)","root/c 3.txt(abcd)","root/c/d 4.txt(efgh)"] Output: [["root/a/2.txt","root/c/d/4.txt"],["root/a/1.txt","root/c/3.txt"]]
Constraints:
1 <= paths.length <= 2 * 1041 <= paths[i].length <= 30001 <= sum(paths[i].length) <= 5 * 105paths[i] consist of English letters, digits, '/', '.', '(', ')', and ' '.Follow up:
Problem summary: Given a list paths of directory info, including the directory path, and all the files with contents in this directory, return all the duplicate files in the file system in terms of their paths. You may return the answer in any order. A group of duplicate files consists of at least two files that have the same content. A single directory info string in the input list has the following format: "root/d1/d2/.../dm f1.txt(f1_content) f2.txt(f2_content) ... fn.txt(fn_content)" It means there are n files (f1.txt, f2.txt ... fn.txt) with content (f1_content, f2_content ... fn_content) respectively in the directory "root/d1/d2/.../dm". Note that n >= 1 and m >= 0. If m = 0, it means the directory is just the root directory. The output is a list of groups of duplicate file paths. For each group, it contains all the file paths of the files that have the same content. A file path is a string that
Start with the most direct exhaustive search. That gives a correctness anchor before optimizing.
Pattern signal: Array · Hash Map
["root/a 1.txt(abcd) 2.txt(efgh)","root/c 3.txt(abcd)","root/c/d 4.txt(efgh)","root 4.txt(efgh)"]
["root/a 1.txt(abcd) 2.txt(efgh)","root/c 3.txt(abcd)","root/c/d 4.txt(efgh)"]
delete-duplicate-folders-in-system)Source-backed implementations are provided below for direct study and interview prep.
// Accepted solution for LeetCode #609: Find Duplicate File in System
class Solution {
public List<List<String>> findDuplicate(String[] paths) {
Map<String, List<String>> d = new HashMap<>();
for (String p : paths) {
String[] ps = p.split(" ");
for (int i = 1; i < ps.length; ++i) {
int j = ps[i].indexOf('(');
String content = ps[i].substring(j + 1, ps[i].length() - 1);
String name = ps[0] + '/' + ps[i].substring(0, j);
d.computeIfAbsent(content, k -> new ArrayList<>()).add(name);
}
}
List<List<String>> ans = new ArrayList<>();
for (var e : d.values()) {
if (e.size() > 1) {
ans.add(e);
}
}
return ans;
}
}
// Accepted solution for LeetCode #609: Find Duplicate File in System
func findDuplicate(paths []string) [][]string {
d := map[string][]string{}
for _, p := range paths {
ps := strings.Split(p, " ")
for i := 1; i < len(ps); i++ {
j := strings.IndexByte(ps[i], '(')
content := ps[i][j+1 : len(ps[i])-1]
name := ps[0] + "/" + ps[i][:j]
d[content] = append(d[content], name)
}
}
ans := [][]string{}
for _, e := range d {
if len(e) > 1 {
ans = append(ans, e)
}
}
return ans
}
# Accepted solution for LeetCode #609: Find Duplicate File in System
class Solution:
def findDuplicate(self, paths: List[str]) -> List[List[str]]:
d = defaultdict(list)
for p in paths:
ps = p.split()
for f in ps[1:]:
i = f.find('(')
name, content = f[:i], f[i + 1 : -1]
d[content].append(ps[0] + '/' + name)
return [v for v in d.values() if len(v) > 1]
// Accepted solution for LeetCode #609: Find Duplicate File in System
struct Solution;
use std::collections::HashMap;
impl Solution {
fn find_duplicate(paths: Vec<String>) -> Vec<Vec<String>> {
let mut hm: HashMap<String, Vec<String>> = HashMap::new();
for s in paths {
let mut s_iter = s.split_whitespace();
let dir: &str = s_iter.next().unwrap();
for f in s_iter {
let mut f_iter = f.chars();
let name: String = f_iter.by_ref().take_while(|&c| c != '(').collect();
let content: String = f_iter.by_ref().take_while(|&c| c != ')').collect();
hm.entry(content)
.or_default()
.push(format!("{}/{}", dir, name))
}
}
hm.into_iter()
.filter_map(|(_, v)| if v.len() > 1 { Some(v) } else { None })
.collect()
}
}
#[test]
fn test() {
let paths: Vec<String> = vec_string![
"root/a 1.txt(abcd) 2.txt(efgh)",
"root/c 3.txt(abcd)",
"root/c/d 4.txt(efgh)",
"root 4.txt(efgh)"
];
let mut res: Vec<Vec<String>> = vec_vec_string![
["root/a/2.txt", "root/c/d/4.txt", "root/4.txt"],
["root/a/1.txt", "root/c/3.txt"]
];
let mut ans = Solution::find_duplicate(paths);
res.sort();
ans.sort();
assert_eq!(ans, res);
}
// Accepted solution for LeetCode #609: Find Duplicate File in System
function findDuplicate(paths: string[]): string[][] {
const d = new Map<string, string[]>();
for (const p of paths) {
const [root, ...fs] = p.split(' ');
for (const f of fs) {
const [name, content] = f.split(/\(|\)/g).filter(Boolean);
const t = d.get(content) ?? [];
t.push(root + '/' + name);
d.set(content, t);
}
}
return [...d.values()].filter(e => e.length > 1);
}
Use this to step through a reusable interview workflow for this problem.
Two nested loops check every pair or subarray. The outer loop fixes a starting point, the inner loop extends or searches. For n elements this gives up to n²/2 operations. No extra space, but the quadratic time is prohibitive for large inputs.
Most array problems have an O(n²) brute force (nested loops) and an O(n) optimal (single pass with clever state tracking). The key is identifying what information to maintain as you scan: a running max, a prefix sum, a hash map of seen values, or two pointers.
Review these before coding to avoid predictable interview regressions.
Wrong move: Loop endpoints miss first/last candidate.
Usually fails on: Fails on minimal arrays and exact-boundary answers.
Fix: Re-derive loops from inclusive/exclusive ranges before coding.
Wrong move: Zero-count keys stay in map and break distinct/count constraints.
Usually fails on: Window/map size checks are consistently off by one.
Fix: Delete keys when count reaches zero.