Winnowing
University of Illinois Chicago · University of California, Berkeley · +1 more institution
Abstract
Digital content is for copying: quotation, revision, plagiarism, and file sharing all create copies. Document fingerprinting is concerned with accurately identifying copying, including small partial copies, within large sets of documents.We introduce the class of local document fingerprinting algorithms, which seems to capture an essential property of any finger-printing technique guaranteed to detect copies. We prove a novel lower bound on the performance of any local algorithm. We also develop winnowing, an efficient local fingerprinting algorithm, and show that winnowing's performance is within 33% of the lower bound. Finally, we also give experimental results on Web data, and report experience with MOSS, a…
Citation impact
- FWCI
- 25.30
- Percentile
- 100%
- References
- 13
Authors
3Topics & keywords
- Winnowing
- Copying
- Computer science
- Theoretical computer science
- Algorithm
- Engineering