DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones

University of California, Davis · École Normale Supérieure Paris-Saclay

Indexed incrossref

Abstract

Detecting code clones has many software engineering applications. Existing approaches either do not scale to large code bases or are not robust against minor code modifications. In this paper, we present an efficient algorithm for identifying similar subtrees and apply it to tree representations of source code. Our algorithm is based on a novel characterization of subtrees with numerical vectors in the Euclidean space R n middot and an efficient algorithm to cluster these vectors w.r.t. the Euclidean distance metric. Subtrees with vectors in one cluster are considered similar. We have implemented our tree similarity algorithm as a clone detection tool called DECKARD and evaluated it on large code bases written…

Citation impact

1,022
total citations
FWCI
62.98
Percentile
100%
References
31
Citations per year

Authors

4

Topics & keywords

Keywords
  • Computer science
  • Scalability
  • Code (set theory)
  • Tree (set theory)
  • Java
  • Source code
  • Theoretical computer science
  • Programming language
No related works found for this paper.

Funding