On finding duplication and near-duplication in large software systems

Baker, Brenda S.

doi:10.1109/wcre.1995.514697

articleNov 19, 2002Closed access

On finding duplication and near-duplication in large software systems

BSBrenda S. Baker

Nokia (United States) · AT&T (United States)

Indexed incrossref

Abstract

This paper describes how a program called dup can be used to locate instances of duplication or near-duplication in a software system. Dup reports both textually identical sections of code and sections that are the same textually except for systematic substitution of one set of variable names and constants for another. Further processing locates longer sections of code that are the same except for other small modifications. Experimental results from running dup on millions of lines from two large software systems show dup to be both effective at locating duplication and fast. Applications could include identifying sections of code that should be replaced by procedures, elimination of duplication during…

Citation impact

723

total citations

FWCI: 65.97
Percentile: 100%
References: 21

Citations per year

Authors

1

BS
Brenda S. BakerCorresponding
Nokia (United States), AT&T (United States)

Topics & keywords

Topics

Keywords

dup
Gene duplication
Computer science
Debugging
Code (set theory)
Software
Data deduplication
Software system

No related works found for this paper.