articleMay 20, 2014Closed access

The promises and perils of mining GitHub

University of Victoria · Delft University of Technology

Indexed incrossref

Abstract

With over 10 million git repositories, GitHub is becoming one of the most important source of software artifacts on the Internet. Researchers are starting to mine the information stored in GitHub's event logs, trying to understand how its users employ the site to collaborate on software. However, so far there have been no studies describing the quality and properties of the data available from GitHub. We document the results of an empirical study aimed at understanding the characteristics of the repositories in GitHub and how users take advantage of GitHub's main features---namely commits, pull requests, and issues. Our results indicate that, while GitHub is a rich source of data on software development,…

Citation impact

748
total citations
FWCI
127.11
Percentile
100%
References
39
Citations per year

Authors

6

Topics & keywords

Keywords
  • Computer science
  • World Wide Web
  • Software
  • The Internet
  • Data science
  • Set (abstract data type)
  • Event (particle physics)
UN Sustainable Development Goals
  • Industry, innovation and infrastructure
No related works found for this paper.