datasetTIB Data ManagerNov 21, 2012GREEN OA

New York Times Annotated Corpus

Clinical Research Consortium

Indexed indatacite

Abstract

The New York Times Corpus contains over 1.8 million articles written and published by the New York Times between January 1, 1987 and June 19, 2007 with article metadata provided by the New York Times Newsroom, the New York Times Indexing Service and the online production staff at nytimes.com. The corpus includes: over 1.8 million articles (excluding wire services articles that appeared during the covered period); over 650,000 article summaries written; over 1,500,000 articles manually tagged by library scientists with tags drawn from a normalized indexing vocabulary of people, organizations, locations and topic descriptors; over 275,000 algorithmically-tagged articles that have been hand verified by the online…

Citation impact

625
total citations
FWCI
Percentile
References
0
Citations per year

Authors

1

Topics & keywords

Keywords
  • Computer science
  • Automatic summarization
  • Search engine indexing
  • Vocabulary
  • Parsing
  • Library science
  • Information retrieval
  • World Wide Web
No related works found for this paper.