Learning to classify short and sparse text &amp; web with hidden topics from large-scale data collections

Phan, Xuan-Hieu; Nguyen, Le-Minh; Horiguchi, Susumu

doi:10.1145/1367497.1367510

articleApr 21, 2008Closed access

Learning to classify short and sparse text & web with hidden topics from large-scale data collections

XPXuan-Hieu Phan LNLe-Minh Nguyen SHSusumu Horiguchi

Tohoku University · Japan Advanced Institute of Science and Technology

Indexed incrossref

Abstract

This paper presents a general framework for building classifiers that deal with short and sparse text & segments by making the most of hidden topics discovered from large-scale data collections. The main motivation of this work is that many classification tasks working with short segments of text & Web, such as search snippets, forum & chat messages, blog & news feeds, product reviews, and book & movie summaries, fail to achieve high accuracy due to the data sparseness. We, therefore, come up with an idea of gaining external knowledge to make the data more related as well as expand the coverage of classifiers to handle future data better. The underlying idea of the framework is that for each classification…

Citation impact

745

total citations

FWCI: 41.71
Percentile: 100%
References: 39

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Computer science
Scale (ratio)
Information retrieval
Artificial intelligence
Geography

UN Sustainable Development Goals

Quality Education

No related works found for this paper.

Learning to classify short and sparse text &amp; web with hidden topics from large-scale data collections

Learning to classify short and sparse text & web with hidden topics from large-scale data collections