The ICSI Meeting Corpus
International Computer Science Institute · University of California, Berkeley · +3 more institutions
Abstract
We have collected a corpus of data from natural meetings that occurred at the International Computer Science Institute (ICSI) in Berkeley, California over the last three years. The corpus contains audio recorded simultaneously from head-worn and table-top microphones, word-level transcripts of meetings, and various metadata on participants, meetings, and hardware. Such a corpus supports work in automatic speech recognition, noise robustness, dialog modeling, prosody, rich transcription, information retrieval, and more. We present details on the contents of the corpus, as well as rationales for the decisions that led to its configuration. The corpus were delivered to the Linguistic Data Consortium (LDC).
Citation impact
- FWCI
- 41.27
- Percentile
- 100%
- References
- 8
Authors
11- AJAdam JaninCorresponding
International Computer Science Institute, University of California, Berkeley
- DBDon Baron
International Computer Science Institute, University of California, Berkeley
- JAJane A. Edwards
University of California, Berkeley, International Computer Science Institute
- DPDaniel P. W. Ellis
Columbia University, International Computer Science Institute
- DGDavid Gelbart
International Computer Science Institute, University of California, Berkeley
Topics & keywords
- Computer science
- Metadata
- Natural language processing
- Prosody
- Transcription (linguistics)
- Speech corpus
- Artificial intelligence
- Speech recognition