articleIEEE Transactions on Audio Speech and Language ProcessingJan 1, 2026Closed access

DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model With Self-Generated Cross-Modal Alignment

KCK C Lu ZCZhehuai Chen SFSzu‐Wei FuCHChao-Han Huck YangSHSung-Feng Huang

National Taiwan University · National Taipei University · +2 more institutions

Indexed incrossref

Abstract

No abstract available for this paper.

Citation impact

4

total citations

FWCI: 122.92
Percentile: 100%
References: 72

Too recent for citation history.

Authors

28

KC
K C LuCorresponding
National Taiwan University, National Taipei University
ZC
Zhehuai Chen
Nvidia (United States)
SF
Szu‐Wei Fu
Nvidia (United States)
CH
Chao-Han Huck Yang
National Taiwan University, Nvidia (United States), National Taipei University
SH
Sung-Feng Huang
Nvidia (United States)

Topics & keywords

Topics

Keywords

Language model
Audio signal processing
Context model
Speech processing
Audio analyzer
Natural language

No related works found for this paper.