ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification

Desplanques, Brecht; Thienpondt, Jenthe; Demuynck, Kris

doi:10.21437/interspeech.2020-2650

articleOct 25, 2020GREEN OA

ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification

BDBrecht Desplanques JTJenthe Thienpondt KDKris Demuynck

Ghent University

Indexed inarxivcrossref

Abstract

Current speaker verification techniques rely on a neural network to extract speaker representations. The successful x-vector architecture is a Time Delay Neural Network (TDNN) that applies statistics pooling to project variable-length utterances into fixed-length speaker characterizing embeddings. In this paper, we propose multiple enhancements to this architecture based on recent trends in the related fields of face verification and computer vision. Firstly, the initial frame layers can be restructured into 1-dimensional Res2Net modules with impactful skip connections. Similarly to SE-ResNet, we introduce Squeeze-and-Excitation blocks in these modules to explicitly model channel interdependencies. The SE…

Citation impact

1,402

total citations

FWCI: 76.84
Percentile: 100%
References: 36

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Computer science
Pooling
Artificial neural network
Speech recognition
Time delay neural network
Channel (broadcasting)
Pattern recognition (psychology)
Frame (networking)

UN Sustainable Development Goals

Quality Education

No related works found for this paper.