Depth fuels expertise, breadth sparks innovation.

AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
Yuan Tseng,
Layne Berry*,
Yi-Ting Chen*,
I-Hsiang Chiu*,
Hsuan-Hao Lin*,
Max Liu*,
Puyuan Peng*,
Yi-Jen Shih*,
Hung-Yu Wang*,
Haibin Wu*,
Po-Yao Huang,
Shang-Wen Li,
David Harwath,
Yu Tsao,
Shinji Watanabe,
Abdelrahman Mohamed,
Chi-Luen Feng,
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024
We propose the AV-SUPERB benchmark that enables general-purpose evaluation of unimodal audio/visual and bimodal fusion representations on 7 datasets covering 5 audio-visual tasks in speech and audio processing.
[ arXiv ]
[ Code ]
Diffusion Model-Augmented Behavioral Cloning
Frontiers4LCD Workshop at International Conference on Machine Learning (ICML), 2023
We propose a novel imitation learning method combining with diffusion model. We show that our method can achieve better performance than previous imitation learning methods.
[ arXiv ]
Controllable User Dialogue Act Augmentation for Dialogue State Tracking
23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), 2022
We propose a data augmentation method for DST, which improve the state-of-the-art performance on MultiWOZ 2.1.
[ arXiv ]
[ Code ]