About Me

Yidi Li is currently an Associate Professor in the College of Computer Science and Technology at Taiyuan University of Technology (太原理工大学, 计算机科学与技术学院), and the director of the Multimodal Intelligent Human-Robot Interaction Laboratory (多模态智能人机交互实验室, MIHRI Lab). Prior to that, she received the PhD degree in Computer Science and Technology from Peking University under the supervision of Prof. Hong Liu in 2023. She has published more than 30 papers in top international conferences and SCI journals in the field of artificial intelligence. She serves as a reviewer for many important journals and conferences in the field of artificial intelligence, such as CAAI TRIT, PR, AAAI, and ICASSP, etc. She serves as an expert judge for various national-level college student competitions.

Yidi Li’s research work centers on audio-visual fusion, and is dedicated to improving the model’s adaptability and robustness in complex dynamic scenarios by investigating multimodal heterogeneous data fusion problems, and multimodal multilevel information interaction problems. Her research interests include audio-visual learning, sound source localization, speech recognition, and speaker tracking.

📣📣 Call for members📣📣

多模态智能人机交互实验室(MIHRI Lab)现正招收2026、2027年入学的研究生，大一/大二优秀本科生！

我们寻求对人工智能、机器人技术、计算机视觉等领域充满热情的优秀保研/考研学生。欢迎编程能力较好、有深度学习实践经验、程序设计竞赛或者科研经历，有志于攻读硕士/博士研究生和出国深造的同学与我联系（发送简历至liyidi@tyut.edu.cn），也欢迎大一/大二的优秀本科生进组学习。

MIHRI Lab将为成员提供：

前沿研究：参与丰富的前沿研究课题。
学术交流：参加国内/国际会议，扩展学术视野。
国际合作：海外名校合作专家联合指导。
学习访问：优秀学生可推荐至国内外知名院校学习访学。

🥳 News

2026.1 📄Three ICASSP papers accepted!
2025.12📄One CAAI TIT paper accepted! (🚩SCI Q1 TOP)
2025.9 🏆MIHRI Lab won the Best Student Paper Award at the 2025 ACAIT!
2025.5 📄Two ICIC papers accepted!
2025.4 📄Two ESWA papers accepted! (🚩SCI Q1 TOP)
2024.12📄One ICASSP paper accepted!
2024.12📄Two CAAI TIT papers accepted! (🚩SCI Q1 TOP)
2024.9 📄One TMM paper accepted! (🚩SCI Q1 TOP)
2024.7 🏆Dr. Li awarded the 2024 ACM Rising Star Award (Taiyuan)!
2024.4 📄One TMM paper accepted! (🚩SCI Q1 TOP)

📜 Research Area

Multi-modal Learning:

Speaker tracking, Sound source localization, Speech recognition, Audio-visual event localization, Emotion recognition

Computer Vision:

Industrial vision, Action recognition, Object tracking

💻 Research Experiences

2013.09 - 2017.07: B.Sc. in Statistics, Taiyuan University of Technology, China
2017.09 - 2023.07: Ph.D. in Computer Science, Peking University, China
2023.7 - Present: Associate Professor, Taiyuan University of Technology, China
2025.10 - Present: Postdoctoral Researcher, Osaka University, Japan

📝 Publications

[22] Yihan Li, Yidi Li, etc. AVCLNet: Multimodal Multi-Speaker Tracking Network Using Audio-Visual Contrastive Learning, CAAI Transactions on Intelligence Technology (CAAI TRIT), 2025. (SCI Q1-top)
[21] Yidi Li, Jiahao Wen, Rui Gong, etc. PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection. Expert Systems With Applications, 2025. (SCI Q1-top)
[20] Yidi Li, Hong Liu, Bing Yang. STNet: Deep Audio-Visual Fusion Network for Robust Speaker Tracking. IEEE Transactions on Multimedia (TMM), 2024. (SCI Q1-top)
[19] Yidi Li, Wenkai Zhao, Zeyu Wang, Zhenhuan Xu, Bin Ren, Nicu Sebe. Multi-Stage Multimodal Distillation for Audio-Visual Speaker Tracking, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024.
[18] Jie Xiang, Ang Zhao, Xia Li, Xubin Wu, Yanqing Dong, Yan Niu, Yidi Li, Xin Wen, Enhancing Brain MRI Super-Resolution through Multi-Slice Aware Matching and Fusion, CAAI Transactions on Intelligence Technology, 2024.
[17] Zihao Mi, Xueyu Liu, Jianan Zhang, Guangze Shi, Yidi Li, and Yongfei Wu. Multi-instance Curriculum Learning for Histopathology Image Classifications with Hard Negative Mining and Positive Augmentation. BIBM, 2024. (CCF B)
[16] Linhui Sun, Yidi Li, Xujiao Zhao, Kaiyi Wang, Hao Guo*. Event-RGB Fusion for insulator Defect Detection Based on lmproved YOLOv8. ACAIT, 2024.
[15] Yidi Li, Kairan Zhang, Chenxu Yang, Sizhou Liu, Hao Guo, Hongfei Zhang. A Synthesis Library Subset Screening Method for High Energy Efficiency Requirements. CWSN, 2024.
[14] Tao Wang, Mengyuan Liu, Hong Liu, Wenhao Li, Miaoju Ban, Tianyu Guo and Yidi Li. Feature Completion Transformer for Occluded Person Re-identification. IEEE Transactions on Multimedia (TMM), 2024. (SCI Q1-top)
[13] Zhenhuan Xu, Yongfei Wu, Liming Zhang, Yidi Li, Adaptive Fourier Decomposition Based Signal Extraction on Weak Electromagnetic Field, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024. (CCF B)
[12] Ruijia Fan, Hong Liu, Yidi Li, ATTA-NET: Attention Aggregation Network for Audio-Visual Emotion Recognition, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024. (CCF B)
[11] Yidi Li, Guoquan Wang, Zhan Chen, Hao Tang, and Hong Liu. On-Device Audio-Visual Multi-Person Wake Word Spotting, CAAI Transactions on Intelligence Technology (CAAI TRIT), 2023. (JCR Q1)
[10] Yidi Li, Jiale Ren, Yawei Wang, Xia Li, and Hong Liu. Audio-Visual Keyword Transformer for Unconstrained Sentence-Level Keyword Spotting, CAAI Transactions on Intelligence Technology (CAAI TRIT), 2023. (JCR Q1)
[9] Guoquan Wang, Hong Liu, Tianyu Guo, Jingwen Guo, Ti Wang, Yidi Li. Self-supervised 3D Skeleton Representation Learning with Active Sampling and Adaptive Relabeling for Action Recognition. Proceedings of IEEE International Conference on Image Processing (ICIP), 2023.
[8] Xingyue Shi, Hong Liu, Wei Shi, Zihui Zhou, Yidi Li. Boosting Person Re-Identification with Viewpoint Contrastive Learning and Adversarial Training. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023. (CCF B)
[7] Wanruo Zhang, Hong Liu, Jianbing Wu, Yidi Li. MVSSC: Meta Reinforcement Learning Based Visual lndoor Navication Using Multi-view Semantic Spatial Context, Pattern Recognition Letters (PRL). 2023. (CAAI B)
[6] Jian Zhang, Ge Yang, Runwei Ding, Yidi Li. Cascade RDN: Towards Accurate Localization in Industrial Visual Anomaly Detection with Structural Anomaly Generation, IEEE Robotics and Automation Letters (RAL), 2023. (CAAI B)
[5] Yidi Li, Hong Liu, Hao Tang. Multi-Modal Perception Attention Network with Self-Supervised Learning for Audio-Visual Speaker Tracking, Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) (Oral). 2022. (CCF A)
[4] Peini Guo, Zhengyan Chen, Yidi Li, and Hong Liu. Audio-Visual Fusion Network Based on Conformer for Multimodal Emotion Recognition. Artificial Intelligence (CICAI), 2022. (CAAI A)
[3] Hong Liu, Yongheng Sun, Yidi Li, Bing Yang. 3D Audio-Visual Speaker Tracking with a Novel Particle Filter. Proceedings of International Conference on Pattern Recognition (ICPR), 2021.
[2] Yidi Li, Hong Liu, Bing Yang, Runwei Ding, Yang Chen. Deep Metric Learning-Assisted 3D Audio-Visual Speaker Tracking via Two-Layer Particle Filter. Complexity, 2020.
[1] Hong Liu, Yidi Li, Bing Yang. 3D Audio-Visual Speaker Tracking with a Two-Layer Particle Filter. Proceedings of IEEE International Conference on Image Processing (ICIP), 2019.

🌟 Projects

[1] Young Scientists Fund of the National Natural Science Foundation of China, 2024.
[2] Scientific and Technologial Innovation Programs of Higher Education Institutions in Shanxi, 2024.
[3] Shanxi Provincial Department of Science and Technology Basic Research Project, 2024.
[4] Open Research Project of Guangdong Provincial Key Laboratory, 2025.

🏅 Certifications and Awards

2024 ACM Rising Star Award. (Taiyuan chapter)
全国三维数字化创新设计大赛（A类学科竞赛）, 2024, 2025. 国一. (Supervisor)
全国大学生信息安全与对抗技术竞赛（A类学科竞赛）, 2024, 国一. (Supervisor)
中国机器人及人工智能大赛（A类学科竞赛）, 2025, 国一. (Supervisor)

Yidi Li (李一迪)