Wu, Xihong
Professor
Research Interests: Auditory perception, computational auditory models, speech and language processing
Office Phone: 86-10-6275 9989
Email: wxh@cis.pku.edu.cn
Wu, Xihong is a professor in the Department of Machine Intelligence, School of EECS. He obtained his B.Sc. from Jilin University in 1989, and Ph.D. from the Peking University in 1995 respectively. He has research interest in several areas in connection with auditory mechanisms for speech perception, language processing mechanisms, speech signal processing, artificial intelligence, robotics and so on.
Prof. Wu has published nearly 200 papers, many of which were published in authoritative international journals and important international conferences. He was awarded as the "New Century Talent" by the Ministry of Education in 2005 and won the “Science and Technology Progress Award” from China Shipbuilding Industry Corporation in 2007. He was also awarded the Okawa Research Grant in 2009; the First Prize at the Robot Skills International Competition of International Joint Conference on Artificial Intelligence (IJCAI-13) in 2013; the Best Paper Award of ISCSLP International Conference in 2014; the Best Paper Finalist Award in the IEEE ICIA International Conference in 2016. He won the first prize for more than ten times in the international academic evaluation held by the National Institute of Standards and Technology (NTSA) and other authoritative international organizations. He is the senior member of IEEE and China Electronics Society, the deputy director of Chinese Academy of Audiological Rehabilitation (CAAR), the deputy director of Chinese Rehabilitation Engineering and Auxiliary Technical Committee, the member of the National Standardization Technical Committee of Acoustics, the member of Human Biometrics Application Subcommittee of National Standardization Technical Committee of Safety and Alarm System, the review expert of the medical instruments of National Drug Administration. He is also the editor of international journals like “Neural Networks” and “Neurocomputing”, the executive editor of the domestic journal named “Chinese Journal of Listening and Rehabilitation Science” and the editor of the “Journal of Electronics” and “Journal of Automation”. He was invited to be the chairman of the international and domestic academic conferences for many times.
His research achievements are summarized as follows:
1) Binaural processing in complex scene
Listeners can achieve significant benefit from binaural hearing when sound sources are spatially separated, which is termed as spatial release from masking (SRM). It has been documented that the benefit is caused by the physical head shadow effect for high frequency components, and the perceptual localization effect for low frequency components. SRM is also found in reverberation environment, but the physical and perceptual cues above can’t be taken into account directly. We conducted a serial of works to explore the SRM with a paradigm of simulating the reverberation environment by the precedence effect. Additionally, a new head-related transfer function (HRTF) database was built up to facilitate our studies, which has been widely cited in the world.
2) Speech perception in noisy environment
Speech intelligibility can be degraded when competing sounds exist simultaneously. The extent to which the speech intelligibility is influenced by the interfering sounds depends on several factors, such as the acoustic characteristics of speech signals, the environment of listening, and the prior knowledge of listeners. To reveal the effect of these factors and the underlying mechanisms, works were conducted from human behavior performance and to neuro-physiology responses. The main findings include 1) the pitch contour of speech play an important role for the intelligibility of speech in background noise; 2) the capability of storing temporal fine structure determines the effect of masking release; 3) a priming cue consisting speaker characteristic can improve speech intelligibility significantly.
3) Applications in machine intelligence
DNN has been introduced into automatic speech recognition (ASR) and significant progress has been made in recent years. By the application of DNN, several works have been shown to be beneficial for improving the performance of ASR, including 1) a GMM-free method for training DNN acoustic model was proposed, and the performance was comparable with GMM-based one; 2) the deep recurrent neural network based acoustic model was developed, and achieved state-of-the-art performance; 3) The multi-scale convolutional neural network based acoustic model was developed for analyzing speech on multiple scales, and showed improvement on ASR.