Li, Sujian

Associate Professor

Research Interests: Natural language processing，computational linguistics

Office Phone: 86-10-6276 5835-105

Email: lisujian@pku.edu.cn

Li, Sujian is an associate professor in the Department of Computer Science and technology, School of EECS. She obtained her Ph.D. from Institute of Computing Technology, Chinese Academy of Sciences in 2002. Her research interests include natural language processing, automatic summarization, discourse parsing and information extraction.

Dr. Li has published more than 100 research papers, and many of them are published in top-tier conferences, such as ACL, KDD, AAAI, SIGIR, and EMNLP. She has served in the Program Committee of various international conferences including ACL, AAAI, IJCAI, EMNLP, COLING and WWW, and as Area chair of ACL 2017, Area chair of CCL 2015/2017 and Area chair of NLPCC 2016.

Dr. Li has 10 research projects including NSFC projects, 973 program, 863 program, etc. Her research achievements are summarized as follows:

1) Automatic summarization: The key problem in automatic summarization is sentence representation and sentence ranking. She proposed some novel graph-based methods to incorporate more information to select important sentences, designed a ranking framework upon recursive neural networks to measure the salience of a sentence and its constitent phases, developed the concept of summary prior and convolutional neural networks to alleviate the human labor of feature engineering. These methods proposed can efficiently extract important sentences to form a summary.

2) Discourse parsing: Constructing text-level discourse structure and classifying implicit discourse relations are challenging for building a practical discourse parser. She proposed the discourse dependency structure and construct the corresponding corpus to faciliate the research of discourse parsing. To overcome the human labor to label discourse relations, she proposed a semi-supervised method to automatically acquire typical implicit discourse relations. She also designed the neural networks to simulate repeated reading to better understand the text semantics for identifying discourse relations. These methods can improve the text understanding and benefit summarization and reading comprehension, etc.

3) Keyphrase and term extraction: Phrases in a document are not independent in delivering the content of the document. In order to capture and make better use of their relationships in keyphrase extraction, She explored the semantic graph and semi-supervised method to formulate both n-ary and binary relationships among phrases. She also designed a novel neural topic model to simultaneously extract terms and keyphrases, and proposed to model the augmented dependency path structure for entity relation classification. The proposed solutions for term and keyphrase extraction can benefit text understanding, information retrieval and information extraction.