Xiangci Li
I am an Applied Scientist at Amazon. My research interest is natural language processing, specifically knowledge-intensive natural language processing, including scholarly document processing, dialogue generation and fact-verification.

李向磁
Natural Language Processing Researcher
- Company: Amazon Web Services
- Location: Bay Area, CA, USA
- Role: Applied Scientist II
- Email: lixiangci8 AT gmail DOT com
I am an Applied Scientist II at Amazon Web Services. I had 5 internship experiences at Google, Amazon, Tencent, Baidu and Chan Zuckerburg Initiative. My Ph.D. was at UT Dallas, advised by Dr. Jessica Ouyang. Previously I was a master student at University of Southern California (USC) and research assistant in Prof. Nanyun Peng's PLUS Lab at Information Sciences Insitute (now moved to UCLA). Before that, I was a full-time computational neuroscience research assistant at Erlich Lab at New York University Shanghai, where I graduated with Bachelor's degree in computer science and neuroscience.
Publications
Related Work and Citation Text Generation: A Survey
Xiangci Li & Jessica Ouyang
- Insititution: University of Texas at Dallas
- Conference: The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024)
- Publication Date: 2024/11
- TLDR: We survey all existing related work generation works, including sentence-level citation text generation.
- Resources: Video
A Knowledge Plug-and-Play Test Bed for Open-domain Dialogue Generation
Xiangci Li, Linfeng Song, Haitao Mi, Lifeng Jin, Jessica Ouyang & Dong Yu
- Conference: The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
- Publication Date: 2024/5
- Insititution: Tencent America
- TLDR: We build a multi-source Wizard of Wikipedia (Ms.WoW) dataset as a test bed for multi-source open-domain dialogue generation and propose a challenge called dialogue knowledge plug-and-play to test a trained model's adaptability to newly available knowledge sources.
- Resources: Video, Repository
Contextualizing Generated Citation Texts
Biswadip Mandal, Xiangci Li & Jessica Ouyang
- Conference: The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
- Publication Date: 2024/5
- Insititution: University of Texas at Dallas
- TLDR: We show the benefit of generating citation contexts alone with target citation texts.
Minimal Evidence Group Identification for Claim Verification
Xiangci Li, Sihao Chen, Rajvi Kapadia, Jessica Ouyang & Fan Zhang
- Insititution: Google
- Year: 2024
- TLDR: We propose a novel task called minimal evidence group identification to address claim-verification with multiple plausible sets of fully or partially supporting evidence.
Wizard of Shopping: Target-Oriented E-commerce Dialogue Generation with Decision Tree Branching
Xiangci Li, Zhiyu Chen, Jason Ingyu Choi, Nikhita Vedura, Besnik Fetahu, Oleg Rokhlenko & Shervin Malmasi
- Insititution: Amazon
- Year: 2023
- TLDR: We propose a shopping dialogue generation using decision tree and large language models, which greatly improves the downstream conversational product search performance.
Explaining Relationships Among Research Papers
Xiangci Li & Jessica Ouyang
- arXiv: 2402.13426
- Insititution: University of Texas at Dallas
- TLDR: We explore literature review generating with large language models.
Cited Text Spans for Scientific Citation Text Generation
Xiangci Li, Yi-Hui Lee & Jessica Ouyang
- Conference: The Fourth Workshop on Scholarly Document Processing (SDP 2024)
- Publication Date: 2024/8
- Insititution: University of Texas at Dallas
- TLDR: We show that distantly-retrieved cited text spans greatly improves citation text generation.
- Resources: Video
CORWA: A Citation Oriented Related Work Annotation Dataset
Xiangci Li, Biswadip Mandal, Jessica Ouyang
- Conference: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2022)
- Publication Date: 2022/7/10
- Insititution: University of Texas at Dallas
- TLDR: We collect a linguistically motivated citation span-based dataset for related work generation. We develop a strong baseline model to automatically annotate unlabeled related work sections. We propose a new task, citation span generation. Finally we provide a big picture of the future related work generation system.
- Resources: Video, Repository
Automatic Related Work Generation: A Meta Study
Xiangci Li, Jessica Ouyang
- arXiv: 2201.01880
- Upload Date: 2022/1
- Insititution: University of Texas at Dallas
- TLDR: We survey prior works of related work generation task, and selected prior works on other relevant tasks. We point out the limitations of the existing works, and suggest new directions to explore.
CASPR: A Commonsense Reasoning-based Conversational Socialbot
Kinjal Basu, Huaduo Wang, Nancy Dominguez, Xiangci Li, Fang Li, Sarat Chandra Varanasi, Gopal Gupta
- Venue: Alexa Prize Socialbot Grand Challenge 4 Proceedings
- Publication Date: 2021/7
- Insititution: University of Texas at Dallas
- TLDR: We report on the design and development of the CASPR system, a socialbot designed to compete in the Amazon Alexa Socialbot Challenge 4.
Scientific Discourse Tagging for Evidence Extraction
Xiangci Li, Gully Burns, Nanyun Peng
- Conference: The 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021)
- Publication Date: 2021/4/19
- Insititution: University of Southern California, Information Sciences Institute
- TLDR: We develop a state-of-the-art model for scientific discourse tagging and demonstrate its strong performance and transferability on a few datasets. We then demonstrate the benefit of leveraging the scientific discourse tags on downstream-tasks by providing claim-extraction task and evidence fragment detection task as two show cases.
- Resources: Video, Code, Poster
A Paragraph-level Multi-task Learning Model for Scientific Fact-Verification
Xiangci Li, Gully Burns, Nanyun Peng
- Conference: The AAAI-21 Workshop on Scientific Document Understanding
- Publication Date: 2021/2/9
- Insititution: University of Southern California, Information Sciences Institute
- TLDR: We propose a novel, paragraph-level, multi-task learning model for a scientific claim-verifications task (SciFact) by directly computing a sequence of contextualized sentence embeddings from a BERT model and jointly training the model on rationale selection and stance prediction.
- Resources: Video, Code
Context-aware Stand-alone Neural Spelling Correction
Xiangci Li, Hairong Liu, Liang Huang
- Conference: Findings of the Association for Computational Linguistics: EMNLP 2020
- Publication Date: 2020/11/16
- Insititution: Baidu USA
- TLDR: We present a simple yet powerful solution that jointly detects and corrects misspellings as a sequence labeling task by fine-turning a pre-trained language model.
- Resources: Code
Building deep learning models for evidence classification from the open access biomedical literature
Gully Burns, Xiangci Li, Nanyun Peng
- Journal: Database
- Publication Year: 2019
- Insititution: University of Southern California, Information Sciences Institute
- TLDR: We apply deep learning on experimental type classification of biomedical experimental descriptions.
- Resources: Code
Neural and computational mechanismsfor task switching
Xiangci Li*, Chunyu Duan*, Ce Ma, Carlos Brody, Zheng Zhang, Jeffrey Erlich
- Insititution: New York University Shanghai
- Conferences: Society for Neuroscience 2017, Computational and Systems Neuroscience (Cosyne) 2021 (Oral, 4.6% acceptance rate)
- TLDR: We trained rats and recurrent neural networks to perform a task-switching paradigm using similar procedures. Our results elucidate how ongoing activity, shaped by recent experience of animals and artifical systems, can influence new tasks at hand.
Resume
Education
University of Texas at Dallas
2020.8 - 2024.11
Doctor of Philosophy
Computer Science
- Advisor: Professor Jessica Ouyang
- Research direction: knowledge-intensive natural language generation
- GPA: 3.96
University of Southern California
2018.5 - 2019.12
Master of Science
Computer Science
- Courses: Natural Language Processing, Machine Learning, Computer Vision, Information Integration, Algorithms, Web Technology, Database Systems
- Advisors: Professor Nanyun Peng & Dr. Gully Burns
- Research direction: scientific information extraction
New York University Shanghai
2013.8 - 2017.5
Bachelor of Science
Computer Science & Neuroscience
- Honor: Cum Laude
- GPA: 3.72
Professional Experience
Applied Scientist II
2024.5 - Present
Amazon Web Services, Inc.
Research Intern & Student Researcher
2023.9 - 2024.1 - 2024.5
Google LLC
- Minimal Evidence Group Identification for Fact Verification
- Supervised by Dr. Fan Zhang, & Rajvi Kapadia.
- Collaboration with Sihao Chen.
Applied Scientist Intern
2023.5 - 2023.8
Amazon.com Services LLC
- Wizard of Shopping: Target-Oriented E-commerce Dialogue Generation with Decision Tree Branching
- Supervised by Dr. Zhiyu Chen, Jason Ingyu Choi, Dr. Nikhita Vedula, Dr. Besnik Fetahu, Dr. Oleg Rokhlenko & Dr. Shervin Malmasi
Research Intern, Natural Language & Speech Processing
2022.5 - 2022.8
Tencent America, Tencent AI Lab
- Multi-source knowledge-based open domain dialogue generation
- Supervised by Dr. Linfeng Song, Dr. Lifeng Jin & Dr. Haitao Mi
Graduate Research Assistant
2020.8 - 2023.5
University of Texas at Dallas
- Advised by Professor Jessica Ouyang
- Scholarly document processing: automatic related work section generation in scientific papers
Graduate Research Assistant
2020.12 - 2021.5
University of Texas at Dallas
- Advised by Professor Gopal Gupta
- Member of UT Dallas CASPR team for Alexa Prize Competition Grand Challenge 4
Teaching Assistant
2020.8 - 2020.12
University of Texas at Dallas
- Undergraduate-level Machine Learning
- Graduate-level Semantic Web
Research Scientist Intern
2020.1 - 2020.5
Baidu USA
- Supervised by Professor Liang Huang & Dr. Hairong Liu
- Full-time internship on stand-alone neural spelling correction
- Paper accepted by Findings of EMNLP 2020
Graduate Student Worker & Research Assistant
2018.5 - 2020.8
University of Southern California, Information Sciences Institute
- Co-mentored by Professor Nanyun Peng and Dr. Gully Burns
- Worked on a series of projects leading to publications under evidX, which uses natural language
processing techniques to extract scientific knowledge from biomedical literature, emphasizing on
evidence extraction
- Biomedical experimental type classification
- Scientific discourse tagging
- Evidence fragment delineation
- Automatic scientific claim-verification
- Worked on collecting a challenging commonsense reasoning dataset for natural language processing models
Visiting Researcher
2019.5 - 2019.8
Chan Zuckerburg Initiative
- Supervised by Dr. Gully Burns
- Worked under Meta team, which develops a system to recommend biomedical papers to researchers
- Developed testing pipeline for a content-based recommendation system using clustering techniques
Research Assistant
2016.5 - 2018.4
Erlich lab, NYU-ECNU Institute of Brain and Cognitive Science at NYU Shanghai
- Co-mentored by Professor Jeffrey Erlich and Professor Zheng Zhang
- Started with student research assistant and switched to full-time research staff after graduation
- Worked on Virtual Rat project, by building a recurrent neural network to model animal’s task switch cost phenomenon and analyzing the dynamics of the model
- Analyzed rat’s behavior data and found evidence to support the computational model
- Submitting a first-author paper for project: Neurophysiological and computation evidence for the task-set inertia theory of switch cost>
- Trained mice and rats with behavior task on a regular basis
- Contributed code for managing lab database using Python and MySQL
iGEM competition
2014.12 - 2015.9
NYU Shanghai iGEM team
- International Genetically Engineered Machine (iGEM) is a synthetic biology competition held by MIT.
- Developed project SYNTHESIZED (Bacteria Music Generator)
- Played a key role in the team including designing developing the main product and team outreaches
- Attended the iGEM Giant Jamboree in Boston, MA, and won a silver medal
Skills
Computer Science Theories
Machine Learning, Deep Learning, Natural language Processing, Large Language Models, Knowledge Graph, Computer Vision, Theory of Computation
NLP, ML & CV Frameworks
OpenAI, Tensorflow, Keras, PyTorch, PaddlePaddle, Scikit-learn, Minpy, NLTK & OpenCV
Programming
Python, Java, C, MATLAB, MySQL, C++, Web (HTML, CSS, JavaScript, Angular 2+, php, Node.js), iOS (Swift), Verilog
Languages
- Mandarin Chinese (native)
- Japanese (native, Japanese-Language Proficiency Test N1 certificate)
- English (full-English-education since college)