Xiangci Li

I'm

Xiangci Li

I am a final year Ph.D. student at UT Dallas. I am actively looking for full-time positions as a natural language processing scientist or engineer. My research interest is natural language processing, specifically knowledge-intensive natural language processing, including scholarly document processing, dialogue generation and fact-verification. My advisor is Dr. Jessica Ouyang.

李向磁

NLP Researcher, Ph.D. Student

  • University: UT Dallas
  • Major: Computer Science
  • Location: Dallas, TX, USA
  • Perspective Degree: Ph.D.
  • Role: Student Researcher at Google
  • Email: lixiangci8 AT gmail DOT com

I have 5 internship experiences at Google, Amazon, Tencent, Baidu and Chan Zuckerburg Initiative. Previously I was a master student at University of Southern California (USC) and research assistant in Prof. Nanyun Peng's PLUS Lab at Information Sciences Insitute (now moved to UCLA). Before that, I was a full-time computational neuroscience research assistant at Erlich Lab at New York University Shanghai, where I graduated with Bachelor's degree in computer science and neuroscience.

Publications

A Knowledge Plug-and-Play Test Bed for Open-domain Dialogue Generation

Xiangci Li, Linfeng Song, Haitao Mi, Lifeng Jin, Jessica Ouyang & Dong Yu

  • Conference: The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
  • Publication Date: 2024/5
  • Insititution: Tencent America
  • TLDR: We build a multi-source Wizard of Wikipedia (Ms.WoW) dataset as a test bed for multi-source open-domain dialogue generation and propose a challenge called dialogue knowledge plug-and-play to test a trained model's adaptability to newly available knowledge sources.

Contextualizing Generated Citation Texts

Biswadip Mandal, Xiangci Li & Jessica Ouyang

  • Conference: The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
  • Publication Date: 2024/5
  • Insititution: University of Texas at Dallas
  • TLDR: We show the benefit of generating citation contexts alone with target citation texts.

Minimal Evidence Group Identification for Fact Verification

Xiangci Li, Sihao Chen, Rajvi Kapadia & Fan Zhang

  • Insititution: Google
  • Year: 2024
  • TLDR: We propose a novel task called minimal evidence group identification to address fact-verification with multiple plausible sets of fully or partially supporting evidence.

Wizard of Shopping: Target-Oriented E-commerce Dialogue Generation with Decision Tree Branching

Xiangci Li, Zhiyu Chen, Jason Ingyu Choi, Nikhita Vedura, Besnik Fetahu, Oleg Rokhlenko & Shervin Malmasi

  • Insititution: Amazon
  • Year: 2023
  • TLDR: We propose a shopping dialogue generation using decision tree and large language models, which greatly improves the downstream conversational product search performance.

Explaining Relationships Among Research Papers

Xiangci Li & Jessica Ouyang

  • arXiv: 2402.13426
  • Insititution: University of Texas at Dallas
  • TLDR: We explore literature review generating with large language models.

Cited Text Spans for Citation Text Generation

Xiangci Li, Yi-Hui Lee & Jessica Ouyang

  • arXiv: 2309.06365
  • Insititution: University of Texas at Dallas
  • TLDR: We show that distantly-retrieved cited text spans greatly improves citation text generation.

CORWA: A Citation Oriented Related Work Annotation Dataset

Xiangci Li, Biswadip Mandal, Jessica Ouyang

  • Conference: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2022)
  • Publication Date: 2022/7/10
  • Insititution: University of Texas at Dallas
  • TLDR: We collect a linguistically motivated citation span-based dataset for related work generation. We develop a strong baseline model to automatically annotate unlabeled related work sections. We propose a new task, citation span generation. Finally we provide a big picture of the future related work generation system.
  • Resources: Video, Repository

Automatic Related Work Generation: A Meta Study

Xiangci Li, Jessica Ouyang

  • arXiv: 2201.01880
  • Upload Date: 2022/1
  • Insititution: University of Texas at Dallas
  • TLDR: We survey prior works of related work generation task, and selected prior works on other relevant tasks. We point out the limitations of the existing works, and suggest new directions to explore.

CASPR: A Commonsense Reasoning-based Conversational Socialbot

Kinjal Basu, Huaduo Wang, Nancy Dominguez, Xiangci Li, Fang Li, Sarat Chandra Varanasi, Gopal Gupta

  • Venue: Alexa Prize Socialbot Grand Challenge 4 Proceedings
  • Publication Date: 2021/7
  • Insititution: University of Texas at Dallas
  • TLDR: We report on the design and development of the CASPR system, a socialbot designed to compete in the Amazon Alexa Socialbot Challenge 4.

Scientific Discourse Tagging for Evidence Extraction

Xiangci Li, Gully Burns, Nanyun Peng

  • Conference: The 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021)
  • Publication Date: 2021/4/19
  • Insititution: University of Southern California, Information Sciences Institute
  • TLDR: We develop a state-of-the-art model for scientific discourse tagging and demonstrate its strong performance and transferability on a few datasets. We then demonstrate the benefit of leveraging the scientific discourse tags on downstream-tasks by providing claim-extraction task and evidence fragment detection task as two show cases.
  • Resources: Video, Code, Poster

A Paragraph-level Multi-task Learning Model for Scientific Fact-Verification

Xiangci Li, Gully Burns, Nanyun Peng

  • Conference: The AAAI-21 Workshop on Scientific Document Understanding
  • Publication Date: 2021/2/9
  • Insititution: University of Southern California, Information Sciences Institute
  • TLDR: We propose a novel, paragraph-level, multi-task learning model for a scientific claim-verifications task (SciFact) by directly computing a sequence of contextualized sentence embeddings from a BERT model and jointly training the model on rationale selection and stance prediction.
  • Resources: Video, Code

Context-aware Stand-alone Neural Spelling Correction

Xiangci Li, Hairong Liu, Liang Huang

  • Conference: Findings of the Association for Computational Linguistics: EMNLP 2020
  • Publication Date: 2020/11/16
  • Insititution: Baidu USA
  • TLDR: We present a simple yet powerful solution that jointly detects and corrects misspellings as a sequence labeling task by fine-turning a pre-trained language model.
  • Resources: Code

Building deep learning models for evidence classification from the open access biomedical literature

Gully Burns, Xiangci Li, Nanyun Peng

  • Journal: Database
  • Publication Year: 2019
  • Insititution: University of Southern California, Information Sciences Institute
  • TLDR: We apply deep learning on experimental type classification of biomedical experimental descriptions.
  • Resources: Code

Neural and computational mechanismsfor task switching

Xiangci Li*, Chunyu Duan*, Ce Ma, Carlos Brody, Zheng Zhang, Jeffrey Erlich

  • Insititution: New York University Shanghai
  • Conferences: Society for Neuroscience 2017, Computational and Systems Neuroscience (Cosyne) 2021 (Oral, 4.6% acceptance rate)
  • TLDR: We trained rats and recurrent neural networks to perform a task-switching paradigm using similar procedures. Our results elucidate how ongoing activity, shaped by recent experience of animals and artifical systems, can influence new tasks at hand.

Resume

Education

University of Texas at Dallas

2020.8 - Present

Doctor of Philosophy

Computer Science

  • Advisor: Professor Jessica Ouyang
  • Research direction: knowledge-intensive natural language processing
  • GPA: 3.96

University of Southern California

2018.5 - 2019.12

Master of Science

Computer Science

  • Courses: Natural Language Processing, Machine Learning, Computer Vision, Information Integration, Algorithms, Web Technology, Database Systems
  • Advisors: Professor Nanyun Peng & Dr. Gully Burns
  • Research direction: scientific information extraction

New York University Shanghai

2013.8 - 2017.5

Bachelor of Science

Computer Science & Neuroscience

  • Honor: Cum Laude
  • GPA: 3.72

Professional Experience

Research Intern & Student Researcher

2023.9 - 2024.1 - Current

Google LLC

  • Minimal Evidence Group Identification for Fact Verification
  • Supervised by Dr. Fan Zhang, & Rajvi Kapadia.
  • Collaboration with Sihao Chen.

Applied Scientist Intern

2023.5 - 2023.8

Amazon.com Services LLC

Research Intern, Natural Language & Speech Processing

2022.5 - 2022.8

Tencent America, Tencent AI Lab

Graduate Research Assistant

2020.8 - 2023.5

University of Texas at Dallas

  • Advised by Professor Jessica Ouyang
  • Scholarly document processing: automatic related work section generation in scientific papers

Graduate Research Assistant

2020.12 - 2021.5

University of Texas at Dallas

  • Advised by Professor Gopal Gupta
  • Member of UT Dallas CASPR team for Alexa Prize Competition Grand Challenge 4

Teaching Assistant

2020.8 - 2020.12

University of Texas at Dallas

  • Undergraduate-level Machine Learning
  • Graduate-level Semantic Web

Research Scientist Intern

2020.1 - 2020.5

Baidu USA

  • Supervised by Professor Liang Huang & Dr. Hairong Liu
  • Full-time internship on stand-alone neural spelling correction
  • Paper accepted by Findings of EMNLP 2020

Graduate Student Worker & Research Assistant

2018.5 - 2020.8

University of Southern California, Information Sciences Institute

  • Co-mentored by Professor Nanyun Peng and Dr. Gully Burns
  • Worked on a series of projects leading to publications under evidX, which uses natural language processing techniques to extract scientific knowledge from biomedical literature, emphasizing on evidence extraction
    • Biomedical experimental type classification
    • Scientific discourse tagging
    • Evidence fragment delineation
    • Automatic scientific claim-verification
  • Worked on collecting a challenging commonsense reasoning dataset for natural language processing models

Visiting Researcher

2019.5 - 2019.8

Chan Zuckerburg Initiative

  • Supervised by Dr. Gully Burns
  • Worked under Meta team, which develops a system to recommend biomedical papers to researchers
  • Developed testing pipeline for a content-based recommendation system using clustering techniques

Research Assistant

2016.5 - 2018.4

Erlich lab, NYU-ECNU Institute of Brain and Cognitive Science at NYU Shanghai

  • Co-mentored by Professor Jeffrey Erlich and Professor Zheng Zhang
  • Started with student research assistant and switched to full-time research staff after graduation
  • Worked on Virtual Rat project, by building a recurrent neural network to model animal’s task switch cost phenomenon and analyzing the dynamics of the model
  • Analyzed rat’s behavior data and found evidence to support the computational model
  • Submitting a first-author paper for project: Neurophysiological and computation evidence for the task-set inertia theory of switch cost>
  • Trained mice and rats with behavior task on a regular basis
  • Contributed code for managing lab database using Python and MySQL

iGEM competition

2014.12 - 2015.9

NYU Shanghai iGEM team

  • International Genetically Engineered Machine (iGEM) is a synthetic biology competition held by MIT.
  • Developed project SYNTHESIZED (Bacteria Music Generator)
  • Played a key role in the team including designing developing the main product and team outreaches
  •  Attended the iGEM Giant Jamboree in Boston, MA, and won a silver medal

Skills

Computer Science Theories

Machine Learning, Deep Learning, Natural language Processing, Large Language Models, Knowledge Graph, Computer Vision, Theory of Computation

NLP, ML & CV Frameworks

OpenAI, Tensorflow, Keras, PyTorch, PaddlePaddle, Scikit-learn, Minpy, NLTK & OpenCV

Programming

Python, Java, C, MATLAB, MySQL, C++, Web (HTML, CSS, JavaScript, Angular 2+, php, Node.js), iOS (Swift), Verilog

Neuroscience

  • Electro-physiology data analysis
  • Rodent behavior training

Languages

  • Mandarin Chinese (native)
  • Japanese (native, Japanese-Language Proficiency Test N1 certificate)
  • English (full-English-education since college)

Misc.

  • Piano
  • Swimming
  • Stock trading
  • Driving
  • International politics