Service and Contributions
Open Source Contributions
I have contributed to several open-source projects that advance Natural Language Processing (NLP) and Machine Learning (ML):
- Starcoder2 Project
- Contributed to developing Starcoder2, an open-source coding LLM, as part of the BigCode project led by Hugging Face.
- Impact: Recognized as one of the leading open-source LLMs for coding in 2024, with complete transparency in its training process and data usage.
- GitHub Repository
- FACTUAL: Text Scene Graph Parsing
- Built FACTUAL, a text scene graph parser, in collaboration with Adobe and Wuhan University.
- Impact: Downloaded over 33,000 times from PyPI and 130,000 times from Hugging Face Models.
- GitHub Repository
- SCAR: Data Selection Tool for LLM Fine-tuning
- Developed SCAR, a state-of-the-art tool for data selection that enhances LLM performance while reducing dataset size.
- Impact: Downloaded over 5,000 times from PyPI.
- GitHub Repository
Peer Review Activities
I contribute to the academic community by reviewing papers for top-tier conferences and journals:
ACL, EMNLP, ICLR, EACL, NAACL, ACL Rolling Review, AAAI, AJI
Talks and Workshops
I have delivered talks and organized workshops to share insights on NLP and LLM research with academic and industry audiences:
2024
- Detect Automatic AI-Generated Sentences for Human-AI Hybrid Articles
- Event: Shared Task at ALTA 2024, Canberra
- Details
- Make Sense of Textual Data
- Event: ResBaz Workshop, Victoria
- Details
- Synthetic or Human Data: Optimizing Data Curation for Alignment of Large Language Models
- Event: Virtual Talk at Ant Group
- Details
2020
- Context-Dependent Semantic Parsing
- Event: Conference Oral Presentation at COLING 2020
- Details
