Service and Contributions
Academic Service
Chairing
- Senior Area Chair: EMNLP 2026
- Area Chair: NeurIPS 2026, ACL Rolling Review 2025–2026, NLPCC 2026
Conference and Workshop Organization
- Program Chair: PersonaLLM Workshop at NeurIPS 2025
- Publication Chair: Australasian Language Technology Association (ALTA) Workshop 2026
- Shared Task Organizer: ALTA 2024 Shared Task, Detecting AI-Generated Sentences in Human–AI Hybrid Articles
Local Service
- Seminar Organizer: School of Computing Technologies, RMIT University
Reviewing
- ACL, EMNLP, NAACL, EACL, ICLR, NeurIPS, AAAI, IJCAI, and related NLP/AI venues
Software and Open Source Contributions
FactualSceneGraph Toolkit for faithful and consistent textual scene-graph parsing, connecting single-sentence FACTUAL parsing with discourse-level DiscoSG refinement. Repository ·
· FACTUAL paper, Findings of ACL 2023 · DiscoSG paper, EMNLP 2025 Outstanding Paper Award
StarCoder 2 / The Stack v2 Contributor to the BigCode open-science code LLM and corpus initiative. The Stack v2 spans 619 programming languages and underpins StarCoder2 models trained on 3.3-4.3T tokens, with model weights released for responsible open research and development. Repository · Paper
SCAR ACL 2025 data selection method and toolkit for efficient instruction tuning of LLMs. SCAR ranks instruction-response pairs by style consistency; in reported benchmarks, selecting as little as 0.7% of the full dataset can match or surpass full-data fine-tuning. Repository · Paper
Talks, Panels, and Workshops
PersonaLLM Workshop at NeurIPS 2025 Program Chair. Details
Cultural Alignment & Low-Resource Languages: Adapting LLMs for Diverse Cultures Panel, International AI Cooperation and Governance Forum, 2025.
Synthetic or Human Data: Optimizing Data Curation for Alignment of LLMs Invited virtual talk, Ant Group, 2024.
Make Sense of Textual Data Workshop, Research Bazaar Victoria, 2024.
Detect Automatic AI-Generated Sentences for Human-AI Hybrid Articles Shared Task at ALTA 2024, Canberra. Details
