信息资源管理学报 ›› 2024, Vol. 14 ›› Issue (5): 45-58.doi: 10.13365/j.jirm.2024.05.045

• 专题·大语言模型下的古籍智能信息处理 • 上一篇    下一篇

面向古文自然语言处理生成任务的大语言模型评测研究

朱丹浩1 赵志枭2 张一平1 孙光耀2 刘畅2 胡蝶2 王东波2   

  1. 1.江苏警官学院刑事科学技术系,南京,210031;
    2.南京农业大学信息管理学院,南京,210095
  • 出版日期:2024-09-26 发布日期:2024-10-15
  • 作者简介:朱丹浩,副教授,博士,研究方向为自然语言处理、机器学习;赵志枭,硕士研究生,研究方向为数字人文、智能信息处理;张一平,本科,研究方向为自然语言处理、知识图谱;孙光耀,硕士研究生,研究方向为自然语言处理与文本挖掘;刘畅,博士研究生,研究方向为数字人文、自然语言处理;胡蝶,硕士研究生,研究方向为数字人文、自然语言处理与文本挖掘;王东波(通讯作者),教授,博士生导师,研究方向为数字人文与智能信息处理,Email:db.wang@njau.edu.cn。
  • 基金资助:
    本文系国家社科重大基金项目“中国古代典籍跨语言知识库构建与应用研究”(21&ZD331)、江苏省高等学校大学生实践创新创业训练计划项目“面向公安内网文献资源的垂直搜索引擎研究”(202210329046Y)的研究成果之一。

Research on Large Language Model Evaluation for the Generation Task of Natural Language Processing in Classical Chinese

Zhu Danhao1 Zhao Zhixiao2 Zhang Yiping1 Sun GuangYao2 Liu Chang2 Hu Die2 Wang Dongbo2   

  1. 1.Department of Criminal Science and Technology, Jiangsu Police Institute, Nanjing, 210031;
    2.School of Information Management, Nanjing Agricultural University, Nanjing, 210095
  • Online:2024-09-26 Published:2024-10-15
  • About author:Zhu Danhao, associate professor, Ph.D., research area:natural language process and machine learning; Zhao ZhiXiao, master candidate, research area: digital humanities and intelligent information organization; Zhang Yiping, undergraduate, research area: natural language processing, knowledge graph; Sun Guangyao, master candidate, research area: natural language processing and text mining; Liu Chang, Ph.D. candidate, research area: digital humanities, natural language processing; Hu Die, master candidate, research area: digital humanities, natural language processing, and text mining; Wang Dongbo (corresponding author), professor, Ph.D., doctoral supervisor , research interests include digital humanities and intelligent information organization, Email: db.wang@njau.edu.cn.
  • Supported by:
    This work is supported by the Major Project of the National Social Science Fund of China "Research on the Construction and Application of a Cross-Language Knowledge Base for Ancient Chinese Books" (21&ZD331) and the project of the practice and innovation entrepreneurship training program for college students in Jiangsu Province "Research on Vertical Search Engine for Public Security Intranet Document Resources"(202210329046Y).

摘要: 大语言模型的频繁发布为大语言模型的评测研究带来了机遇与挑战,针对通用领域大语言模型的评测体系日趋成熟,而面向垂直领域的大语言模型评测仍在起步阶段, 本文以古文领域评测为切入点,从语言和知识两个维度构建了一批古籍领域评测任务,并选取当前各大榜单中性能较为优越的13个通用领域大语言模型进行评测。评测结果显示,ERNIE-Bot在古籍领域知识方面遥遥领先于其他模型,而GPT-4模型在语言能力方面表现出最佳性能,在开源模型中,ChatGLM系列模型表现最为出色。通过构建评测任务和数据集,制定了一套适用于古籍领域的大语言模型评测标准,为古籍领域大语言模型性能评测提供了参考,也为后续古籍大语言模型训练过程中的基座模型选取提供了依据。

关键词: 大语言模型, 生成式任务, 大模型评测, 古籍, 领域知识

Abstract: The rapid development of large language models (LLMs) presents both opportunities and challenges for their evaluation. While evaluation systems for general-domain LLMs are becoming more refined, assessments in specialized fields remain in the early stages. This study evaluates LLMs in the domain of classical Chinese, designing a series of tasks based on two key dimensions: language and knowledge. Thirteen leading general-domain LLMs were selected for evaluation using major benchmarks. The results show that ERNIE-Bot excels in domain-specific knowledge, while GPT-4 demonstrates the strongest language capabilities. Among open-source models, the ChatGLM series exhibits the best overall performance. By developing tailored evaluation tasks and datasets, this study provides a set of standards for evaluating LLMs in the classical Chinese domain, offering valuable reference points for future assessments. The findings also provide a foundation for selecting base models in future domain-specific LLM training.

Key words: Large language model, Generative tasks, Large model evaluation, Ancient books, Domain knowledge

中图分类号: