面向古文自然语言处理生成任务的大语言模型评测研究

doi:10.13365/j.jirm.2024.05.045

信息资源管理学报 ›› 2024, Vol. 14 ›› Issue (5): 45-58.doi: 10.13365/j.jirm.2024.05.045

• 专题·大语言模型下的古籍智能信息处理 • 上一篇下一篇

面向古文自然语言处理生成任务的大语言模型评测研究

朱丹浩¹　赵志枭²　张一平¹　孙光耀²　刘畅²　胡蝶²　王东波²

1.江苏警官学院刑事科学技术系，南京，210031；
2.南京农业大学信息管理学院，南京，210095

出版日期:2024-09-26 发布日期:2024-10-15
作者简介:朱丹浩，副教授，博士，研究方向为自然语言处理、机器学习；赵志枭，硕士研究生，研究方向为数字人文、智能信息处理；张一平，本科，研究方向为自然语言处理、知识图谱；孙光耀，硕士研究生，研究方向为自然语言处理与文本挖掘；刘畅，博士研究生，研究方向为数字人文、自然语言处理；胡蝶，硕士研究生，研究方向为数字人文、自然语言处理与文本挖掘；王东波（通讯作者），教授，博士生导师，研究方向为数字人文与智能信息处理，Email：db.wang@njau.edu.cn。
基金资助:
本文系国家社科重大基金项目“中国古代典籍跨语言知识库构建与应用研究”（21&ZD331）、江苏省高等学校大学生实践创新创业训练计划项目“面向公安内网文献资源的垂直搜索引擎研究”（202210329046Y）的研究成果之一。

Research on Large Language Model Evaluation for the Generation Task of Natural Language Processing in Classical Chinese

Zhu Danhao¹　Zhao Zhixiao²　Zhang Yiping¹　Sun GuangYao²　Liu Chang²　Hu Die²　Wang Dongbo²

1.Department of Criminal Science and Technology, Jiangsu Police Institute, Nanjing, 210031;
2.School of Information Management, Nanjing Agricultural University, Nanjing, 210095

Online:2024-09-26 Published:2024-10-15
About author:Zhu Danhao, associate professor, Ph.D., research area:natural language process and machine learning; Zhao ZhiXiao, master candidate, research area: digital humanities and intelligent information organization; Zhang Yiping, undergraduate, research area: natural language processing, knowledge graph; Sun Guangyao, master candidate, research area: natural language processing and text mining; Liu Chang, Ph.D. candidate, research area: digital humanities, natural language processing; Hu Die, master candidate, research area: digital humanities, natural language processing, and text mining; Wang Dongbo (corresponding author), professor, Ph.D., doctoral supervisor , research interests include digital humanities and intelligent information organization, Email: db.wang@njau.edu.cn.
Supported by:
This work is supported by the Major Project of the National Social Science Fund of China "Research on the Construction and Application of a Cross-Language Knowledge Base for Ancient Chinese Books" (21&ZD331) and the project of the practice and innovation entrepreneurship training program for college students in Jiangsu Province "Research on Vertical Search Engine for Public Security Intranet Document Resources"(202210329046Y).

摘要/Abstract

摘要： 大语言模型的频繁发布为大语言模型的评测研究带来了机遇与挑战，针对通用领域大语言模型的评测体系日趋成熟，而面向垂直领域的大语言模型评测仍在起步阶段，本文以古文领域评测为切入点，从语言和知识两个维度构建了一批古籍领域评测任务，并选取当前各大榜单中性能较为优越的13个通用领域大语言模型进行评测。评测结果显示，ERNIE-Bot在古籍领域知识方面遥遥领先于其他模型，而GPT-4模型在语言能力方面表现出最佳性能，在开源模型中，ChatGLM系列模型表现最为出色。通过构建评测任务和数据集，制定了一套适用于古籍领域的大语言模型评测标准，为古籍领域大语言模型性能评测提供了参考，也为后续古籍大语言模型训练过程中的基座模型选取提供了依据。

关键词: 大语言模型, 生成式任务, 大模型评测, 古籍, 领域知识

Abstract: The rapid development of large language models (LLMs) presents both opportunities and challenges for their evaluation. While evaluation systems for general-domain LLMs are becoming more refined, assessments in specialized fields remain in the early stages. This study evaluates LLMs in the domain of classical Chinese, designing a series of tasks based on two key dimensions: language and knowledge. Thirteen leading general-domain LLMs were selected for evaluation using major benchmarks. The results show that ERNIE-Bot excels in domain-specific knowledge, while GPT-4 demonstrates the strongest language capabilities. Among open-source models, the ChatGLM series exhibits the best overall performance. By developing tailored evaluation tasks and datasets, this study provides a set of standards for evaluating LLMs in the classical Chinese domain, offering valuable reference points for future assessments. The findings also provide a foundation for selecting base models in future domain-specific LLM training.

Key words: Large language model, Generative tasks, Large model evaluation, Ancient books, Domain knowledge

中图分类号:

G206

朱丹浩　赵志枭　张一平　孙光耀　刘畅　胡蝶　王东波. 面向古文自然语言处理生成任务的大语言模型评测研究[J]. 信息资源管理学报, 2024, 14(5): 45-58.

Zhu Danhao　Zhao Zhixiao　Zhang Yiping　Sun GuangYao　Liu Chang　Hu Die　Wang Dongbo. Research on Large Language Model Evaluation for the Generation Task of Natural Language Processing in Classical Chinese[J]. Journal of Information Resources Management, 2024, 14(5): 45-58.

[1]	张海　赵雪　王东波. 大语言模型下古籍智能信息处理：构成要素、框架体系与实践路径研究[J]. 信息资源管理学报, 2024, 14(5): 36-44.
[2]	左亮　赵志枭　王东波. 基于大语言模型的《四库全书》自动分类研究[J]. 信息资源管理学报, 2024, 14(5): 23-35.
[3]	周海晨　章成志　胡志刚　徐硕　毛进　陈亮. 大模型时代下全文计量分析的应用与思考 ——2023全文本文献计量分析学术沙龙综述[J]. 信息资源管理学报, 2024, 14(2): 162-168,封2.
[4]	肖　璐　赵之辉　陈　果. 细分领域下面向专业人员的非常识性知识关联挖掘[J]. 信息资源管理学报, 2020, 10(6): 101-109.
[5]	李明杰　杨璐嘉. 基于GIS的明代古籍版刻地理信息系统的设计与实现[J]. 信息资源管理学报, 2020, 10(3): 125-133.

面向古文自然语言处理生成任务的大语言模型评测研究

Research on Large Language Model Evaluation for the Generation Task of Natural Language Processing in Classical Chinese

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 5

编辑推荐

Metrics

本文评价