Journal of Information Resources Management ›› 2024, Vol. 14 ›› Issue (5): 45-58.doi: 10.13365/j.jirm.2024.05.045

Previous Articles     Next Articles

Research on Large Language Model Evaluation for the Generation Task of Natural Language Processing in Classical Chinese

Zhu Danhao1 Zhao Zhixiao2 Zhang Yiping1 Sun GuangYao2 Liu Chang2 Hu Die2 Wang Dongbo2   

  1. 1.Department of Criminal Science and Technology, Jiangsu Police Institute, Nanjing, 210031;
    2.School of Information Management, Nanjing Agricultural University, Nanjing, 210095
  • Online:2024-09-26 Published:2024-10-15
  • About author:Zhu Danhao, associate professor, Ph.D., research area:natural language process and machine learning; Zhao ZhiXiao, master candidate, research area: digital humanities and intelligent information organization; Zhang Yiping, undergraduate, research area: natural language processing, knowledge graph; Sun Guangyao, master candidate, research area: natural language processing and text mining; Liu Chang, Ph.D. candidate, research area: digital humanities, natural language processing; Hu Die, master candidate, research area: digital humanities, natural language processing, and text mining; Wang Dongbo (corresponding author), professor, Ph.D., doctoral supervisor , research interests include digital humanities and intelligent information organization, Email: db.wang@njau.edu.cn.
  • Supported by:
    This work is supported by the Major Project of the National Social Science Fund of China "Research on the Construction and Application of a Cross-Language Knowledge Base for Ancient Chinese Books" (21&ZD331) and the project of the practice and innovation entrepreneurship training program for college students in Jiangsu Province "Research on Vertical Search Engine for Public Security Intranet Document Resources"(202210329046Y).

Abstract: The rapid development of large language models (LLMs) presents both opportunities and challenges for their evaluation. While evaluation systems for general-domain LLMs are becoming more refined, assessments in specialized fields remain in the early stages. This study evaluates LLMs in the domain of classical Chinese, designing a series of tasks based on two key dimensions: language and knowledge. Thirteen leading general-domain LLMs were selected for evaluation using major benchmarks. The results show that ERNIE-Bot excels in domain-specific knowledge, while GPT-4 demonstrates the strongest language capabilities. Among open-source models, the ChatGLM series exhibits the best overall performance. By developing tailored evaluation tasks and datasets, this study provides a set of standards for evaluating LLMs in the classical Chinese domain, offering valuable reference points for future assessments. The findings also provide a foundation for selecting base models in future domain-specific LLM training.

Key words: Large language model, Generative tasks, Large model evaluation, Ancient books, Domain knowledge

CLC Number: