Journal of Information Resources Management ›› 2025, Vol. 15 ›› Issue (3): 108-121.doi: 10.13365/j.jirm.2025.03.108

Previous Articles     Next Articles

Evaluation of Prompt Fine-Tuning Data Efficacy in Large Language Models: A Focus on Data Quality

Liu Xiaohui Ran Congjing Liu Xingshen Li Wang   

  1. School of Information Management, Wuhan University, Wuhan, 430072
  • Online:2025-05-26 Published:2025-06-16
  • About author:Liu Xiaohui, Ph.D. candidate, research interests include information resource management, data elements, and informetrics; Ran Congjing(corresponding author), professor, Ph.D., doctoral supervisor, research interests include intellectual property, big data governance,Email: rancongjing@whu.edu.cn; Liu Xingshen, Ph.D. candidate, research interests include data science, natural language processing, intellectual property; Li Wang, Ph.D.candidate, research interests include data science, intellectual property.
  • Supported by:
    This paper is one of the research outcomes of the Major Project of the National Social Science Fund of China "Research on the Construction of a Security System for Big Data Sovereignty" (21&ZD169) and the Youth Project of the National Social Science Fund of China "Research on the Intelligent Discrimination and Recommendation of University Patent Quality Based on Knowledge Units"(23CTQ028).

Abstract: Breakthroughs in generative artificial intelligence have led to the emergence of phenomenon-level large language models (LLMs), such as ChatGPT, posing unprecedented challenges to traditional data utility assessment methods. In response, this study focuses on evaluating the utility of instruction-tuning data for LLMs by establishing a multi-dimensional assessment framework that integrates three key dimensions—complexity, usability, and diversity—and accordingly proposes a novel data utility evaluation function. Experiments on multiple publicly available instruction-tuning datasets demonstrate that the proposed approach provides a reasonable and effective means of measuring data quality, while the reasoning loss observed in LLMs fine-tuned on different datasets exhibits a high degree of consistency with the proposed evaluation metrics. This work is the first to directly employ reasoning loss as a measure of the quality of LLM instruction-tuning data, further introducing the three dimensions—complexity, usability, and diversity—to characterize “high-quality data”. By proposing new quantitative metrics, this study offers important theoretical and practical guidance for future improvements in the quality of instruction-tuning data for large language models and related research applications.

Key words: Data elements, Data utility, Data quality, Large language models, Prompt fine-tuning data

CLC Number: