Journal of Information Resources Management ›› 2026, Vol. 16 ›› Issue (2): 111-124.doi: 10.13365/j.jirm.2026.02.111

Previous Articles     Next Articles

Research on the Early Identification Method of Breakthrough Achievements from the Perspective of Knowledge Flow

Ye Qing1,2 Wang Yezhu3 Xie Yundong4 Zhang Peng1   

  1. 1.Business School, Fuyang Normal University, Fuyang, 236041; 
    2.School of Management, University of Science and Technology of China, Hefei, 230026; 
    3.Library, University of Science and Technology of China, Hefei, 230026; 
    4.School of Public Policy & Management, University of Science and Technology of China, Hefei, 230026
  • Online:2026-03-26 Published:2026-06-04
  • About author:Ye Qing, PhD, associate professor, with research interests in science and technology innovation management and evaluation; Wang Yezhu, PhD, librarian, with research interests in informatics; Xie Yundong, PhD, associate researcher, with research interests in intelligent decision-making based on science and technology big data; Zhang Peng(corresponding author), PhD, lecturer, with research interests in science of science, Email:zpbeidou@fynu.edu.cn.
  • Supported by:
    This work is supported by the Youth Project funded by the National Natural Science Foundation of China "Research on Early Feature Extraction and Identification Model Construction of Disruptive Scientific Achievements Based on Multi-source Data Fusion" (72404261), the Scientific Research Project of the Education Department of Anhui Province, China "Research on the Associated Network and Prediction Model of Academic Misconduct from a Multi-Source Fusion Perspective" (2025AHGXSK40135), and the New Liberal Arts Fund of USTC "Dynamic Prediction and Explainability of Disruptive Scientific Discoveries Based on Multimodal Temporal Semantic Enhancement" (FSSF-A-260107).

Abstract: Breakthrough research drives greatly advances in science and technology, and expands the boundaries of human knowledge. The early identification of such achievements is essential for forward-looking basic research planning, the efficient allocation of scientific resources, and the formulation of national innovation strategies. However, existing studies on the identification of breakthrough achievements often focus on a single-dimension rather than an integrated theoretical framework, and few adequately explore the early formation mechanisms of breakthrough achievements. To address these limitations, this study proposes a theoretical framework that grounded in knowledge flow theory, encompassing the process of knowledge input, knowledge production, and knowledge output. Based on this framework, a multi-dimensional early identification index system comprising 15 indicators is constructed by integrating features across the three stages. Multiple machine learning algorithms are then employed to build identification models, from which the most effective model is selected. Finally, SHAP is applied to interpret the model and to quantify the relative importance and contributions of different features in the identification process. The results indicate that: ① the CatBoost model demonstrates the best performance in early identification. ② three key early signals are particularly influential, which are citations within five years, disruption index within five years in the knowledge output stage, and the author's highest academic achievement in the knowledge production stage. ③ the model exhibits strong generalization capability when it verified by APS milestone papers. Overall, this study proposes a novel paradigm that integrates predictive accuracy with interpretability for the early identification of breakthrough research, and provides evidence for understanding its early formation mechanisms.

Key words: Breakthrough achievements, Knowledge flow theory, Early identification, Machine learning, SHAP

CLC Number: