信息资源管理学报 ›› 2023, Vol. 13 ›› Issue (5): 137-148.doi: 10.13365/j.jirm.2023.05.137

• 研究论文 • 上一篇    

高被引论文核心影响因素判别研究

许林玉   

  1. 徐州医科大学管理学院,徐州,221004
  • 出版日期:2023-09-26 发布日期:2023-10-15
  • 作者简介:许林玉,博士、讲师,研究方向为用户行为、科学计量,Email:1123223036@qq.com。

The Identification of the Core Factors of Highly Cited Papers

Xu Linyu   

  1. School of Management,Xuzhou Medical University,Xuzhou,221004
  • Online:2023-09-26 Published:2023-10-15

摘要: 高被引论文具有较高的学术话语权与参考价值,判别其核心影响因素对于学术论文获得持续吸引引文能力,建立并强化高被引竞争优势至关重要。通过文献提取与问卷调查的主客观相结合方法提取、筛选并形成学术论文内外部影响因素集,通过逻辑回归方法探究内外部影响因素对高被引论文的线性和非线性影响,最后运用机器学习多种经典分类算法来检验上述结果的稳健性。研究发现,参考文献质量、参考文献年龄均对形成高被引论文具有显著的正向线性影响,且随着变量值的增大,二次项对线性影响具有较强的叠加效应;期刊质量对形成高被引论文近似线性影响;而作者声誉、使用次数及初始被引量等因素对形成高被引论文具有显著的正向线性影响,随着变量值的增大,二次项逐渐削弱其线性影响,呈现先增大后趋于平缓的半倒U型趋势;机器学习决策树、朴素贝叶斯、随机森林等经典分类算法均对高被引论文具有较好的预测效果,研究结果具有较强的稳健性。

关键词: 高被引论文, 核心影响因素, 择优依附, 逻辑回归, 机器学习分类算法

Abstract: Highly cited papers have high academic discourse and reference values. The research on the identification of the core factors of highly cited papers is very important for academic papers to obtain citations and to establish and strengthen the competitive advantage. This paper extracts, screens, and forms a set of internal and external influencing factors of academic papers through the combination of literature extraction and questionnaire. It then explores the linear and nonlinear influence of these factors on highly cited papers by means of logistic regression. Finally, this paper uses various classical classification algorithms of machine learning to test the robustness of the above results. The results show that the quality and age of references both have a significant positive linear effect on the formation of highly cited papers, and with the increase of variable values, the quadratic coefficient has a strong superposition effect. In addition, journal reputation has an approximate linear effect on the formation of highly cited papers. However, the indicators such as the author’s reputation, usage and initial citation have a significant positive linear influence on the formation of highly cited papers. With the increase of variable values, the linear effect of the quadratic coefficient gradually weakens, showing a semi-"inverted U" trend of first increasing and then leveling off. Machine learning classical classification algorithms such as decision tree, naive bayes and random forest all show good prediction results for highly cited papers, which shows that the research results of this paper are robust.

Key words: Highly cited papers, Core factors, Preferential attachment, Logical regression, Machine learning classification algorithm

中图分类号: