信息资源管理学报 ›› 2020, Vol. 10 ›› Issue (6): 82-89.doi: 10.13365/j.jirm.2020.06.082

• 研究论文 • 上一篇    下一篇

基于职业经历和引文网络的华人姓名消歧算法

刘玮辰1 史冬波1 李 江2   

  1. 1.上海交通大学国际与公共事务学院,上海,200030; 
    2.南京大学信息管理学院,南京,210023
  • 出版日期:2020-11-26 发布日期:2020-12-17
  • 作者简介:刘玮辰,博士生,研究方向为科技政策、人才流动、创新经济学;史冬波(通讯作者),博士,特别副研究员,研究方向为创新经济学,Email:shidongbo@sjtu.edu.cn;李江,博士,教授,研究方向为信息计量学。

Name Disambiguation for Chinese Authors Using Their Career Experience and Citation Networks

Liu Weichen1 Shi Dongbo1 Li Jiang2   

  1. 1.School of International and Pulic Affairs, Shanghai Jiao Tong University, Shanghai, 200030;
    2.School of Information Management, Nanjing University, Nanjing,210023
  • Online:2020-11-26 Published:2020-12-17

摘要: 作者姓名歧义是科技文献研究的重要基础问题,该问题在华人姓名中一直没有得到较好的解决,本研究目的在于提升华人姓名消歧算法的准确率。本文首次提出基于作者职业经历与引文网络的姓名消歧算法,该算法在构建的华人作者Web of Science(WoS)论文准确集上的F1值达到82.91%,但在数据可得性、规模化使用等方面存在一定限制。本文的算法是针对WOS华人作者的姓名消歧算法,具有操作性强、运算速度快、不依赖于复杂模型、不受制于计算资源等特性,具备良好的应用前景,本文构建的精确数据集亦对后续研究有借鉴意义。

关键词: 姓名消歧, 华人学者, 监督学习, 职业经历, 引文网络

Abstract: Author name disambiguation is an important basic problem in scientific literature research, which has not been well solved in Chinese names. This study aims at improving the accuracy rate of Chinese name disambiguation. This paper proposes an algorithm based on author's career experience and citation networks for the first time. The F1-score of this algorithm conducted on the ground truth dataset of Chinese authors' papers from Web of Science reaches to 80.88%. This algorithm has some limitations in data availability and large-scale use. The algorithm proposed in this paper is the first time to eliminate name disambiguation among WOS Chinese authors. It has the characteristics of strong operability, fast operation speed, independent of complex models and not limited by computing resources, and has a good application prospect. The ground truth dataset constructed in this paper also has important reference significance for the follow-up research.

Key words: Name Disambiguation, Chinese Names, Supervised Learning, Career Experience, Citation Networks

中图分类号: