信息资源管理学报 ›› 2020, Vol. 10 ›› Issue (6): 101-109.doi: 10.13365/j.jirm.2020.06.101

• 研究论文 • 上一篇    下一篇

细分领域下面向专业人员的非常识性知识关联挖掘

肖 璐1 赵之辉1 陈 果2   

  1. 1.南京财经大学新闻学院,南京,210023 
    2.南京理工大学经济管理学院,南京,210094
  • 出版日期:2020-11-26 发布日期:2020-12-17
  • 作者简介:肖璐,副教授,博士,硕士研究生导师,研究方向为数据挖掘与知识聚合,Email: ahjk_xiaolu@163.com;赵之辉,硕士研究生,研究方向为社会化媒体研究;陈果,副教授,博士,硕士研究生导师,研究方向为领域知识分析。
  • 基金资助:
    本文系国家社会科学基金青年项目“学术型网络社区多元关联挖掘与知识聚合研究”(16CTQ025)研究成果之一。

Non-Common Knowledge Association Mining for Professionals in Subdivided Field

Xiao Lu1 Zhao Zhihui1 Chen Guo2   

  1. 1.School of Journalism, Nanjing University of Finance & Economics, Nanjing,210023; 
    2.School of Economics & Management, Nanjing University of Science and Technology, Nanjing,210094
  • Online:2020-11-26 Published:2020-12-17

摘要: 针对细分领域知识关联挖掘应用中普遍存在的“所得结果不需要挖掘也知道”这一质疑,提出一种更符合专业人员需求的非常识性知识关联挖掘方案。该方案包含三个关键点:①数据源采用专业人员经验交流文本,而不是常识性的百科文本,以保障挖掘结果符合专业问题解决的需要;②采用大规模预训练词向量+小规模细分领域语料学习微调的方式,能更好地开展领域术语表示学习,以解决细分领域语料不足和未登录专业术语学习的效果问题;③依托领域知识库剔除挖掘结果中常识性知识关联,以向专业人员提供值得深入的潜在性、线索性知识关联。以心血管领域为例,从小规模医生经验交流文本上挖掘所得知识关联,能更好地契合临床疑难问题解决经验、医学研究实验发现,可为专业人员提供有价值的、可进一步知识探索和利用的线索。

关键词: 知识关联挖掘, 领域知识分析, 预训练, 表示学习, 小规模语料

Abstract: In order to solve the problem that the results of knowledge association mining do not need to be mined in subdivision domain, this paper proposes a new scheme of non-common knowledge association mining which is more suitable for the needs of professionals.The scheme consists of three key points: firstly,to ensure the mining results can better solve professional problems, professional experience-sharing texts are used to analyze data, not common-sense encyclopedic texts. Secondly, to solve the problem of insufficient corpus in subdivided field and the unlisted terminology, the method of "large-scale pre-training word vector and small-scale subdivided field corpus learning fine-tuning" is adopted and carry out better representation learning in domain terminology. Finally, potential and cued knowledge association is provided to professionals after eliminating common knowledge association from mining results. Taking cardiovascular field as an example, knowledge association that mine from experience exchange texts of small-scale can better fit difficult clinical problem solving experience, medical research experiment discovery, and provide valuable clues for professionals to further explore and utilize knowledge.

Key words: Knowledge association mining, Domain knowledge analysis, Pre-training, Representation learning, Small-scale corpus

中图分类号: