信息资源管理学报 ›› 2024, Vol. 14 ›› Issue (6): 131-142.doi: 10.13365/j.jirm.2024.06.131

• 研究论文 • 上一篇    下一篇

知识重组视角下古诗典故词关联研究:一种融合细粒度共引关系和语义特征的方法

李晓敏1,2 王昊1,2 布文茹1,2  周抒1,2   

  1. 1.南京大学信息管理学院,南京,210023; 
    2.数据工程与知识服务江苏省高校重点实验室(南京大学),南京,210093
  • 出版日期:2024-11-26 发布日期:2024-12-20
  • 作者简介:李晓敏,博士研究生,研究方向为文本挖掘与知识组织;王昊(通讯作者),教授,博士,博士生导师,研究方向为自然语言处理、数据挖掘应用与本体学习研究,Email: ywhaowang@nju.edu.cn;布文茹,硕士研究生,研究方向为数据分析与挖掘;周抒,博士研究生,研究方向为自然语言处理。
  • 基金资助:
    本文系国家自然科学基金面上项目“关联数据驱动下我国非遗文本的语义解析与人文计算研究”(72074108);中央高校基本科研项目“面向人文计算的方志文本的语义分析和知识图谱研究”(010814370113)研究成果之一。

Linking Allusion Words in Ancient Poetry from the Perspective of Knowledge Reorganization: A Method of Integrating the Fine-grained Co-citation Relationships and the Semantic Features

Li Xiaomin1,2 Wang Hao1,2 Bu Wenru1,2 Zhou Shu1,2   

  1. 1.School of Information Management, Nanjing University, Nanjing, 210023; 
    2.Key Laboratory of Data Engineering and Knowledge Services in Jiangsu Provincial Universities(Nanjing University), Nanjing, 210093
  • Online:2024-11-26 Published:2024-12-20
  • About author:Li Xiaomin, Ph.D. candidate, with research interests in text mining and knowledge organization; Wang Hao(corresponding author), Ph.D., professor, with research interests in natural language processing, data mining applications and ontology learning, Email: ywhaowang@nju.edu.cn; Bu Wenru, postgraduate, with research interests in data analysis and mining; Zhou Shu, Ph.D. candidate, with research interests in natural language processing.
  • Supported by:
    This article is an outcome of the project "Research on Semantic Parsing and Humanities Computing of Chinese Intangible Cultural Heritage Text Driven by Linked Data"(72074108) supported by National Natural Science Foundation of China and the project "The Semantic Analysis and Knowledge Graph Research of Local Chronicle Text Oriented to Humanistic Computing"(010814370113) supported by the Fundamental Research Funds for the Central Universities.

摘要: 在知识重组相关理论与技术的指导下,对典故文化资源进行语义挖掘和组织,促进典故文化的传承与利用。本文提出了一种融合细粒度共引关系和语义特征实现典故词关联的模型。首先依据古诗与典故词间的引用关系构建共引网络,再将细粒度共引关系位置共引和情感共引加入到共引网络中,初步构建细粒度共引网络,之后利用Doc2vec获得每个典故词的语义特征,并整合语义特征重构共引网络,最后利用链接预测算法遍历细粒度共引网络,实现典故词的语义关联与组织。同时从路径角度对关联结果进行分析,探索规律性领域知识。本研究构建了一个包含5869个节点和27032条边的共引网络,提出的加入位置共引和情感共引以及语义特征的典故词关联方法效果达到了0.963,且相较于细粒度共引关系,语义特征在典故词关联中作用更为显著。此外,路径角度分析关联结果发现共引网络中的典故词关系紧密,最短路径阶数与典故词对的数量以及相似度均呈负相关关系。

关键词: 知识重组, 典故词, 共引网络, 语义特征, 链路预测

Abstract: Guided by theories and technologies related to knowledge reorganization, this study conducts semantic mining and organization of allusion cultural resources to promote the inheritance and utilization of allusion culture. A model is proposed that integrates fine-grained co-reference relations and semantic features to link allusion terms. First, a co-reference network is constructed based on the citation relationships between ancient poems and allusion terms, and fine-grained co-reference relations, including positional co-reference and emotional co-reference, are added to build a fine-grained co-reference network. Then, Doc2vec is employed to extract the semantic features of each allusion term, and these features are integrated to reconstruct the co-reference network. Finally, a link prediction algorithm is applied to traverse the fine-grained co-reference network, achieving semantic association and organization of allusion terms. The association results are further analyzed from a path-based perspective, uncovering some regular patterns in domain knowledge. The constructed co-reference network consists of 5,869 nodes and 27,032 edges. The proposed method, which incorporates positional and emotional co-references as well as semantic features, achieves an accuracy of 0.963 in the task of linking allusion terms. Moreover, the analysis reveals that the shortest path order is negatively correlated with both the number of allusion term pairs and their similarity.

Key words: Knowledge reorganization, Allusion words, Co-citation network, Semantic features, Link prediction

中图分类号: