Journal of Information Resources Management ›› 2024, Vol. 14 ›› Issue (3): 90-103.doi: 10.13365/j.jirm.2024.03.090

Previous Articles     Next Articles

Multi-level Functional Structure Recognition of Scientific Literature

Liu Haotan1,2 Liu Jiawei1,2 Zhang Fan1,2 Lu Wei1,2   

  1. 1.School of Information Management, Wuhan University, Wuhan,430072
    2.Information Retrieval and Knowledge Mining Laboratory,Wuhan University,Wuhan,430072
  • Online:2024-05-26 Published:2024-06-14
  • About author:Liu Haotan, Ph.D.candidate, research on human-research interaction(HCI), information retrieval and text generation; Liu Jiawei, Ph.D. candidate, research on human-research interaction(HCI), information retrieval; Zhang Fan, associate professor, Ph.D., research on information retrieval evaluation, user behavior analysis; Lu Wei(corresponding author), professor, Ph.D. research on information retrieval, data intelligence, innovation evaluation, and so on.
  • Supported by:
    This is an outcome of the Key Project "Data and Intelligence Empowered Theoretic Change of Scientific Information Resource and Knowledge Management Theory"(72234005) and the project "Argumentation Logic Recognition of Scientific Proposition Text based on Machine Reading Comprehension"(72174157), both supported by National Natural Science Foundation of China.

Abstract: The automatic recognition of structure function helps improve the efficiency of tasks such as fine-grained information retrieval, keyword extraction, and citation analysis. In response to the current challenges faced by structure function recognition research, including weak expression of internal textual dependencies and insufficient model generalization and transferability, this paper utilizes graph convolution neural networks to capture inherent dependency information and topological structures among word nodes, enhancing the modeling and representation capabilities of scientific publications. Additionally, adversarial learning is introduced to improve the generalization ability of the structure-function recognition model. The ScienceDirect dataset is selected to examine the recognition effectiveness of various model approaches for structure function at three different granularities: Header, Section, and Paragraph. Furthermore, we tested the transferability of multiple models across domains on PubMED-20k, a medical abstract structure function recognition dataset. Experimental results demonstrate that BERT+GCN get the best performance at the Header level, with an value of 88%, which is a 3% improvement over baseline models. At the Section level, the combination of BERT and GAN achieves the best performance, which is also a 3% improvement over baseline models. At the section paragraph level, the score reaches 68%. BERT+GCN exhibits superior cross-domain transferability compared to other models, achieving an score of 90% on cross-domain data.

Key words: Functional Structure, Graph convolution network, Generative adversarial networks, Scientific literature, Information recognitio

CLC Number: