Journal of Information Resources Management ›› 2020, Vol. 10 ›› Issue (5): 23-29.doi: 10.13365/j.jirm.2020.05.023

Previous Articles     Next Articles

Research on Text Resource Classification of Humanities and Social Sciences Thematic Database Based on Deep Learning: Taking “XinHua Silkroad”Database and“One Belt One Road”Database as Examples

Shi Qin Li Yang   

  1. School of Information Management, Nanjing University,Nanjing,210023
  • Online:2020-09-26 Published:2020-10-13

Abstract: With the deepening of digital transformation, the construction of thematic databases in the field of Humanities and Social Sciences continues to develop. Text resources are an important part of thematic database construction, and also the main way to acquire domain knowledge for Humanities and social sciences research. Based on the Long Short-Term Memory model, a classification model of thematic textual resources in the Humanities and Social Sciences that integrates attentional mechanisms is proposed to address the characteristics of similar themes, in-depth content and similar features. In this paper, we use word vectors to complete the digitization of sample text, use the Long Short-Term Memory model for semantic feature extraction, and use the attention mechanism to highlight key phrases to optimize the feature extraction process, and finally use Softmax to give thematic text classification results. By crawling the relevant texts of the "Xinhua SilkRoad" database and the "One belt One road" thematic database, the feasibility and superiority of the model proposed in this paper are verified. The results showed that the Humanities and Social Sciences thematic text resource classification model, which combines the Long Short-Term Memory model with the attentional mechanism is superior to other models in the long and short text classification task.

Key words: Humanities and Social Sciences, Thematic database, Thematic text classification, Long Short-Term Memory model, Attention mechanism

CLC Number: