Journal of Information Resources Management ›› 2023, Vol. 13 ›› Issue (1): 129-139.doi: 10.13365/j.jirm.2023.01.129

Previous Articles     Next Articles

Automatic Classification of Product Review Texts Combining Short Text Extension and BERT

Li Xiangdong1,2 Sun Qianru1 Shi Jian1   

  1. 1.School of Information Management, Wuhan University; 
    2.Center for Electronic Commerce Research and Development, Wuhan University, Wuhan, 430072
  • Online:2023-01-26 Published:2023-03-18

Abstract: In view of the fact that texts of product reviews are short and words are informal, this research aims to explore how to automatically classify product review texts by product categories and improve the classification performance. The core words set of the training set is constructed through the TF-IDF and LDA model, and short texts are extended by Word2Vec similarity calculation method. After extension, the product reviews are categorized by the product categories based on the Bidirectional Encoder Representation of Transformer (BERT) model. And then we design corresponding comparative experiments to prove the effectiveness of this method. When using BERT classification for the product reviews after extension, the F1 value obtained by the method proposed in this paper is 2.1 percent higher than are not extended, and it is 0.9 percent higher than that when using HowNet similarity calculation method. The reasons for the effectiveness of the method proposed in this paper are analyzed from the aspects of basic principles, different word similarity calculation methods, and words used methods. The method proposed in this paper can effectively improve the classification performance of the product reviews when organizing information by product categories, and be applied to the field of information organization of e-commerce information and research on related theories and methods.

Key words: Product review texts, Short text, Feature extension, Word2Vec, BERT

CLC Number: