信息资源管理学报 ›› 2020, Vol. 10 ›› Issue (3): 78-91.doi: 10.13365/j.jirm.2020.03.078

• 研究论文 • 上一篇    下一篇

中文在线评论中的产品新属性识别研究

秦成磊 章成志   

  1. 南京理工大学信息管理系,南京,210094
  • 出版日期:2020-05-26 发布日期:2020-05-26
  • 作者简介:秦成磊,男,博士研究生,研究方向为自然语言处理与文本挖掘;章成志,男,博士,教授,博士生导师,研究方向为信息组织、信息检索、数据挖掘及自然语言处理。
  • 基金资助:
    本文系国家社科基金重大项目“面向知识创新服务的数据科学理论与方法研究”(16ZAD224)的成果之一。

Extraction New Attributes of Product from Chinese Online Reviews

Qin Chenglei Zhang Chengzhi   

  1. Department of Information Management, Nanjing University of Science and Technology, Nanjing,210094
  • Online:2020-05-26 Published:2020-05-26

摘要: 新材料、新技术、新工艺的应用使得新属性广泛存在于新上市的产品中。现有的产品属性抽取方法通常只关注评价对象的主要属性抽取,未对新属性识别展开深入研究,从而影响以属性抽取为研究基础的相关研究的实验结论。针对该情况,本研究将产品新属性识别转化为分类任务,分别将分类模型、条件随机场(CRF)、双向长短期记忆网络与条件随机场结合的深度学习模型(Bi-LSTM-CRF)应用到该任务中。对实验结果进行分析,确定使用CRF模型获取候选新属性;随后,使用四种强约束规则过滤噪音,优化模型识别结果;最后,为增强所识别新属性的可解释性,基于层次聚类的思想对新属性和种子属性进行聚类,以种子属性解释新属性。实验结果表明本研究所提出的产品新属性识别方案能够对产品属性进行有效扩充。

关键词: 新属性抽取, 属性聚类, 条件随机场, Bi-LSTM-CRF

Abstract: The new product attributes widely exist in the newly marketed products because of the application of new materials, new technologies and new processes. The existing product attributes extraction methods mainly focus on the extraction of core attributes, and new attributes are not recognized. This will affect the experimental results of related research based on attribute extraction. In view of this situation, we transformed the new attributes recognition into classification tasks, and utilized the classification model, conditional random field (CRF) and deep learning model (Bi-LSTM-CRF) to solve this task. We analyzed the experimental results, and decided to employ CRF model to get candidate new attributes. And we filtered noise by four strong rule-based methods. In order to enhance the interpretability of the new attributes, the new attributes were clustered through the idea of hierarchical clustering. Experimental results show that the proposed scheme of new attributes recognition can effectively extend the collection of product attributes.

Key words: New attributes extraction, Attributes clustering, CRF, Bi-LSTM-CRF

中图分类号: