信息资源管理学报 ›› 2020, Vol. 10 ›› Issue (1): 29-.doi: 10.13365/j.jirm.2020.01.029

• 专题-突发事件应急情报分析 • 上一篇    下一篇

大数据驱动的社交网络舆情用户情感主题分类模型构建研究——以“移民”主题为例

王晰巍1,2 邢云菲1 韦雅楠1 王铎1   

  1. 1. 吉林大学管理学院,长春,1300222. 吉林大学大数据管理研究中心,长春,130022
  • 收稿日期:2019-10-08 出版日期:2020-01-26 发布日期:2020-01-26
  • 通讯作者: 王晰巍,女,教授,博士生导师,研究方向为大数据应用、舆情分析和信息行为 E-mail:wxw_mail@163.com
  • 作者简介:王晰巍,女,教授,博士生导师,研究方向为大数据应用、舆情分析和信息行为,Email:wxw_mail@163.com;邢云菲,女,博士研究生,研究方向为社交网络、舆情分析。
  • 基金资助:
    国家社科基金重大项目“大数据驱动的社交网络舆情主题图谱构建及调控策略研究”(18ZDA310)。

Research on the Topic Model Construction of Sentiment Classification of Public Opinion Users in Social Networks Driven by Big Data——Taking “Immigration” as the Topic

Wang Xiwei1,2 Xing Yunfei1 Wei Yanan1 Wang Duo1   

  1. 1.School of Management, Jilin University, Changchun 130022;2. Big Data Management Research Center, Jilin University, Changchun 130022
  • Received:2019-10-08 Online:2020-01-26 Published:2020-01-26

摘要: 基于卷积神经网络构建大数据驱动的社交网络舆情用户情感主题分类模型,通过爬虫分别获取微博和Twitter用户针对热点主题“移民”的情感文本内容,利用Word2Vector训练中文词向量,GloVe训练英文词向量,使用NLPIRBosonNLP工具进行分词,构建基于“移民”主题的用户情感语料库,通过CNN卷积神经网络对情感分类进行训练和测试,并将分类结果与TimeLSTMSVM的分类结果进行对比以验证CNN分类的优越性。数据结果表明,所构建的模型能够实现有效的多语言环境下中英文文本分类,通过合理设置激活函数和相关参数能够优化提高模型分类准确度,相较传统机器学习具有一定的优越性。在处理“移民”话题的文本分类上,CNN分类效果优于TimeLSTM模型。研究为跨语言的社交网络舆情用户情感主题图谱的可视化分析提供了前期的研究框架。

关键词: 卷积神经网络, 社交网络, 情感分类, 主题模型, 舆情监测, 用户研究

Abstract: Based on convolutional neural network, this paper build topic model on sentiment classification of public opinion users in social networks driven by big data. Weibo and Twitter users’ content data are extracted respectively through the crawler on hot topic of Immigration. Word2Vector is used to train Chinese word and GloVe to train English word vector. NLPIR and BosonNLP tools are used to participle and corpus of users sentiment is built based on the Immigration topic. Finally, sentiment classification is trained by CNN neural network. The results were compared with TimeLstm and SVM to verify the superiority of CNN classification. Results show that the proposed model can achieve effective text classification in a multilanguage environment including Chinese and English. The accuracy of model can be optimized by properly setting the activation functions and related parameters. It shows the model proposed is superior to traditional machine learning. In terms of text classification on the topic of Immigration, the classification effect of CNN is better than that of TimeLstm model. The study of this paper provides a preliminary research framework for the visualization analysis of the sentiment knowledge map of public opinion users in crosslanguage social networks.

Key words:

Convolutional neural network, Social networks, Sentiment classification; Topic model, Public opinim monitoring, User research

中图分类号: