信息资源管理学报 ›› 2023, Vol. 13 ›› Issue (6): 99-109,124.doi: 10.13365/j.jirm.2023.06.099

• 研究论文 • 上一篇    下一篇

突发事件情境下社交媒体用户的言语行为分类研究

孙冉1 安璐1,2   

  1. 1.武汉大学信息管理学院,武汉,430072; 
    2. 武汉大学信息资源研究中心,武汉,430072
  • 出版日期:2023-11-26 发布日期:2023-12-29
  • 作者简介:孙冉,博士研究生,研究方向为网络舆情分析;安璐(通讯作者),博士,教授,博士生导师,研究方向为网络数据分析、应急情报研究,Email:anlu97@163.com。
  • 基金资助:
    本文系国家自然科学基金面上项目“危机情境下网络信息传播失序识别与干预方法研究”(72174153)与国家自然科学基金创新研究群体项目“信息资源管理”(71921002)的研究成果之一。

The Speech Acts Classification of Social Media Users in the Context of Public Emergencies

Sun Ran1 An Lu1, 2   

  1. 1.School of Information Management, Wuhan University, Wuhan, 430072; 
    2. Center for Studies of Information Resources, Wuhan University, Wuhan, 430072
  • Online:2023-11-26 Published:2023-12-29

摘要: 言语行为的自动分类有助于理解社交媒体用户话语的意图和行为,从而有效刻画舆情态势。本研究基于言语行为理论,对社交媒体用户在文本中表达的意图进行了细粒度的分类,对与疫苗相关的四千条推文进行手动注释,基于用户特征、时间特征、文本向量特征、主题特征、情感特征等,采用逻辑回归、随机森林、XGBoost等机器学习方法以及BERT和神经网络模型的组合方法,构建并评估突发事件情境下社交媒体用户言语行为分类模型,随后采用SHAP解释方法进行特征重要性排序,并利用非参数检验方法Kruskal-Wallis检验对不同言语行为的情感、影响力等差异性进行检验。基于XGBoost模型的言语行为分类准确度达到0.792,优于其他基线模型。文本向量特征在言语行为识别中的重要性最高。不同的推文言语行为在转发数上没有显著差别,在点赞数和情感特征上具有显著差异。

关键词: 言语行为, 主题分析, 突发事件, XGboost, 情感分析

Abstract: Automatic classification of speech acts can help to understand the intentions and behaviors of social media users' discourse and effectively uncover public opinion. Based on the Speech Act Theory, we provide a detailed classification of social media users' intentions expressed in tweets. We manually annotate four thousand vaccine-related tweets and extract user features, temporal features, text vector features, topic features, sentiment features, etc. Then, machine learning methods such as logistic regression, random forest, XGBoost, and a combination of BERT and neural network models are used to construct speech act classification models in the context of public emergencies. The SHAP interpretation method is used to rank the importance of features. Finally, we use the Kruskal-Wallis test to evaluate the differences in sentiment and the impact of various speech acts. The accuracy of speech act classification based on the XGBoost model reaches 0.792, which is better than baseline models. Text vector features have the highest importance in speech act classification. There is no significant difference in the number of retweets among different tweet speech acts, while there are significant differences in the number of likes and sentiment features.

Key words: Speech act, Topic analysis, Public emergencies, XGboost, Sentiment analysis

中图分类号: