信息资源管理学报 ›› 2024, Vol. 14 ›› Issue (5): 75-90.doi: 10.13365/j.jirm.2024.05.075

• 研究论文 • 上一篇    下一篇

基于扎根理论和机器学习的隐私政策诱导同意研究

陈梦蕾1 罗颖嘉2 朱侯1   

  1. 1.中山大学信息管理学院,广州,510006;
    2.南洋理工大学数理科学院,新加坡,637371
  • 出版日期:2024-09-26 发布日期:2024-10-15
  • 作者简介:陈梦蕾,硕士研究生,研究方向为信息资源语义分析;罗颖嘉,硕士研究生,研究方向为信息资源语义分析;朱侯(通讯作者),博士,副教授,硕士生导师,研究方向为隐私管理、计算机模拟,Email:zhuhou3@mail.sysu.edu.cn。
  • 基金资助:
    本文系教育部人文社会科学研究一般项目“人群-算法互动的智媒舆论演化机制及风险控制”(23YJC630270)及国家自然科学基金青年项目“基于计算实验的社会化媒体隐私多源互动泄露机理研究”(71801229)的研究成果之一。

Induced Consent Analysis of Privacy Policy Based on Grounded Theory and Machine Learning

Chen Menglei1 Luo Yingjia2 Zhu Hou1   

  1. 1.Information Management College, Sun Yat-Sen University, Guangzhou,510006;
    2.School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, 637371
  • Online:2024-09-26 Published:2024-10-15
  • About author:Chen Menglei,master candidate,specializing in semantic analysis of information resources;Luo Yingjia,master candidate,specializing in semantic analysis of information resources;Zhu Hou(corresponding author) ,Ph.D., associate professor and master’s supervisor, specializing in privacy management and computer simulation,Email:zhuhou3@mail.sysu.edu.cn.
  • Supported by:
    This is an outcome from "Public Opinion Evolution Mechanism and Risk Control of Smart Media Based on Crowd-Algorithm Interaction"(23YJC630270) funded by Ministry of Education in China Project of Humanities and Social Sciences and "Research on Mechanism of Social Media Privacy Leakage from Multi-sources and Their Interaction Based on Computational Experiments"(71801229) funded by National Natural Science Foundation of China.

摘要: 从用户角度分析隐私政策的诱导同意倾向、探索诱导同意机制,有利于在帮助用户辨别不公隐私条款的同时,为监管部门规范APP隐私政策制定提供指导。研究采用扎根理论从用户视角分析隐私政策的诱导同意倾向,归纳构建隐私政策诱导同意编码体系,人工标注语料后通过半监督学习训练K-BERT模型,实现隐私政策中含诱导同意倾向语句的自动化识别,并通过进一步的网络分析、序列模式挖掘探究隐私政策诱导用户同意的特征及深层规律。本研究提出的模型实现了自动化识别隐私政策诱导同意语句的目标,并通过实证分析发现,用户机会成本、隐私管理成本、模糊概念处于诱导维度关系网络的核心,其中模糊概念和推卸责任语句在隐私政策构成模式化诱导性行文中发挥重要作用,通常先后密集地出现在其他不公平语句后;儿童领域与其他领域APP隐私政策的诱导同意特征间存在显著差异,且部分领域隐私政策间存在一些共性特征,可能与其服务提供方式与商业逻辑的相似性有关。

关键词: 隐私政策, 诱导同意, 扎根理论, K-BERT, 网络分析, 序列模式挖掘

Abstract: Analyzing privacy policies from the user’s perspective to understand the tendency for induced consent is beneficial in helping users identify unfair terms and providing regulatory authorities with guidance to standardize app privacy policies. This study uses grounded theory to examine the tendency of induced consent in privacy policies from the user’s perspective and develops a coding system for such tendencies. After manually annotating the corpus, we trained a K-BERT model using semi-supervised learning to achieve the automated identification of statements with a tendency to induce consent within privacy policies. Moreover, further network analysis and sequence pattern mining were conducted to explore the characteristics and underlying patterns of user consent induction in privacy policies. Empirical analysis reveals that user opportunity costs, privacy management costs, and fuzzy concepts are central to the network of inducing dimensions. Fuzzy concepts and responsibility-shifting statements play a crucial role in the patterned inductive writing of privacy policies, usually appearing densely following other unfair statements. Furthermore, the study identifies significant differences in the features of induced consent between the children's domain and other domains. Some common features exist among privacy policies across specific domains, potentially linked to similarities in service delivery and business logic.

Key words: Privacy policy, Induced consent, Grounded theory, K-BERT, Network analysis, Sequential pattern mining

中图分类号: