信息资源管理学报 ›› 2021, Vol. 11 ›› Issue (6): 105-115.doi: 10.13365/j.jirm.2021.06.105

• 研究论文 • 上一篇    下一篇

面向中文电子病历文书的医学命名实体识别研究 ——一种基于半监督深度学习的方法

景慎旗1,2,3 赵又霖1   

  1. 1.南京大学信息管理学院,南京,210023;
    2.南京医科大学生物医学工程与信息学院,南京,211166;
    3.南京医科大学第一附属医院(江苏省人民医院)数据应用管理中心,南京,210096
  • 出版日期:2021-11-26 发布日期:2022-01-18
  • 作者简介:景慎旗(通讯作者),博士生,Email: jingshenqi@jsph.org.cn,研究方向为信息资源管理,知识组织,知识服务,数据挖掘;赵又霖,副教授,研究方向为知识组织。
  • 基金资助:
    国家重点研发计划项目“重大慢性非传染性疾病防控研究”重点专项“糖尿病信息化管理平台与传播体系创建及示范应用”(2018YFC1314900);江苏省重点研发计划“重大慢性病综合防控体系构建与示范”(BE2020721)。

Recognizing Clinical Named Entity from Chinese Electronic Medical Record Texts Based on Semi-Supervised Deep Learning

Jing Shenqi1,2,3  Zhao Youlin1   

  1. 1.School of Information Management, Nanjing University, Nanjing, 210023;
    2.School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, 211166;
    3.Center for Data Management, The First Affiliated Hospital of Nanjing Medical University (Jiangsu Province Hospital), Nanjing, 210096
  • Online:2021-11-26 Published:2022-01-18

摘要: 电子病历文书详细记录患者诊疗全过程,蕴藏的医学知识是电子病历中最丰富的,因此挖掘电子病历文书潜在的知识结构具有十分重要的价值。面向非结构化电子病历知识挖掘的首要工作是命名实体识别,现有的医学领域命名实体识别方法面临标注数据质量偏低、标注数据不足的问题,同时现有方法中均只考虑文本的序列特性,忽略文本中词间、字间的依赖关系,限制了命名实体识别效果。本文提出一种基于半监督深度学习的医学命名实体识别方法,即结合具有专家权威的中文百科半自动化实体标注法及BERT-GCN-CRF框架,对电子病历文本进行医学命名实体识别抽取。以真实电子病历文本作为实验对象,该模型获取的准确率、召回率、F1值均有明显提高,其中P、R和F1综合平均值分别为84.6%、84.0%和84.2%,同时人工标注工作量显著减少。本文提出的方法对电子病历的非结构化文本挖掘工作有重要意义。

关键词: 医学命名, 实体识别, 电子病历文书, 知识挖掘, 半监督深度学习, BERT-GCN-CRF

Abstract: The electronic medical record document records the entire process of patient diagnosis and treatment in detail. Medical knowledge is the most abundant in the electronic medical record. Therefore, it is of great value to explore the potential knowledge structure of the electronic medical record document. The primary task for knowledge mining of unstructured electronic medical records is named entity recognition. The existing named entity recognition methods in the medical field face the problems of low quality and insufficient annotation data. At the same time, the existing methods only consider the sequence characteristics of the text, and ignore the dependence between words and characters in the text, which limits the effect of named entity recognition. This paper proposes a medical named entity recognition method based on semisupervised deep learning, which combining the semiautomatic entity annotation method of Chinese encyclopedia with expert authority and the BERT-GCN-CRF framework to perform medical named entity recognition and extraction on electronic medical record text. Taking the real electronic medical record text as the experimental object, the accuracy, recall, and F1 value obtained by this model are significantly improved, and the comprehensive average of P, R and F1 are 84.6%, 84.0% and 84.2%, respectively. At the same time, the workload of manual labeling is significantly reduced. The new method is of great significance to the unstructured text mining of electronic medical records.

Key words: Medical named entity recognition, Electronic medical records, Knowledge mining, Semi-supervised deep learning, BERT-GCN-CRF

中图分类号: