信息资源管理学报 ›› 2021, Vol. 11 ›› Issue (3): 4-17.doi: 10.13365/j.jirm.2021.03.004

• 特稿 •    下一篇


黄水清1,2 王东波1,2   

  1. 1.南京农业大学信息管理学院,南京,210095; 
  • 出版日期:2021-05-26 发布日期:2021-06-22
  • 作者简介:黄水清(通信作者),教授,博导,研究方向为文本信息处理与检索、数字图书馆、信息计量,Email:;王东波,教授,博导,研究方向为自然语言处理与文本挖掘、信息计量。

Review of Corpus Research in China

Huang Shuiqing1,2 Wang Dongbo1,2   

  1. 1.College of Information Management, Nanjing Agricultural University, Nanjing,210095; 
    2.Research Center for Humanities and Social computing, Nanjing Agricultural University, Nanjing,210095
  • Online:2021-05-26 Published:2021-06-22

摘要: 随着大数据和人工智能技术的深入发展,语料库研究得到越来越多的关注和重视。从最开始的面向语言学研究的言语材料集合到如今支撑知识挖掘和发现的深度标注知识资源,语料库及相关研究在深度和广度两方面都得到了充分的探索。本文以国内期刊文献为对象,首先从定量角度分析了我国语料库研究的发文趋势、作者合作态势以及各时代研究热点,然后从定性角度详细梳理并探讨了国内语料库的构建和标注的方法、流程和策略,并阐述了语料库在语言教学、信息检索等领域的应用现状,最后全面梳理了国内具有代表性的各类语料库,并对其建设和发展特点进行了总结和概括。

关键词: 语料库, 语料库构建, 平行语料库, 语料标注, 知识挖掘

Abstract: With the in-depth development of big data and artificial intelligence technology, corpus research has received more and more attention. From the collection of language materials for linguistic research to the in-depth annotation knowledge resources supporting knowledge mining and discovery, the depth and breadth of the corpus related research have also been fully explored. This article takes the domestic journal literature as the object. First, we analyze the publishing trend of domestic corpus research, the author's cooperation and the research hotspots of different eras from a quantitative perspective. Secondly, we discuss the processes and methods of the construction and annotation of domestic corpus from a qualitative perspective. We explain the application status of corpus including the fields of language teaching, information retrieval, etc. Finally, we comprehensively discuss various representative corpus in China, and summarize the construction and development characteristics.

Key words: Corpus, Corpus construction, Corpus application, Representative corpus, Knowledge mining
