科研人员Web数据自动抓取模式及其开源解决方案

doi:10.13365/j.jirm.2015.02.021

Journal of Information Resources Management ›› 2015, Vol. 5 ›› Issue (2): 21-27.doi: 10.13365/j.jirm.2015.02.021

Previous Articles Next Articles

The Mode of Automatically Crawling Web Data and its Open Source Solutions for Researchers

Zhang Tingting　Liu Kai　Wang Weijun

Received:2014-09-02 Online:2015-04-26 Published:2015-04-26

Abstract

Abstract:

In Big Data era, the quantity and quality of data which usually determines the quality of research findings as well as the whole project’s success is becoming the key factor in scientific competition. However, taking the issue of automatically crawling web data into consideration, there is not yet a systematic academic research. To address this issue, this paper carries out an analysis of the basic patterns that web crawling emerges and presents four basic web crawling modes of researchers: single site static crawl mode, cross-site static crawl mode, single site dynamic crawl mode and cross-site dynamic crawl mode. In the meantime, this paper introduces two kinds of method to solve the problem based on the architecture of open source: the open-source crawlers and researchers’ own custom reptile. Finally, this paper gives a detailed discussion of the software architecture and the basic code of each solution.

Key words: Researcher, 　Web crawler, 　Technical solution, 　Open source software

CLC Number:

TP311.5

Zhang Tingting　Liu Kai　Wang Weijun. The Mode of Automatically Crawling Web Data and its Open Source Solutions for Researchers[J]. Journal of Information Resources Management, 2015, 5(2): 21-27.

[1]	Xiao Peng　Zheng Weinan. Where Do the Best Minds Assemble: The Employment Mobility of the First Generation of LIS Professionals of the People’s Republic of China [J]. Journal of Information Resources Management, 2023, 13(4): 22-34.
[2]	Zhang Ying　Qi Jinglin　Sun Yuwei. Characteristics of Management Science Researchers’ Data Reuse Behaviors [J]. Journal of Information Resources Management, 2020, 10(4): 79-87.
[3]	. [J]. Journal of Information Resources Management, 2011, 1(3): 53-56.

The Mode of Automatically Crawling Web Data and its Open Source Solutions for Researchers

PDF

Knowledge

Cited

Abstract

Cite this article

share this article

References

Related Articles 3

Recommended Articles

Metrics

Comments