The Application of NLTK Library for Python Natural Language Processing in Corpus Research

Authors

  • Meng Wang Jining Medical University
  • Fanghui Hu Jining Medical University

DOI:

https://doi.org/10.17507/tpls.1109.09

Keywords:

corpus, python, natural language processing, NLTK

Abstract

Corpora play an important role in linguistics research and foreign language teaching. At present, the relevant research on the corpus in China mainly uses WordSmith, Antconc and other retrieval tools. NLTK library, which is based on Python language, can provide more flexible and rich research methods, and it can use unified data standards to avoid the trouble of various data type conversion. At the same time, with the help of Python’s numerous third-party libraries, it can make up for the shortcomings of other tools in syntax analysis, graphic rendering, regular expression retrieval and other aspects. In terms of the main links in corpus research, such as text cleaning, word form restoration, part of speech tagging and text retrieval statistics, this paper takes the US presidential inaugural speech in the corpus as an example to show how to use this tool to process the language data, and introduces the application of Python NLTK library in corpus research.

Author Biographies

Meng Wang, Jining Medical University

School of Medical Information Engineering

Fanghui Hu, Jining Medical University

School of Foreign Languages

References

Deng Qingqiong, Peng Weiming, Yin Gan. (2017). A case study of practical word frequency statistics in Python teaching. Computer Education 12, 20-27.

Feng Min. (2020). Research on corpora in College English Grammar Teaching. China Journal of Multimedia & Network Teaching 11, 150-152.

Kambhampati Kalyana Kameswari, J Raghaveni, R. Shiva Shankar, Ch. Someswara Rao. (2019). Predicting Election Results using NLTK. International Journal of Innovative Technology and Exploring Engineering 9.1, 4519-4529.

Li Junfei. (2019). Research on the Application of Natural Language Processing Toolkit in College English Teaching. Education Modernization 92, 136-137.

Li Chen, Liu Weiguo. (2019). Chinese Text Information Extraction Based on NLTK. Computer Systems & Applications 28.1, 275−278.

Liu Xu. (2015). The Application of NLTK Toolkit Based on Python in Corpus Research. Journal of Kunming Metallurgy College 31.5, 65-69.

Liu Weiguo, Li Chen. (2019). Case design of NLTK module application in Python programming teaching. Computer Education 3, 92-97.

Steven B, Ewan K, Edward L. (2009). NLTK Natural Language Processing with Python. California: O 'Reilly Media.

Wiebke Wagner. (2010). Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit. Language Resources and Evaluation 44.4, 421-424.

Downloads

Published

2021-09-01

Issue

Section

Articles