A Corpus-Based Approach in Vocabulary Research: Defining the Word of the Year 2023 in Kazakh
DOI:
https://doi.org/10.17507/tpls.1503.02Keywords:
The Word of the Year (WOTY), Kazakh, corpus, vocabulary, frequency indicatorAbstract
The Word of the Year (WOTY) is an event held in various countries and regions to determine the most relevant, significant, and popular words and expressions that reflect not only the linguistic but also the socio-cultural aspects of the country. This paper aims to identify the most frequently used words/phrases in Kazakh for 2023 to be nominated for the WOTY title. The research methods include media discourse analysis and quantitative analysis using a corpus-based approach. A computer program, #LancsBox 6.0, generated a dataset—a research corpus consisting of 500 texts published on Kazakh news platforms throughout 2023. The results indicated that: 1) the conjunction “jáne” [and] had the highest frequency and occurrence in the research corpus; 2) the extracted words with high frequency indicators might serve as candidates for WOTY 2023, such as “Kazakhstan”, “jana” [new], “jyly” [year], “kerek” [need], “jumys [work]”; 3) WOTY “artificial intelligence” named by other global sources showed a high frequency indicator in Kazakh media texts. The study contributed with the generated corpus of media texts in Kazakh for 2023. The significance of our study is highlighted by the pioneering linguistic assessment in Kazakh language, which involves the analysis of media discourse publications based on corpus outcomes.
References
Abibullayeva, A., & Çetin, A. (2022). Keyword Extraction from Kazakh News Dataset with BERT. El-Cezeri Journal of Science and Engineering, 9(4), 1193–1200. https://doi.org/10.31202/ecjse.1131826
Aitova, N., & Ospanova, D. (2024). Verb-based emotive structures in the linguistic corpus base. Bulletin of Toraighyrov University. Philology series, 1, 55-69. https://doi.org/10.48081/ELDM2166
Almaty Corpus of Kazakh Language. (n. d.). Retrieved May 29, 2024, from http://web-corpora.net/.
Brezina, V., Weill-Tessier, P., & McEnery, T. (n. d.). LancsBox 6.0. Retrieved November 10, 2021, from http://corpora.lancs.ac.uk/lancsbox.
Brezina, V. (2018). Statistics in Corpus Linguistics. Cambridge University Press.
Buriakovskaia, V. A., & Dmitrieva, O. A. (2017). Lingvokulturnye harakteristiki “Slova Goda” [Linguistic and cultural characteristics of the "Word of the Year"]. İzvestia VGPU. Filologişeskie nauki, 2, 101-105.
Cambridge Dictionary. (n. d.). Retrieved June 9, 2024, from https://dictionary.cambridge.org/editorial/word-of-the-year.
Collins Dictionary. (n. d.). Retrieved June 9, 2024, from https://www.collinsdictionary.com/.
Jubanov, A. Q., Janabekova, A. A., Toqmyrzaev, D., & Otegenova, B. J. (2020). Qazaq auyzşa til mətinderiniñ jiilik sözdigi [Frequency Dictionary of Kazakh oral language texts]. Eltanym baspasy.
Kazakh Speech Corpus 2. (n. d.). Retrieved May 29, 2024, from https://issai.nu.edu.kz/.
Makhambetov, O., Makazhanov, A., Yessenbayev, Zh., Matkarimov, B., Sabyrgaliyev, I., & Sharafudinov, A. (2013). Assembling the Kazakh Language Corpus [Conference session]. In Proceedings of Empirical Methods in Natural Language Processing (pp. 1022–1031). Seattle, WA, USA. https://doi.org/10.13140/2.1.5127.4882
Martseva, T. A., Snisar, A. Yu., Kobenko, Yu. V., & Girfanova K. A. (2018). Neologisms in American electronic mass media. In A. Filchenko & Z. Anikina (Ed.), Linguistic and Cultural Studies: Traditions and Innovations (2nd ed., pp. 266–274). Cham.
Mastrantuono, A., & Regan, B. (2024). Present perfect and preterit variation in the Spanish of Lima and Mexico City: Findings from a corpus analysis. Corpus Linguistics and Linguistic Theory, 20(2), 375–405. https://doi.org/10.1515/cllt-2022-0060
McEnery, T., & Hardie, A. (2012). Corpus Linguistics: Method, theory and practice. Cambridge University Press.
Melnik, Yu. A. (2016). Sotsial'no-lingvisticheskie proekty "Slovo goda" - 2015: spetsifika i versii [Socio-linguistic projects "Word of the year" - 2015: specifics and versions]. Bulletin of Omsk State Pedagogical University. Humanities research, 2(11), 56-57.
Merriam-Webster. (n. d.). Retrieved June 9, 2024, from https://www.merriam-webster.com/.
Moscow Times. (n. d.). Retrieved June 9, 2024, from https://www.themoscowtimes.com/.
Myrzakhmetov, B., & Kozhirbayev, Zh. (2018). Extended language modeling experiments for Kazakh. CEUR Workshop Proceedings, 2303-2315. Retrieved May 10, 2024, from https://www.academia.edu/117671980/Extended_language_modeling_experiments_for_Kazakh.
National Corpus of the Kazakh Language. (n. d.). Retrieved May 29, 2024, from https://v2.qazcorpus.kz/.
National Scientific and Practical Center "Til-Qazyna. (n. d.). Retrieved June 9, 2024, from https://tilqazyna.kz/.
Nikolaeva, E. V. (2017). “Slova goda” kak lingvokul’turnye koncepty [“Words of the Year” as linguocultural concepts]. Philology, Theory & Practice, 10(1), 154–157.
Nugumanova, A., & Mansurova, M. (2019). Tabigi til matinderindegi terminderdi avtomatti turde tanu [Automatic recognition of terms in natural language texts]. Oskemen.
Ormanova, A. B., & Anafinova, M. L. (2022). A Linguistic Interference in Information Space Terms: A Corpus-Based Study in Kazakh. Theory and Practice in Language Studies, 12(12), 2497-2507. https://doi.org/10.17507/tpls.1212.04
Panda, A. K. (2024). Words of the Year 2023. Journal of English Language Teaching, 66(2), 2-5.
Peterson, M. (2013). Computer Games and Language Learning. TESL Canada Journal, 33(1), 90-108.
Pushkin Institute of the Russian Language. (n. d.). Retrieved June 9, 2024, from (https://slovogoda.pushkin.institute/
Qajybek, E. Z., & Fazyljanova, A. M. (2016). Jalpy bılım berudegı qazaq tılınıñ jiılık sözdıgı [The frequency dictionary of the Kazakh language in general education]. Almaty.
Rakhimova, D. R., Kassymova, D. T., & Isabaeva, D. N. (2021). Qazaq tiline arnalgan BERT modeli negizinde suraq-jauap juyesin zertteu jane azirleu [Research and development of a question-and-answer system based on the BERT model for the Kazakh language]. Bulletin of the Abai KazNPU. Physical and mathematical sciences, 4(76), 119-127. https://doi.org/10.51889/2021-4.1728-7901.16
Ryabova, M. Yu., & Sergeichik, T. S. (2019). Word of the Year as a Cultural Concept in Media Discourse. In Z. Anikina (Ed.) The Abel Prize (pp. 325–332). Springer Nature. https://doi.org/10.1007/978-3-030-11473-2
Stepanov, Y. S. (2007). Koncepty. Tonkaya Plyonka Civilizacii [Concepts. A Thin Tape of Civilization]. Yazyki slavyanskikh kul’tur.
Stubbs, M. (2001). Words and Phrases: Corpus Studies of Lexical Semantics. Blackwell.
Sub-Corpus of the National Corpus of the Kazakh Language. (n. d.). Retrieved May 29, 2024, from https://qazcorpora.kz/.
Zimmer, B. (n. d.). A brief history of the Word of the Year. Oxford Languages Blog. Retrieved May 15, 2024, from https://languages.oup.com/word-of-the-year/word-of-the-year-a-brief-history/.