Clarifying Learner Englishes From Greater China Using Native Language Identification — A Pilot Study

Authors

  • Xiaoyun Li University of Szeged

DOI:

https://doi.org/10.17507/tpls.1206.04

Keywords:

learner English, greater China, native language identification, spoken monologue, written essays

Abstract

The purpose of this paper is to identify the characteristics of learner Englishes from the three major regions of Greater China, namely, Mainland China, Hong Kong, and Taiwan. To achieve this aim, a comparative study is conducted into the three learner Englishes via Native Language Identification (NLI). The average identification accuracy yielded in this study is 60 % on spoken monologues and 59.8 % on written essays. With these two satisfactory accuracies, this paper profiles the three learner Englishes by probing into their best-identifying indicators. The results show that learner English from Mainland China are characteristic for high degree of collectivistic involvement and uncertainty, low informativeness, and underuse of conjunctions; learner English from HKG is highly informative and impersonal; the two types of learner English from Taiwan are similar in that they share an individualistically involved style but differ in that the English essays by Taiwan L2 learners are found to be high on uncertainty and negation but low on informativeness and the usage of conjunctions..

Author Biography

Xiaoyun Li, University of Szeged

Department of Theoretical Linguistics

References

Biber, D. (1988). Variation across speech and writing. Cambridge: Cambridge University Press.

Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (2007). Longman grammar of spoken and written English (6th version). London: Longman.

Chafe, W. (1985). Linguistic differences produced by differences between speaking and writing. Literacy, language, and learning: The nature and consequences of reading and writing, 105, 105-123.

Crystal, D. (2011). Foreword. In A. Feng (Eds.), English language in education and societies across greater China (pp. xi- xii). Bristol: St Nicholas House.

Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.

Dadkhah, A., Harizuka, S., & Mandal, M. K. (1999). Pattern of social interaction in societies of the Asia–Pacific region. The Journal of social psychology, 139(6), 730-735.

Feng, A. (2012). Spread of English across greater China. Journal of Multilingual and Multicultural Development, 33(4), 363-377.

del Río, I., Zampieri, M., Malmasi, S. (2018). A Portuguese native language identification dataset. In Proceedings of the thirteenth workshop on innovative use of NLP for building educational applications (pp. 291-296).

Gilquin, G., & Paquot, M. (2008). Too chatty: Learner academic writing and register variation. English Text Construction, 1(1), 41-61.

Grant, L., & Ginther, A. (2000). Using computer-tagged linguistic features to describe L2 writing differences. Journal of second language writing, 9(2), 123-145.

Håkansson, G., & Norrby, C. (2010). Environmental influence on language acquisition: Comparing second and foreign language acquisition of Swedish. Language learning, 60(3), 628-650.

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: an update. ACM SIGKDD explorations newsletter, 11(1), 10-18.

Hofstede, G. 1984. Culture’s Consequences: International Differences in Work-related Values. Beverly Hills, CA: Sage.

Hyland, K. (2002a). Authority and invisibility: Authorial identity in academic writing. Journal of pragmatics, 34(8), 1091-1112.

Hyland, K. (2002b). Options of identity in academic writing. ELT journal, 56(4), 351-358.

Íñigo-Mora, I. (2004). On the use of the personal pronoun we in communities. Journal of Language and Politics, 3: 27-52.

Ishikawa, S. I. (2013). The ICNALE and sophisticated contrastive interlanguage analysis of Asian learners of English. Learner corpus studies in Asia and the world, 1(1), 91-118.

Ishikawa, S. I. (2014). Design of the ICNALE-Spoken: A new database for multi-modal contrastive interlanguage analysis. Learner corpus studies in Asia and the world, 2, 63-76.

Jarvis, S., & Paquot, M. (2015). Learner corpora and native language identification. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus Research (pp. 605–628). Cambridge: Cambridge University Press.

Kobayashi, Y., & Abe, M. (2016). A corpus-based approach to the register awareness of Asian learners of English. Journal of Pan-Pacific Association of Applied Linguistics, 20(2), 1-17.

Koppel, M., Schler, J., & Zigdon, K. (2005). Automatically determining an anonymous author’s native language. In International Conference on Intelligence and Security Informatics (pp. 209-217). Springer, Berlin, Heidelberg.

Liu, Q., & Y. Miao. A corpus-based study on connective use in oral English by Chinese science and engineering majors. Foreign Language World, 32(5), 16-23.

Malmasi, S. (2016). Native language identification: explorations and applications [Unpublished Doctoral thesis]. Macquarie University.

Malmasi, S., & Dras, M. (2014a). Arabic native language identification. In Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP) (pp. 180-186).

Malmasi, S., & Dras, M. (2014b). Chinese native language identification. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers (pp. 95-99).

Malmasi, S., & Dras, M. (2014c). Finnish native language identification. In Proceedings of the Australasian Language Technology Association Workshop 2014 (pp. 139-144).

Malmasi, S., Dras, M., & Temnikova, I. (2015). Norwegian native language identification. In Proceedings of the International Conference Recent Advances in Natural Language Processing (pp. 404-412).

Mu, C., & Carrington, S. (2007). An investigation of three Chinese students' English writing strategies. The Electronic Journal for English as a Second Language, 11(1), 1-23.

Nisioi, S. (2015). Feature analysis for native language identification. In International Conference on Intelligent Text Processing and Computational Linguistics (pp. 644-657). Springer, Cham.

Perkins, R. (2015). Native language identification (NLID) for forensic authorship analysis of weblogs. In M. Dawson, & M. Omar (Eds.), New threats and countermeasures in digital crime and cyber terrorism (pp. 213-234). IGI Global.

Quirk, R., Greenbaum, S., Leech, G., & Svartvik, J. (1985). A comprehensive grammar of the English language. London & New York: Longman.

Straka, M., & Straková, J. (2017). Tokenizing, POS tagging, lemmatizing and parsing ud 2.0 with udpipe. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies (pp. 88-99).

Tarone, E. (2007). Sociolinguistic approaches to second language acquisition research (1997–2007). The modern language journal, 91, 837-848.

Tarone, E. (2012). Interlanguage. In K. Brown (Eds.), The encyclopedia of language and linguistics (pp. 747–752). Boston, MA: Elsevier.

Vincze, V. (2013). Weasels, Hedges and Peacocks: Discourse-level Uncertainty in Wikipedia Articles. In Proceedings of the Sixth International Joint Conference on Natural Language Processing (pp. 383–391). Nagoya, Japan. Asian Federation of Natural Language Processing.

Downloads

Published

2022-06-01

Issue

Section

Articles