Evaluating Large Language Models and Neural Machine Translation Systems in Translating Liaozhaizhiyi: A Cross-Cultural Literary Study

Authors

  • Jing Zhao Universiti Sains Malaysia
  • Mozhgan Ghassemiazghandi Universiti Sains Malaysia
  • Shaidatul Akma Adi Kasuma Universiti Sains Malaysia

DOI:

https://doi.org/10.17507/tpls.1605.34

Keywords:

ChatGPT, Neural machine translation, Google translate, Youdao translate, Liaozhaizhiyi, Chinese classical literature, cross-cultural communication

Abstract

Neural machine translation has demonstrated strong performance in high-resource languages and commercial translation contexts. However, its effectiveness in translating classical Chinese literature remains insufficiently examined. This study conducts a comparative evaluation of three translation systems—ChatGPT, Google Translate, and Youdao Translate—using selected texts from Liaozhaizhiyi as the research corpus. The analysis focuses on four dimensions: semantic alignment, measured by BLEU scores; translation fluency; stylistic fidelity; and the detectability of machine-generated translation patterns. The results indicate that ChatGPT achieves superior performance in stylistic fidelity, particularly in preserving poetic tone, as well as in semantic alignment when translating idiomatic expressions and culturally embedded references. In addition, translations produced by ChatGPT exhibit fewer mechanical artefacts commonly associated with neural machine translation outputs. Further experiments demonstrate that structured prompt engineering strategies contribute to improved literary naturalness and greater cultural coherence in the translated texts. These findings suggest that large language models offer notable advantages in the translation of classical literary works and provide empirical insights into the role of artificial intelligence in facilitating cross-cultural interpretation and the international transmission of Chinese literary traditions.

Author Biographies

Jing Zhao, Universiti Sains Malaysia

School of Languages, Literacies and Translation

Mozhgan Ghassemiazghandi, Universiti Sains Malaysia

School of Languages, Literacies and Translation

Shaidatul Akma Adi Kasuma, Universiti Sains Malaysia

School of Languages, Literacies and Translation

References

Aharoni, R., Koppel, M., & Goldberg, Y. (2014). Automatic Detection of Machine-translated Text and Translation Quality Estimation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 289–295). https://doi.org/10.3115/v1/P14-2048

Ahrenberg, L. (2017). Comparing Machine Translation and Human Translation: A Case Study. In Proceedings of the Workshop on Human-Informed Translation and Interpreting Technology (pp. 21–28). Association for Computational Linguistics: Copenhagen, Denmark. Retrieved December 29, 2025, from https://aclanthology.org/W17-7903/

Akabli, J., & Khaloufi, R. (2024). Translating identity in Leila Abouzeid’s Return to Childhood. AWEJ for Translation & Literary Studies, 8(2), 2–17. https://doi.org/10.24093/awejtls/vol8no2.1

Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint, arXiv:1409.0473. https://arxiv.org/abs/1409.0473

Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278. https://doi.org/10.1016/j.jml.2012.11.001

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., & Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901. https://arxiv.org/abs/2005.14165

Castilho, S., Moorkens, J., Gaspari, F., Calixto, I., Tinsley, J., & Way, A. (2017). Is neural machine translation the new state of the art? Prague Bulletin of Mathematical Linguistics, 108, 109–120. https://doi.org/10.1515/pralin-2017-0013

Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1724–1734). https://doi.org/10.3115/v1/D14-1179

Christiano, P. F., Leike, J., Brown, T. B., Martic, M., Legg, S., & Amodei, D. (2017). Deep reinforcement learning from human preferences. Advances in Neural Information Processing Systems, 30, 4299–4307. https://arxiv.org/abs/1706.03741

Drobot, I.-A. (2021). Translating literature using machine translation: Is it really possible? Scientific Bulletin of the Politehnica University of Timișoara: Transactions on Modern Languages, 20(1), 57–64. https://doi.org/10.59168/FUAP6124

Dankers, V., Lucas, C. G., & Titov, I. (2022). Can transformers be too compositional? Analysing idiom processing in neural machine translation. arXiv preprint, arXiv:2205.15301. https://doi.org/10.48550/arXiv. 2205.15301

Deng, X., & Yu, Z. (2022). A systematic review of machine-translation-assisted language learning for sustainable education. Sustainability, 14(13), 7598. https://doi.org/10.3390/su14137598

España-Bonet, C., Costa-Jussà, M. R., Rapp, R., Lambert, P., Eberle, K., Banchs, R. E., & Babych, B. (2016). Hybrid machine translation overview. In Hybrid Approaches to Machine Translation (pp. 1–24). Springer Cham. https://doi.org/10.1007/978-3-319-21311-8

Gao, R., Lin, Y., Zhao, N., & Cai, Z. G. (2024). Machine translation of Chinese classical poetry: A comparison among ChatGPT, Google Translate, and DeepL Translator. Humanities and Social Sciences Communications, 11(1), Article 835. https://doi.org/10.1057/s41599-024-03363-0

Gozzi, M., & Di Maio, F. (2024). Comparative analysis of prompt strategies for large language models: Single-task vs. multitask prompts. Electronics, 13(23), 4712. https://doi.org/10.3390/electronics13234712

Guerberof-Arenas, A., & Toral, A. (2022). Creativity in translation: Machine translation as a constraint for literary texts. Translation Spaces, 11, 184–212. https://doi.org/10.1075/ts.21025.gue

Hadley, J., Popović, M., Afli, H., & Way, A. (2019). Proceedings of the Qualities of Literary Machine Translation. European Association for Machine Translation, Dublin, Ireland. Retrieved December 7, 2025, from https://aclanthology.org/W19-7300/

Jiao, W., Wang, W., Huang, J., Wang, X., Shi, S., & Tu, Z. (2023). Is ChatGPT a good translator? Yes with GPT-4 as the engine: A preliminary study. arXiv preprint, arXiv:2301.08745. https://doi.org/10.48550/arXiv.2301.08745

Jing, Y., Yang, Y., Feng, Z., Ye, J., Yu, Y., & Song, M. (2019). Neural style transfer: A review. IEEE Transactions on Visualization and Computer Graphics, 26(11), 3365–3385. https://doi.org/10.1109/TVCG.2019.2921336

Koehn, P. (2009). Statistical machine translation. MIT Press.

Kulkarni, A., Shivananda, A., Kulkarni, A., & Gudivada, D. (2023). The ChatGPT architecture: An in-depth exploration of OpenAI’s conversational language model. In Applied generative AI for beginners: Practical knowledge on diffusion models, ChatGPT, and other LLMs (pp. 55–77). Apress. https://doi.org/10.1007/978-1-4842-9994-4_4

Karpinska, K., & Iyyer, M. (2023). Large language models effectively leverage document-level context for machine translation. In Proceedings of the Eighth Conference on Machine Translation (WMT 2023), Volume 1: Research Papers (pp. 478–489). Retrieved December 7, 2025, from https://aclanthology.org/2023.wmt-1.41/

Lau, J., Wang, Y., & Tang, G. (2024). Improving BERTScore for machine translation evaluation through contrastive learning. IEEE Access, 12, 77739–77749. https://doi.org/10.1109/ACCESS.2024.3406993

Matuschek, H., Kliegl, R., Vasishth, S., Baayen, H., & Bates, D. (2017). Balancing Type I error and power in linear mixed models. Journal of Memory and Language, 94, 305–315. https://doi.org/10.1016/j.jml.2017.01.001

Naveen, P., & Trojovský, P. (2024). Overview and challenges of machine translation for contextually appropriate translations. iScience, 27(10), 110878. https://doi.org/10.1016/j.isci.2024.110878

Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2002). BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (pp. 311–318). https://doi.org/10.3115/1073083.1073135

Post, M. (2018). A call for clarity in reporting BLEU scores. In Proceedings of the Third Conference on Machine Translation (WMT) Volume 1: Research Papers (pp. 186–191). https://doi.org/10.48550/arXiv.1804.08771

Qin, C., Zhang, A., Zhang, Z., Chen, J., Yasunaga, M., & Yang, D. (2023). Is ChatGPT a general-purpose natural language processing task solver? arXiv preprint, arXiv:2302.06476. https://doi.org/10.48550/arXiv.2302.06476

Ribeiro, M. T., Wu, T., Guestrin, C., & Singh, S. (2020). Beyond accuracy: Behavioral testing of NLP models with CheckList. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 4902–4912). Retrieved December 7, 2025, from https://aclanthology.org/2020.acl-main.442/

Si, S., Zhou, S., & Zhang, Y. (2024). Exploring the capabilities of ChatGPT in ancient Chinese translation and person name recognition. Corpus-Based Studies across Humanities, 2, 221–234. https://doi.org/10.1515/csh-2024-0017

Toral, A., & Way, A. (2018). What level of quality can neural machine translation attain on literary text? In J. Moorkens, S. Castilho, F. Gaspari, & S. Doherty (Eds.), Translation quality assessment: From principles to practice (pp. 263–287). Springer. https://doi.org/10.1007/978-3-319-91241-7_12

Weaver, W. (1955). Translation. In W. N. Locke & A. D. Booth (Eds.), Machine translation of languages: Fourteen essays (pp. 15–23). MIT Press.

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Chi, E., Le, Q., & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35, 24824–24837. https://doi.org/10.48550/arXiv.2201.11903

Wang, Q. (2025). Evaluating Uighur literary translation: A comparative study of ChatGPT, Google Translate, and Bing Translator. PLoS ONE, 20, e0335261. https://doi.org/10.1371/journal.pone.0335261

Zhou, P., & Cheng, J. (2025). Stylistic variation across English translations of Chinese science fiction: Ken Liu versus ChatGPT. Frontiers in Artificial Intelligence, 8, Article 1576750. https://doi.org/10.3389/frai.2025.1576750

Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. (2019). BERTScore: Evaluating text generation with BERT. arXiv preprint, arXiv:1904.09675. https://doi.org/10.48550/arXiv.1904.09675

Zhang, B., Haddow, B., & Birch, A. (2023). Prompting large language model for machine translation: A case study. arXiv preprint, arXiv:2301.07069. https://doi.org/10.48550/arXiv.2301.07069

Downloads

Published

2026-05-01

Issue

Section

Articles