An Evaluation of ChatGPT's Translation Accuracy Using BLEU Score

Mozhgan Ghassemiazghandi

doi:10.17507/tpls.1404.07

Authors

Mozhgan Ghassemiazghandi Universiti Sains Malaysia

DOI:

https://doi.org/10.17507/tpls.1404.07

Keywords:

BLEU score evaluation, ChatGPT-4 translation, large language models, machine translation accuracy, translation quality assessment

Abstract

Traditional views have long held that machine translation cannot achieve the quality and accuracy of human translators, especially in complex language pairs like Persian and English. This study challenges this perspective by demonstrating that ChatGPT-4, with access to vast amounts of multilingual data and leveraging advanced large language model algorithms, significantly outperforms widely utilized open-source machine translation tools and approaches the realm of human translation quality. This research aims to critically assess the translation accuracy of ChatGPT-4 against a traditional open-source machine translation tool from Persian to English, highlighting the advancements in artificial intelligence-driven translation technologies. Using Bilingual Evaluation Understudy scores for a comprehensive evaluation, this study compares the translation outputs from ChatGPT-4 with MateCat, providing a quantitative basis for comparing their accuracy and quality. ChatGPT-4 achieves a BLUE score of 0.88 and an accuracy of 0.68, demonstrating superior performance compared to MateCat, with a 0.82 BLUE score and 0.49 accuracy. The results indicate that the translations generated by ChatGPT-4 surpass those produced by MateCat and nearly mirror the quality of human translations. The evaluation demonstrates the effectiveness of OpenAI's large language model algorithms in improving translation accuracy.

Author Biography

Mozhgan Ghassemiazghandi, Universiti Sains Malaysia

School of Languages, Literacies and Translation

References

Abidin, Z., & Ahmad, I. (2021). Effect of mono corpus quantity on statistical machine translation Indonesian–Lampung dialect of nyo. In Journal of Physics: Conference Series, 1751(1), 12036.

Adedokun, M. J., Salami, S., Onyeali, D. C., Toheeb, B. O., Adeyoyin, D., & Afuzobugwu, K. (2023). Transforming Smallholder Farmers Support with an AI-powered FAQbot: A Comparison of Techniques. Retrieved March 7, 2024, from https://openreview.net/forum?id=VPl472SKaB

Amin, R., & Mandapuram, M. (2021). CMS - Intelligent Machine Translation with Adaptation and AI. ABC Journal of Advanced Research, 10(2), 199–206. https://doi.org/10.18034/abcjar.v10i2.693

Bahdanau D, Cho K, Bengio Y. (2016). Neural machine translation by jointly learning to align and translate. https://doi.org/10.48550/arXiv.1409.0473

Banat, M., & Abu Adla, Y. (2023). Exploring the Effectiveness of GPT-3 in Translating Specialized Religious Text from Arabic to English: A Comparative Study with Human Translation. Journal of Translation and Language Studies, 4(2), 1–23. https://doi.org/10.48185/jtls.v4i2.762

Bhadwal, N., Agrawal, P., & Madaan, V. (2020). A machine translation system from Hindi to Sanskrit language using rule based approach. Scalable Computing: Practice and Experience, 21(3), 543–554. https://doi:10.12694/scpe.v21i3.1783

Castillo-González, W., Lepez, C. O., & Bonardi, M. C. (2022). Chat GPT: a promising tool for academic editing. Data & Metadata, 1, 23. https://doi:10.56294/dm202223

Chatzikoumi, E. (2019). How to evaluate machine translation: A review of automated and human metrics. Natural Language Engineering, 26(2), 137–161. https://doi.org/10.1017/s1351324919000469

Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1724-1734. https://doi.org/10.3115/v1/d14-1179

Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H.W., Sutton, C., Gehrmann, S., Schuh, P., Shi, K., Tsvyashchenko, S., Maynez, J., Rao, A., Barnes, P., Tay, Y., Shazeer, N., Prabhakaran, V., ..., Fiedel, N. (2022). Palm: Scaling language modeling with pathways. https://doi.org/10.48550/arXiv.2204.02311

De Martino, J.M., Silva, I.R., Marques, J.C.T., Martins, A.C., Poeta, E.T., Christinele, D.S., & Campos, J.P.A.F. (2023). Neural machine translation from text to sign language. Univ Access Inf Soc. https://doi.org/10.1007/s10209-023-01018-6

Evtikhiev, M., Bogomolov, E., Sokolov, Y., & Bryksin, T. (2023). Out of the BLEU: How should we assess quality of the Code Generation models? Journal of Systems and Software, 203, 111741. https://doi.org/10.1016/j.jss.2023.111741

Fakih, A., Ghassemiazghandi, M., Fakih, A. H., & Singh, M. K. (2024). Evaluation of Instagram's Neural Machine Translation for Literary Texts: An MQM-Based Analysis. Gema Online Journal of Language Studies 213, Volume 24(1), 1730-1732. . http://doi.org/10.17576/gema-2024-2401-13

Farooq, U., Rahim, M. S. M., Sabir, N., Hussain, A., & Abid, A. (2021). Advances in machine translation for sign language: approaches, limitations, and challenges. Neural Computing and Applications, 33(21), 14357–14399. https://doi.org/10.1007/s00521-021-06079-3

Forcada, M. L., & Ñeco, R. P. (1997). Recursive hetero-associative memories for translation. Lecture Notes in Computer Science, 453–462. https://doi.org/10.1007/bfb0032504

Freitag, M., Rei, R., Mathur, N., Lo, C-K., Stewart, C., Avramidis, E., Kocmi, T., Foster, G., Lavie, A., & Martins, A.F.T. (2022). Results of WMT22 metrics shared task: Stop using BLEU–neural metrics are better and more robust. In Proceedings of the Seventh Conference on Machine Translation (WMT) (pp. 46-68).

Freitag, M., Foster, G., Grangier, D., Ratnakar, V., Tan, Q., & Macherey, W. (2021). Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation. Transactions of the Association for Computational Linguistics, 9, 1460–1474. https://doi.org/10.1162/tacl_a_00437

Ghassemiazghandi, M. (2023). Machine Translation of Selected Ghazals of Hafiz from Persian into English. Arab World English Journal for Translation and Literary Studies, 7(1), 220–232. https://doi.org/10.24093/awejtls/vol7no1.17

Han, L. (2016). Machine translation evaluation resources and methods: A survey. ArXiv: Computation and language. Cornell University Library. https://doi.org/10.48550/arXiv.1605.04515

Han, L. (2022). An overview on machine translation evaluation. https://doi.org/10.48550/arXiv.2202.11027

Haque, S., Eberhart, Z., Bansal, A., & McMillan, C. (2022). Semantic similarity metrics for evaluating source code summarization. Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension. https://doi.org/10.1145/3524610.3527909

Harsha, N. S., Kumar, C. N., Sonthi, V. K., & Amarendra, K. (2022). Lexical Ambiguity in Natural Language Processing Applications. In 2022 International Conference on Electronics and Renewable Systems (ICEARS) (pp. 1550-1555). IEEE.

Hendy, A., Abdelrehim, M., Sharaf, A., Raunak, V., Gabr, M., Matsushita,H., Kim, Y.J., Afify, M., & Awadalla H.H. (2023). How good are GPT models at machine translation? A comprehensive evaluation. https://doi.org/10.48550/arXiv.2302.09210

Jiao, W., Wang, W., Huang, J. T., Wang, X., & Tu, Z. (2023). Is ChatGPT a good translator? A preliminary study. https://doi.org/10.48550/arXiv.2301.08745

Jumanto, J., Rizal, S. S., Asmarani, R., & Sulistyorini, H. (2022). The Discrepancies of Online Translation-Machine Performances: A Mini-Test on Object Language and Metalanguage. In 2022 International Seminar on Application for Technology of Information and Communication (iSemantic) (pp. 27-35). IEEE.

Kahlon, N.K., & Singh, W. (2023) Machine translation from text to sign language: a systematic review. Univ Access Inf Soc, 22, 1–35. https://doi.org/10.1007/s10209-021-00823-1

Kang, X., Zhao, Y., Zhang, J., & Zong, C. (2021). Enhancing lexical translation consistency for document-level neural machine translation. Association for Computing Machinery, 21, 3. https://doi.org/10.1145/3485469

Kenny, D. (2022). Human and machine translation. Machine translation for everyone: Empowering users in the age of artificial intelligence, 18, 23.

Khoshafah, F. (2023). ChatGPT for Arabic-English Translation: Evaluating the Accuracy. https://doi.org/10.21203/rs.3.rs-2814154/v1

Kocmi, T., Federmann, C., Grundkiewicz, R., Junczys-Dowmunt, M., Matsushita, H., & Menezes, A. (2021). To ship or not to ship: An extensive evaluation of automatic metrics for machine translation. https://doi.org/10.48550/arXiv.2107.10821

Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady, 10(8), 707–710.

Liu, Y., Han, T., Ma, S., Zhang, J., Yang, Y., Tian, J., He, H., Li, A., He, M., Liu, Z., Wu, Z., Zhao, L., Zhu, D., Li, X., Qiang, N., Shen, D., Liu, T., & Ge, B. (2023). Summary of CHATGPT-related research and perspective towards the future of large language models. Meta-Radiology, 1(2), 100017. https://doi.org/10.1016/j.metrad.2023.100017

Marie, B., Fujita, A., & Rubino, R. (2021). Scientific credibility of machine translation research: A meta-evaluation of 769 papers. https://doi.org/10.48550/arXiv.2106.15195

Maruf, S., Saleh, F., & Haffari, G. (2021). A Survey on Document-level Neural Machine Translation. ACM Computing Surveys, 54(2), 1–36. https://doi.org/10.1145/3441691

Mishra, R. (2024). A Comparative Analysis of Statistical and Neural Machine Translation Models. Integrated Journal of Science and Technology, 1(2), 1-3

Mohamed, S. A., Elsayed, A. A., Hassan, Y. F., & Abdou, M. A. (2021). Neural machine translation: past, present, and future. Neural Computing and Applications, 33(23), 15919–15931. https://doi.org/10.1007/s00521-021-06268-0

Olah, C. (2015). Understanding LSTM Networks. Retrieved March 8, 2024, from https://colah.github.io/posts/2015-08-Understanding-LSTMs

Papineni, K., Roukos, S., Ward, T., & Zhu W-J.(2002). BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Meeting on Association for Computational Linguistics. ACL, 311–318. https://doi.org/10.3115/1073083.1073135

Poibeau, T. (2017). Machine translation. MIT Press.

Quintana, R. C., & Castilho, S. (2022). A review of the Integration of Machine Translation in CAT tools. New Trends in Translation and Technology 2022, 214.

Ranathunga, S., Lee, E. S. A., Prifti Skenduli, M., Shekhar, R., Alam, M., & Kaur, R. (2023). Neural machine translation for low-resource languages: A survey. ACM Computing Surveys, 55(11), 1-37.

Rawling, P., & Wilson, P. (2021). The Routledge Handbook of Translation and Philosophy. Abingdon, Oxon: Routledge, Taylor & Francis Group.

Reiter, E. (2018). A structured review of the validity of BLEU. Computational Linguistics, 44(3), 393-401.

Rivera-Trigueros, I. (2022). Machine translation systems and quality assessment: a systematic review. Lang Resources & Evaluation, 56, 593–619. https://doi.org/10.1007/s10579-021-09537-5

Sahari, Y., Qasem, F., Asiri, E., Alasmri, I., Assiri A., & Mahdi, H. (2024). Translation of Figurative Language: A Comparative Study of ChatGPT and Human Translators. https://doi.org/10.21203/rs.3.rs-3921149/v1

Sakamoto, A. (2020). The value of translation in the era of automation: An examination of threats. When Translation Goes Digital, 231–255. https://doi:10.1007/978-3-030-51761-8_10

Sanz-Valdivieso, L., & López-Arroyo, B. (2023). Google Translate vs. ChatGPT: Can non-language professionals trust them for specialized translation? Proceedings of the International Conference on Human-Informed Translation and Interpreting Technology 2023. https://doi.org/10.26615/issn.2683-0078.2023_008

Segonne, V., & Mickus, T. (2023). “Definition Modeling: To model definitions.” Generating Definitions With Little to No Semantics. https://doi.org/10.48550/arXiv.2306.08433

Snover, M., Dorr, B., Schwartz, R., Micciulla, L., & Makhoul, J. (2006). A study of translation edit rate with targeted human annotation. In Proceedings of association for machine translation in the Americas (pp. 223–231).

Stahlberg, F. (2020). Neural machine translation: A review. Journal of Artificial Intelligence Research, 69, 343-418.

Sutskever, I., Vinyals, O., and Le, Q. (2014). Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems (NIPS 2014).

Tehrani Shafagh, A. (2023). Princes of the Court: Memoirs of the Seil Sepor Family (A. Tehrani Shafagh, Trans.). Sahami Enteshar Company. (Original work published 2006)

Tillmann, C., Vogel, S., Ney, H., Zubiaga, A., & Sawaf, H. (1997). Accelerated DP based search for statistical translation. 5th European Conference on Speech Communication and Technology (Eurospeech 1997). https://doi.org/10.21437/eurospeech.1997-673

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need. Proceedings of the 31st International Conference on Neural Information Processing Systems (2017): 6000–6010.

Wang, H., Wu, H., He, Z., Huang, L., & Church, K. W. (2022). Progress in machine translation. Engineering, 18, 143-153.

Wang, Y. (2024). Research of types and current state of machine translation. Applied and Computational Engineering, 37(1), 95–101. https://doi:10.54254/2755-2721/37/20230479

Way, A. (2018). Quality Expectations of Machine Translation. Translation Quality Assessment, 159–178. https://doi.org/10.1007/978-3-319-91241-7_8

Weaver, W. (1955). Translation. Mach Transl Lang, 14, pp. 15-23.

Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., Johnson, M., Liu, X., Kaiser, L., Gouws, S., Kato, Y., Kudo, T., Kazawa. H., ..., Dean, J. (2016). Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. https://doi.org/10.48550/arXiv.1609.08144

Zaghlool, Z. D. M., & Khasawneh, M. A. S. (2023). Aligning Translation Curricula with Technological Advancements; Insights from Artificial Intelligence Researchers and Language Educators. Studies in Media and Communication, 12(1), 58. https://doi.org/10.11114/smc.v12i1.6378

An Evaluation of ChatGPT's Translation Accuracy Using BLEU Score

Authors

DOI:

Keywords:

Abstract

Author Biography

Mozhgan Ghassemiazghandi, Universiti Sains Malaysia

References

Downloads

Published

Issue

Section