A Cross-Scenario Data Set Applying to Thai and Lao Language Processing

Authors

  • Gornkrit Meemongkol Srinakharinwirot University

DOI:

https://doi.org/10.17507/tpls.1303.13

Keywords:

cross-scenario, data set, Thai and Lao language, language processing

Abstract

The purpose of this study was to create a cross-scenario data set for applying to Thai and Lao language processing. Our method involved 113 people, and a questionnaire was given to each person to collect the data. The people were asked to infer the meaning of ten Lao words in the Thai language. The data were analyzed by employing the framework of Bloomfield (1933), Benson (1985), Johnson (1987), Sinclair (1991), Baker (1992) and Miller (1998). They stated that the schema is the concept of a word’s meaning in a person’s mind. This derives from the individual personal experience. The results showed that a cross-scenario data set can be created from these Lao-to-Thai inferences. Each scenario consists of profuse lexical features that are consistent with words in Thai and Lao language. This study will be beneficial for language-processing developers as well as linguists in the future.

Author Biography

Gornkrit Meemongkol, Srinakharinwirot University

International College for Sustainability Studies

References

Arun, N. (2011). Artificial Intelligence and the Application. Executive Journal, 30(4), 167-171.

Baker, M. (1992). In other words. London: Routledge.

Benson, M. (1985). Collocations and Idioms. In Ilson, R. (Ed.), Dictionaries, lexicography and language learning. Oxford: Pergamon Press.

Bloomfield. (1933). Language. University of Chicago Press.

Cambridge, (2021). Online Dictionary. Retrieved 27 March,2022 from https://dictionary.cambridge.org/

Chamniyom, R. (2003). Kan sueksa kham bubphabod thi klaimachak khamkariya nai phasa Thai. [A Study of Preposition converted from Verbs in Thai]. M.A. Thesis, Department of Thai Language, Silpakorn University, Thailand.

Charoensuk, J. (2006). Kanbaeng Khobkhed arnupark pariched nai phasa Thai doy chai kham rabunai lae khorsonthed choeng wakayasumphan. [Thai elementary discourse unit segmentation by using discourse segmentation cues and syntactic information]. M.A. Thesis, Department of computer engineering, Kasetsart University.

Choomthong, D. (2017). Characteristics of Slang Terms in Online Discussion among Thai Stock Investors. NIDA Journal of Language and Communication, 22(30), 81-105.

Fillmore, C. (1968). The Case for Case. In Universals in Linguistic Theory.

Fillmore, C. (1971). Some Problems for Case Grammar. In O'Brien, RJ, ed.

Fillmore, C. and B. T. Atkins. (1992). Frames, Fields and Contrasts: New Essays in Semantics and Lexical Organization. Lawrence Erlbaum Associates.

Fillmore and Baker. (2009). A Frames Approach to Semantic Analysis. Oxford Handbook of Linguistic Analysis. Oxford University Press. 313-340.

Gruber, T. R. (1993). A translation approach to portable ontologies. Knowledge Acquisition, 5(2):199-220.

Gruber, T. R. (1995). Toward principles for the design of ontologies used for knowledge sharing. International Journal of Human-Computer Studies, 43(4-5), 907-928.

Intasaw, N. (2013). Kanyak arnupark phasa Thai duay kanchai baebchamlong supportwektor machine. [Thai Clause Segmentation Using a Support Vector Machine Model]. M.A. Thesis, Department of Linguistics, Chulalongkorn University, Thailand.

Intratat, C. (1996). Krabuankan thi kumkariya kraipen kham bubphabot. [Grammaticalization of verbs into prepositions in Thai]. Doctoral Dissertation, Department of Linguistics, Chulalongkorn University, Thailand.

Jeon, N. et.al. (2019). Kan priabthiab withikan niyam khamsub nai phodchananukrom chabub ratchabundittayasathan porsor 2554 kab watchananukrom phasa Laos sathabun witthayasarhangchad porsor 2555. [The Comparison Between the Lexical Definition Used in the Thai Dictionary of the Royal Institute 2011 and the Lao Dictionary of the National Science Council 2012]. Sripatum Review of Humanities and Social Sciences, 19(2). 7-20.

Johnson, M. (1987). The body in the mind: The bodily basis of meaning, imagination, and reason. University of Chicago Press.

Kanjanawasee. S. (1991). Kan sarup choeng sahed thi maichai kan todlong. [Causal Inferences in Nonexperimental Research]. Educational Research News. Educational Research Department, Office of the National Education Commission, 9-13.

Kawtrakul, A., et al. (2002). “A State of the Art of Thai Language Resources and Thai Language Behavior Analysis and Modeling”. In Proceedings of the International Conference on Natural Language Processing Post COLING 2002 Workshop, 1-8.

Khruahong, S. et al. (2015). Ontology Design for Thailand Travel Industry. International Journal of Knowledge Engineering, 1(3), 191-196.

Luo, Y. (2020). Morphology in Kra-Dai Languages. Linguistics Oxford Research Encyclopedias. Oxford Research Encyclopaedia of Linguistics, Oxford University Press. “https://doi.org/10.1093/acrefore/9780199384655.013.529”

Leenoi, T. (2008). Kansang kruea krai kham Thai khong manotad penthan ruam khong entity lamdap thi nueng duay withikan pae songthang lae kanchai phodchananukrom thi sangdauwithikan taektangkan. [The Construction of Thai Wordnet of 1st Order Entity Common Base Concepts Using a Bi-Directional Translation Method and with Dictionaries of Different Compilational Approaches]. M.A. Thesis, Department of Linguistics, Chulalongkorn University, Thailand.

León, M.A.L. (2015). Corpus design and compilation process for the preparation of a bilingual glossary (English-Spanish) in the logistics and maritime transport field: LogisTRANS. Proceedings of 32nd International Conference of the Spanish Association of Applied Linguistics (AESLA), 293 – 299.

Miller, G. (1998). An introduction to Wordnet. University of Princeton. Odebunmi, A and F. Unuabonah. 2013. “Generic Structure of Nigerian and South Africa Quasi-Judicial Public Hearings. Language, Discourse and Society”. Language & Society, RC 25 of the International Sociological Association, 2(2), 82-98.

Nakon, C. (2000). Rabobkham lae wakayasumphan nai phasa Lao krang mooban Wanglao tambon Non-Krot amphoe Muang changwat Nakhon Sawan. [Morphology and syntax of Lao Khrang at Wanglao village, Tambon Non-Krot, Muang district, Nakhon Sawan province]. M.A. Thesis, Department of Oriental Epigraphy, Silpakorn University, Thailand.

National Social Sciences Council. (2012). Watchananukrom phasa Lao. [Lao Dictionary]. Vientian: National Publication.

Nooteed, J. and Potibal, P. (2019). Phumisart Phasathin khong kham thi mee payanchanaton kuabkram /mr/ lae /pl/ nai phasa Thai thin tai: lumnam thalaysab Songkhla. [Dialect Geography of lexemes with Initial clusters /mr/ and /ml/ in Southern Thai Dialect: Songkhla Lake Basin]. Journal of Humanities and Social Sciences Suratthani Rajabhat University, 11(1), 191-215.

Panmeta, N. (2011). Waiyakorn Thai. [Thai Grammar]. Faculty of Arts, Chulalongkorn University, Thailand.

Phetsiri, C. (2010). Kan sakad kwamroo kiawkab sabphakun tangya khong phued samunpai chak eakkasarn phasa Thai puea sanab sanun kan tob khamtham attanomat. [Knowledge extraction of medicinal properties of Thai herbs from Thai texts for supporting automatic question - answering system]. Research report. Dhurakij Pundit University.

Phosai, K. (2009). Kanwikhro ha kuam mai fang lae kan rianroo kreangchak sumrab rabob tam tob attanomat. [Latent semantic analysis and machine learning for Thai question answering system]. M.A. Thesis, Department of Computer Sciences, Thammasat University, Thailand.

Prapin, W. (1996). Kan suksa priabthiab kham sub phasa Lao song in changwat Nakhon Pathom, Ratchaburi lae Phetchaburi. [Comparative lexical study of Lao-Song in Nakhonpathom, Ratchaburi and Phetchaburi province]. M.A. Thesis, Department of Thai Epigraphy, Silpakorn University, Thailand.

Rattanaprasert, W. (1985). Kham and laksana kham nai phasa Laowiang nai changwat Chachoengsao. [Word classes and word types of Lao-Wiang language in Chachoeng Sao province]. M.A. Thesis, Department of Oriental Epigraphy, Silpakorn University, Thailand.

Rungruang, J. (2012). A Comparative Study of Thai and Chinese Internet Language. Journal of International Studies, 2(2), 33-46.

Saengsupawat, P. et al. (2014). Ontology-based knowledge acquisition for Thai ingredient substitution. ARPN Journal of Engineering and Applied Sciences, 9(9), 1461-1468.

Sila, S. (1975). Priabthiab kham taektang rawang phasa Krungthep kap phasa Kubua. [A comparative study of Krungthep and Kubua language]. B.A. Thesis, Faculty of Archaeology, Silpakorn University, Thailand.

Sinclair, J. (1991). Corpus concordance collocation. Hong Kong: Oxford University Press.

Srinarawat, D. (2007). Thai political slang: formation and attitudes towards usage. International Journal of the Sociology of Language. 186, 95- 107. “https://doi.org/10.1515/IJSL.2007.044”

Suktharachan, M. (2017). Kansrang krobmanothad phasa Thai chak klangkhormun kankaset phuea pen than kwam roo sumrub kanpramuanpon duay computer. The development of Thai concept frame in agricultural domain for knowledge based computational processing. Doctoral Dissertation, Department of Linguistics, Kasetsart University, Thailand.

Tantisripreecha, T. and Soonthornphisaj, N. (2010). A Study of Thai Succession Law Ontology on Supreme Court Sentences Retrieval. In Proceedings of the International Multi Conference on Engineers and Computer Scientists, 146-151.

Thipsena, R. et.al. (2014). Kanchamnak glum khamtham attanomat bon kradan sonthana dau chai technique mueang khor kwam. [Automatic Question Classification on Webboard Using Text Mining Techniques]. Science and Technology Journal Mahasarakham University, 33(5), 493-502.

Tungkwampian, W. et al. (2015). Development of Thai Herbal Medicine Knowledge Base Using Ontology Technique. The Thai Journal of Pharmaceutical Sciences, 39(3), 102-109.

Wijasika, A. and Srivihok, A. (2014). Thai Food Safety Document Searching System by Ontology. 2nd International Conference on Research in Science, Engineering and Technology (ICRSET’2014), 11-15.

Winch and Gingell. (1994). Dialect interference and difficulties with writing: An investigation in St. Lucian primary schools. Language and Education, 8(3), 157-182. “https://doi.org/10.1080/09500789409541388”

Wutthikorn, Y. (2010). Kan khayai than kwamru baeb attanomat sumrab rabob tham-tob attanomat. [Automatic knowledge base expansion for dialog system]. M.A. Thesis, Department of Computer Sciences, Thammasat University, Thailand.

Yensamut, P. (1981). Kham lae Kwammai nai phasa Laosong. [Words and meaning in Lao Song]. M.A. Thesis, Department of Thai Epigraphy, Silpakorn University, Thailand.

Yuyen. V. (1997). Kansueksa Priabthiab kwam taektang rawang phasa Thai kub Phasa Lao lae khorbokphrong nai kanchai phasa Thai khong naksueksa Lao thi Mahawitthayalai Kasetsart. [A Contrastive Study of Thai and Lao Languages as well as The Errors in Thai Usage of Lao Students at Kasetsart University]. Manutsayasat Wichakan Journal, 5(1), 35-54.

Downloads

Published

2023-03-02

Issue

Section

Articles