Development of a web service for creating tests based on text analysis using natural language processing technologies
Abstract
The purpose of the work is to analyze models, natural language processing methods, and select modern technologies for training these models, as well as to develop a web service for creating tests based on text analysis using natural language processing technologies. The study considers methods and algorithms for
intelligent data analysis to generate questions and correct and incorrect answers from the text. The authors justify the choice of a neural network for generating tests based on English and Ukrainian text, and characterize data sources for training. The study also describes the activity of the proposed model, which will serve as a basis for creating a web service. After a detailed review of these datasets, the necessary data for the experiment were extracted and transformed into a convenient format for use. The training algorithm for 6 models was designed and implemented, and valuable metrics were obtained after their training. Additionally, a server-side and web interface were developed to interact with each other.
Keywords
text analysis; natural language; natural language processing technologies; NLP; model
References
Affolter, K., Stockinger, K., & Bernstein, A. (2019). A comparative survey of recent natural language interfaces for databases. The VLDB Journal, 28(5), 793-819. https://doi.org/10.1007/s00778-019-00567-8.
Akyon, F., Cavusoglu, D., Cengiz, C., Altinuc, S. & Temizel, A. (2022) Automated question generation and question answering from Turkish texts using text-to-text transformers. Turkish Journal of Electrical Engineering and Computer Sciences (30:5), article 17. https://doi.org/10.55730/1300-0632.3914.
Colab.research.google.com. (2022) Google Colaboratory. Retriеved: https://colab.research.google.com/notebooks/welcome.ipynb?hl=ua.
Common crawl. Retriеved: https://commoncrawl.org/.
Devlin, J., Chang, M., Lee, K. & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In J. Burstein, C. Doran & T. Solorio (Eds.). Proceedings of NAACL-HLT 2019, (pp. 4171–4186), https://aclanthology.org/N19-1423.pdf.
Dodge, J. et al. (2021) Documenting large Webtext corpora: A case study on the Colossal Clean Crawled Corpus, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, (pp. 1286–1305), https://aclanthology.org/2021.emnlp-main.98.pdf.
Education Ecosystem (LEDU). (2022) A Simple Introduction to Natural Language Processing, Becoming Human: Artificial Intelligence Magazine. Retriеved: https://becominghuman.ai/a-simple-introduction-to-natural-language-processing-ea66a1747b32.
Graetz, F.M. (2022). Why AdamW matters, Towards Data Science. Retriеved: https://towardsdatascience.com/why-adamw-matters-736223f31b5d.
Guo, J., Zhan, Z., Gao, Y., Xiao, Y., Lou, J. G., & Liu, T. (2019). Towards complex text-to-SQL in cross-domain database with intermediate representation. In 57th Annual Meeting of the Association for Computational Linguistics, (pp. 4524-4535), Association for Computational Linguistics, Florence, Italy. https://doi.org/10.18653/v1/P19-1444.
Huggingface.co. (2022) squad_v2 · Datasets at Hugging Face. Retriеved: https://huggingface.co/datasets/squad_v2.
Lutkevich, B. (2022) What is Natural Language Processing? An Introduction to NLP . Retriеved from: https://www.techtarget.com/searchenterpriseai/definition/natural-language-processing-NLP.
Mellah, Y., Rhouati, A., Ettifouri, E. H., Bouchentouf, T. & Belkasmi, M. G. (2021). SQL Generation from Natural Language: A Sequence-to-Sequence Model Powered by the Transformers Architecture and Association Rules. Journal of Computer Science, 17(5), 480-489. https://doi.org/10.3844/jcssp.2021.480.489.
Nagoudi, E., Elmadany, A. & Abdul-Mageed, M. (2022) AraT5: Text-to-Text Transformers for Arabic Language Understanding and Generation. In S. Muresan, P. Nakov, & A. Villavicencio (Eds.). Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), (pp. 628–647), http://dx.doi.org/10.18653/v1/2022.acl-long.47.
Negri, D. (2022). Transformer NLP explained & natural language processing examples, Eidosmedia. Retriеved from https://www.eidosmedia.com/blog/technology/machine-learning-size-isn-t-everything.
Roberts, A. (2022). Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer. Google Research. Retriеved from https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html.
Vachev, K., Hardalov, M., Karadzhov, G., Georgiev, G., Koychev, I., & Nakov, P. (2022). Leaf: Multiple-Choice Question Generation. Advances in Information Retrieval. ECIR 2022. Lecture Notes in Computer Science (13186). Springer, Cham, (pp. 321–328). https://doi.org/10.1007/978-3-030-99739-7_41.
Xavier, B.A., & Chen, PH. (2022) Natural Language Processing for Imaging Protocol Assignment: Machine Learning for Multiclass Classification of Abdominal CT Protocols Using Indication Text Data. Journal of Digital Imaging 35, 1120–1130. https://doi.org/10.1007/s10278-022-00633-8.
Zhytomyr Polytechnic State University Ukraine
https://orcid.org/0000-0001-6825-4697
Zhytomyr Polytechnic State University Ukraine
https://orcid.org/0000-0002-5515-6550
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
The Copyright Owners of the submitted texts grant the Reader the right to use the pdf documents under the provisions of the Creative Commons 4.0 International License: Attribution-Share-Alike (CC BY SA). The user can copy and redistribute the material in any medium or format and remix, transform, and build upon the material for any purpose.
1. License
The University of Silesia Press provides immediate open access to journal’s content under the Creative Commons BY-SA 4.0 license (http://creativecommons.org/licenses/by-sa/4.0/). Authors who publish with this journal retain all copyrights and agree to the terms of the above-mentioned CC BY-SA 4.0 license.
2. Author’s Warranties
The author warrants that the article is original, written by stated author/s, has not been published before, contains no unlawful statements, does not infringe the rights of others, is subject to copyright that is vested exclusively in the author and free of any third party rights, and that any necessary written permissions to quote from other sources have been obtained by the author/s.
If the article contains illustrative material (drawings, photos, graphs, maps), the author declares that the said works are of his authorship, they do not infringe the rights of the third party (including personal rights, i.a. the authorization to reproduce physical likeness) and the author holds exclusive proprietary copyrights. The author publishes the above works as part of the article under the licence "Creative Commons Attribution-ShareAlike 4.0 International".
ATTENTION! When the legal situation of the illustrative material has not been determined and the necessary consent has not been granted by the proprietary copyrights holders, the submitted material will not be accepted for editorial process. At the same time the author takes full responsibility for providing false data (this also regards covering the costs incurred by the University of Silesia Press and financial claims of the third party).
3. User Rights
Under the CC BY-SA 4.0 license, the users are free to share (copy, distribute and transmit the contribution) and adapt (remix, transform, and build upon the material) the article for any purpose, provided they attribute the contribution in the manner specified by the author or licensor.
4. Co-Authorship
If the article was prepared jointly with other authors, the signatory of this form warrants that he/she has been authorized by all co-authors to sign this agreement on their behalf, and agrees to inform his/her co-authors of the terms of this agreement.
I hereby declare that in the event of withdrawal of the text from the publishing process or submitting it to another publisher without agreement from the editorial office, I agree to cover all costs incurred by the University of Silesia in connection with my application.