Motor multifunções: pesquisa terminológica bilíngue e assistente de escrita académica com base em dados científicos abertos
Keywords:
Search engine, academic literacy, repositories, open scientific dataAbstract
In this paper, we will give an overview of the creation process of a multi-function engine that is being developed within the PortLinguE research project (ref. PTDC/LLT-LIG/31113/2017) and reuses scientific data available in open access regime. We will describe the general architecture of the engine, which is based on a Django framework, and the logical model of the engine that will work with BERT machine learning models, as it enables searches that consider context and semantic similarities. The engine has two main functions that are presented in detail: (1) the bilingual terminology search function, capable of identifying translation equivalents of comparable texts taken from scientific repositories (useful to translators and researchers working with specialized languages) and (2) the academic writing assistant function, which relies on the constitution of a phrase bank for European academic Portuguese, through the collection, annotation and analysis of scientific articles taken from national repositories (useful to students seeking to improve their writing in academic contexts).
References
CRIBB, J.; SARI, T. Open Science: Sharing Knowledge in the Global Century. Collingwood: Victoria, 2010. DOI: 10.1071/9780643097643
Devlin, J. et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv:1810.04805 [Cs], 2019.
https://doi.org/10.48550/arXiv.1810.04805
Estrela, A.; Sousa, O. C. Competência textual à entrada no Ensino Superior. Revista de Estudos da Linguagem, v,19 (1), pp. 247-267, 2011.
Morley, J. (2004). Academic Phrasebank. https://www.phrasebank.manchester.ac.uk/about-academic-phrasebank/
Pogiatzis, A. NLP: Contextualized word embeddings from BERT. Towards Data Science, 2019. https://towardsdatascience.com/nlp-extract-contextualized-word-embeddings-from-bert-keras-tf-67ef29f60a7b
Preto-Bay, A. M. The Social-Cultural Dimension of Academic Literacy Development and the Explicit Teaching of Genres as Community Heuristics. The Reading Matrix, vol. 4, no.3, 2004.https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.621.8717&rep=rep1&type=pdf
Varun. Calculating Document Similarities using BERT, word2vec, and other models. Towards Data Science, 2020. https://towardsdatascience.com/calculating-document-similarities-using-bert-and-other-models-b2c1a29c9630
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 UEM Scientific Journal: Arts and Social Sciences Series
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.