Motor multifunções: pesquisa terminológica bilíngue e assistente de escrita académica com base em dados científicos abertos

Authors

  • Micaela Aguiar Universidade do Minho
  • José Monteiro Universidade do Minho
  • Sílvia Araújo Universidade do Minho

Keywords:

Search engine, academic literacy, repositories, open scientific data

Abstract

In this paper, we will give an overview of the creation process of a multi-function engine that is being developed within the PortLinguE research project (ref. PTDC/LLT-LIG/31113/2017) and reuses scientific data available in open access regime. We will describe the general architecture of the engine, which is based on a Django framework, and the logical model of the engine that will work with BERT machine learning models, as it enables searches that consider context and semantic similarities. The engine has two main functions that are presented in detail: (1) the bilingual terminology search function, capable of identifying translation equivalents of comparable texts taken from scientific repositories (useful to translators and researchers working with specialized languages) and (2) the academic writing assistant function, which relies on the constitution of a phrase bank for European academic Portuguese, through the collection, annotation and analysis of scientific articles taken from national repositories (useful to students seeking to improve their writing in academic contexts).

References

CRIBB, J.; SARI, T. Open Science: Sharing Knowledge in the Global Century. Collingwood: Victoria, 2010. DOI: 10.1071/9780643097643

Devlin, J. et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv:1810.04805 [Cs], 2019.

https://doi.org/10.48550/arXiv.1810.04805

Estrela, A.; Sousa, O. C. Competência textual à entrada no Ensino Superior. Revista de Estudos da Linguagem, v,19 (1), pp. 247-267, 2011.

Morley, J. (2004). Academic Phrasebank. https://www.phrasebank.manchester.ac.uk/about-academic-phrasebank/

Pogiatzis, A. NLP: Contextualized word embeddings from BERT. Towards Data Science, 2019. https://towardsdatascience.com/nlp-extract-contextualized-word-embeddings-from-bert-keras-tf-67ef29f60a7b

Preto-Bay, A. M. The Social-Cultural Dimension of Academic Literacy Development and the Explicit Teaching of Genres as Community Heuristics. The Reading Matrix, vol. 4, no.3, 2004.https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.621.8717&rep=rep1&type=pdf

Varun. Calculating Document Similarities using BERT, word2vec, and other models. Towards Data Science, 2020. https://towardsdatascience.com/calculating-document-similarities-using-bert-and-other-models-b2c1a29c9630

Published

2024-05-29

How to Cite

Aguiar, M., Monteiro, J. ., & Araújo, S. . (2024). Motor multifunções: pesquisa terminológica bilíngue e assistente de escrita académica com base em dados científicos abertos . UEM Scientific Journal: Arts and Social Sciences Series , 4(1). Retrieved from http://196.3.97.23/revista/index.php/lcs/article/view/240