Main Article Content

Eva María Domínguez Noya
Instituto da Lingua Galega, Universidade de Santiago de Compostela / Centro Ramón Piñeiro para a investigación en humanidades
Spain
https://orcid.org/0000-0001-5592-4065
Vítor Míguez
Universidad del País Vasco / Euskal Herriko Unibertsitatea
Spain
http://orcid.org/0000-0001-7138-373X
Vol 14 (2022): Estudos de Lingüística Galega, Pescuda
DOI: https://doi.org/10.15304/elg.14.8452
Submitted: 03-05-2022 Accepted: 05-10-2022 Published: 07-12-2022
Copyright How to Cite Most read articles by the same author(s) Cited by

Abstract

The treatment of multiword units is an unfinished task in natural language processing. In this context, we isolate binomial scientific nomenclature terms, whose main traits – Latin or Latinized multiword expressions and international recognition – distinguish them from the Galician ‘popular’ lexicon and make their treatment applicable to other languages. After reviewing their characterization in CORGA and other Peninsular corpora, we propose an analysis of scientific names as a particular subtype of nouns, namely, scientific nomenclature, without specifying values for gender and number. We then describe the interventions conducted on the kernel and the training corpus to include the new tag into the XIADA system and, subsequently, we asses two strategies for the detection of candidates: a specific tool for extracting scientific names and online inventories. Finally, in light of the data provided by CORGA, we verify a significant presence of binomial scientific terms and show the relevance of the new tag for their identification and distribution.