Entity normalization in a Spanish medical corpus using a UMLS-based lexicon: findings and limitations

Baez, Pablo; Campillos-Llanos, Leonardo; Nunez, Fredy; Dunstan, Jocelyn

Entity normalization in a Spanish medical corpus using a UMLS-based lexicon: findings and limitations

dc.article.number	110076
dc.contributor.author	Baez, Pablo
dc.contributor.author	Campillos-Llanos, Leonardo
dc.contributor.author	Nunez, Fredy
dc.contributor.author	Dunstan, Jocelyn
dc.date.accessioned	2024-08-01T08:00:09Z
dc.date.available	2024-08-01T08:00:09Z
dc.date.issued	2024
dc.description.abstract	Entity normalization is a common strategy to resolve ambiguities by mapping all the synonym mentions to a single concept identifier in standard terminology. Normalizing medical entities is challenging, especially for languages other than English, where lexical variation is considerably under-represented. Here, we report a new linguistic resource for medical entity normalization in Spanish. We applied a UMLS-based medical lexicon (MedLexSp) to automatically normalize mentions from 2000 medical referrals of the Chilean Waiting List Corpus. Three medical students manually revised the automatic normalization. The inter-coder agreement was computed, and the distribution of concepts, errors, and linguistic sources of variation was analyzed. The automatic method normalized 52% of the mentions, compared to 91% after manual revision. The lowest agreement between automatic and automatic-manual normalization was observed for Finding, Disease, and Procedure entities. Errors in normalization were associated with ortho-typographic, semantic, and grammatical linguistic inadequacies, mainly of the hyponymy/hyperonymy, polysemy/metonymy, and acronym-abbreviation types. This new resource can enrich dictionaries and lexicons with new mentions to improve the functioning of modern entity normalization methods. The linguistic analysis offers insight into the sources of lexical variety in the Spanish clinical environment related to error generation using lexicon-based normalization methods. This article also introduces a workflow that can serve as a benchmark for comparison in studies replicating our analysis in Romance languages.
dc.description.funder	ANID fondecyt
dc.description.funder	ANID
dc.fechaingreso.objetodigital	2024-09-03
dc.format.extent	14 páginas
dc.fuente.origen	WOS
dc.identifier.doi	10.1007/s10579-024-09755-7
dc.identifier.eissn	1574-0218
dc.identifier.issn	1574-020X
dc.identifier.scopusid	SCOPUS_ID:85194494230
dc.identifier.uri	https://doi.org/10.1007/s10579-024-09755-7
dc.identifier.uri	https://repositorio.uc.cl/handle/11534/87237
dc.identifier.wosid	WOS:001260434000001
dc.information.autoruc	Facultad de Letras; Núñez Torres, Fredy Rodrigo; S/I; 157277
dc.issue.numero	3
dc.language.iso	en
dc.nota.acceso	contenido parcial
dc.pagina.final	516
dc.pagina.inicio	489
dc.revista	LANGUAGE RESOURCES AND EVALUATION
dc.rights	acceso restringido
dc.subject	Clinical text
dc.subject	Entity linking
dc.subject	Lexical variation
dc.subject	Linguistic resources
dc.subject	Medical lexicon
dc.subject	Normalization
dc.subject.ddc	370
dc.subject.dewey	Educación	es_ES
dc.subject.ods	03 Good health and well-being
dc.subject.odspa	03 Salud y bienestar
dc.title	Entity normalization in a Spanish medical corpus using a UMLS-based lexicon: findings and limitations
dc.type	artículo
dc.volumen	27
sipa.codpersvinculados	157277
sipa.index	WOS
sipa.trazabilidad	Carga WOS-SCOPUS;01-08-2024

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Entity normalization in a Spanish medical corpus using a UMLS-based lexicon - findings and limitations.pdf
Size:: 3.12 KB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Artículos de revistas