News Gathering: Leveraging Transformers to Rank News

dc.catalogadorjwg
dc.contributor.authorMunoz C.
dc.contributor.authorApolo M.J.
dc.contributor.authorOjeda M.
dc.contributor.authorLobel H.
dc.contributor.authorMendoza M.
dc.date.accessioned2024-05-30T17:18:04Z
dc.date.available2024-05-30T17:18:04Z
dc.date.issued2024
dc.description.abstractNews media outlets disseminate information across various platforms. Often, these posts present complementary content and perspectives on the same news story. However, to compile a set of related news articles, users must thoroughly scour multiple sources and platforms, manually identifying which publications pertain to the same story. This tedious process hinders the speed at which journalists can perform essential tasks, notably fact-checking. To tackle this problem, we created a dataset containing both related and unrelated news pairs. This dataset allows us to develop information retrieval models grounded in the principle of binary relevance. Recognizing that many Transformer-based models might be suited for this task but could overemphasize relationships based on lexical connections, we tailored a dataset to fine-tune these models to focus on semantically relevant connections in the news domain. To craft this dataset, we introduced a methodology to identify pairs of news stories that are lexically similar yet refer to different events and pairs that discuss the same event but have distinct lexical structures. This design compels Transformers to recognize semantic connections between stories, even when their lexical similarities might be absent. Following a human-annotation assessment, we reveal that BERT outperformed other techniques, excelling even in challenging test cases. To ensure the reproducibility of our approach, we have made the dataset and top-performing models publicly available.
dc.fuente.origenSCOPUS
dc.identifier.doi10.1007/978-3-031-56063-7_41
dc.identifier.issn16113349 03029743
dc.identifier.scopusidSCOPUS_ID:85189362886
dc.identifier.urihttps://repositorio.uc.cl/handle/11534/86101
dc.information.autorucFacultad de Comunicaciones; Mendoza Prado Marcelo; S/I; 62895
dc.language.isoen
dc.nota.accesocontenido parcial
dc.pagina.final493
dc.pagina.inicio486
dc.publisherSpringer Science and Business Media Deutschland GmbH
dc.revistaLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
dc.rightsacceso restringido
dc.subjectDocument ranking
dc.subjectNews gathering
dc.subjectNews IR
dc.subject.ddc70
dc.subject.deweyPeriodismoes_ES
dc.titleNews Gathering: Leveraging Transformers to Rank News
dc.typelibro
dc.volumen14610 LNCS
sipa.codpersvinculados62895
sipa.trazabilidadORCID;2024-05-30
Files