Redes neuronales para extracción de información relevante de sentencias legales

Suárez Carbonell, Lucas Andrés

Redes neuronales para extracción de información relevante de sentencias legales

dc.catalogador	aba
dc.contributor.advisor	Barceló Baeza, Pablo
dc.contributor.author	Suárez Carbonell, Lucas Andrés
dc.contributor.other	Pontificia Universidad Católica de Chile. Escuela de Ingeniería
dc.date.accessioned	2023-11-17T19:56:01Z
dc.date.available	2023-11-17T19:56:01Z
dc.date.issued	2023
dc.description	Tesis (Magíster en Ciencias de la Ingeniería)--Pontificia Universidad Católica de Chile, 2023.
dc.description.abstract	En los últimos anos el Procesamiento de Lenguaje Natural, desde ahora PLN, ha utilizado técnicas de Aprendizaje Automático para representar fragmentos de texto. La introducción de la arquitectura del Transformer (Vaswani et al., 2017) y posteriormente de BERT (Devlin et al., 2018) junto con su versión más pequeña ALBERT (Lan et al., 2019) revolucionaron el estado del arte en PLN, imponiéndose como estándar para resolver tareas que involucren el modelamiento computacional de lenguaje. Una de estas tareas corresponde a sumarización extractiva, donde el objetivo es crear un resumen de un texto dado seleccionando y extrayendo frases y oraciones clave del documento original. Una de las limitaciones que aparecen con el uso de BERT en este tipo de tareas corresponde al tamaño máximo que tienen los transformers para procesar el texto de entrada, lo que dificulta el trabajo con documentos largos. En este trabajo utilizamos BERT y otros modelos de lenguaje similares para construir un sistema que permita obtener la jurisprudencia de una sentencia legal de la Corte Suprema. Para ello, se propone una arquitectura capaz de encapsular la información en dos niveles: a nivel de bloque de texto y a nivel de documento, para luego realizar una clasificación binaria de cada una de los bloques. Para validar que el modelo propuesto es capaz de resolver la tarea se realizaron pruebas sobre el dataset de documentos legales BillSum (Kornilova & Eidelman, 2019), alcanzando resultados comparables con modelos del estado del arte en términos de ROUGE.
dc.description.abstract	In recent years, Natural Language Processing (NLP) has used Machine Learning techniques to represent text fragments. The introduction of the architecture of the Transformer (Vaswani et al., 2017) and later of BERT (Devlin, Chang, Lee, & Toutanova, 2018) together with its smaller version ALBERT (Lan et al., 2019) revolutionized the state of the art in NLP, establishing itself as a standard for solving tasks that involve the computational language modeling. One of these tasks corresponds to extractive summarization, where the goal is to create a summary of a given text by selecting and extracting key phrases and sentences from the original document. One of the limitations that appear with the use of BERT in this type of task corresponds to the maximum size that the transformers have to process the input text, which makes it difficult to work with long documents. In this work we use BERT and other similar language models to build a system that allows us to obtain the jurisprudence of a legal sentence of the Supreme Court. For this, an architecture capable of encapsulating the information in two levels is proposed: a text block level and a document level, to then carry out a binary classification of each one of the blocks. To validate that the proposed model is capable of solving the task, tests were carried out on the legal documents dataset BillSum (Kornilova & Eidelman, 2019), reaching comparable results with state-of-the-art models in terms of ROUGE.
dc.fechaingreso.objetodigital	2023-11-17
dc.format.extent	x, 35 páginas
dc.fuente.origen	SRIA
dc.identifier.doi	10.7764/tesisUC/ING/75332
dc.identifier.uri	https://doi.org/10.7764/tesisUC/ING/75332
dc.identifier.uri	https://repositorio.uc.cl/handle/11534/75332
dc.information.autoruc	Instituto de Ingeniería Matemática y Computacional; Barceló Baeza, Pablo; 0000-0003-2293-2653; 13516
dc.information.autoruc	Escuela de Ingeniería; Suárez Carbonell, Lucas Andrés; S/I; 233086
dc.language.iso	es
dc.nota.acceso	Contenido completo
dc.rights	acceso abierto
dc.subject	Procesamiento de Lenguaje Natural
dc.subject	Resumen extractivo
dc.subject	Transferencia de Aprendizaje
dc.subject	Transformer
dc.subject	Modelos Preentrenados
dc.subject	Documentos Largos
dc.subject	Sentencias Judiciales
dc.subject	Natural Language Processing
dc.subject	Extractive Summarization
dc.subject	Transfer Learning
dc.subject	Transformers
dc.subject	Pretrained Models
dc.subject	Long Documents
dc.subject	Legal Sentences
dc.subject.ddc	620
dc.subject.dewey	Ingeniería	es_ES
dc.title	Redes neuronales para extracción de información relevante de sentencias legales
dc.type	tesis de maestría
sipa.codpersvinculados	13516
sipa.codpersvinculados	233086

Files

Original bundle

Now showing 1 - 1 of 1

Name:: TESIS_LSuárez_Firma Final.pdf
Size:: 687.84 KB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.98 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

3.01 Tesis magíster