Browsing by Author "Mendoza Rocha, Marcelo"
Now showing 1 - 8 of 8
Results Per Page
Sort Options
- ItemA Study on Information Disorders on Social Networks during the Chilean Social Outbreak and COVID-19 Pandemic(2023) Mendoza Rocha, Marcelo; Valenzuela Leighton, Sebastián Andrés; Núñez-Mussa, Enrique; Padilla Arenas, Fabián; Providel, Eliana; Campos, Sebastián; Bassi, Renato; Riquelme, Andrea; Aldana, Valeria; López, ClaudiaInformation disorders on social media can have a significant impact on citizens’ participation in democratic processes. To better understand the spread of false and inaccurate information online, this research analyzed data from Twitter, Facebook, and Instagram. The data were collected and verified by professional fact-checkers in Chile between October 2019 and October 2021, a period marked by political and health crises. The study found that false information spreads faster and reaches more users than true information on Twitter and Facebook. Instagram, on the other hand, seemed to be less affected by this phenomenon. False information was also more likely to be shared by users with lower reading comprehension skills. True information, on the other hand, tended to be less verbose and generate less interest among audiences. This research provides valuable insights into the characteristics of misinformation and how it spreads online. By recognizing the patterns of how false information diffuses and how users interact with it, we can identify the circumstances in which false and inaccurate messages are prone to becoming widespread. This knowledge can help us to develop strategies to counter the spread of misinformation and protect the integrity of democratic processes.
- ItemDetection and impact estimation of social bots in the Chilean Twitter network(2024) Mendoza Rocha, Marcelo; Providel, Eliana; Santos, Marcelo; Valenzuela, SebastiánThe rise of bots that mimic human behavior represents one of the most pressing threats to healthy information environments on social media. Many bots are designed to increase the visibility of low-quality content, spread misinformation, and artificially boost the reach of brands and politicians. These bots can also disrupt civic action coordination, such as by flooding a hashtag with spam and undermining political mobilization. Social media platforms have recognized these malicious bots’ risks and implemented strict policies and protocols to block automated accounts. However, effective bot detection methods for Spanish are still in their early stages. Many studies and tools used for Spanish are based on English-language models and lack performance evaluations in Spanish. In response to this need, we have developed a method for detecting bots in Spanish called Botcheck. Botcheck was trained on a collection of Spanish-language accounts annotated in Twibot-20, a large-scale dataset featuring thousands of accounts annotated by humans in various languages. We evaluated Botcheck’s performance on a large set of labeled accounts and found that it outperforms other competitive methods, including deep learning-based methods. As a case study, we used Botcheck to analyze the 2021 Chilean Presidential elections and discovered evidence of bot account intervention during the electoral term. In addition, we conducted an external validation of the accounts detected by Botcheck in the case study and found our method to be highly effective. We have also observed differences in behavior among the bots that are following the social media accounts of official presidential candidates.
- ItemEvaluating GPT-4o in high-stakes medical assessments: performance and error analysis on a Chilean anesthesiology exam(2025) Altermatt Couratier, Fernando René; Neyem, Andrés; Sumonte Fuenzalida, Nicolás Ignacio; Villagrán Gutiérrez, Ignacio Andrés; Mendoza Rocha, Marcelo; Lacassie Quiroga, Héctor; Delfino Yurin, AlejandroBackground Large language models (LLMs) such as GPT-4o have the potential to transform clinical decision-making, patient education, and medical research. Despite impressive performance in generating patient-friendly educational materials and assisting in clinical documentation, concerns remain regarding the reliability, subtle errors, and biases that can undermine their use in high-stakes medical settings. Methods A multi-phase experimental design was employed to assess the performance of GPT-4o on the Chilean anesthesiology exam (CONACEM), which comprised 183 questions covering four cognitive domains—Understanding, Recall, Application, and Analysis—based on Bloom’s taxonomy. Thirty independent simulation runs were conducted with systematic variation of the model’s temperature parameter to gauge the balance between deterministic and creative responses. The generated responses underwent qualitative error analysis using a refined taxonomy that categorized errors such as “Unsupported Medical Claim,” “Hallucination of Information,” “Sticking with Wrong Diagnosis,” “Non-medical Factual Error,” “Incorrect Understanding of Task,” “Reasonable Response,” “Ignore Missing Information,” and “Incorrect or Vague Conclusion.” Two board-certified anesthesiologists performed independent annotations, with disagreements resolved by a third expert. Statistical evaluations—including one-way ANOVA, non-parametric tests, chi-square, and linear mixed-effects modeling—were used to compare performance across domains and analyze error frequency. Results GPT-4o achieved an overall accuracy of 83.69%. Performance varied significantly by cognitive domain, with the highest accuracy observed in the Understanding (90.10%) and Recall (84.38%) domains, and lower accuracy in Application (76.83%) and Analysis (76.54%). Among the 120 incorrect responses, unsupported medical claims were the most common error (40.69%), followed by vague or incorrect conclusions (22.07%). Co-occurrence analyses revealed that unsupported claims often appeared alongside imprecise conclusions, highlighting a trend of compounded errors particularly in tasks requiring complex reasoning. Inter-rater reliability for error annotation was robust, with a mean Cohen’s kappa of 0.73. Conclusions While GPT-4o exhibits strengths in factual recall and comprehension, its limitations in handling higher-order reasoning and diagnostic judgment are evident through frequent unsupported medical claims and vague conclusions. These findings underscore the need for improved domain-specific fine-tuning, enhanced error mitigation strategies, and integrated knowledge verification mechanisms prior to clinical deployment.
- ItemImitating Human Reasoning to Extract 5W1H in News(ACM Digital Library, 2025) Muñoz Castro, Carlos José; Mendoza Rocha, Marcelo; Löbel Díaz, Hans-Albert; Keith, BrianExtracting key information from news articles is crucial for advancing search systems. Historically, the 5W1H framework, which organises information based on ’Who’, ’What’, ’When’, ’Where’, ’Why’, and ’How’, has been a predominant method in digital journalism empowering search tools. The rise of Large Language Models (LLMs) has sparked new research into their potential for performing such information extraction tasks effectively. Our study examines a novel approach to employing LLMs in the 5W1H extraction process, particularly focusing on their capacity to mimic human reasoning. We introduce two innovative Chain-of-Thought (COT) prompting techniques to extract 5W1H in news: extractive reasoning and question-level reasoning. The former directs the LLM to pinpoint and highlight essential details from texts, while the latter encourages the model to emulate human-like reasoning at the question-response level. Our research methodology includes experiments with leading LLMs using prompting strategies to ascertain the most effective approach. The results indicate that COT prompting significantly outperforms other methods. In addition, we show that the effectiveness of LLMs in such tasks depends greatly on the nature of the questions posed.
- ItemLearning to cluster urban areas: two competitive approaches and an empirical validation(2022) Vera Villa, Camila; Lucchini Wortzman, Francesca; Bro, Naim; Mendoza Rocha, Marcelo; Löbel Díaz, Hans-Albert; Gutiérrez, Felipe; Dimter, Jan; Cuchacovic, Gabriel; Reyes, Axel; Valdivieso López, Hernán Felipe; Alvarado Monardez, Nicolás; Toro, SergioUrban clustering detects geographical units that are internally homogeneous and distinct from their surroundings. It has applications in urban planning, but few studies compare the effectiveness of different methods. We study two techniques that represent two families of urban clustering algorithms: Gaussian Mixture Models (GMMs), which operate on spatially distributed data, and Deep Modularity Networks (DMONs), which work on attributed graphs of proximal nodes. To explore the strengths and limitations of these techniques, we studied their parametric sensitivity under different conditions, considering the spatial resolution, granularity of representation, and the number of descriptive attributes, among other relevant factors. To validate the methods, we asked residents of Santiago, Chile, to respond to a survey comparing city clustering solutions produced using the different methods. Our study shows that DMON is slightly preferred over GMM and that social features seem to be the most important ones to cluster urban areas.
- ItemPerformance of single-agent and multi-agent language models in Spanish language medical competency exams(Springer Nature, 2025) Altermatt Couratier, Fernando René; Neyem, Andrés; Sumonte Fuenzalida, Nicolás Ignacio; Mendoza Rocha, Marcelo; Villagrán Gutiérrez, Ignacio Andrés; Lacassie Quiroga, HéctorBackground Large language models (LLMs) like GPT-4o have shown promise in advancing medical decision-making and education. However, their performance in Spanish-language medical contexts remains underexplored. This study evaluates the effectiveness of single-agent and multi-agent strategies in answering questions from the EUNACOM, a standardized medical licensure exam in Chile, across 21 medical specialties. Methods GPT-4o was tested on 1,062 multiple-choice questions from publicly available EUNACOM preparation materials. Single-agent strategies included Zero-Shot, Few-Shot, Chain-of-Thought (CoT), Self-Reflection, and MED-PROMPT, while multi-agent strategies involved Voting, Weighted Voting, Borda Count, MEDAGENTS, and MDAGENTS. Each strategy was tested under three temperature settings (0.3, 0.6, 1.2). Performance was assessed by accuracy, and statistical analyses, including Kruskal–Wallis and Mann–Whitney U tests, were performed. Computational resource utilization, such as API calls and execution time, was also analyzed. Results MDAGENTS achieved the highest accuracy with a mean score of 89.97% (SD = 0.56%), outperforming all other strategies (p < 0.001). MEDAGENTS followed with a mean score of 87.99% (SD = 0.49%), and the CoT with Few-Shot strategy scored 87.67% (SD = 0.12%). Temperature settings did not significantly affect performance (F2,54 = 1.45, p = 0.24). Specialty-level analysis showed the highest accuracies in Psychiatry (95.51%), Neurology (95.49%), and Surgery (95.38%), while lower accuracies were observed in Neonatology (77.54%), Otolaryngology (76.64%), and Urology/Nephrology (76.59%). Notably, several exam questions were correctly answered using simpler single-agent strategies without employing complex reasoning or collaboration frameworks. Conclusions and relevance Multi-agent strategies, particularly MDAGENTS, significantly enhance GPT-4o’s performance on Spanish-language medical exams, leveraging collaboration to improve diagnostic accuracy. However, simpler single-agent strategies are sufficient to address many questions, high-lighting that only a fraction of standardized medical exams require sophisticated reasoning or multi-agent interaction. These findings suggest potential for LLMs as efficient and scalable tools in Spanish-speaking healthcare, though computational optimization remains a key area for future research.
- ItemReduciendo la ambigüedad interpretativa en un entorno educativo con ChatGPT(2025) García Varela, Francisco José Andrés; Nussbaum Voehl, Miguel; Mendoza Rocha, Marcelo; Pontificia Universidad Católica de Chile. Escuela de IngenieríaEste estudio plantea que tanto las palabras concretas como las abstractas son fundamentales para una comunicación efectiva, especialmente en contextos educativos donde la interacción entre estos tipos de lenguaje se vincula con teorías lingüísticas, cognitivas y de estratificación social. Un desafío clave es equilibrar la eficiencia del lenguaje abstracto para transmitir conceptos complejos con la accesibilidad del lenguaje concreto, que mejora la comprensión estudiantil. Los lenguajes generativos, gracias a su capacidad para manipular símbolos, ofrecen una vía para abordar este desafío al facilitar la representación y exploración estructurada y sistemática de conceptos abstractos en sus contextos. La pregunta de investigación central fue: “¿Cómo pueden los lenguajes generativos ayudar a los actores educativos a articular sus ideas y acciones con mayor claridad al identificar y refinar términos abstractos?”. Para explorar esto, se desarrolló un protocolo en inglés para ChatGPT-4, con lineamientos y prompts estructurados orientados a apoyar a los usuarios en el logro de objetivos educativos específicos. En un estudio piloto con 13 participantes, ChatGPT-4 entregó retroalimentación, sugirió mejoras y guió interacciones escritas. Uno de los autores observó a los participantes, tomó notas sobre su comportamiento y sostuvo breves discusiones posteriores para conocer sus impresiones. Posteriormente, los participantes enviaron reflexiones por correo electrónico. El proceso ayudó a transformar respuestas abstractas en formulaciones más concretas, mejorando la claridad y el vínculo con los contenidos educativos. El protocolo demostró ser eficaz para vincular teorías pedagógicas abstractas con la aplicación práctica en aula, entrenando a docentes en el uso de descripciones vívidas, escenarios cercanos y ejemplos tangibles. Este estudio ilustra cómo la inteligencia artificial (IA) puede integrar con éxito principios de enseñanza y teorías del aprendizaje para mejorar las prácticas educativas.
- ItemSimulating conversations on social media with generative agent-based models(2025) Jeon, Min Soo; Mendoza Rocha, Marcelo; Fernández Pizarro, Miguel; Providel, Eliana; Rodríguez Bórquez, Felipe; Espina Quilodrán, Nicolás Gonzalo; Carvallo, Andrés; Abeliuk, AndrésLarge Language Models (LLMs) can generate realistic text resembling human-produced content. However, the ability of these models to simulate conversations on social media is still less explored. To investigate the potential and limitations of simulated text in this domain, we introduce network-simulator, a system to simulate conversations on social media. First, we simulate the macro structure of a conversation using Agent-Based Modeling (ABM). The generated structure defines who interacts with whom, the type of interaction, and the agent’s stance on the topic of the conversation. Subsequently, using the simulated interaction structure, our system generates prompts conditioned on the simulation variables, producing texts that are conditioned on the parameters of the predefined structure, guiding a micro simulation process. We compare human conversations with those simulated by our system. Based on stylistic and model-based metrics, we found that our system can simulate conversations on social media in various dimensions. However, we detected differences in metrics related to the predictability of text production. Furthermore, we examine the effect of true and false framings within simulated conversations, revealing that simulated discussions surrounding false information exhibit a more negative collective sentiment bias than those based on true content.
