Event organised by the Computational Humanities research group
9 May 2023 3pm BST (remote)
Enrique Manjavacas (Leiden University, The Netherlands), Historical Language Models and their application to Word Sense Disambiguation
Large Language Models (LLMs) have become the cornerstone of current methods in Computational Linguistics. As the Humanities look towards computational methods in order to analyse large quantities of text, the question arises as to how these models are best developed and applied to the specificities of their domains. In this talk, I will address the application of LLMs to Historical Languages, following up on the MacBERTh project. In the context of the development of LLMs for Historical Languages, I will address how they can be specifically fine-tuned with efficiency to tackle the problem of Word Sense Disambiguation. In a series of experiments relying on data from the Oxford English Dictionary, I will highlight how non-parametric and metric learning approaches can be an interesting alternative to traditional fine-tuning methods that rely on classifiers that learn to disambiguate specific lemmas.
Enrique Manjavacas Arevalo is currently a post-doc at the University of Leiden, working in the MacBERTh project developing Large Language Models for Historical Languages. He obtained a PhD at the University of Antwerp (2021) with a dissertation on computational approaches to text reuse detection.
The video of this seminar is available here: https://youtu.be/TnodLOKw-wY.