Event organised by the Computational Humanities research group
16 May 2023 3pm BST (remote)
Piroska Lendvai and Claudia Wick (Bavarian Academy of Sciences and Humanities, Germany), Finetuning Latin BERT for Word Sense Disambiguation on the Thesaurus Linguae Latinae
The Thesaurus Linguae Latinae (TLL) is a comprehensive monolingual dictionary that records contextualized meanings and usages of Latin words in antique sources at an unprecedented scale. We created a new dataset based on a subset of sense representations in the TLL, with which we finetuned the Latin BERT neural language model (Bamman and Burns, 2020) on a supervised Word Sense Disambiguation task. We observe that the contextualized BERT representations finetuned on TLL data score better than static embeddings used in a bidirectional LSTM classifier on the same dataset, and that our per-lemma BERT models achieve higher and more robust performance than reported by Bamman and Burns (2020) based on data from a bilingual Latin dictionary. We discuss the differences in sense organizational principles between these two lexical resources, and report about our dataset construction and improved evaluation methodology.
Piroska Lendvai (PhD) works at the Digital Humanities R&D Department of the Bavarian Academy of Sciences and Humanities (Munich, Germany), where she supports research in Humanities and Social Sciences via tools and approaches from language technology.
Claudia Wick (PhD) works as a lexicographer in the Thesaurus linguae Latinae project at the Bavarian Academy of Sciences and Humanities (Munich, Germany) that targets the compilation of a comprehensive dictionary for ancient Latin. In her spare time she pursues programming. To register to this seminar, please email Barbara McGillivray at Barbara.firstname.lastname@example.org