Welcome to Marta Arisi, visiting PhD at the Department of Digital Humanities ✨

We are delighted to announce that Marta Arisi will be joining us from Sciences Po as a visiting PhD researcher at the Department of Digital Humanities at King’s College London.

During her stay she will be working with Jonathan Gray and exploring collaborations with researchers in the department and the Centre for Digital Culture.

Marta comments:

“Through my visit, I wish to share and explore research collaborations around my doctoral research on genealogies of ‘openness’ in AI. This project aims to situate AI openness in relation to other histories and practices of openness in digital culture, and to interrogate the implied universality of openness by refocusing on the structural inequalities and power asymmetries that data-driven systems can create and reproduce. In doing so, I reflect upon the values and concepts which are invoked to guide us to more hopeful digital futures.

More about her research can be found here.

Seminar | Part-of-Speech Tagging & Lemmatisation in Unedited Greek: Simple Tasks, Complex Challenges?

Event organised by the Computational Humanities research group.

To register to the seminar, please fill in this form by 1 December 2024.

10 December 2024 – 1.10pm GMT

Remote – Via Microsoft Teams.

Colin Swaelens (Ghent University), Part-of-Speech Tagging & Lemmatisation in Unedited Greek: Simple Tasks, Complex Challenges? 

Abstract

In today’s landscape of language technology, dominated by large language models, tasks like part-of-speech tagging and lemmatisation receive less attention in current NLP research. However, these tasks still pose significant challenges, especially for under-resourced, morphologically rich languages like Ancient Greek. Our project focuses on the verbatim transcriptions of Byzantine marginal poetry stored in the Database of Byzantine Book Epigrams (DBBE). Due to the highly interconnected nature of the poems, we aim to eventually perform similarity detection across the corpus. As a first step, we sought to annotate the DBBE with part-of-speech tags, morphological analyses, and lemmas. Although research on these tasks dates back to more straightforward rule-based systems from the 1970s, current taggers struggle with these unedited texts. The inconsistent orthography —largely due to itacism— adds to this complexity. To mitigate these issues, we trained a transformer-based language model encompassing classical, medieval, and modern Greek. Our experiments, however, revealed that fine-tuning the model for each annotation task was not always fruitful. There is a growing tendency to address such challenges with a multi-task head, allowing the model to process multiple annotations concurrently, drawing inspiration from cognitive psychology. This raises the question: will this more intricate solution outshine the seemingly more transparent methods of the past?

Bio

Colin Swaelens is a PhD student at the Language & Translation Technology Team (LT3) and the Database of Byzantine Book Epigrams (DBBE) at Ghent University, under supervision of dr. Ilse De Vos (Flanders AI Academy) and prof. Els Lefever (LT3). His PhD project is embedded in the project Interconnected texts: a graph-based computational approach to Byzantine paratexts as nodes between textual transmission and cultural and linguistic developments. Within this project, he is developing an annotation pipeline to provide all texts in DBBE with a part-of-speech tag, morphological analysis and lemma. This linguistic information will, in a next stage, be used within the development of a tool to detect similar verses in this corpus, serving the other subprojects on manuscript culture and formulaicity. 

Seminar |  Computational theatre research: leveraging large datasets and AI for the performing arts

Event organised by the Computational Humanities research group.

To register to the seminar, please fill in this form by 3 November 2024.

12 November 2024 – 1.10pm GMT

Remote – Via Microsoft Teams.

Miguel Escobar (NUS Singapore), Computational theatre research: leveraging large datasets and AI for the performing arts 

Abstract

Computational methods can better help us understand the history and current landscape of the performing arts. For example, we can use network analysis and simulations to study how collaborations within theatre companies develop over time, and how specific management decisions lead to different collaborative patterns. For this, we can take advantage of the records of theatre productions, which are increasingly available in digitized form. Recent advances in open-source AI models can also be used to extract detailed information from videos and texts. For example, we can fine-tune action segmentation models to identify culturally-specific performing conventions in video recordings of performances, and determine how their usage has changed over time. 

Bio

Miguel Escobar Varela is Associate Professor at the department of English, Linguistics, and Theatre Studies (ELTS) and deputy director of the Centre for Computational Social Science and Humanities (CSSH) at the National University of Singapore. In his research, he uses digital tools to document and study cultural heritage in Southeast Asia. He is the author of Theater as data (University of Michigan Press, 2021) and has written several articles on digital humanities and Indonesian theatre. A full list of his publications and digital projects is available at https://miguelescobar.com. He is also Associate Editor of the newly established journal Computational Humanities Research (Cambridge University Press). 

Seminar | Gender-Coded Sound: Analysing the Gendering of Music in Toy Commercials via Multi-Task Learning • 22 October 2024

Event organised by the Computational Humanities research group.

To register to the seminar, please fill in this form by 18 October 2024.

22 October 2024 – 3pm BST

In person – King’s College London, Bush House (SE)1.10 (FOR KCL STAFF AND STUDENTS ONLY)

Remote – Via Microsoft Teams

Luca Marinelli (Queen Mary University of London), Gender-Coded Sound: Analysing the Gendering of Music in Toy Commercials via Multi-Task Learning

Abstract

Music can convey ideological stances, and gender is just one of them. Evidence from musicology and psychology research shows that gender-loaded messages can be reliably encoded and decoded via musical sounds. However, much of this evidence comes from examining music in isolation, while studies of the gendering of music within multimodal communicative events are sparse. In this paper, we outline a method to automatically analyse how music in TV advertising aimed at children may be deliberately used to reinforce traditional gender roles. Our dataset of 606 commercials included music-focused mid-level perceptual features, multimodal aesthetic emotions, and content analytical items. Despite its limited size, and because of the extreme gender polarisation inherent in toy advertisements, we obtained noteworthy results by leveraging multi-task transfer learning on our densely annotated dataset. The models were trained to categorise commercials based on their intended target audience, specifically distinguishing between masculine, feminine, and mixed audiences. Additionally, to provide explainability for the classification in gender targets, the models were jointly trained to perform regressions on emotion ratings across six scales, and on mid-level musical perceptual attributes across twelve scales. Standing in the context of MIR, computational social studies and critical analysis, this study may benefit not only music scholars but also advertisers, policymakers, and broadcasters.

Bio

Luca is a PhD student at the UKRI CDT in Artificial Intelligence and Music at the Centre for Digital Music (C4DM), Queen Mary University of London, under the co-supervision of Dr C. Saitis, G. Fazekas, and Prof. P. Lucht (Center for Interdisciplinary Women’s and Gender Studies, Technical University of Berlin). His PhD project sits at the intersection of music data science, gender and media studies, aiming at implementing machine learning techniques to aid the critical analysis of gendered markers in large corpora of television adverts. 

AI in the Street: report from the London Observatory

Starting with a simple question “what does responsible AI look like from the street?” AI-in-the-street teams, one of them hosted at the Department of Digital Humanities, are undertaking creative participatory research in 5 cities the UK and Australia – London, Edinburgh, Coventry, Cambridge and Logan. These research-based interventions funded by AHRC’s  BRAID programme (Bridging Divides in Responsible AI) take the form of diagramming workshops, sensing walks and street-based activities, and will inform the scoping of a prototype for a “street-level observatory” for everyday AI: a digital showcase and protocol for rendering the presence, role and effects of AI-based technologies visible and/or tangible for everyday publics in the street.

This image illustrates AI in the street with a women looking out on a wild imaginative map.

Image: Anne Fehres and Luke Conroy & AI4Media / Better Images of AI Licenced by CC-BY 4.0

The research of the London Observatory was conducted in three city locations – Science Gallery (London Bridge),  Martello Street Studios (London Fields) and Hermitage Community Moorings (Wapping) hosted by Ambient Information Systems with Yasmine Boudiaf. In our primarily discursive workshops of up to 3hrs length we had in total 18 participants, and the approach, which evolved over the three sessions, combined role-playing, experience-sharing, exploring counterfactuals and terminology through algorithmically-guided conversation, envisioning exercises, and collaborative drawing. Audio/video recordings, texts and images from the sessions form the basis of a manifestation as artwork (in progress).

The workshops were grounded in three key considerations  

• identities in the street, noting in particular that we may inhabit multiple and fluid identities (as parent accompanying child, as cyclist) 

• needs and desires that the street actually or potentially fulfils, or fails to; the extent to which technologies including AI meet these needs and desires, either as currently deployed or as imagined, and unintended effects (which may have uneven impact)  

• alternative, non-technological solutions to these needs and desires. 

While participants had limited awareness of the extent of AI deployment in the street, they were tech-literate and understood the deeper societal and legal implications of AI systems. Most participants noted the lack of users’ voices in design and implementation processes, and several pointed to the energetic and environmental costs of AI. One participant had expert-level domain knowledge, but even they described finding AI systems as opaque in multiple ways, from problem specification and design, to terms of engagement and access, to the origin, processing and fate of data.

There was widespread agreement that, despite their potential agility and precision, AI technologies are entangled in an ossified economic model that centralises power away from citizens and relegates environmental costs as externalities. Devolution of human agency to machine systems was seen as of mixed utility. Other concerns raised included poor problem specification leading to ‘solutionism’ and function creep, and the general vulnerability of complex technological systems. 

The findings of the London Observatory will be combined with those of Edinburgh, Coventry, Cambridge and Logan (Australia), and presented at the Science Gallery on Thursday, 12th September 2024 at 6.30pm.

Text by AI in the Street, Mukul Patel and Mercedes Bunz.

AI in the street is funded under the AHRC BRAID programme (Bridging Divides in Responsible AI). BRAID is a 3-year national research programme funded by the UKRI Arts and Humanities Research Council (AHRC), led by the University of Edinburgh in partnership with the Ada Lovelace Institute and the BBC. 

Data Driven Classics: Exploring the Power of Shared Datasets

Workshop organised by Andrea Farina (Department of Digital Humanities) and George Oliver (Department of Classics).


The Department of Digital Humanities at King’s College London is excited to announce a unique opportunity for scholars interested in the intersection of Classics and digital methodologies. We invite you to participate in our upcoming event entitled Data Driven Classics: Exploring the Power of Shared Datasets on 5th July 2024.

Date: 5th July 2024

Time: 10:00 AM – 5:00 PM

Venue: King’s College London, Embankment Room MB-1.1.4 (Macadam Building, Strand Campus)

About the Workshop:

The study of the ancient world increasingly relies on curated datasets, emphasising the importance of data sharing and reproducibility for open research in today’s technologically interconnected world. In this context, the workshop aims to achieve two main objectives:

  1. Raise awareness on the significance of datasets, data papers, and data-sharing for Classics.
  2. Guide classicists in identifying, utilising, and sharing datasets within the scientific community.

The workshop will consist of a one-day programme featuring engaging presentations, hands-on sessions, and roundtable discussions led by experts in the field. In the morning session, our four invited speakers will explore the importance of data-sharing and present case studies of published datasets in Classics, covering linguistic and historical-geographical perspectives. This will be followed by a general discussion on data use and sharing.

Dr Mandy Wigdorowitz (University of Cambridge), Humanities has a place in the open research and data sharing ecosystem.

Paola Marongiu (University of Neuchâtel), Collecting, creating, sharing and reusing data in Classics: an overview of the best practices.

Mathilde Bru (University College London), Building and publishing a dataset as a Classicist.

Prof Claire Holleran (University of Exeter), Working with epigraphic datasets: mapping migration in Roman Hispania.

In the early afternoon, participants will engage in hands-on activities, working in groups to describe datasets and identify their potential for reuse. They are encouraged to bring their own datasets, if available, to receive feedback from both the workshop facilitators and fellow participants. Feedback will focus not only on the quality of the data itself but also on the best practices for sharing it (e.g., format, open repository, deposition process). For those who do not have their own datasets, we will provide sample datasets to familiarise themselves with various repository types and data formats. Participants will also have the opportunity to learn about different platforms for data sharing and essential elements such as creating a README file and understanding its purpose. Discussions will also cover vital aspects such as licensing options and the significance of obtaining a DOI for datasets.

Who can attend:

This workshop is open to postgraduate students, researchers, and staff members interested in Classics, regardless of their level of expertise in digital methodologies. We especially encourage participation from those with an interest in linguistics, archaeology, history, and related fields. Participants are sought within and outside King’s College London. Preference will be given to applicants whose cover letters demonstrate that their research projects or professional pursuits benefit from the event. We also aim to maintain a balanced representation across disciplinary backgrounds.

Registration and logistics:

Seats for this workshop are limited. To apply for participation, please email Andrea Farina and George Oliver at andrea.farina[at]kcl.ac.uk and george.oliver[at]kcl.ac.uk attaching a cover letter no longer than one page in .pdf format and writing “Data Driven Classics Registration” as the subject of your email. In your cover letter, please state your name, affiliation, position (student, PhD student, Lecturer etc.), email address, and your field in Classics (e.g., linguistics, history, etc.), and explain why you would like to attend the workshop and how it can benefit your research.

There is no registration fee for this event. However, participants are responsible for covering their travel expenses through their own institutions. The workshop will accommodate a maximum of 25 participants to ensure adequate assistance during the hands-on session.

Important dates:

Deadline to submit expression of interest with cover letter: 22nd May 2024.
Notification of acceptance: 31st May 2024.
Event: 5th July 2024.

Contact Information:

For any inquiries or further information, please contact Andrea Farina at andrea.farina[at]kcl.ac.uk or George Oliver at george.oliver[at]kcl.ac.uk.

Art x Public AI report launched

A new report investigating the potential of Public AI and its importance for artists and cultural institutions is now available for free to download and to read online

The AI stack

Studying Public AI, the report lays out the various layers of the AI stack from, data to the AI models and other software components to the natural resources that AI systems need to function.

It has been written by the Serpentine’s Arts Technologies team using research by the Creative AI Lab, a collaboration between the Serpentine and the Department of Digital Humanities, KCL and is part of their annual Future Art Ecosystems publication series. The report is structured around the following research questions: 

  • What is Public AI?
  • What technologies constitute the AI stack? 
  • How can cultural organisations exert agency within and around AI systems?
  • What are the different strategies for artists to experiment and intervene in AI systems?

Instead of approaching AI as a singular, monolithic technology driven by a small number of technology companies, the report considers how present-day AI capabilities, rooted in data, and model and compute components, come into being by means of this technical stack that integrates natural resources, systems, and technologies to produce the necessary hardware and software. By viewing the technology through this composite lens, pathways for intervention or the building of alternatives becomes more imaginable. 


The report explores a number of these divergent pathways for specific layers of the AI stack, structured around three chapters: Organisation, Artist, and Ecosystem. In chapter 1, public institutions are assessed as keepers and even stewards of important datasets. The transformation of their operational logic by the introduction of AI infrastructure if also considered.

Process for training pre-training and fine-tuned models

The changing role of the artist becomes central in chapter 2, alongside how artists can make impactful interventions at specific layers of the AI stack. Following the suggestion from artists Holly Herndon and Mat Dryhurst that ‘all media is now training data’, the data used to train large AI models is termed ‘shadow labour’ because of the work that creating such data originally involved. Here, the role and limits of ‘intellectual property’ rights are considered. 


But artists are more empowered in the AI context than this might initially suggest. A typology is proposed that explores the different ways that artists work with tools: as ‘users’ of closed-off ‘black boxed’ tool such as text-to-image generators like DALL-E; as ‘auditors’ who might be interested in tinkering with open source tools, and finally ‘competitors’ who build alternatives for the open source community, or to bring to market. Finally, in chapter 3, a set of recommendations for co-ordinated action across the cultural sector is laid out.

Though not a definitive treatment of the intersection between AI, the public interest, and the cultural sector, the report Art x Public AI makes the case for the ecosystem of ‘art and advanced technologies’ as a laboratory for new approaches to building and intervening across the layers of the AI stack. 

The purpose of the report is to provide analyses, concepts and strategies for cultural organisations, artists and the broader art and advanced technologies ecosystem responding to the technosocial transformations of AI systems on culture. Based on the belief that culture is a public good, the report shows that public and non-public entities are deeply entangled in every layer of the AI stack and presents recommendations for reclaiming the public in AI and steering its development for the public good.

Launch of the Public AI report – photo by by Sam Nightingale

Watch a presentation and Q&A from Reference Point, London with the briefing’s lead researchers and authors: Eva Jäger, Victoria Ivanova, and Alasdair Milne (KCL PhD student). You can also follow the work of Future Art Ecosystems by subscribing to their newsletter, here

The Creative AI Lab & Serpentine Arts Technologies

Seminar: How does language change and variation affect our ML models? • 7 May 2024

Event organised by the Computational Humanities research group.

To register to the seminar, please fill in this form by 2 April 2024.

7 May 2024 – 3pm BST

In person – King’s College London

Remote – Via Microsoft Teams

Haim Dubossarsky (School of Electronic Engineering and Computer Science, Queen Mary University of London), How does language change and variation affect our ML models?

Abstract

Applied Machine Learning techniques, especially in the textual domain, have introduced us to the craft of transfer learning. Simply put, we can take an off-the-shelf model, fine-tune it on a curated training set specific to the task at hand, and expect performance improvements. This approach became even more promising with the emergence of multilingual models such as XLM-R or mBERT, which allow fine-tuning on a task in language X and expect performance gains on the same task in language Y, at least in theory. However, languages exhibit diverse behaviours in different contexts, and fine-tuning a model for a specific task may degrade its performance in slightly different linguistic contexts that were not initially considered.

Furthermore, cross-lingual transfer learning relies heavily on assumptions about underlying linguistic factors shared between languages, many of which have not been thoroughly tested. In this talk, Dr Haim Dubossarsky will focus on two recent works that highlight the limitations of the common approach in modern ML application. The first demonstrates how linguistic input perturbations, stemming from language changes due to reclaimed language, significantly impede the performance of hate speech detection models. In the second work, Dr Haim Dubossarsky will illustrate how multilingual language models fail to transfer from English to Hindi in a polysemy detection task, despite the promise of multilingual support. He will then propose potential solutions to these challenges.

Bio

Dr Haim Dubossarsky is a Lecturer in the School of Electronic Engineering and Computer Science at Queen Mary University of London, an Affiliated Lecturer in the Language Technology Lab at the University of Cambridge, and a recently appointed Turing Fellow. His research focuses on Natural Language Processing (NLP) and Artificial Intelligence (AI), with a particular emphasis on the intersection of linguistics, cognition, and neuroscience.

His work has made significant contributions to the emerging field of computational semantic change and has delved into investigating the societal impact and biases of modern NLP tools. Haim employs advanced mathematical and computational methods across disciplines, enriching research and pushing the boundaries of knowledge in NLP and related fields. His interdisciplinary approach often uncovers novel research questions that were previously inaccessible through more traditional methods.

Seminar: Examining temporality in historical photographs • 9 April 2024

Event organised by the Computational Humanities research group.

To register to the seminar, please fill in this form by 2 April 2024.

9 April 2024 – 3pm BST

Remote – Via Microsoft Teams

Melvin Wevers (University of Amsterdam, Examining temporality in historical photographs

Abstract

In this talk, Dr Melvin Wevers explores the capacity of computer vision models to discern temporal information in visual content, focusing specifically on historical photographs. He investigates the dating of images using OpenCLIP, an open-source implementation of CLIP, a multi-modal language and vision model. The experiment consists of three steps: zero-shot classification, fine-tuning, and analysis of visual content. Dr Melvin Wevers is currently expanding this study with a deeper examination of the role that people and their fashion play in conveying temporal information. During the talk, he will showcase the most recent results and run through the codebase used for this study.

Bio

Dr Melvin Wevers is an Assistant Professor in Digital History at the University of Amsterdam. His research focuses on the application of computational methods to model historical processes. He does this by combining insights from the philosophy of history with the affordances of modelling techniques, such as time series analysis, Bayesian statistics, deep/machine learning, and information theory. Dr Melvin Wevers is also one of the founding members of the Computational Humanities Research Conference.