Daidalos: NER for Literary Studies on Latin and Ancient Greek Texts

  • Andrea Beyer (Autor/in)

Abstract

Literary texts offer a wealth of unstructured data that can be harnessed for data-driven text analysis through Natural Language Processing (NLP). Named Entity Recognition and Classification (NER) is a crucial initial step in this process, enabling the automatic identification of entities such as persons, organizations, locations, and dates. However, NER faces significant challenges, particularly with historical texts in low-resource languages like Latin and Ancient Greek, due to limited annotated corpora and the dynamic nature of language. This paper explores the evolution of NER from simple extraction to semantics-aware entity disambiguation and linking, highlighting the importance of multi-layer annotation systems to enhance data quality and model accuracy. The interdisciplinary Daidalos project aims to bridge the gap between Digital Humanities and Classical Studies by providing an NLP infrastructure that supports various data-driven research methods, among others NER. One of the project’s case studies demonstrates the potential of NER in Classical literary studies; this is accompanied by proposals on other NER related literary research questions, e.g. on authorship attribution and stereotyping. Additionally, the paper offers some thoughts about teaching NER, presenting a framework to assess the required level of digital literacies when working on a specific research question. Finally, it discusses the implications of generative AI and Large Language Models (LLM) on NER and NLP in Classics, emphasizing the challenges for independent research posed by the high costs and limited transparency of LLMs.

Statistiken

loading
Veröffentlicht
2026-05-08
Sprache
Englisch
Schlagworte
Named Entity Recognition and Classification, Natural Language Processing, Large Language Models, AI