LiLa: Linking Latin

Building a Knowledge Base of Linguistic Resources for Latin

Photo of the Duomo

Three papers accepted at CLiC-it 2021

The LiLa team has three papers accepted at the Eighth Italian Conference on Computational Linguistics (CLiC-it 2021,

  1. Linking the Lewis & Short Dictionary to the LiLa Knowledge Base of Interoperable Linguistic Resources for Latin. This paper describes the steps taken to include data from the Lewis & Short bilingual Latin-English dictionary into the Knowledge Base for linguistic resources for Latin LiLa. First, data were extracted from the original XML and matched with entries in LiLa, overcoming ambiguities and structural inconsistencies in the source. Subsequently, senses were modelled using the Ontolex Lemon Lexicographic module (lexicog), so that they can be included in the LiLa Knowledge Base and thus made interoperable with the (meta)data of the linguistic resources for Latin therein interlinked.
  2. The Annotation of Liber Abaci, a Domain-Specific Latin Resource. The Liber Abaci (XIIIth century) is a milestone in the history of mathematics and accounting. Due to the late stage of Latin, its features and to its very specialized content, it also represents a unique resource for scholars working on Latin corpora. In this paper we present the annotation and linking work carried out in the frame of the project Fibonacci 1202-2021. A gold standard lemmatization and Part Of Speech tagging allowed to elaborate first observations on the linguistic and historical features of the text, and to link the text to the \textit{Lila Knowledge Base}, that has as goal to make distributed linguistic resources for Latin interoperable, by following the principles of the Linked Data paradigm. Starting from this specific case, we discuss the importance of annotating and linking scientific and technical texts, in order to a) compare and search them together with other (non-technical) Latin texts b) train, apply and evaluate NLP resources on a non-standard variety of Latin. The paper also describes the fruitful interaction and coordination between NLP experts and traditional Latin scholars on a project requiring a large range of expertise.
  3. Sentiment Analysis of Latin Poetry: First Experiments on the Odes of Horace. In this paper we presents a set of annotated data and the results of a number of unsupervised experiments for the analysis of sentiment in Latin poetry. More specifically, we describe a small gold standard made of eight poems by Horace, in which each sentence is labeled manually for the sentiment using a four-value classification (positive, negative, neutral and mixed). Then, we report on how this gold standard has been used to evaluate two automatic approaches for sentiment classification: one is lexicon-based and the other adopts a zero-shot transfer approach.