CIRCSE Resources in CLARIN-IT
A set of linguistic resources for Latin developed in the CIRCSE research center and within the LiLa project is now available in a dedicated collection of the ILC4CLARIN repository of CLARIN-IT: https://dspace-clarin-it.ilc.cnr.it/repository/xmlui/handle/000-c0-111/525
Resources currently available are:
- LiLa Lemma Bank: large collection of Latin lemmas each described with a set of grammatical and morphological information;
- Index Thomisticus Treebank: analytical and tectogrammatical annotation of a portion of the Index Thomisticus corpus;
- Latin Vallex v.1: valency lexicon;
- LatinAffectus: prior polarity lexicon;
- Index Graecorum Vocabulorum in Linguam Latinam: manually-corrected OCR of G.A. Saalfeld’s list of Latin loans from Ancient Greek (1874);
- Word Formation Latin: derivational morphology lexicon;
- EvaLatin 2020 Data: training and gold test data for lemmatizers and PoS taggers;
- The Etymological Dictionary of Latin and the other Italic Languages: collection of Proto-Italic and Proto-Indo-European reconstructed forms.