LiLa: Linking Latin

Building a Knowledge Base of Linguistic Resources for Latin

Screenshot of an ScG treebanked sentence

Summa contra Gentiles treebank now complete!

It’s with great pride that we announce that the treebank annotation of the Summa contra Gentiles is now complete! The result of almost ten years’ worth of manual work, the Summa contra Gentiles brings the total number of annotated nodes in the Index Thomisticus Treebank (IT-TB) to 460,000, corresponding to over 26,000 sentences.

The latest ConLL and PML files are available for download from the IT-TB website. Next steps include:

  1. The conversion of the fourth and final book of the Summa contra Gentiles to the Universal Dependencies (UD) format; the first three books can be downloaded from the UD website.
  2. The retraining of the IT-TB parsing pipeline, currently available via the Treex::Web platform. This Latin pipeline performs tokenisation, PoS tagging, morphological analysis, dependency parsing and basic semantic role labelling.

So watch this space for updates!

As work commences on the annotation of the Summa Theologiae (!), we wish to extend our warmest congratulations to our colleagues Marinella Testori and Marco Passarotti for this incredible achievement!