Semantic Annotation for Digital Humanities (F-AG 7)

Project content

In the first phase of CLARIN-D, two curation projects were conducted: „Implementation of a web-based annotation platform (WebAnno)“ and „Development of guidelines and Best practices for annotation of non-standard varieties of German“. The aim of the new curation project „Semantic Annotation for Digital Humanities“ is to consolidate the successful work of the previous curation projects and to extend them in novel directions. The focus of the new curation project is on semantic annotation for Digital Humanities. It is divided into three work packages:

A. Consolidation and further development of WebAnno for practical use in DH projects

In order to provide better support for semantic annotation layers as well as user-defined annotations, new functionalities will be made available in WebAnno:

  • Template-based annotations – filling predefined elements (slots) in predicate-argument structure annotation, or in event annotation;

  • Constraints – context-based restrictions on target element annotations.

The new functionalities will be implemented in interaction with cooperation partners as active users.

For appropriate dissemination in the community, WebAnno will be integrated into the CLARIN infrastructure and offered as a CLARIN service.

B. Curation of resources for semantic annotation and further annotation of the NoSta-D corpus

The aim of work package B is to develop a prototype for linked lexical semantic resources for German (including a LOD representation) and a robust annotation scheme for concepts and predicate-argument structures for annotation of concepts and events in DH projects. Here, the curation project focuses on the following tasks:

  1. Linking existing (GermaNet, SALSA) and newly developed (UBY) lexical semantic resources for German following the model of the Unified Verb Index.

  2. Exploring guidelines and annotation formats for WSD (similar to OntoNotes) and SRL (FrameNet, VerbNet-style). Selected non-standard corpora will be annotated according to these schemas.

C. Supporting Shared‑Tasks for German for selected annotation types

Jointly with the national organizations (GSCL, DGfS-CL) we will support shared-task initiatives for various annotation types. The first editions of shared-tasks for Named Entity Recognition (NER) and Sentiment Tagging were successfully conducted during KONVENS 2014. A further task on PoS-Tagging for internet-based communication language data is being supported by GSCL. Possible shared tasks to be supported by the curation project include dependency parsing for non-standard language varieties (building on curation project 2), or the analysis of compounds for German.

Project duration

  • 01.03.2015 – 29.02.2016

Applicants

Responsible institutions

  • Institut für Computerlinguistik, Universität Heidelberg

  • Fachbereich Informatik, Technische Universität Darmstadt

Project management

  • Silvana Hartmann

  • Eva Mujdricza-Maydt

  • Seid Muhie Yimam

Cooperation partners

  • Prof. Dr. Phillip Cimiano, Universität Bielefeld

  • Prof. Dr. Stefanie Dipper, Universität Berlin

  • Prof. Dr. Gerhard Heyer, Universität Leipzig

  • Prof. Dr. Anke Lüdeling, Universität Bochum

  • Prof. Bolette Sandford Petersen, Universität Kopenhagen

  • Prof. Dr. Angelika Storrer, Universität Mannheim

  • CLARIN-D-Zentrum Tübingen (Prof. Dr. Erhard Hinrichs)

  • CLARIN-D-Zentrum Hamburg: CLARIN-D Helpdesk

Project website / references

  • http://www.cl.uni-heidelberg.de/projects/clarin-d/activities.mhtml

  • https://www.lt.informatik.tu-darmstadt.de/de/research/clarin-d-webanno-webbased-annotation-tool-for-linguistic-annotations/

  • Bonial, C., Stowe, K. & Palmer, M. (2013): Renewing and Revising SemLink. In: Proc. of LDL-2013: Representing and linking lexicons, terminologies and other language data, pp. 9-17.

  • Burchardt, A., Erk, K. & Frank, A. (2005): A WordNet Detour to FrameNet. In: Proc. of the GLDV 2005 GermaNet II Workshop, PP. 408-421.

  • Burchardt, A., Erk, K., Frank, A., Kowalski, A., Padó, S. & Pinkal, M. (2009): Using FrameNet for the Semantic Analysis of German: Annotation, Representation and Automation. Boas, H. C. (ed.), Multilingual FrameNets in Computational Lexicography - Methods and Applications, pp. 209-244, Mouton de Gruyter.

  • Burchardt, A., Padó, S., Spohr, D., Frank, A. & Heid, U. (2008): Constructing Integrated Corpus and Lexicon Models for Multi-Layer Annotations in OWL DL. Linguistic Issues in Language Technology, 1, pp. 1-33.

  • Cholakov, K., Eckle-Kohler, J. & Gurevych, I. (2014): Automated Verb Sense Labelling Based on Linked Lexical Resources. In: Proc. of EACL 2014, pp. 68-77.

  • Eckart de Castilho, R., Biemann, C., Gurevych, I. and Yimam, S.M. (2014): WebAnno: a flexible, web-based annotation tool for CLARIN. In Proceedings of the CLARIN Annual Conference (CAC) 2014, Soesterberg, Netherlands.

  • Fürstenau, H. & Lapata, M. (2012): Semi-supervised Semantic Role Labeling via Structural Alignment. Computational Linguistics, 38(1): pp. 135-171.

  • Gurevych, I., Eckle-Kohler, J., Hartmann, S., Matuschek, M., Meyer, C. M. & Wirth, C. (2012): UBY - A Large-Scale Unified Lexical-Semantic Resource Based on LMF. In: Proc. of EACL 2012, pp. 580-590.

  • Hartmann, S. & Gurevych, I. (2013): FrameNet on the Way to Babel: Creating a Bilingual FrameNet Using Wiktionary as Interlingual Connection. In: Proc. of ACL 2013, pp. 1363-1373.

  • Palmer, M. 2009. Semlink: Linking PropBank, VerbNet and FrameNet. Proceedings of the Generative Lexicon Conference.: GenLex-09.

  • Yimam, S.M., Eckart de Castilho, R., Gurevych, I., and Biemann C. (2014): Automatic Annotation Suggestions and Custom Annotation Layers in WebAnno In: Proceedings of ACL-2014, demo session, Baltimore, MD, USA

  • Yimam, S.M., Gurevych, I., Eckart de Castilho, R., and Biemann C. (2013): WebAnno: A Flexible, Web-based and Visually Supported System for Distributed Annotations. In Proceedings of ACL-2013, demo session, Sofia, Bulgaria.