workshop & conferences

Data Collection and Management Workshop

Date & Time

December 17-19, 2018


Palacký University, Křižkovského 10, block C, auditorium 3.05  & Univerzitní 3, room 225

About workshop

This workshop deals with the data collection and management practices within the Sinophone Borderlands project. The aim is to present the methodologies to other project members, so that parallel data can be collected in multiple field sites, leading to interdisciplinary research within this project. The second aim is to inventorize the current and the best data management practices and to finalize the data management policy of the Sinophone Borderlands project.


Poetry and song are inextricably interwoven in most indigenous Australian traditions. And the poetic masterpieces found across the continent are little-known outside their immediate communities, tied up as they are with the intricacies of the languages they are sung in. As a result, Australia has little awareness of the many hundreds of Shakespeares, Keatses, and Bob Dylans whose poetic masterpieces are composed in First Nations languages. The same goes for the continent’s rich and varied indigenous musical traditions. In this talk I will seek to give a glimpse into the richness of the poetic language found across a number of north Australian communities I have worked in, focussing on allusive subtlety, inner feeling, multilingual characterisation, and the deployment of vocabulary and grammar for expressive nuance, and the role of song in maintaining language knowledge through the powerful emotional charge it generates



Nicholas (Nick) Evans, ARC Laureate Fellow and Distinguished Professor of Linguistics at the Australian National University, directs the Australian Research Council Centre of Excellence for the Dynamics of Language (CoEDL). He has carried out wide-ranging fieldwork on indigenous languages of Australia and Papua New Guinea. The driving interest of his work is the interplay between documenting and describing the far-reaching diversity contained in the world’s endangered languages and the many humanistic and scientific questions they can help us answer.

In addition to book-length grammars and dictionaries of several Aboriginal languages (Kayardild, Bininj Gun-wok, Dalabon) and edited collections on numerous linguistic topics, he has published over 180 scientific papers. His crossover book Dying Words: Endangered Languages and What They Have to Tell Us, which sets out a broad program for engaging with the world’s dwindling linguistic diversity has been translated into French, Japanese, Korean and German, with a Chinese translation soon to appear.

He has also worked as a linguist, interpreter and anthropologist in two Native Title claims in northern Australia, and as a promotor of Aboriginal art.

Nick is a member of the Australian Academy of the Humanities, the Australian Social Sciences Academy, a corresponding member of the British Academy, and a recipient of the inaugural Anneliese Maier Forschungspreis from the Alexander von Humboldt Foundation / German Ministry of Science and Education, and the Ken Hale Award from the Linguistics Society of America.

Dr. Johannes Dellert (University of Tübingen, Germany) is a computational linguist who currently works as a lecturer and researcher at the Department of Linguistics in Tübingen. His current research focuses on the development of novel approaches to the interpretation of non-standard language (as part of Prof. Detmar Meurers’ ICALL research group), and on developing new interactive tools for historical linguistics as a continuation of his PhD work in Prof. Gerhard Jäger’s group. As part of the latter, Dr. Dellert is also the main contributor and coordinator for the NorthEuraLex database.

Dr. Robert Forkel (Max Planck Institute for the Science of Human History, Jena, Germany) leads the data management group of the Department for Linguistic and Cultural Evolution at MPI SHH.

Luís Morgado Da Costa (Nanyang Technological University, Singapore) is a cognitive scientist with a wide range of interests, currently focusing his work on computational linguistics. He is currently a PhD student at the Interdisciplinary Graduate School, Nanyang Technological University (NTU), in Singapore. Before that, he was a research associate in the Computational Linguistics Lab, Division of Linguistics and Multilingual Studies, also at NTU, working on several projects, ranging from Natural Language Parsing and Generation, Computational Lexicography, Computer Assisted Language Learning, as well as general Mandarin Chinese and Japanese Linguistics.

The main focus of his current research is to model diverse aspects of linguistic knowledge in ways it can be applied to different tasks (e.g. Machine Translation, Word Sense Disambiguation, Computer Assisted Language Learning, etc). He works mainly with English and Mandarin Chinese, but has also done work with other languages such as Japanese, Kristang, Portuguese, Indonesian, Coptic and Abui.

He a member of DELPH-IN, sharing the communal commitment to develop open source NLP tools and resources for deep linguistic processing of natural languages, a member of the Global Wordnet Association, and a member of NTU’s Digital Humanities Cluster.

Dr. David Moeljadi (Nanyang Technological University, Singapore and soon Sinophone Project, Olomouc) graduated from Nanyang Technological University in Singapore in 2018, works as data scientist at Traveloka Services Pte. Ltd. in Singapore, a leading Southeast Asia online Indonesian travel company. His research is about computational linguistics, grammar engineering, treebanking, corpus, dictionaries and lexicography. He built an open-source computational grammar for Indonesian called INDRA (Indonesian Resource Grammar) which can parse and generate sentences and a treebank called JATI. He develops Wordnet Bahasa, a large scale open-source semantic dictionary of the Malay languages (Malaysian and Indonesian). He collaborates with the Language Development and Cultivation Agency, under the Ministry of Education and Culture of the Republic of Indonesia, created and develops a database for the authoritative Indonesian dictionary KBBI (Kamus Besar Bahasa Indonesia) and a database for loanwords in Indonesian. He collaborates with The CJK Dictionary Institute, Inc. in Japan and works as the chief translator of The Kanji Learner’s Dictionary: Indonesian Edition. Together with researchers from Tokyo University of Foreign Studies, He worked in a project to build an open-source morphological dictionary and analyser for Malay/Indonesian (MALINDO Morph), and in a project to build an Asian language parallel corpus (TALPCo). David is a member of the Indonesian Association for Lexicography and a member of the Deep Linguistic Processing with HPSG (DELPH-IN) research consortium.


Day 1 – December 17, 2018
14:30, Křižkovského 10, block C, auditorium 3.05

Public lecture by Professor Nicholas Evans (ANU)

Lecture title: Waving to the other side: the language of poetry in indigenous Australian song

Day 2 – December 18, 2018: Data collection scope, techniques and manual
Univerzitní 3, room 225

Session 1: Data collection (language, culture, artifacts)
9.30Coffee time
9.45František KratochvílLinguistic data collection techniques and beyond (visuals)
10.30Volker GastParallel texts in research
11.15Olaf Günther and Tereza HejzlarováAnthropological data collection and artifacts
Session 2: Data collection (culture, knowledge systems)
13.30Martin SoukupAnthropological data collection techniques
14.15Ondřej Kučera, Dan Faltýnek, Kateřina Šamajová, Renata ČižmárováEthnobotanical data collection
15.00Alfred GerstlNarratives of political influence
Coffee & Discussion: Sinophone Borderlands Data collection Manual Contents

Evening: Dinner

Day 3 – December 19, 2018: Data management and mining
Univerzitní 3, room 225

Session 3: Data management (language databases)
9.30Coffee time
9.45Johannes Dellert (Jena)NorthEuraLex
10.30Robert Forkel (MPI)CLLD, D-PLACE and Archeological Data
11.15Luis Morgado Da Costa (NTU)Digital humanities tools for dummies
Session 4: Data management (culture, history)
13.30Olaf Günther & Tereza HejzlarováAnthropological and cultural data management
14.15Martin SoukupAnthropological data management
15.00Nicholas EvansEthical issues: property rights and repatriations – Australian perspective
Final Discussion, Coffee and Conclusion