Sinophone Borderlands Data Collection and Management Workshop

Sinophone Borderlands Data Collection and Management Workshop

Petra Vaculíková10 Oct 2018Leave a comment


Dates:  17-19 December 2018
Location: Univerzitní 3, Olomouc, room 225 

This workshop deals with the data collection and management practices within the Sinophone Borderlands project. The aim is to present the methodologies to other project members, so that parallel data can be collected in multiple field sites, leading to interdisciplinary research within this project. The second aim is to inventorize the current and the best data management practices and to finalize the data management policy of the Sinophone Borderlands project.

The workshop takes place over three days, starting with a public lecture on Monday 17 December, followed by a two-day workshop dedicated to the data collection and management.

17 December 2018: Public Lecture by Professor Nicholas Evans (ANU)

Lecture title: Waving to the other side: the language of poetry in indigenous Australian song
Poetry and song are inextricably interwoven in most indigenous Australian traditions. And the poetic masterpieces found across the continent are little-known outside their immediate communities, tied up as they are with the intricacies of the languages they are sung in. As a result, Australia has little awareness of the many hundreds of Shakespeares, Keatses, and Bob Dylans whose poetic masterpieces are composed in First Nations languages. The same goes for the continent’s rich and varied indigenous musical traditions. In this talk I will seek to give a glimpse into the richness of the poetic language found across a number of north Australian communities I have worked in, focussing on allusive subtlety, inner feeling, multilingual characterisation, and the deployment of vocabulary and grammar for expressive nuance, and the role of song in maintaining language knowledge through the powerful emotional charge it generates.

I take the title of my talk from some lines of a Mayali song by the late and great Djorli Laywanga, a Dalabon songman: Kurebe ngadjowkke ngawayudwayudme, marrek berlnayiii, marrek nuk berlnayiii. ‘From the other side of the river I am waving, I couldn’t see your arm waving back, Maybe I missed your arm waving’. I hope that the close readings of several poetic masterpieces that I will undertake during the lecture will help span what we see and hear across the river.

Nicholas (Nick) Evans, ARC Laureate Fellow and Distinguished Professor of Linguistics at the Australian National University, directs the Australian Research Council Centre of Excellence for the Dynamics of Language (CoEDL). He has carried out wide-ranging fieldwork on indigenous languages of Australia and Papua New Guinea. The driving interest of his work is the interplay between documenting and describing the far-reaching diversity contained in the world’s endangered languages and the many humanistic and scientific questions they can help us answer.

In addition to book-length grammars and dictionaries of several Aboriginal languages (Kayardild, Bininj Gun-wok, Dalabon) and edited collections on numerous linguistic topics, he has published over 180 scientific papers. His crossover book Dying Words: Endangered Languages and What They Have to Tell Us, which sets out a broad program for engaging with the world’s dwindling linguistic diversity has been translated into French, Japanese, Korean and German, with a Chinese translation soon to appear.

He has also worked as a linguist, interpreter and anthropologist in two Native Title claims in northern Australia, and as a promotor of Aboriginal art.

Nick is a member of the Australian Academy of the Humanities, the Australian Social Sciences Academy, a corresponding member of the British Academy, and a recipient of the inaugural Anneliese Maier Forschungspreis from the Alexander von Humboldt Foundation / German Ministry of Science and Education, and the Ken Hale Award from the Linguistics Society of America.

18 December 2018: Data collection techniques and scope

Location: Univerzitní 3, Olomouc, room 225 
Session 1: Data collection (language, culture, policy)
9.00am: Opening

Most of the world’s 7,000 languages are no longer being learned by children, and by the end of this century it is likely that three quarters of them will be lost. Every few weeks an old person is buried – in the book and volume of their brain was the last and often unsuspected repository of an entire language and the knowledge it enfolds.

What do we lose when we bury such a person, and about what we can do to bring out as much of their knowledge as possible into a durable form that can be passed on to future generations. Drawing on fragile minority languages from around the world, it examines some of the key areas of knowledge that will be lost with language death – of the natural world, of the possibilities of language and the human mind, of deep history, of how to decipher ancient scripts. I conclude by asking what we can do to safeguard the world’s rich linguistic heritage. 

In this talk I will discuss the inclusion of etymological information, especially Sinitic loanwords, in the Indonesian dictionary KBBI (Kamus Besar Bahasa Indonesia) fifth edition. KBBI fifth edition is the latest edition of KBBI, the most comprehensive and authoritative Indonesian monolingual dictionary. Unlike its previous editions, KBBI fifth edition uses a database, it is mainly online-based (https://kbbi.kemdikbud.go.id), updated more regularly, and will be enriched by etymological information from 2019. This etymological information is valuable for a language like Indonesian that contains loanwords from various languages which belong to different language families such as Austronesian (Old Javanese), Indo-European (Sanskrit, Persian, Portuguese, Dutch, English), Dravidian (Tamil), Semitic (Arabic), and Sinitic (Hokkien, Cantonese, Mandarin). The richness of loanwords in Indonesian reflects its speakers’ contact with other speakers from different nations and shapes the modern Indonesian language. Especially for Sinitic languages, there are various languages and dialects: Hokkien (Amoy/Xiamen dialect 廈門話, Chiangchiu/Zhangzhou dialect 漳州話, Foochow/Fuzhou dialect 福州話, Teochew/Chaozhou dialect 潮州話, Tsoanchiu/Quanzhou dialect 泉州話), Cantonese 廣東話, Hakka 客家話, Wu (Ningbo dialect 寧波話) and Mandarin. Data collection is based on the headwords in KBBI fifth edition. The etymological information from reliable sources is added to the selected words. Other than data collection, the technical part to actualize the inclusion of etymological information involves programming and KBBI database restructuration. The existing KBBI database will be augmented with etymological information-related tables containing the original scripts of the loanwords and the tree-relationships within them as well as between them and the entries (words) in KBBI, enabling KBBI online page to present etymological trees of the loanwords accurately.

10.15am: Coffee break

Předmětem přednášky bude rozbor možností terénního výzkumu jako hlavní metodologické strategie sociální a kulturní antropologie. Zvláštní pozornost bude věnována inovativním přístupům v terénním výzkumu s důrazem na možnosti, které nabízí vizuální antropologie a antropologie prostoru (spatial anthropology) pro zkoumání kulturních jevů. V prvním bude věnována pozornost možnostem zkoumání kulturních mentálních reprezentací prostřednictvím nativní kresby. Ve druhém případě budou prezentovány postupy, jak zkoumat kulturní jevy ve velkém měřítku s využitím výzkumných nástrojů geografie v antropologie za účelem studia sociálně-prostorového vzorce.

This presentation assesses the possibilities and constraints when accessing legal and political documents issued by the PRC’s agencies. It predominantly focuses on materials related to the Xinjiang Uyghur Autonomous Region. Apart from the publicly available governmental sources, it discusses the opportunities to access the official sources, that are not meant to leave China because of their problematic and sensitive nature. Furthermore, I will briefly explain the current state of affairs when trying to obtain a first-hand data and information which are not subjected to governmental censorship.
In sociology and international relations, first-hand data collection often involves going to the field to do interviews and participant observations. The first-hand data are stored and analyzed together often with second-hand data in software such as Atlas.ti. In this talk, I will illustrate this data collecting and analyzing process through a case of my research on the Chinese NGOs‘ interaction with the government. Afterward, I will explain additional methods my team members have been using, and how other units might assist us in the data collection process.
12.00pm: Lunch
The Sinophone Project collects different categories of material culture, however every category has its own function within the project. On one hand there are series of similar items (bags, amulets, knifes, cups etc.) that are collected to compare shapes and forms of the same. On the other hand representative examples of items constitute another category, which are collected to show and illustrate the work of every single project in an exhibition. The latter category is formed in words of spatial metaphors (barn, temple, smithery). A third category is constituted by photos of architecture that also are part of the material culture collection. The presentation will discuss the various needs to document these three categories.
The aim of this presentation is to map the ancient Silk Road territories using cabbage, and, in a greater sense, Brassica genus crops, as a marker for tracing the linguistically-related exchanges between Chinese, Central Asian (Turkic) and European languages. Using the historical example of Silk Road and its subsequent diachronic analysis, we are attempting to recreate the ancient route from the former Chinese capital of Chang’an (modern Xi’an) to the Mediterranean. The special focus of this talk is the introduction of our data collection and management methods.

By comparing various aspects related to cabbage production, usage, processing, with special emphasis on cultural and linguistic aspects, we aim to identify the factors contributing to cultural plants’ dispersal and material culture artefact exchange between Europe and China.

As a part of this research proposal, our aim is to provide a methodological framework for effective linguistic, as well as botanical data collection, to be executed at a wider scale and potentially used for other researches, even for such surpassing our original subject matter.

As a preliminary, our design is to determine the appropriate sites of fieldwork, ranging from Central China through Central Asia, all the way to the Mediterranean Sea. After collecting relevant data from these diverse places with adherence to our previously devised methodological framework, a primarily linguistic analysis is in order, to identify the relations in language and culture exchange. Secondarily, our focus is on the identification of cultural practices of these ethnically diverse areas. To conclude this summary, the ambition of our endeavour is to create a multifaceted and complex intersection between linguistics, anthropology, geography, history and botany. 

2.30pm: Coffee break

19 December 2018: Data management and data mining

Location: Univerzitní 3, Olomouc, room 225 

The first part of my talk will be about the design principles behind  NorthEuraLex, our lexicostatistical database of Northern Eurasia, with  a special focus on our workflow for extracting and aggregating lexical  information from published sources, as well as our infrastructure for  generating and maintaining a unified IPA encoding across 107 languages  from 21 families.

In the second part, I will describe our ongoing efforts to add a rich  etymological annotation layer to the database. Here, I will focus on  our steps towards a unified representation format that is general  enough to represent data from any etymological source in a natural  way, while allowing for automated aggregation across sources.  Also, I  present initial ideas for (and possible a prototype of) an automated  conflict resolution procedure which allows to generate elementary  etymological annotations (e.g. cognacy judgments) according to a  user-specified reliability ranking.

Robert Forkel (Max Planck Institute for the Science of Human History, Jena, Germany) leads the data management group of the Department for Linguistic and Cultural Evolution at MPI SHH. Having worked on the publication of cross-linguistic databases in the CLLD project his work now focuses on scaling the data management approaches pioneered in this project to the next level, integrating ethnographical, archeological and linguistic data in a shared framework. Key components of this framework are lightweight data specifications like CLDF and reference catalogs like Glottolog and Concepticon, as well as training programs like the Spring School on Quantitative Methods.
10.30am: Coffee break
The goal of this talk is to provide participants with a basic understanding of how computational tools and methods generally used in Natural Language Processing and Corpus Linguistics can be employed to broaden the scope of the work done within the fields of Humanities and Social Sciences. I will illustrate some of the possible contributions of employing such methods using case studies (i.e. using projects conducted both at NTU and abroad) — while trying to frame Digital Humanities as a newly available and sometimes necessary dimension of research in Humanities and Social Sciences.
12.00pm: Lunch
2.30pm: Coffee
after 3.30pm: Olomouc city tour

Invited speakers

Nicholas (Nick) Evans, ARC Laureate Fellow and Distinguished Professor of Linguistics at the Australian National University, directs the Australian Research Council Centre of Excellence for the Dynamics of Language (CoEDL). He has carried out wide-ranging fieldwork on indigenous languages of Australia and Papua New Guinea. The driving interest of his work is the interplay between documenting and describing the far-reaching diversity contained in the world’s endangered languages and the many humanistic and scientific questions they can help us answer.

In addition to book-length grammars and dictionaries of several Aboriginal languages (Kayardild, Bininj Gun-wok, Dalabon) and edited collections on numerous linguistic topics, he has published over 180 scientific papers. His crossover book Dying Words: Endangered Languages and What They Have to Tell Us, which sets out a broad program for engaging with the world’s dwindling linguistic diversity has been translated into French, Japanese, Korean and German, with a Chinese translation soon to appear.

He has also worked as a linguist, interpreter and anthropologist in two Native Title claims in northern Australia, and as a promotor of Aboriginal art.

Nick is a member of the Australian Academy of the Humanities, the Australian Social Sciences Academy, a corresponding member of the British Academy, and a recipient of the inaugural Anneliese Maier Forschungspreis from the Alexander von Humboldt Foundation / German Ministry of Science and Education, and the Ken Hale Award from the Linguistics Society of America.

Dr. Johannes Dellert (University of Tübingen, Germany) is a computational linguist who currently works as a lecturer and researcher at the Department of Linguistics in Tübingen. His current research focuses on the development of novel approaches to the interpretation of non-standard language (as part of Prof. Detmar Meurers’ ICALL research group), and on developing new interactive tools for historical linguistics as a continuation of his PhD work in Prof. Gerhard Jäger’s group. As part of the latter, Dr. Dellert is also the main contributor and coordinator for the NorthEuraLex database.

Dr. Robert Forkel (Max Planck Institute for the Science of Human History, Jena, Germany) leads the data management group of the Department for Linguistic and Cultural Evolution at MPI SHH.

Luís Morgado Da Costa (Nanyang Technological University, Singapore) is a cognitive scientist with a wide range of interests, currently focusing his work on computational linguistics. He is currently a PhD student at the Interdisciplinary Graduate School, Nanyang Technological University (NTU), in Singapore. Before that, he was a research associate in the Computational Linguistics Lab, Division of Linguistics and Multilingual Studies, also at NTU, working on several projects, ranging from Natural Language Parsing and Generation, Computational Lexicography, Computer Assisted Language Learning, as well as general Mandarin Chinese and Japanese Linguistics.

The main focus of his current research is to model diverse aspects of linguistic knowledge in ways it can be applied to different tasks (e.g. Machine Translation, Word Sense Disambiguation, Computer Assisted Language Learning, etc). He works mainly with English and Mandarin Chinese, but has also done work with other languages such as Japanese, Kristang, Portuguese, Indonesian, Coptic and Abui.

He a member of DELPH-IN, sharing the communal commitment to develop open source NLP tools and resources for deep linguistic processing of natural languages, a member of the Global Wordnet Association, and a member of NTU’s Digital Humanities Cluster.

Dr. David Moeljadi (Nanyang Technological University, Singapore and soon Sinophone Project, Olomouc) graduated from Nanyang Technological University in Singapore in 2018, works as data scientist at Traveloka Services Pte. Ltd. in Singapore, a leading Southeast Asia online Indonesian travel company. His research is about computational linguistics, grammar engineering, treebanking, corpus, dictionaries and lexicography. He built an open-source computational grammar for Indonesian called INDRA (Indonesian Resource Grammar) which can parse and generate sentences and a treebank called JATI. He develops Wordnet Bahasa, a large scale open-source semantic dictionary of the Malay languages (Malaysian and Indonesian). He collaborates with the Language Development and Cultivation Agency, under the Ministry of Education and Culture of the Republic of Indonesia, created and develops a database for the authoritative Indonesian dictionary KBBI (Kamus Besar Bahasa Indonesia) and a database for loanwords in Indonesian. He collaborates with The CJK Dictionary Institute, Inc. in Japan and works as the chief translator of The Kanji Learner’s Dictionary: Indonesian Edition. Together with researchers from Tokyo University of Foreign Studies, He worked in a project to build an open-source morphological dictionary and analyser for Malay/Indonesian (MALINDO Morph), and in a project to build an Asian language parallel corpus (TALPCo). David is a member of the Indonesian Association for Lexicography and a member of the Deep Linguistic Processing with HPSG (DELPH-IN) research consortium.

Write a comment

Categories

Recent Posts

12 Sep 2019

Modernization in South-East Asia and Western Pacific

in Olomouc, October 10-11, 2019 Organizer: Martin Soukup The objective...

Read more

29 Aug 2019

Konstantinos Tsimonis: ChinaThreat.eu: Securitizing Chinese economic presence in Europe

Date 16. 10. 2019 10:00 Location Trainee centre 3.40, Vodární 6,...

Read more

29 Aug 2019

Alan Chong: Singapore Engages China’s Belt and Road Initiative: The Pitfalls and Promises of Soft Strategies

Date 25. 9. 2019 10:00 Location Trainee centre 3.40, Vodární 6,...

Read more

26 Aug 2019

Bill Hayton: The Recent Developments in the South China Sea

Date 18. 9. 2019 10:00 Location Trainee centre 3.40, Vodární 6,...

Read more

19 Jul 2019

Preciosa de Joya: The Filipino-Chinese (Tsinoys) and their narratives of loyalty to the motherland

Date 10. 7. 2019 10:00 Location Trainee centre 3.40, Vodární 6,...

Read more

02 Jul 2019

Jan Mrázek: Encounters with China and Singapore in Riau Islands and Indonesian ‘South China Sea’: Images, Stories, Nets

Date 3. 7. 2019 10:00 Location Trainee centre 3.40, Vodární 6,...

Read more

14 Jun 2019

The 1st International Workshop on Cantonese Syntax

The 1st International Workshop on Cantonese Syntax will be held on 27-28 June,...

Read more

13 Jun 2019

Hermann Aubie: Freedom of expression and repression in China from 1989 to 2019

When 19 June 2019, 14:00-15:00 Where Trainee Center, 3.40, Vodární...

Read more

13 Jun 2019

Malcom Cook: Australia’s Myth-Busting Relation with China

When 19 June 2019, 11:00-12:00 Where Trainee Center, room 3.40,...

Read more

10 Jun 2019

Tuong Vu: Rethinking War and Revolution in Vietnam

When: June 19, 2019, 10:00-11:00 Where: Trainee Center, 3.40,...

Read more

31 May 2019

Chinese Agriculture Abroad

When June 7, 2019, 09:30 -17:00; June 8, 2019, 10:00 -15:00 ...

Read more

23 May 2019

Security and Risks of Fieldwork

When May 29, 2019, 10:00 -15:00 Where Trainee Centre 3. 40,...

Read more

22 May 2019

Jeanne Marie Stumpf-Carome: Ethnography: The Risky Business of Documenting Repression

When: May 29, 2019, 10:00-11:30 Where: Trainee Center, 3.40,...

Read more

21 May 2019

Pascal Abb: Think Tanks in China

When: May 28th, 2019, 15:00-16:30 Where: Trainee Centre, 3.40,...

Read more

20 May 2019

Shogo Suzuki: Will the AIIB trigger off a ‘new round of rivalry’ between China and Japan?

When: May 22, 2019, 10:00-11:00 Where: Trainee Centre, room 3.40,...

Read more

03 May 2019

Wolfram Schaffar: In Search of Theories and Concepts to Make Sense of BRI – World Systems Theory, the Imperial Chain and the Primacy of Inner Factors

When: Wednesday, March 15, 2019, 10:00-11:30 Where: Trainee Centre,...

Read more

30 Apr 2019

Mark Beeson: The China Challenge in the Asia Pacific

When: Monday, May 13, 2019, 11:30 -13:00 Where: Trainee center...

Read more

29 Apr 2019

Ian Ja Chong: Where Have I Seen This Before? Colonialism, the Cold War, and China’s Belt and Road

When: April, 24, 2019, 10:00 -11:30 Where: Trainee center 3.40,...

Read more

28 Apr 2019

Filip Kraus: Vietnamese Migration to Taiwan (ROC)

When: March 28, 2019, 15:00-16:30 Where: Room KC-2.04,...

Read more

27 Apr 2019

Ian Storey: The South China Sea Dispute: The Key Issues

When: March 27, 2019, 13:00-14:30 Where: Trainee center 3.40,...

Read more

12 Sep 2019

International workshop Peoples and languages of the Sino – Russian borderlands: Dauria

in Olomouc, October 9-10, 2019 Organizers: Ute Wallenböck, Veronika...

Read more

Latest Comments

Archives