site Centre for Buddhist Studies

Database of Medieval Chinese Syntax

Collaborative project:

The Database of Medieval Chinese Syntax (DMCS)

**The first public version of the Database will be released in 2017**


Project director/contact: Christoph Anderl (
Project co-director: Ann Heirman
Project co-director (external advisor for digitization and manuscript mark-up):    Marcus Bingenheimer (Temple University)
Manuscript mark-up: Lín Jìnghuì 林靜慧 (PhD)
Programmers: Christian Bell and Jan Schrupp 

Internships (MA students): Suzuki Harada (Dec. 2015 - April 2016), Ruth Vervaet Faye (Feb. - April 2016)

Collaborating institution: Chung-hwa Institute of Buddhist Studies (Taiwan)

Funding agencies: FWO and BOF (Special Research Fund)

In the beginning of 2014, a new project on the analysis of the syntax of Medieval Chinese was initiated in the framework of the Ghent Centre for Buddhist Studies and a Pegasus Marie-Curie Grant. The focus is on manuscript texts from Dūnhuáng and also includes the production of high-quality digital editions in collaboration with the Chung-hwa Institute of Buddhist Studies (DILA, Taiwan). Since 2015, the project receives funding from the BOF.

The Database’s main focus is currently on the period of ca. 700-1100 CE, with an emphasis on the analysis of semi-vernacular (and other) texts from the Dūnhuáng corpus, in addition to a selection of texts dating from the Five Dynasties and early Sòng period.

The DB is designed as a flexible set of interconnected modules and XML data collections (in an eXist environment). The modules and sub-databases/collections are continuously adapted to specific research questions (including the design and adaptation of the input masks). The basic modules are “Syntax” (registering basic information of function words) and “Sentence Analysis” (featuring a Tree Generator and Sentence Parsing of example sentences). Connected to the DB is a growing number of TEI compatible marked-up digital editions of Dūnhuáng manuscripts, as well as a large bibliography.

Illustration: From manuscript to digital text - Example of the process of transcription and encoding of line 1 of ms. Pelliot 2634: (1) The first line of the manuscript facsimile is indicated by the red box; (2) shows the encoding of the first line into TEI-compatible XML in oXygen; (3) the red box shows the transformation of (2) into a html webpage, reflecting the ms. features (“diplomatic version”); (4) shows a html transformation into “normalized” text (“regularized version”); this version will be the basis for further grammatical mark-up and linking to the database (illustration based on the mark-up of Buddhist texts conducted under the guidance of Marcus Bingenheimer).

For a thorough description of the encoding work and conventions, see here.

The flexible and expandable structure of the DB aims at accommodating different needs of various users:

  • Research tool: The data registered is the basis of original research on grammatical phenomena of Medieval Chinese. In addition, the data collected will be the basis of a monograph on the grammar of MC (Anderl, Christoph: A Grammar of Late Medieval Chinese. Brill Handbook Series. Leiden, planned for publication in 2018). In addition, the DB is adjusted to the research requirements of specific Ghent University PhD projects on Chinese historical linguistics and Buddhist studies (e.g., from 2016-2020, BOF will fully fund a PhD project on phonetic loan characters in Dūnhuáng texts).
  • Reference tool: The DB aims at developing into a useful tool for reading/translating/analyzing MC texts.
  • Learning/teaching: The “Syntax” collection is organized in the form of “chapters” and “subchapters”; as such, grammatical phenomena can be studied systematically. In addition, the Tree Generator and Sentence Parsing tools can be used in classroom.
  • Training of advanced students: Since Autumn 2015 Ghent master students will be able to do their obligatory internships (1-3 months at a company or institution) at the Ghent Center for Buddhist Studies, being trained in and working on aspects DB. Currently, two master students are doing their internship at the GCBS.


Illustration: DB output of the analysis of an example sentence (MCDB, accessed 01/10/2015)