**The first public version of the Database will be released in 2017**
Project director/contact: Christoph Anderl (Christoph.Anderl@UGent.be)
Project co-director: Ann Heirman
Project co-director (external advisor for digitization and manuscript mark-up): Marcus Bingenheimer (Temple University)
Manuscript mark-up: Lín Jìnghuì 林靜慧 (PhD)
Programmers: Christian Bell and Jan Schrupp
Internships (MA students): Suzuki Harada (Dec. 2015 - April 2016), Ruth Vervaet Faye (Feb. - April 2016)
Collaborating institution: Chung-hwa Institute of Buddhist Studies (Taiwan)
Funding agencies: FWO and BOF (Special Research Fund)
In the beginning of 2014, a new project on the analysis of the syntax of Medieval Chinese was initiated in the framework of the Ghent Centre for Buddhist Studies and a Pegasus Marie-Curie Grant. The focus is on manuscript texts from Dūnhuáng and also includes the production of high-quality digital editions in collaboration with the Chung-hwa Institute of Buddhist Studies (DILA, Taiwan). Since 2015, the project receives funding from the BOF.
The Database’s main focus is currently on the period of ca. 700-1100 CE, with an emphasis on the analysis of semi-vernacular (and other) texts from the Dūnhuáng corpus, in addition to a selection of texts dating from the Five Dynasties and early Sòng period.
The DB is designed as a flexible set of interconnected modules and XML data collections (in an eXist environment). The modules and sub-databases/collections are continuously adapted to specific research questions (including the design and adaptation of the input masks). The basic modules are “Syntax” (registering basic information of function words) and “Sentence Analysis” (featuring a Tree Generator and Sentence Parsing of example sentences). Connected to the DB is a growing number of TEI compatible marked-up digital editions of Dūnhuáng manuscripts, as well as a large bibliography.
Illustration: From manuscript to digital text - Example of the process of transcription and encoding of line 1 of ms. Pelliot 2634: (1) The first line of the manuscript facsimile is indicated by the red box; (2) shows the encoding of the first line into TEI-compatible XML in oXygen; (3) the red box shows the transformation of (2) into a html webpage, reflecting the ms. features (“diplomatic version”); (4) shows a html transformation into “normalized” text (“regularized version”); this version will be the basis for further grammatical mark-up and linking to the database (illustration based on the mark-up of Buddhist texts conducted under the guidance of Marcus Bingenheimer).
For a thorough description of the encoding work and conventions, see here.
The flexible and expandable structure of the DB aims at accommodating different needs of various users:
Illustration: DB output of the analysis of an example sentence (MCDB, accessed 01/10/2015)