Dataset for Xiang Subgrouping
This dataset is divided into two parts: the first represents an evaluation of the data in the Linguistic Atlas of of Chinese Dialects (Cao 2008) for uniqueness and generality for 3 core Xiang varieties (Changsha, Shaoyang, Lianyuan) and 2 non-Xiang varieties (Changde, Chaling). 1 non-core Xiang variety was also included for reference (Hengyang). It is divided into three sections based on the type of linguistic features covered, corresponding to the three volumes of the Atlas: Phonology, Lexicon, and Grammar. Each dataset contains 8-10 columns: (1) Variety (which variety of the six is being considered); (2) Feature (the feature from the Atlas); (3) Value (the value for that location given in the Atlas); (4) Middle Chinese Interpretation (ONLY Phonological Dataset; the value of the feature in the Qieyun, using Baxter's transcription); (5) Dialectal Rendition (the form the feature takes in the published description of the variety); (6) Implied Sound Change (ONLY Phonological Dataset; the assumed sound change from Middle Chinese); (7) Unique; (8) General; (9) Map Number (relevant page and map number in the Linguistic Atlas of Chinese Dialects); (10) Notes.
Uniqueness means that a feature does not occur in non-Xiang varieties, while Generality means a feature occurs in all Xiang varieties; this is indicated in the dataset with 'Yes' (positive value), 'No' (negative value), and N/A (irrelevant). Features deemed particularly relevant for subgrouping purposes, i.e. are particularly rare or unique, are indicated in yellow highlight. If a feature in the Atlas does not agree with the published description of a variety, this is indicated under Notes as 'Mismatch', with the relevant mismatched feature indicated in red lettering. In the lexical and grammatical datasets, the 'Dialectal Rendition' column (Column 4) focuses on features which are unique and deemed helpful to subgrouping purposes; that is, rows that are highlighted in yellow. If a row is highlighted but lacks a value for this column, it means a relevant form could not be identified in the published description for that variety.
The second dataset ('Innovations') represents a list of 147 linguistic innovations evaluated for 13 Sinitic language varieties (9 Xiang, 4 non-Xiang), with '1' meaning 'possesses innovation' and '0' meaning 'does not possess innovation'. A value of 'NA' means that the value could not be determined for that variety for a lack of data.