Les corpus comparables

This website brings to you a corpus of aligned stories from the Kiranti mythological cycle. In the course of doing fieldword on three Rai languages, Koyi, Thulung and Khaling, it became apparent that the traditional stories that were recorded were in fact versions of the same story. It is these stories that make up the corpus on this website: stories that have been collected in at least two versions, with those versions representing differences in speaker, dialect or language.

Typological interest

The typological interest of this type of comparable corpus is that it makes it possible to compare the ways the same event is described in different languages, but based on native narrative content as opposed to foreign content, be it image-based stimuli (Pear Story, Frog where are you, ...) or narrative (New Testament stories, ...)


Concept of comparable corpus

The basic concept of a comparable corpus is that the stories are collected independently for each language, and when the stories turn out the be the same, they are considered to be from the same proto-myth. The different versions (speakers, dialects, languages) are lined up side by side in pairs, and the segments that share narrative content are marked as being part of a "similarity". This is done in pairs for all versions of a given story, with the Similarities recorded in an excel spreadsheet and converted into a tag. When looking through the stories, sentences participating in a Similarity pairing are signaled by a special label; clicking on that label makes it possible to see all versions of the story which have been identified as sharing that Similarity. The idea for this type of alignment comes from parallel corpora--essentially translation equivalents of texts which are aligned sentence by sentence for the purposes of training Computer Assisted Translation software. Parallel corpora, because of the size limits on the corpora resulting from the need for translated material, has given way to Comparable corpora, which are aligned corpora of similar but not identical texts (for example, two newspaper articles describing the same sports event could be the basis for a comparable corpus). The Kiranti myths that make up our corpus presumably have the same origin, but because they have evolved in the different languages they are told in, they can be considered to be a comparable rather than parallel corpus.