An open archive to contribute to the preservation of the world's linguistic heritage

The Pangloss Collection hosts recordings of little-documented languages, which for the most part are currently endangered. These documents are painstakingly produced by professional linguists working to rescue the world's linguistic diversity, which is currently dwindling, parallel to the world's biodiversity. 

The target languages are typically studied in the field, in their geographic and social context. Dialectologists like to say that each word has a history of its own (Jaberg 1908: 6); likewise, each linguistic document has a history of its own. Linguistic resources are a result of the collaboration between the author of the document (a native speaker) and the visiting linguist, a collaboration which often extends over many years. Thus, Georges Dumézil referred to the last speaker of the Ubykh language as "my teacher and friend Tevfik Esenç". 

The Pangloss Collection developed over more than twenty years of sustained work by researchers and specialized engineers at CNRS. It grows year after year, through contributions that come from French research centres and their partners in various places across the globe.

For how many languages does the Pangloss Collection host data sets?

As of 2024, the collection hosts over 1,180 hours of recordings in 252 languages and dialects. Close to half of the resources (2 567 out of 5 714) are transcribed, annotated and translated, allowing listeners to access the contents. Which translation languages are used is up to the depositors: thus, someone working in Brazil may choose Portuguese rather than English as the main language of translation. If you would like to volunteer an additional translation (for instance, translating the Ubykh story "The goat and the sheep", which currently only has an English translation, into another language: German, Turkish, Russian, Chinese...), you are welcome to get in touch. All contributions are gratefully acknowledged in the documents' catalogue entry (their metadata).

Updated on the 15th of January 2024

Integration in international networks

The Pangloss Collection is a member of DELAMAN, the Digital Endangered Languages and Musics Archives Network. It is hosted by the Cocoon platform, Collection de Corpus Oraux Numériques, which is one of the OLAC (Open Language Archive Community) participating archives.

An Open Archive within a free and decentralized Internet

The Pangloss website does not use cookies or track visitors' activity. In keeping with its Open Science policy, the Pangloss Collection follows basic principles of transparency, respect of users' privacy, and free orientation of attention. 

  • Transparency and respect of privacy: major platforms collect data concerning the behaviour of their users, and use them for various types of profiling. By contrast, on the Pangloss website, your activity is not recorded.
  • Free orientation of attention: apart from the foregrounding of a pair of resources on the Pangloss Collection home page, the interface aims to be neutral and let you choose the area that you are interested in and the type of documents that you wish to consult. In the same spirit, "professional" mode, designed for linguists and computer scientists, is available to anyone (simply set the toggle on the upper right-hand corner), without requiring credentials or login.