DIGIVOY: Discover TextGrid's Digital Library with Voyant
Responsible: Fotis Jannidis, Steffen Pielström
Contact Person: Steffen Pielström
The "TextGrids Discover Digital Library with Voyant" builds a bridge between the TextGridRepository (http://textgridrep.de) and the Voyant tools (http://voyant-tools.org). The application, developed by DARIAH-DE in collaboration with TextGrid and Voyant, allows to directly discover, analyze and visualize the content of the TextGridRep, i.e. in particular the extensive digital library with works of more than 600 authors, with the various Voyant tools. The use of the application can be subdivided into three steps:
(1) First, you search for the texts that you want to analyze in the TextGridRep, by using the existing search functions.
(2) After selecting a text, you can define some options: On the one hand, you can suppress certain parts of the selected texts, e.g. speaker names or editorial notes, which might interfere with the analysis. On the other hand, you can choose which of the numerous Voyant tools to use.
(3) In the last step, the texts with the desired preprocessing are visualized in the selected Voyant tool. Thus, the demonstrator makes it easier to discover and process texts from the digital library and allows for more accurate queries through preprocessing.
The application is available in a beta version at the following link: https://dariah.zam.kfa-juelich.de/textgridrep-website/
The Contact person for scientific service is Christof Schöch. We are glad to receive any feedback, for which you can also use the corresponding user survey: https://docs.google.com/spreadsheet/viewform?formkey=dGdyN0tXaTRmZmVpZEVCN0F0U0cxdkE6MA#gid=0.
The Three Components of DIGIVOY
1. TextGrids Digital Library: Extensive Text Collection
TextGrids Digital Library is an extensive repository of German-language texts. It especially contains literary but also philosophical texts, as well as dictionaries. This respectively concerns the collected works of each author.
In addition to texts from the German-speaking world, texts from other European authors are translated into German as well. The collection currently comprises the works of about 700 literaries and 250 philosophers. All texts are based on reliable, quotable student editions and are available in a standardized TEI format. The TextGridRep(ository) provides users with free access to the texts without registering for the TextGridLab(oratory). The texts can be searched in the repository via a catalog or found by a list of authors. Besides they can be displayed as HTML in the browser, their metadata can be viewed, and the TEI source text can be downloaded.
The Digital Library is available at the following link: http://textgridrep.de/.
In addition, TextGrid provides some information about the digital library:
2. Voyant Tools: Flexible Exploration- and Analysis Tool
The basic idea of the Voyant Tools, is to provide web-based, diverse explorative and analytical approaches to any text or text collection. Voyant consists of a number of different single tools: all tools can be used either separately, while some of them can also be shared in a toolset.
The Voyant tools allow you to load texts from different sources very easily into the tools, to discover and visualize specific properties and structures. Texts can be copied into a text field, uploaded via the URL, or uploaded from the hard drive, with various formats, including HTML, TXT and RTF.
The Voyant tools are available at the following link: http://voyant-tools.org (Currently in version 3.0 beta). A Voyant documentation is also available: http://docs.voyant-tools.org/start/.
3. DIGIVOY: Connection of Digital Library and Voyant Tools
DIGIVOY provides a direct, convenient connection between TextGrids Digital Library and the Voyant Tools. This allows the two offerings to be shared, making both offers even more useful.
DIGIVOY extends the TextGridRep with three functions: first, a function to select several texts from the digital library in a "basket", secondly a function to prepare the selected texts for the analysis, and thirdly, a function to choose the selected and prepared texts with a particular voyant tool.
Access to DIGIVOY happens via an extended version of the TextGridRep which is available at the following address: https://dariah.zam.kfa-juelich.de/textgridrep-website/.
DIGIVOY uses its own installation for the Voyant tools, which means that sufficient server resources are available: https://dariah.zam.kfa-juelich.de/voyant/ (Version 1.0).
This installation is also freely usable independently of TextGrid. DIGIVOY is currently also available in the beta version of the official TextGridRep: http://www.textgridrep.de/beta/.
Use of DIGIVOY
1. Selection of one or more texts in the digital library
Texts can be searched in the TextGridRep either via the search function, or can be discovered through the list of authors. In either case, it is possible to select a single text directly for analysis with Voyant by clicking on the link "Send to ..." in the area under the metadata for a single text. See Figure 1) The "Send to ..." link leads directly to the next step, the preparation of the texts.
Figure 1: Selection of texts.
If you want to select several texts and analyze them with Voyant, you can also set a tick in the area under the metadata in "in basket" (also visible in Figure 1) To the "Basket" tab, the "Basket" allows you to check the list of selected texts (see Figure 2). If you are satisfied with the content of the baskets, click on "Send" to the next step, the preparation. If you are satisfied with the contents of the baskets, you can delete the individual texts (by clicking on the red cross) or the
basket of the texts.
Figure 2: the „Basket".
2. Preparation of texts (Preprocessing)
The preparation of the texts makes it possible to specifically select or exclude certain parts of the texts. This is important for the subsequent analysis with the Voyant Tools, so that the analyses are as meaningful as possible and are not disturbed by text fragments which are not relevant or even disruptive for a particular question.
The range of the texts with the "teiHeader", which contains the metadata, is automatically removed and the German stopwords are automatically activated (this removes frequent function words without semantic content).
However, depending on the question, it may also be important to delete other text passages before the analysis (see Figure 3, for details on the coding of the texts, see also the note at the end of this document).
Figure 3: Preparing the texts and selecting the Voyant tool.
The following text components can be deleted:
- CastList - List of dramatic characters (in dramas)
- Desc - Description (usually contains the element title)
- figure – Pictures or captions
- head – Heading (chapter or file and scenes)
- note – Editor's notes
- speaker – Spoken names (in dramas)
- stage – Stage directions (in dramas)
- title – Title of a book or other work, or of a chapter
If you activate a check mark in front of a text component, it is deleted from the text. In the analysis of dramas, it may be useful to remove the "CastList" and the "speaker", so that the analysis of the rate of the words is not disturbed by the very frequent, but under certain circumstances, semantically less relevant speaker names.
3. Select Various Voyant Tools
On the same page where the texts are prepared, the desired Voyant tool can now be selected (see figure 3 again). If you do not make any changes, the toolset is used. All other tools in the drop-down list are single tools, interesting ones are for example "Bubblelines" or "Cirrus". (Not all tools work with consistent reliability.)
After selecting the desired tool, click on "Send" and the selected texts are with the desired options are opened with Voyant. The actual exploration and analysis eventually takes place with the Voyant Tools Hints.
Benefits of DIGIVOY
Both, existing resources and their users benefit from their connection through DIGIVOY: A completely new, explorative and analytical access to the texts of the digital library is now only a few clicks away. In addition, the TEI award of the TextGridRep is used flexibly and profitably for the preparation of the texts. Finally, it can be "played" at Voyant directly with a large selection of German-language texts.
The practical and methodological gain for literary scholars, historians, philosophers and cultural historians lies in the fact, that by using Digivoy, the sometimes complex preprocessing of texts from text repositories is eliminated or made very simple. The exploration of texts on a quantitative basis, raises questions for (classical) reading as well as for other more advanced analyzes. However, one must be aware of the limitation that all word frequency analyzes are carried out with non-lemmatized texts.
Notes on coding the elements (for preprocessing)
In the transformations, the original Zeno XML markup was converted to the corresponding TEI tags. In addition, attempts have been made to add further markup on the basis of certain structures occurring in the markup or in the text (e.g. lg grouping, speakers, etc.). The heuristics used for this purpose are schematic and unspecific, given the limited comprehensibility of such large amounts of data, which can lead to incorrect interpretation in some cases. In order to remove such or similar sources of errors step by step, the literature folder is continually revised by us and is therefore available in various versions.