Breadcrumb

data-federation-architecture

DARIAH-DE research data federation infrastructure (Forschungsdaten-Förderationsarchitektur - DFA)

The DARIAH-DE „Forschungsdaten-Förderationsarchitektur" (DFA) is the term for services and tools that enable research data and collection descriptions to be found from various sources, such as cultural institutions, libraries, archives, research facilities, and data centers, and used for analysis.

Search queries in a scientific context require high accuracy in the determination of the respective parameters. It should be possible for researchers to limit their scientific research in the digital environment to specific sources. In this way, XML structures can be interrogated from data sets of different provenances, thus ensuring the interoperability of different data and metadata schemata, as well as correlating heterogeneous data and metadata sources with a common reference for places, names, data, or other logical units.

Figure 1: Schematic structure of the DARIAH-DE Data Federation Architecture

The DARIAH-DE „Forschungsdaten-Förderationsarchitektur", visualized in the graph above, includes indexing and displaying research data, providing sustained and sustained access to the use of technical tools to compare descriptions and content of digital collections, and a Comprehensive search functionality for heterogeneously structured data collections and archives.

The  DARIAH-DE Forschungsdaten-Förderationsarchitektur is modular and can be extended at any time by further components and includes the following tools and services at the moment :

In Collection Registry, information from research data collections can be detected in DARIAH-DE, as well as new collection information can be registered.

The DARIAH-DE Repository allows to save research data, provide it with metadata, make persistent, machine-readable referencing through the use of Persistent Identifiers, and find it through the generic search. It is also possible to use the repository to archive data collections in a sustainable and secure manner.

With the help of the DARIAH-DE Publikator, research data can be conveniently imported into the DARIAH-DE repository via graphical interface and awarded with metadata. These can then be entered into the collection registry as a collection and are then detected in the generic search.

The Schema-Registry is the place where specific metadata standards are stored and cross-walks between metadata schemas can be stored, managed over the long term and combined if necessary. It thus provides a conceptual aid for the mapping of research data of different origin and nature.

The Crosswalk Registry is a graphical tool for researchers in the arts, humanities and social sciences to link different metadata standards stored in the schema registry. This assignment allows an automated translation from one data schema to another. For this reason, the Crosswalk Registry is the ideal way to search for different collections. The functionality of the Crosswalk Registry is illustrated in the following screenshot of the user interface:

Figure 2: Crosswalk Mapping in the Data Modeling Environment.

The Generic Search provides a front-end for the data stored in the Collection Registry and the DARAH-DE repository. The generic search can be used to search the distributed data records. In addition, using the generic search, it is possible to search the listed metadata, save this search in a personalized way, and then adapt or refine it at a later date.

The Epic-PID Service provides as a basic service for the permanent referencing of the research data via so-called 'persistent identifiers'. The latter are services that ensure a sustainable reference to data. Thus, references, for example in scientific publications, remain stable even if the location of the referenced data changes. DARIAH-DE uses PIDs from the European Persistent Identifier Consortium (EPIC).

This set of digital tools is a modular software architecture that allows each service to access heterogeneous data sources of different origins. New methods for the analysis of distributed data collections are thereby obtained.