Text+ Potential Service Portfolio
The following overview contains selected services provided by several German Academies, the CLARIN-D or DARIAH-DE research infrastructures, showcasing requirements that should potentially be covered in a Text+ service portfolio. In CLARIAH-DE dedicated convergences of these services is foreseen which are planned to be further integrated in the NFDI. Note this list is neither exhaustive nor determining which services will eventually be included into such a portfolio.
All services are listed within each category in alphabetical order.
Research infrastructures provide services to assist users in their work.
Tools for collaboration assist users in working together and sharing information. Available collaborative tools include Wikis for information exchange and facilitating teamwork.
A helpdesk provides the opportunity to ask questions about the infrastructure and guarantees a well structured and quick responding. The distribution of support tickets to professional teams can be organized jointly, e.g. a Help portal may provide access to a helpdesk for legal and ethical questions or usage and technical questions.
Learning and Teaching Support Platforms
The initiative provides general course registries for Digital Humanities studies as well as subject specific support platforms for learning and teaching and also quality assured publication series with topics covering research data and digital research methods.
Linked Data and Graph Technologies
Linked Data is a method of publishing structured data so that it can be interlinked and become more findable and useful through semantic meaning. Graph based technologies help with highly connected data and can seamlessly include multiple annotation hierarchies within one dataset.
- XTriples is a generic web service that helps to create Linked Data out of any XML file or repository.
- SPEEDy is a standoff property editor with which researchers can build flexible and dynamic graph based annotation layers on texts.
Virtual Research Environments
Virtual Research Environments (VREs) are tools and services to support the entire research process. For the humanities, this means to support the research workflow, from data creation and annotation, analysis, visualization and archiving. Some large tools provide complex workflows as part of the research process such as manual or automatic annotation (e.g. in edition or in corpus creation contexts), searches within available data and search for tools that fit to the data. Available tools and services include:
- Editors for manual editing of resources
- Integrated platforms for editing and analysis
- ConedaKOR is used to administer and present the academic object collections from the image-based cultural sciences and the humanities.
- ediarum, an integrated digital edition environment
- DTAQ integrated annotation environment of the German Text Archive
- TextGrid Repository and TextGrid Laboratory is an integrated environment that allows creation and working with digital editions, including their publication.
- Systems for automatic analysis and/or annotation of resources
- Tool selection platform
- The Language Resource Switchboard is the prototype of a platform that allows users to select tools that may be used for the data they have at hand.
Available tools for analyzing and visualizing information include:
- Geo-Browser visualizes locations on a map according to a timeline, based on appropriately annotated data files.
- Tündra allows simple and complex searches of syntactic corpora (usually called treebanks ) in a webbased application. The search results are visualized. The data is for example based on automatic annotations created by WebLicht.
Advanced, data oriented services and tools are components of the infrastructure that build on the basic infrastructure services and allow for storing, connecting, searching and accessing research data.
A data repository allows to store research data sustainably and safely. Repositories also provide an interface to share the description of the archived data, i.e. metadata, usually providing OAI-PMH endpoints. Available repositories include:
- DARIAH-DE Repository as a central component of the DARIAH-DE Data Federation Architecture.
- A list of CLARIN Data centres available with the centre registry, also providing basic information and OAI-PMH endpoints for metadata harvesting.
Information Services and Related Tools
Seamless integration of research data and services from different data and service providers requires catalogs and registries. Subject-specific information systems and research tools provide researchers with a wealth of textual resources from antiquity to present time.
- The Collection Registry provides a catalog of collections, created in connection with or as a basis for research projects. Registered collections can, when published, automatically be integrated into a generic search functionality and thereby become findable.
- The Virtual Collection Registry (VCR) allows the combination of research data stored in different locations, repositories and formats. For example researchers may create a virtual collection of annotated texts that are located at one institution and add other texts from another institution, citing them as the basis of their research. This tool respects the privacy and licenses of the resources as it does not require the resources to be copied into one location.
Search and Retrieval Systems
Due to the distributed and distinct nature of the resources, findability of and interoperability across different resources is a major challenge. Available tools to connect, search and access distributed resources include:
- CorrespSearch is a federated search system for scholarly editions of letters.
- Generic Search is a search engine that allows to search in metadata sets, e.g. of the Collection Registry.
- Federated Content Search provides a content based search function over the data available in enabled, distributed repositories. The search results are organized concordance like and reference the original data sets.
Basic infrastructure services are components that are required to work with other parts of the whole infrastructure.
Authentication and Authorization Infrastructure (AAI)
An infrastructure works with a common login procedure, also referred to as AAI (Authentication, Authorization and Identification infrastructure) for providing access to services and resources, especially used if the access is restricted to groups or individuals (e.g. to users from academic research institutions, students, licensed users, etc.). The consortia utilize the Shibboleth technology which is also used by the German National Research and Education Network, DFN. Using this technology, it is possible for users from academic institutions to use the credentials of their home institutions to gain access to restricted resources. For members of institutions not serving as an Identity Provider (IdP) or have restricted services, infrastructures provide and additional IdP, such as the AAI proxy that also brokers between services and eduGAIN, which makes it much easier to connect new services.
Integrated Infrastructure Services
Integrated infrastructure services such as server, storage, archiving, high-performance computing, certified long-term preservation solutions, and service monitoring are a prerequisite for operating a digital research infrastructure. Services which are particularly suited for the needs of the humanities are available from infrastructure partners participating in or related to the consortium.
Persistent Identifiers (PID) ensure the secure referencing of digital objects so that references remain stable, even if the location of the data changes, and allow for identification of resources for citation. In collaboration with the European Persistent Identifier Consortium (EPIC) and other services the consortium uses implementations of ISO 24619, also including DataCite DOI.
Standards and Processes
Authority Data Services
Authority data contains standardized forms of names for people, places, corporate bodies, titles, and subjects. Authority records provide control and quality of data and help researchers to get information on a specific subject or entity in less time.
- ba[sic?] - Better Authorities is a tool for searching, identifying and connecting named entities with authority data. The main goal of this tool is to provide quality assurance to the process of connecting research data with authority data.
To describe research data, various metadata schemas have been in use, including ISO 24622-1 and 24622-2; Marc 21; Dublin Core. As metadata is distributed from all repository systems, the interpretation requires a high level of standardization of formats and interfaces. For this standardization it is necessary to have a way to provide the metadata schemas, potentially also developing the schemas, and tools for editing the metadata according to or mapping the schemas.
- Component Metadata Infrastructure provides an implementation of ISO 24622-1 and ISO 24622-2 for metadata schema provision, editing, and maintenance.
- Data Modeling Environment provides a tool for mapping data structures onto each other, for example to map metadata to the structures used in search applications. It allows for integration into generic search solutions.
Service Quality Management
In order to manage and secure service quality a dedicated quality assurance framework needs to be set up.
- The Service Life Cycle Management Model defines a process for the selection and quality management of new or also existing services into the infrastructure.