roda

Preserve and provide access to digital material

produced by large organisations

RODA (Repository of Authentic Digital Records) is a long-term digital repository solution that delivers functionality for all the main functional units of the OAIS reference model. RODA is capable of ingesting, managing and providing access to various types of digital content produced by large corporations and public bodies.

RODA was developed using open-source technologies and it is supported by standards such as the Open Archival Information System (OAIS), Metadata Encoding and Transmission Standard (METS), Encoded Archival Description (EAD), Dublin Core (DC), DILCIS Board Information Package specifications and PREMIS (Preservation Metadata).

RODA screenshot

RODA implements an ingest workflow that not only validates standardised SIPs, but also checks its content for virus, handles file format identification, extracts technical metadata, and migrates file formats to more “preservable” alternatives.

RODA ensures that ingested data remains authentic by recording PREMIS metadata every time an action is performed on a digital object. It records provenance information in archival metadata standards such as EAD or DC and ensures integrity and availability by frequently monitoring data and making sure that it has not been tampered with. All interactions between users and the repository (human and software) are logged for security and accountability reasons.

Preservation strategies supported by RODA

RODA was designed to be flexible enough to cope with every preservation strategy found on the literature. It can natively support format migration and format normalization, encapsulation and provide support for emulation (not included off-the-self).

01. Format migration

RODA natively supports the conversion of hundreds of file formats via its task execution engine and plugin system. The extensible nature of RODA enables it to be updated at any time to cope with new file formats and to support more advanced preservation tasks.

02. Encapsulation

Representation Information is natively supported by RODA. Digital objects can be linked to Representation Information records, which can reference other Representation Information records thus creating a network of Representation Information records. This is called a Representation network.

03. Emulation

RODA is able to retain the original versions of digital representations as they have been received via the ingest process. These original versions are stored inside Archival Information Packages (AIP) and are an intrinsic part of the overall preservation process.

RODA advantages

01. Conforms to open standards

RODA is compliant with several open descriptive metadata standards such as EAD 2002, EAD 3 and DC, PREMIS for preservation metadata and METS for structural metadata.

It also has the ability to support more standards using an advanced templating system (to support searching, viewing and editing of metadata).

SIP, AIP and DIP formats are also based on open specifications compliant with various repository implementations to avoid technology lock-in.

02. Authenticity

RODA uses preservation metadata (PREMIS) to create a trust chain between all generations of data.

Preservation metadata, together with the establishment of trust of its surrounding environment (ISO 16363) ensures that the service is reliable and that the enclosed digital records are authentic.

RODA also comes with plugins that assess the validity of digital signatures during ingest and has the ability to re-sign archived documents when the lifetime of digital signatures is coming to an end.

03. Scalable

The service-oriented nature of RODA allows it to be highly scalable, enabling the distribution of processing load between several servers.
The use of advanced indexing and parallelisation frameworks enables RODA’s discovery services to be spread across multiple servers for greater performance and to take advantage of all the CPUs available in each server to mass process thousands of objects at the same time.

04. Copes with the rapid changing nature of technology

The pluggable architecture of RODA makes it easy to add more functionality to the system without affecting the core.

This includes adding new preservation tasks such as preservation actions, risk assessment tools, internal and external monitoring, etc.

The system also manages data in a well-documented open Archival Information Package (AIP) structure that can be easily inspected by users and ingested by other repository systems. This way your data is never imprisoned inside a single system.

05. Integration with 3rd party systems

RODA exposes all its functionality via well-documented REST API. Convenient Java libraries are available on GitHub to allow developers to interact with RODA via its Core APIs. Several tools exist to create and manipulate SIPs and submit them to RODA’s ingest workflow. In fact, RODA ability to ingest data from other document management systems include: 1) data typically available on the filesystem via RODA-in tool, 2) data stored on relational databases via the Database Preservation Toolkit, and 3) via direct connectors to original systems APIs.

06. Vendor independent

RODA is 100% built on top of open-source technologies.

The entire infrastructure required to support RODA is vendor independent. This means that you may use the hardware and the Linux distribution that best fit your institutional needs.

Because the product itself is open source, you don’t have to rely on a single vendor for support.

07. Suport for multiple formats

RODA is capable of ingesting all sorts of content. Migration action components are embedded in the system for coping with decaying text documents, raster images, relational databases, video, and audio by normalizing them to formats more adequate for long-term preservation.

A task execution engine and a plugin system enable RODA to easily support additional format migrations.

Additionally, representation information networks can be managed within the repository itself, letting you opt for the right preservation strategy at the right time.

08. Embedded preservation actions

Preservation actions can be executed right from the user interface over any selection of digital objects in the repository.

The task execution engine enables the repository to parallelise the task execution process in order to take full advantage of the existing processing power.

Preservation actions include format conversions, checksum verifications, virus checks, various maintenance tasks, risk assessment, etc.

09. Advanced access control

Users must be authenticated before accessing any functionality and objects in the repository. All user actions are logged for future accountability.

Permissions are fine-grained and can be defined at the top of the repository level, all the way down to individual data objects.

Authentication is supported by a Central Authentication Service (CAS) that is able to connect to various authentication services such as LDAP, Active Directory, OAuth 1.0/2.0, custom database, OpenID, RADIUS, SPNEGO (Windows), Trusted remote user, X.509 (client SSL certificate), etc.

References