Research

KEEP SOLUTIONS strategically embraces research and scientific progress by promoting research and actively participating in national and international R&D projects.

KEEP SOLUTIONS carries out research in close collaboration with national and international organizations such as the Technical University of Vienna, the Austrian Institute of Technology, Microsoft Research, the Technical University of Berlin, The University of Manchester, University Pierre and Marie Curie, the British Library, the Austrian National Library, the Danish National Library, the Portuguese National Archives, among others.

Here you may find a list of the most prominent research projects developed at KEEP SOLUTIONS:

 

European projects (FP7)

 

veraPDF – Definitive PDF/A Validation

Designed to meet the needs of digital preservationists, and supported by leading members of the PDF software developer community, veraPDF is a purpose-built, open source, permissively licensed file-format validator covering all PDF/A parts and conformance levels. Learn more about what veraPDF is doing, and meet the team.

Led by the Open Preservation Foundation (OPF) and the PDF Association, and assisted by the Digital Preservation Coalition, the consortium’s mission is to develop the definitive, open-source validator for PDF/A. The veraPDF consortium has retained two subcontractors to provide and quality-control software and test files. Lead developer Dual Lab specializes in technology-intensive application development, while KEEP Solutions focuses on open source solutions for archival institutions.

veraPDF is funded by the PREFORMA project. PREFORMA – PREservation FORMAts for culture information/e-archives, is a Pre-Commercial Procurement (PCP) project co-funded by the European Commission under its FP7-ICT Programme. The project’s main aim is to address the challenge of implementing standardised file formats for preserving digital objects in the long term, giving memory institutions full control over the acceptance and management of preservation files into digital repositories.

Tipo: European Union project – ICT-2013.11.2
Ano: 2014-2017
URL: URL: http://verapdf.org

 

E-Ark – European Archival Records and Knowledge Preservation

Archives provide an indispensable component of the digital ecosystem by safeguarding information and enabling access to it. Harmonisation of currently fragmented archival approaches is required to provide the economies of scale necessary for general adoption of end-to-end solutions. There is a critical need for an overarching methodology addressing business and operational issues, and technical solutions for ingest, preservation and re-use.

In co-operation with commercial systems providers, E-ARK will create and pilot a pan-European methodology for electronic document archiving, synthesising existing national and international best practices, that will keep records and databases authentic and usable over time.

The methodology will be implemented in an open pilot in various national contexts, using existing, near-to-market tools, and services developed by the partners. This will allow memory institutions and their clients (public- and private-sector) to assess, in an operational context, the suitability of those state-of-the-art technologies.

Our objective is to provide a single, scalable, robust approach capable of meeting the needs of diverse organisations, public and private, large and small, and able to support complex data types. E-ARK will demonstrate the potential benefits for public administrations, public agencies, public services, citizens and business by providing simple, efficient access to the workflows for the three main activities of an archive – acquiring, preserving and enabling re-use of information.

The practices developed within the project will reduce the risk of information loss due to unsuitable approaches to keeping and archiving of records. The project will be public facing, providing a fully operational archival service, and access to information for its users. The project results will be generic and scalable in order to build an archival infrastructure across the EU and in environments where different legal systems and records management traditions apply. E-ARK will provide new types of access for business users.

E-ARK will pilot an end-to-end OAIS-compliant e-archival service covering ingest, vendor-neutral archiving, and reuse of structured and unstructured data, thus covering both databases and records, addressing the needs of data subjects, owners and users. The pilot and methodology will also focus on the essential pre-ingest phase of data export and normalisation in source systems. The pilot will integrate tools currently in use in partner organisations, and provide a framework for providers of these and similar tools ensuring compatibility and interoperability. A core component of the project is the integration platform which uses the existing ESSArch Preservation Platform (EPP) application as an Archival Information System, which is already in productive deployment at the National Archives of Norway and Sweden. In order to achieve scalability, E-ARK will adopt a data management and storage layer for this tool on top of the proven open-source Cloudera CDH4 distribution of Apache Hadoop, enabling storage and computational power to be seamlessly added to the system.

The pilot will run in several national archives, each of which will provide data to run in the pilot instance by agreement from an associated government data owner (e.g. national or regional / federal).

To sustain the outputs of our project, project partner The DLM Forum, comprising 22 national archives and associated commercial and technical providers, is well placed to ensure these. Using the open Apache licensing model, commercial suppliers will be able to incorporate the project outputs (particularly the open interfaces for pre-ingest, ingest, archival, access and re-use) into their own systems, enhancing their longevity. National archives running E-ARK pilot instances will serve as exemplars for others wanting to adopt up the new e-archiving open system.

In addition, project partner, The Digital Preservation Coalition will promote best practices in this area, as will our dedicated government institution partners.

Type: European Union project – FP7 CIP-ICT-PSP-2013-7
Year: 2014-2017
URL: URL: http://eark-project.eu

 

4C – Collaboration to Clarify the Costs of Curation

The Collaboration to Clarify the Costs of Curation (4C) project will help organisations across Europe to more effectively invest in digital curation and preservation. Making an investment inevitably involves a cost and existing research on cost modelling provides the starting point for the 4C work. But the point of an investment is to realise a benefit, so work on cost must also focus on benefit, which must then encompass related concepts such as ‘risk’, ‘value’, ‘quality’ and ‘sustainability’. Organisations that understand this will be more able to effectively control and manage their digital assets over time, but they may also be able to create new cost-effective solutions and services for others.

Existing research into cost modelling is far from complete and there has been little uptake of the tools and methods that have been developed and very little integration into other digital curation processes. The main objective of the 4C project is, therefore, to ensure that where existing work is relevant, that stakeholders realise and understand how to employ those resources. But the additional aim of the work is to closely examine how they might be made more fit-for-purpose, relevant and useable by a wide range of organisations operating at different scales in both the public and the private sector.

These objectives will be achieved by a coordinated programme of outreach and engagement that will identify existing and emerging research and analyse user requirements. This will inform an assessment of where there are gaps in the current provision of tools, frameworks and models. The project will support stakeholders to better understand and articulate their requirements and will clarify some of the complexity of the relationships between cost and other factors. The outputs of this project will include various stakeholder engagement and dissemination events (focus groups, workshops, a conference), a series of reports, the creation of models and specifications, and the establishment of an international Curation Costs Exchange framework. All of this activity will enable the definition of a research and development agenda and a business engagement strategy which will be delivered to the European Commission in the form of a roadmap.

The consortium undertaking this project includes organisations with extensive domain expertise and experience with curation cost modelling issues. It includes national libraries and archives, specialist preservation and curation membership organisations, service providers, research departments and SME’s. It will be coordinated by a national funding organisation that specialises in supporting the innovative use of ICT methods and technologies.

Type: European Union project – FP7 ICT-2011.4.3
Year: 2013-2015
URL: http://www.4cproject.eu | Briefing paper

 

SCAPE – Scalable Preservation Environments

The SCAPE project will develop scalable services for planning and execution of institutional preservation strategies on an open source platform that orchestrates semi-automated workflows for large-scale, heterogeneous collections of complex digital objects. SCAPE will enhance the state of the art of digital preservation in three ways: by developing infrastructure and tools for scalable preservation actions; by providing a framework for automated, quality-assured preservation workflows and by integrating these components with a policy-based preservation planning and watch system. These concrete project results will be validated within three large-scale Testbeds from diverse application areas.

SCAPE approaches digital preservation through research and development sub-projects: Testbeds, Preservation Components, Platform, and Planning and Watch.

The SCAPE Testbeds are the primary driver for the rest of the project, in that they define use case scenarios, create preservation workflows, and assess the large scale applicability of the SCAPE Preservation Platform and the preservation components developed within the project. Using these software components, test environments are created for the different scenarios and the complex large scale preservation workflows.

SCAPE Preservation Components address known limitations of digital preservation systems on three levels: scalability, functional coverage
quality. This sub-project improves and extends existing tools, develop new ones where necessary, and apply proven approaches to the problem of ensuring quality in digital preservation.

Building on the state of the art and focusing on formats and tools that are considered most important by the Testbeds sub-project, SCAPE investigates methods to parallelise and embed components in robust and scalable workflows. A major focus is the ability to capture relevant provenance and contextual information and metadata, and to provide usable outputs for automated policy-driven preservation.

The SCAPE Platform will provide an extensible infrastructure for the execution of digital preservation processes on large volumes of data. It will include a flexible mechanism for the integration of existing digital repository systems and provide a reference implementation. The Preservation Platform will also provide the underlying environment for large-scale testing and evaluation performed by the Testbeds and the Preservation Component providers in the project. The computational layer of the Preservation Platform system will make use of Hadoop, with the underlying distributed storage layer being based on HBase, which provides high performance and scalable data storage on top of Hadoop’s Distributed File System (HDFS).

The Planning and Watch Components developed in SCAPE address the bottleneck of decision processes and processing information required for decision making. Work on these components started with a conceptual analysis, based on extensive real-world application experience. A set of essential policy elements is being defined and modelled. These elements will make use of the SCAPE Policy Catalogue. Building on SCAPE’s machine-understandable policy representation and the first release of the automated planning component, core watch services will be implemented. In the final phase the policy-aware planning component will be fully integrated with the platform and repository operations.

The Cross-project Activities in SCAPE include project management and coordination as well as the investigation of Open Research Challenges and a Research Roadmap. These activities provide administrative control and technical coordination for the project as well as focused research on innovative and emerging technologies having the potential to improve SCAPE’s capabilities.

The project’s Take-up Activities aim to provide both coordination for communication and dissemination of project results within and beyond the project. A number of training activities, which will also incorporate Best Practice guidelines, are aimed at fostering the take-up of project outputs at technical, operational and strategic levels. Furthermore, they will ensure that SCAPE has a long-term and sustained impact beyond the runtime of the project.

Type: European Union project – FP7 ICT-2009.4.1
Year: 2011-2014
URL: http://www.scape-project.eu

 

 

PhD projects

 

Automated Watch for Digital Preservation

The current exponential growth of the digital created documents is an obvious effect of the global tendency towards the digital technology. Replacing paper with digital documents has become a common activity in all kinds of institutions and many already completely eradicated the use of paper. Even European policies, as the eGovernment, urge for the public administration to cease the use of paper, and provide all services and documentation in digital form.

But documents in digital form are much more perishable than their paper counterparts and it is not obvious for the normal user that keeping a digital document accessible for several decades is a very difficult task. Furthermore, some aspects that a normal user will consider maintained when keeping the physical form of the paper do not behave the same way when the information is in digital form. Authenticity is one of these aspects, and it is crucial as the information as no value to be kept if the power to serve as evidence is lost. The digital preservation field tries to tackle all these problems and is currently one of the main concerns of the European research efforts, like the Seventh Framework Program (FP7).

The main difficulty of digital preservation resides on the ever-changing technological environment to which the documents must maintain compatibility. Part of the solution must pass by the detection of these changes by continuously monitoring the environment, the users and the documents to detect preservation risks. This PhD project focuses on creating automatic and systematic ways to monitor the environment and provide a valuable input for risk detection and assessment.

Type: Ph.D in Informatics
Author: Luís Faria
Year: 2011-today

 

Long-term preservation of digital information in the context of a historical archive

This project aimed at developing a Service Oriented Architecture (SOA) specially designed to assist cultural heritage institutions in the implementation of preservation interventions. The proposed SOA delivers a recommendation service and a method to carry out complex format migrations. The recommendation service is supported by three evaluation components that assess the quality of every migration intervention in terms of its performance, suitability of involved formats and data loss. The proposed system is also able to produce preservation metadata that can be used by any client institution to document preservation interventions and retain objects’ authenticity.

The system has been evaluated in what concerns its ability to produce suggestions of migration services that maximize the preservation requirements of any given client institution. The evaluation process also focused the system’s ability to determine the level of degradation imposed to a digital object during a migration process, especially in what concerns its subjective significant properties, i.e., pixel correctness and embedded metadata.

Type: Ph.D. in Information systems
Author: Miguel Ferreira
Year: 2005-2008
Publications: Preservação de longa duração de informação digital no contexto de um arquivo histórico (2009) | RODA and CRiB – A Service-Oriented Digital Repository (2008) | Distributed Preservation Services: Integrating Planning and Actions (2008) | A Foundation for Automatic Digital Preservation (2006)

 

 

Masters projects

 

Database Preservation Toolkit

The preservation of information systems is one of the biggest challenges of digital preservation. Among those systems we can find databases. Databases support the majority of the information management systems, showing themselves as a valuable resource to preserve.

If in one hand there is a need to migrate databases to newer ones that appear with technological evolution, on the other hand there is also the need to preserve the information they hold for a long time period, due to legal duties but also due to archival issues. That being said, that information must be available no matter the database management system where the information came from.

In this area, the existing products for relational database preservation are still scarce – CHRONOS and SIARD are the main ones. The first one is, in most of the cases, unreachable due to the associated costs. The second one only supports basic features.

Therefore there is the urge to explore the main features and limitations of the existing products in order to improve ‘db-preservation-toolkit’ (http://keeps.github.io/db-preservation-toolkit/), an extracted component from the RODA project (http://www.roda-community.org).

Therefore, with this project it is intended to improve ‘db-preservation-toolkit’ with respect to performance and also adding new features in order to support more database management systems, address some missing features of the other products and provide an interface where it is possible to see and search the information of the archived database.

Type: Masters in Informatics Engineering
Author: Miguel Coutada
Year: 2013-2014

 

Self-checkout module for Koha

Currently, digital information is accessible to everyone, so the value in our libraries is sometimes forgotten. In 2003 there were 1969 libraries in Portugal with a checkout volume of 6.568.368 transactions according to INE (Instituto Nacional de Estatística).

Most libraries have reduced budgets and are often used to enhance your assets. From the moment that customers are allowed to use the checkout, becomes essential to maintain control over the books, so that existing thefts are decreased in these institutions.

The most widely used way to control the ckeckout works is to place someone in charge of the institution to perform this task only, preventing it from performing other tasks.

To overcome this problem is necessary to find alternatives to the checkout, including the creation of a local to achieve this in an autonomous way. Systems for self-checkout would allow that the employee performs other tasks besides the checkout becoming the experience and monitoring richer for those that attending the library. However, to avoid unnecessary expenses, this project aims to use physical characteristics of books, not used so far to ensure greater security at the time of the checkout, as well as make a reuse of the safety systems already used by libraries.

Type: Masters in Informatics Engineering
Author: Paulo Agostinho Fernandes Lima
Year: 2013-2014

 

Physical archival management

DigitArq is a software platform aimed at integrated management of a Historical Archive. It is based upon several international standards to produce and export descriptive records, which allows for integration with national and international content aggregator portals. Currently, DigitArq allows to save and retrieve information relating to objects (like books or documents) present in archives, as well as its logical hierarchy within fonds. The goal of this project is to expand DigitArq services with similar features to the ones described above, but for managing physical archives, to allow mapping such a place and all of the objects it contains.

Type: Masters in informatics
Author: Nuno Antunes Marques
Year: 2013

 

DigitArq Mobile

In this project we will develop new client for DigitArq based on the Android platform. The app will enable to visualize metadata using an augmented reality screen while the phone is pointed at the real book.

Type: Masters in Communications Engineering
Authro: Francisco José Ramos Pacheco Silva
Year: 2011-today

 

Private Cloud Manager

Today the use of distributed systems to supress institution processing needs is inevitable. Un- fortunately, those kind of systems are expensive and the complexity of their orchestration limit their adoption. At the same time, institution personal computers can be used to alleviate those processing needs.

Through this article we intend to do a study in technologies associated with development of distributed systems, and discuss to what extent it is possible to reuse them to create a system that allows an easy orchestration of a private cloud, built with personal computers, and abstracting users from the complexity of the underlying system.

Type: Masters in Informatics
Author: Rui Peixoto
Year: 2011-today

 

Weebox App for Android

The number of applications available to mobile users is increasing. The mobile computing has improved the daily lives of the technological society, in a sense that users currently can enjoy these services and platforms that were previously only available from a personal computer. This research project aims to clarify the adaptation of Web applications for the Android platform.

Type: Masters in Communications Engineering
Author: João Miguel de Carvalho Oliveira
Year: 2011-today

 

Centralized Usage Statistics from Digital Repositories

This dissertation presents SCEUR. It’s a project that aims at building an architecture that allows to collect, process and present in an intuitive manner statistics from institutional repositories usage data and also assist in the sharing of statistical data by developing add-ons that facilitate that task in software used by the institutions and by publishing information and various recommendations in this context. International projects are also presented, as well as existing standards and technologies used to implement the underlying concepts.

Type: Masters in Informatics
Author: Hélder Silva
Year: 2010-2011
URL: http://sceur.rcaap.pt
Publications: Serviço Centralizado de Estatísticas de Utilização de Repositórios

 

Modeling ontological structures into hierarchical representations for dissemination over the Web

The file system is a term well known to users of computers and are already very familiar with this system, it may be said to form part of the day to day all the users computer. This can be seen on any operating system, all operating systems implement file systems. Since each operating system presents its native file format.

The work is to use the Webdav protocol, which is a network file system to allow connection to the weebox. The Webdav provides an overview of the documents from the repository, in the form of a folder system, allowing any operating system to mount a share using the protocol Webdav repositories, the repository being presented to the user through a set of folders and files.
A feature that allows Webdav is taking information and store it in a hierarchical fashion, through the use of metadata. This feature will allow the time of submission of documents, it is immediately cataloged automatically.

Type: Mestrado Integrado em Engenharia de Comunicações
Author: David Marques Pires
Year: 2009-2011

 

Translation of desktop applications into mobile apps

Mobile devices are rapidly developing, offering to its users more and better applications. Usually, these applications attempt to be adaptations of desktop or web analogues, implementing part or even all of its features. When creating this kind of application, a developer must bear in mind that a mobile platform is more limited in terms of processor, memory, screen size and battery, requiring more control that its desktop or web counterparts, an that those limitations affect the interface design of the mobile applications. However, there are no guidelines for adapting desktop or web application interfaces into the mobile platforms, making this process even more onerous and time consuming.

Beyond the interface, there is the possibility of multimedia content being provided by the application. This content can be provided in various for- mats, and desktop platforms generally support them, but the spectrum of supported formats by mobile platforms is much smaller despite their growing development in recent years. This raises a problem of content dissemination, that must be considered in pair with the interface adaptation.

It is in this context that this paper is performed, trying to derive guide- lines that would help in the development of mobile applications adapted from the desktop and looking for a solution to the content dissemination problem in mobile platforms.

Type: Masters in Informatics
Author: Vitor Hugo Correia Fernandes
Year: 2009-2010
URL: iTunes AppStore
Publications: Transposição de aplicações Desktop para plataformas móveis: um caso de estudo (2010)

 

Extracção e concentração de metainformação distribuída por vários repositórios

The current dissertation proposes an architecture to provide access to network harvested information from a single access point – the Portuguese Archives Portal – an initiative of the Directorate-General of Portuguese Archives that seeks to create a search portal and a privileged access point to all archival resources available on national territory. The Portuguese Archives Portal will also be the national gateway for international projects such as the Europeana, a web site that collects and provides access to the historical and cultural production of the European community. Adherence to Portal of Portuguese Archives and consequently to the Europeana will bring numerous benefits to the holding member entity, specifically, greater visibility to its collection, search enhancing and eventually an increase in revenue through the archive provided services.

Type: Masters in data processing
Author: Luís Miguel Ferros
Year: 2008-2009
Publications: Extracção e concentração de metainformação distribuída por vários repositórios (2009)

 

Share