Projetos
de investigação

A KEEP nasceu de uma plataforma de I&D e continua activa na produção de conhecimento científico. Prova disso são as inúmeras publicações e participações em eventos científicos onde a KEEP SOLUTIONS tem marcado presença.

A KEEP SOLUTIONS abraça estrategicamente a investigação e o desenvolvimento científico promovendo a realização de trabalhos de investigação e participando ativamente em projetos nacionais e internacionais de I&D.

Temos vindo a colaborar em projetos de investigação com instituições nacionais e internacionais como a Universidade Técnica de Viena, o Austrian Institute of Technology, a Microsoft Research, a Universidade Técnica de Berlin, a Universidade de Manchester, a Universidade Pierre and Marie Curie, a British Library, a Biblioteca Nacional da Áustria, a Biblioteca Nacional da Dinamarca, os Arquivos Nacionais Portugueses, entre outras.

Projetos europeus

veraPDF logo

Designed to meet the needs of digital preservationists, and supported by leading members of the PDF software developer community, veraPDF is a purpose-built, open source, permissively licensed file-format validator covering all PDF/A parts and conformance levels. Learn more about what veraPDF is doing, and meet the team.

Led by the Open Preservation Foundation (OPF) and the PDF Association, and assisted by the Digital Preservation Coalition, the consortium’s mission is to develop the definitive, open-source validator for PDF/A. The veraPDF consortium has retained two subcontractors to provide and quality-control software and test files. Lead developer Dual Lab specializes in technology-intensive application development, while KEEP Solutions focuses on open source solutions for archival institutions.

veraPDF is funded by the PREFORMA project. PREFORMA – PREservation FORMAts for culture information/e-archives, is a Pre-Commercial Procurement (PCP) project co-funded by the European Commission under its FP7-ICT Programme. The project’s main aim is to address the challenge of implementing standardised file formats for preserving digital objects in the long term, giving memory institutions full control over the acceptance and management of preservation files into digital repositories.

Tipo: European Union project – ICT-2013.11.2

Ano: 2014-2017

URL: http://verapdf.org

Eark logo

Archives provide an indispensable component of the digital ecosystem by safeguarding information and enabling access to it. Harmonisation of currently fragmented archival approaches is required to provide the economies of scale necessary for general adoption of end-to-end solutions. There is a critical need for an overarching methodology addressing business and operational issues, and technical solutions for ingest, preservation and re-use.

In co-operation w300ith commercial systems providers, E-ARK will create and pilot a pan-European methodology for electronic document archiving, synthesising existing national and international best practices, that will keep records and databases authentic and usable over time.

The methodology will be implemented in an open pilot in various national contexts, using existing, near-to-market tools, and services developed by the partners. This will allow memory institutions and their clients (public- and private-sector) to assess, in an operational context, the suitability of those state-of-the-art technologies.

Our objective is to provide a single, scalable, robust approach capable of meeting the needs of diverse organisations, public and private, large and small, and able to support complex data types. E-ARK will demonstrate the potential benefits for public administrations, public agencies, public services, citizens and business by providing simple, efficient access to the workflows for the three main activities of an archive – acquiring, preserving and enabling re-use of information.

The practices developed within the project will reduce the risk of information loss due to unsuitable approaches to keeping and archiving of records. The project will be public facing, providing a fully operational archival service, and access to information for its users. The project results will be generic and scalable in order to build an archival infrastructure across the EU and in environments where different legal systems and records management traditions apply. E-ARK will provide new types of access for business users.

E-ARK will pilot an end-to-end OAIS-compliant e-archival service covering ingest, vendor-neutral archiving, and reuse of structured and unstructured data, thus covering both databases and records, addressing the needs of data subjects, owners and users. The pilot and methodology will also focus on the essential pre-ingest phase of data export and normalisation in source systems. The pilot will integrate tools currently in use in partner organisations, and provide a framework for providers of these and similar tools ensuring compatibility and interoperability. A core component of the project is the integration platform which uses the existing ESSArch Preservation Platform (EPP) application as an Archival Information System, which is already in productive deployment at the National Archives of Norway and Sweden. In order to achieve scalability, E-ARK will adopt a data management and storage layer for this tool on top of the proven open-source Cloudera CDH4 distribution of Apache Hadoop, enabling storage and computational power to be seamlessly added to the system.

The pilot will run in several national archives, each of which will provide data to run in the pilot instance by agreement from an associated government data owner (e.g. national or regional / federal).

To sustain the outputs of our project, project partner The DLM Forum, comprising 22 national archives and associated commercial and technical providers, is well placed to ensure these. Using the open Apache licensing model, commercial suppliers will be able to incorporate the project outputs (particularly the open interfaces for pre-ingest, ingest, archival, access and re-use) into their own systems, enhancing their longevity. National archives running E-ARK pilot instances will serve as exemplars for others wanting to adopt up the new e-archiving open system.

In addition, project partner, The Digital Preservation Coalition will promote best practices in this area, as will our dedicated government institution partners.

Tipo: European Union project – FP7 CIP-ICT-PSP-2013-7

Ano: 2014-2017

URL: http://eark-project.eu

The Collaboration to Clarify the Costs of Curation (4C) project will help organisations across Europe to more effectively invest in digital curation and preservation. Making an investment inevitably involves a cost and existing research on cost modelling provides the starting point for the 4C work. But the point of an investment is to realise a benefit, so work on cost must also focus on benefit, which must then encompass related concepts such as ‘risk’, ‘value’, ‘quality’ and ‘sustainability’. Organisations that understand this will be more able to effectively control and manage their digital assets over time, but they may also be able to create new cost-effective solutions and services for others.

Existing research into cost modelling is far from complete and there has been little uptake of the tools and methods that have been developed and very little integration into other digital curation processes. The main objective of the 4C project is, therefore, to ensure that where existing work is relevant, that stakeholders realise and understand how to employ those resources. But the additional aim of the work is to closely examine how they might be made more fit-for-purpose, relevant and useable by a wide range of organisations operating at different scales in both the public and the private sector.

These objectives will be achieved by a coordinated programme of outreach and engagement that will identify existing and emerging research and analyse user requirements. This will inform an assessment of where there are gaps in the current provision of tools, frameworks and models. The project will support stakeholders to better understand and articulate their requirements and will clarify some of the complexity of the relationships between cost and other factors. The outputs of this project will include various stakeholder engagement and dissemination events (focus groups, workshops, a conference), a series of reports, the creation of models and specifications, and the establishment of an international Curation Costs Exchange framework. All of this activity will enable the definition of a research and development agenda and a business engagement strategy which will be delivered to300 the European Commission in the form of a roadmap.

The consortium undertaking this project includes organisations with extensive domain expertise and experience with curation cost modelling issues. It includes national libraries and archives, specialist preservation and curation membership organisations, service providers, research departments and SME’s. It will be coordinated by a national funding organisation that specialises in supporting the innovative use of ICT methods and technologies.

Tipo: European Union project – FP7 ICT-2011.4.3

Ano: 2013-2015

URL: http://www.4cproject.eu

The SCAPE project will develop scalable services for planning and execution of institutional preservation strategies on an open source platform that orchestrates semi-automated workflows for large-scale, heterogeneous collections of complex digital objects. SCAPE will enhance the state of the art of digital preservation in three ways: by developing infrastructure and tools for scalable preservation actions; by providing a framework for automated, quality-assured preservation workflows and by integrating these components with a policy-based preservation planning and watch system. These concrete project results will be validated within three large-scale Testbeds from diverse application areas.

SCAPE approaches digital preservation through research and development sub-projects: Testbeds, Preservation Components, Platform, and Planning and Watch.

The SCAPE Testbeds are the primary driver for the rest of the project, in that they define use case scenarios, create preservation workflows, and assess the large scale applicability of the SCAPE Preservation Platform and the preservation components developed within the project. Using these software components, test environments are created for the different scenarios and the complex large scale preservation workflows.

SCAPE Preservation Components address known limitations of digital preservation systems on three levels: scalability, functional coverage

quality. This sub-project improves and extends existing tools, develop new ones where necessary, and apply proven approaches to the problem of ensuring quality in digital preservation.

Building on the state of the art and focusing on formats and tools that are considered most important by the Testbeds sub-project, SCAPE investigates methods to parallelise and embed components in robust and scalable workflows. A major focus is the ability to capture relevant provenance and contextual information and metadata, and to provide usable outputs for automated policy-driven preservation.

The SCAPE Platform will provide an extensible infrastructure for the execution of digital preservation processes on large volumes of data. It will include a flexible mechanism for the integration of existing digital repository systems and provide a reference implementation. The Preservation Platform will also provide the underlying environment for large-scale testing and evaluation performed by the Testbeds and the Preservation Component providers in the project. The computational layer of the Preservation Platform system will make use of Hadoop, with the underlying distributed storage layer being based on HBase, which provides high performance and scalable data storage on top of Hadoop’s Distributed File System (HDFS).

The Planning and Watch Components developed in SCAPE address the bottleneck of decision processes and processing information required for decision making. Work on these components started with a conceptual analysis, based on extensive real-world application experience. A set of essential policy elements is being defined and modelled. These elements will make use of the SCAPE Policy Catalogue. Building on SCAPE’s machine-understandable policy representation and the first release of the automated planning component, core watch services will be implemented. In the final phase the policy-aware planning component will be fully integrated with the platform and repository operations.300

The Cross-project Activities in SCAPE include project management and coordination as well as the investigation of Open Research Challenges and a Research Roadmap. These activities provide administrative control and technical coordination for the project as well as focused research on innovative and emerging technologies having the potential to improve SCAPE’s capabilities.

The project’s Take-up Activities aim to provide both coordination for communication and dissemination of project results within and beyond the project. A number of training activities, which will also incorporate Best Practice guidelines, are aimed at fostering the take-up of project outputs at technical, operational and strategic levels. Furthermore, they will ensure that SCAPE has a long-term and sustained impact beyond the runtime of the project.

Tipo: European Union project – FP7 ICT-2009.4.1

Ano: 2011-2014

URL: http://www.scape-project.eu

Projetos de doutoramento

Automated Watch for Digital Preservation

The current exponential growth of the digital created documents is an obvious effect of the global tendency towards the digital technology. Replacing paper with digital documents has become a common activity in all kinds of institutions and many already completely eradicated the use of paper. Even European policies, as the eGovernment, urge for the public administration to cease the use of paper, and provide all services and documentation in digital form.

But documents in digital form are much more perishable than their paper counterparts and it is not obvious for the normal user that keeping a digital document accessible for several decades is a very difficult task. Furthermore, some aspects that a normal user will consider maintained when keeping the physical form of the paper do not behave the same way when the information is in digital form. Authenticity is one of these aspects, and it is crucial as the information as no value to be kept if the power to serve as evidence is lost. The digital preservation field tries to tackle all these problems and is currently one of the main concerns of the European research efforts, like the Seventh Framework Program (FP7) .

The main difficulty of digital preservation resides on the ever-changing technological environment to which the documents must maintain compatibility. Part of the solution must pass by the detection of these changes by continuously monitoring the environment, the users and the documents to detect preservation risks. This PhD project focuses on creating automatic and systematic ways to monitor the environment and provide a valuable input for risk detection and assessment.

Tipo: Doutoramento em Informática

Autor: Luís Faria

Ano: 2011-2017300

Preservação de longa duração de informação digital no contexto de um arquivo histórico

Este projeto visa o desenvolvimento de uma Arquitectura Orientada ao Serviço (SOA) capaz de auxiliar organizações e/ou indivíduos na implementação de intervenções de preservação. O sistema é constituído por um conjunto de componentes, fisicamente distribuídos, que são capazes de realizar o seguinte conjunto de actividades: executar acções de preservação baseadas em migração de formatos (conversão); determinar a quantidade de informação, propriedades significativas e funcionalidades perdidas durante uma migração (controlo de qualidade); produzir relatórios que possam ser utilizados como metainformação de preservação e que documentam a intervenção de preservação (autenticidade); e fornecer sugestões de formatos de destino e/ou serviços de conversão que maximizem a satisfação da entidade-cliente (selecção de alternativas de migração). O sistema desenvolvido foi avaliado no que diz respeito à sua capacidade de produzir recomendações de alternativas de migração capazes de satisfazer os requisitos de preservação manifestados por uma entidade-cliente.

Tipo: Doutoramento em Tecnologias e Sistemas de Informação

Autor: Miguel Ferreira

Ano: 2005-2008