Practical experiences with archiving PDF files

Monday 20 July, 13:00 BST / 14:00 CET

Registration for our next webinar is now open. Our session lead is Yvonne Friese from the Deutsche Zentralbibliothek für Wirtschaftswissenschaften:

This webinar deals with archiving PDF files. As PDF files in our repository have myriad data producers, the heterogeneity of PDF files is overwhelming. Unfortunately this means the creation of errors as well. Usually, the original data producers cannot be contacted any more, therefore we have to do the best we can with the PDF material.

As we ingest PDF files and (almost no) PDF/A-files, the validation is done with JHOVE. JHOVE flags issues with about 20% of our PDF files. We estimate that most of them do not contain really bad issues as they can be repaired or even migrated to PDF/A without any problems.

For the webinar, we have picked some of the truly broken PDF files which either cannot be repaired or converted or look strange after migration. As non-archiving is not an option, we have to think of creative ways to rescue what we can. Expect lots of screenshots of real-world PDF problems and some nice tools to automate as many steps as possible.

The webinar will last approximately one hour.