Use Case

A collection holder wants to reduce storage costs for his collections that are currently available as TIFF master files. She/he heard that JPEG2000 is a good candidate for storing digital master files, and she/he heard about the efficiency of image compression especially when using lossy compression.

On the one hand, she/he knows that JPEG2000 compression can be "visually lossless", the compression not being reversible, however the changes in the appearance of the image not being visible for the human eye. On the other hand, she/he is still concerned about the impact the JPEG2000 compression might have on the OCR result.

We suggest a Taverna workflow that creates an executable processing pipeline for studying the results.

The workflow should have 1 TIFF image as input and a list of increasing compression parameters which are used when encoding the image. The image should then be decompressed before applying the OCR. Finally, the impact of the compression on the OCR should be measured by comparing the original OCR output to the OCR output of the compressed images.

==> 3 Groups

Image: http://fue.onb.ac.at/scape/testdata/bsbbookpage.tif

Group 1) Use the toolwrapper for providing access to a JPEG2000 encoding/decoding tool:

Group 2) Use Taverna for creating the workflow:

Group 3) Use a Taverna beanshell for creating the Text comparison
  • commons-lang-2.4.jar (/home/<youruser>/.taverna-home/lib/commons-lang-2.4.jar)