PDF Step pdfToTextFilter

Description

Extracts all text content from within the current PDF document.

In general, PDF documents can place text in documents using a variety of mechanisms. They may contain text as a stream of characters in an expected order, the order may not be expected but explicit positioning will place it in the correct position or it may contain graphical representations of the characters. For these reasons, this filter may not always produce what you expect. You will have to experiment to see what will work for you.

Parameters

description: Required? no; The description of this test step.
fragSep: Required? no, default is a single space; The fragment separator string to use, e.g. "" or " " or "," or " | ". Only used if mode is "groupByLines".
lineSep: Required? no, default is platform line separator; The line separator string to use, e.g. " " or "\n".
mode: Required? no, default is normal; Deprecated: doesn't do anything anymore.
pageSep: Required? no, default is [+++ NEW PAGE +++]\n; The page separator string to use, e.g. "\n" or "------".

Details

Here is an example of using pdfToTextFilter:

pdfToTextFilter example

As a result of invoking the above steps a file would be created containing something like the following:

pdfToTextFilter output

Heading One
Subheading
[+++ NEW PAGE +++]
Heading Two
[+++ NEW PAGE +++]