I would like to be able to split pages within a pdf document depending on the text contained within it (all pdfs would already be OCR'd when scanned in). Using a set of predefined rules the pdf pages would then be sent to a particular folder.
Step 1: 10 statements from 10 suppliers are scanned as a single pdf doc into a folder.
Step 2: After launching the Auto Program it then searches each page of the pdf file to determine which supplier it is and then extracts that page out of the pdf doc and files it as another pdf file into a particular folder (which would be the name of that supplier).
Step 3: It would do the same for all the 10 statement within the original pdf file BUT if there is some text that is not recognized then only these pages will remain in the original pdf file. In other words the original pdf file will be shrunk to just the pages that was not moved to another folder.
I hope this makes sense but please feel free to ask if you need clarification on anything.