I've investigated several C# DLL's and have not found any that work especially well. My requirements are:
The downstream process that will consume the text is set up to use PDFBox, which seems to work well. But:
I'm surprised I cannot find a PDF converter recipe, it seems like a common requirement. So, could anyone help me with either:
Thanks in advance.
I originally asked how to write binary data to a Process.StandardInput (StreamWriter) since it only handles character data: the answer is to use Process.StandardInput.BaseStream (Stream).
In addition, since both pipes might fill up (64KB buffers IIUC), I used the following pattern:
So other than the hackish aspect of either putting an executable within the WebApp (or requiring a separate install) this seems to work fine -- but I still need to do some abuse testing.
If you have the posibility run Process() on your server, you could use XPDF from http://www.foolabs.com/xpdf/ One of the utils is PDFtoText that is capapble of extracting text i PDF files, and even maintaining some type of layout.
Back in the old days, I have heard of examples where Adobe search and MS Index service combined could extract text of PDF files as well.