l have thousands of pdf documents that are 11-15mb. My program says that my document contains more than 100k characters.
Exception in thread "main"
Your document contained more than 100000 characters, and so your
requested limit has been reached. To receive the full text of the
document, increase your limit.
How can l increase the limit to 10-15mb ?
I found a solution which is new Tika facade class but l could not find a way to integrate it with mine.
Tika tika = new Tika();
Here is my code:
BodyContentHandler handler = new BodyContentHandler();
Metadata metadata = new Metadata();
String location = "C:\\Users\\Laptop\\Dropbox\\MainTextbookTrappe2ndEd.pdf";
FileInputStream inputstream = new FileInputStream(location);
ParseContext pcontext = new ParseContext();
PDFParser pdfparser = new PDFParser();
pdfparser.parse(inputstream, handler, metadata, pcontext);
System.out.println("Content of the PDF :" + pcontext);
BodyContentHandler handler = new BodyContentHandler(-1);
to disable the limit. From the Javadoc:
The internal string buffer is bounded at the given number of characters. If this write limit is reached, then a SAXException is thrown.
writeLimit- maximum number of characters to include in the string, or -1 to disable the write limit