I've been looking at Java and ANTRL4, a very nice combination to build parsers. However, as I test them, I'm noting that parsing doesn't start until I send an EOF (CMD-D on a Mac, for example) to the input. That's fine for parsing a file but I can easily imagine building tools such as command line shells/processors very quickly with ANTLR. But it isn't doable unless I can make it parse as characters are typed (so that things happen after RETURN or even after a TAB if one wanted to do command completion, say).
Anyone know how to do this?
The simplest way to use Antlr4 'interactively' is to recognize that the parsing operation is quite fast and that, in a warm VM, re-instancing the parser is also quite fast. Indeed, well more than fast enough to re-parse the entire input text between each keystroke.
The basic strategy is, from a key event, grab the entire current input text and process it in a non-display thread. If the processing does not complete before the next key event, discard the processing thread and start a new one. When a processing iteration does complete, set the next key event to buffer (as needed) and apply the results to the input text.
A sustained stream of keystrokes is unlikely to be faster than 100ms per key event (about 80 wpm). On my system, repeated simple parsing of an editor's 'page' of code using the Java.g4 grammar averages around 5ms. Even with fairly significant processing, the background thread rarely requires more than about 25ms to complete. Of course, YMWV.
If the need is for continuous stream processing -- not 'interactive' -- then Antlr can be adapted to that purpose. This will require a minimal custom lexer that meets the Lexer & TokenStream interfaces but waits for actual input data in response to the Parser's
getCurrentToken() -- the parser's primary function to fetch the next token from the lexer.
StreamLexer tokens = new StreamLexer(yourInputStream); // custom lexer YourParser parser = new YourParser(tokens); parser.removeErrorListeners(); // remove ConsoleErrorListener parser.addErrorListener(new YourErrorListener()); parser.setErrorHandler(new YourParserErrorStrategy()); parser.start();
There is no actual lexer grammar -- the custom lexer simply wraps every input character as a separate token and the parser rules are written accordingly.
In effect, this turns the standard Antlr parser into a grammar-defined 'Push-Parser'. Speed will be limited to the run time of the matching functions of the parser or the data rate of the input stream, whichever is slower.
To achieve any significantly greater parsing speed, a purpose-built state machine will likely be necessary.