当前位置: 动力学知识库 > 问答 > 编程问答 >

Fastest way to find a string in a text file with java

问题描述:

What is the fastest way to check if a file contains a certain string or number?

网友答案:

Have a look at the Scanner class, that ships with JDK (See official documentation). You will be able to skip certain parts of input (in this case - text file) and match against regular expression of your desire. I'm not sure if this is the most efficient way, but sure enough - it's pretty simple. You might also take a look at this example, which will help you get started.

网友答案:

Untried, but probably the fastest mechanism is to first, take your search key and encode it like the file.

For example, if you know the file is UTF-8, take your key and encode it from a String (which it UTF-16) in to a byte array that is UTF-8. This is important because by encoding down to the file representation, you're only encoding the key. Using standard Java Readers goes the other way -- converts the file to UTF-16.

Now that you have a proper key, in bytes, use NIO to create a MappedByteBuffer for the file. This maps the file in to the virtual memory space.

Finally, implement a Boyer-Moore algorithm for string search, using the bytes of the key against the bytes of the file via the mapped region,

There may well be a faster way, but this solves a bulk of the problems with searching a text file in Java. It leverages the VM to avoid copying large chunks of the file, and it skips the conversion step of whatever encoding the file is in to UTF-16, which Java uses internally.

网友答案:

Check out the following algorithms:

  • Boyer-Moore
  • Knuth-Morris-Pratt

or if you want to find one of a set of strings:

  • Rabin-Karp
网友答案:

The best realization I've found in MIMEParser: https://github.com/samskivert/ikvm-openjdk/blob/master/build/linux-amd64/impsrc/com/sun/xml/internal/org/jvnet/mimepull/MIMEParser.java

/**
 * Finds the boundary in the given buffer using Boyer-Moore algo.
 * Copied from java.util.regex.Pattern.java
 *
 * @param mybuf boundary to be searched in this mybuf
 * @param off start index in mybuf
 * @param len number of bytes in mybuf
 *
 * @return -1 if there is no match or index where the match starts
 */

private int match(byte[] mybuf, int off, int len) {

Needed also:

private void compileBoundaryPattern();
分享给朋友:
您可能感兴趣的文章:
随机阅读: