当前位置: 动力学知识库 > 问答 > 编程问答 >

java - Strange behavior in regexes

问题描述:

There was a question about regex and trying to answer I found another strange things.

String x = "X";

System.out.println(x.replaceAll("X*", "Y"));

This prints YY. why??

String x = "X";

System.out.println(x.replaceAll("X*?", "Y"));

And this prints YXY

Why reluctant regex doesn't match 'X' character? There is "noting"X"nothing" but why first doesn't match three symbols and matches two and then one instead of three? and second regex matches only "nothing"s and not X?

网友答案:

Let's consider them in turn:

"X".replaceAll("X*", "Y")

There are two matches:

  1. At character position 0, X is matched, and is replaced with Y.
  2. At character position 1, the empty string is matched, and Y gets added to the output.

End result: YY.

"X".replaceAll("X*?", "Y")

There are also two matches:

  1. At character position 0, the empty string is matched, and Y gets added to the output. The character at this position, X, was not consumed by the match, and is therefore copied into the output verbatim.
  2. At character position 1, the empty string is matched, and Y gets added to the output.

End result: YXY.

网友答案:

The * is a tricky 'quantifier' since it means '0 or more'. Thus, it also matches '0 times X' (i.e. an empty string).

I would use

"X".replaceAll("X+", "Y")

which has the expected behaviour.

网友答案:

In your first example you are using a "Greedy" quantifier. This means that the input string is forced to be read entirely before attempting the first match, so the first match tried is the whole input. If the input matches, the matcher goes past the input string and performs the zero-length match at the end of the string hence the two matches you see. The greedy matcher never backs-off to the zero-length match before the character X before the first match attempt was successful. On the second example you are using a "Reluctant" quantifier which does the opposite of "Greedy". It starts at the beginning and tries to match one character at the time going forward (if it has to). So the zero-length match before the "X" character is matched, matcher moves forward by one (that's why you still see the "X" character in the output) where the next match is now the zero-length match after the "X". There is a good tutorial here: http://docs.oracle.com/javase/tutorial/essential/regex/quant.html

分享给朋友:
您可能感兴趣的文章:
随机阅读: