当前位置: 动力学知识库 > 问答 > 编程问答 >

regex - Regular expression in R behaves differently than in other languages

问题描述:

The regular expression pattern ^[A-Z]{2,4}$ specifies that the string to be matched should start with an uppercase letter and end with an uppercase letter. It also requires that there be exactly two, three, or four letters present. Anything else will not be considered valid:

filter_symbols <- function(symbols) {

valid <- regexpr("^[A-Z]{2,4}$", symbols)

return(sort(symbols[valid == 1]))

#valid

}

filter_symbols(c("MOT", "CVX", "123", "GOG2", "XLE", "AAPL", "AAPLS", "A"))

...and it works like a charm:

[1] "AAPL" "CVX" "MOT" "XLE"

Now when you test the same code here (and there are many similar online regex tester out there):

^[A-Z]{2,4}$

Debuggex Demo

...you don't get any match (neither when you start the words in new lines each) - why is it behaving differently in both cases?

网友答案:

By default, ^ matches at the start of the string, and $ matches only at the end.

Debbugex and other related sites pass the whole input textarea as a single input string, so your regex actually was being matched against MOT\ncvx\n123...AAPL.

Enable the m (multiline) flag - in this mode, ^ and $ will match the start/end of each line and it will enable you to test multiple inputs.

See the updated debuggex demo

网友答案:

In Debuggex, no match results yield because you don't have the correct modifier turned on.

In most all regular expression engines, the anchors ^ and $ only match (respectively) at the beginning and the end of the string by default. If you want to match the begin/end of each line (not only begin/end of string), turn on the m (multi-line) modifier which causes this behavior.

You can see the difference with this mode modifier being turned on — Debuggex Demo

分享给朋友:
您可能感兴趣的文章:
随机阅读: