当前位置: 动力学知识库 > 问答 > 编程问答 >

python - Get start location of capturing group within regex pattern

问题描述:

Basically, I want to find the index for the first occurrence of any of the substrings: "ABC", "DEF", or "GHI", so long as they occur in an interval of three. The regex that I wrote to match this pattern is:

regex = compile ("(?:[a-zA-Z]{3})*?(ABC|DEF|GHI)")

The *? ensures that I get the first match, since it's non-greedy. I'm using a capturing group since I assume that that is the only way to actually get the index (of the substring) that I'm actually looking for. I don't care where the match itself starts, just where the capturing group starts. The ...{3}... mandates that the pattern occur in an interval of 3, ie:

example_1 = "BNDABCDJML"

example_2 = "JKMJABCKME"

example_1 would match since "ABC" occurs at position 3 but example_2 would not match since "ABC" occurs at position 4.

Ideally, given the string:

text = "STCABCFFC"

this matches, but if I simply get the start of the match, it will give me 0, since that's the beginning index of the match, where what I want is 3

I'd like to do this:

print match(regex, text).group(1).start()

but, of course, this doesn't work, since start() is not a method for strings, plus the string is now independent of text. I can't simply search for the starting index of the substring in the capturing group, because that won't guarantee me that it follows the regex pattern (only occur in intervals of 3). Perhaps I'm overlooking something, I don't write too much in python, so forgive me if this is a trivial question.

网友答案:

You were on the right track. start is a method for the MatchObject. Here's the example they give in the docs:

>>> email = "[email protected]_thisger.net"
>>> m = re.search("remove_this", email)
>>> email[:m.start()] + email[m.end():]
'[email protected]'

Basically, instead of match(regex, text).group(1).start() you should do match(regex, text).start(1).

网友答案:

You can get the start and end index from the match object - re.MatchObject.start(group), re.MatchObject.end(group):

regex = compile ("(?:[a-zA-Z]{3})*?(ABC|DEF|GHI)") 

for m in re.finditer(regex, "STCABCFFC"):
    print m.start(1), m.end(1)
    print m.span(1)  # Prints 2-element tuple `(start, end)`
分享给朋友:
您可能感兴趣的文章:
随机阅读: