当前位置: 动力学知识库 > 问答 > 编程问答 >

pcre - Regex to match reocurring character groups

问题描述:

I'm trying to write a regex that would match groups of exactly three characters, that reoccur within the text at least one time.

What I came up with is this simple regex:(.{3}).*\g1, using the \g (global) and \s (dot also matches newline) flags. However, it is clearly faulty, as it only finds a part of the groups I'm hoping to capture. Any idea how can I improve it? Here is the link to an example input https://regex101.com/r/Cuiva1/2

Edit: Here's the full list of groups I was hoping to capture as requested in the comment:GLT,VIW,IWK,KTL,GLT,LTK,LIS,KTX,TXK,XDL,KTL

网友答案:

If your input is always multiple triplets of uppercase characters and you're only looking for ones that repeat, then you need something more complex to avoid backtracking into a previous triplet:

 /(?>[^A-Z]*+([A-Z]{3}))(?=(?:[^A-Z]*+[A-Z]{3})*?\1)|(?>[^A-Z]*+[A-Z]{3})/g

The matches from index 1 will hold what you want. If your strings are not that well formatted (i.e. may contain any length string in between repeating patterns, then you can use a simpler pattern but you'll get totally inconsistent results and miss some matches.


I re-read your desired output, you're not going to achieve this with regex. VIW and IWK are overlapping, which won't work in a single preg_match_all(). Just use string functions.

分享给朋友:
您可能感兴趣的文章:
随机阅读: