The strings being examined resemble the following (notice the whitespace between the brackets):
[name] [address ] [ zip] [ phone number ]
The expression I am presently using...
...successfully captures each text within the brackets, but it also grabs the leading and trailing space so I end up with:
"name" "address " " zip" " phone number "
But what I seek is:
"name" "address" "zip" "phone number"
How can the regex be convinced to not capture the whitespace in these examples? (With the exception of embedded whitespace - such as that between the words in "phone number".)
(Note: I know I could just trim it from the captured variable after the expression is done, but I'm trying to do it within the context of the expression.)
Thanks for any ideas! Below is the exact code I'm using to test this:
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"\\[([^\\])]*)\\]" options:0 error:nil];
NSString *string = @" [name] [address ] [ zip] [ phone number ] ";
NSString *modifiedString = [regex stringByReplacingMatchesInString:string options:0 range:NSMakeRange(0, [string length])
withTemplate:@"\n\n[$1]"]; //note: adding brackets back here just to make it easy to see if the space has been trimmed properly from the captured value
I'm going to go through this step by step.
([^\])]*) is incorrect. This means "a sequence of 0 or more characters, as long as possible, not containing ] or )."
For instance, for this expression:
[name] [address ) ] [ zip] [ phone number ]
...the address part will be skipped over, as "address )" does not match
[^\)]]* (which means "a sequence of zero or more characters, not including ) and ]."
([^\]]*) instead, which will not skip ).
Next, we want to eat all the spaces around the capture. For that, we use two
* sequences, one on each side of the capture:
\[ *([^\]]*) *\]
Now we need to get tricky! The
[^\]]* is greedy by default. That means some of the spaces to either side may be matched by it, and thus included in the capture! We want to use the non greedy version,
[^\]]*?, instead. This means "a sequence of 0 or more characters, not containing ), as short as possible while conforming to the rest of the regular expression."
\[ *([^\]]*?) *\]
@"\\[ *([^\\]]+?) *\\]"
Be careful to enter the spaces in the above.
This will not capture the spaces:
The "?" makes the preceeding meta character non-greedy, greedy is the default.