My objective is to have a censoring. Currently atm I use message.replaceAll("(?i)word", "replacement") but this only catches the words which are not split up.
To bypass this people simply add a different character between the censored word.
So I want to have "Anyone else want to Y.O.L.O" turned into "Anyone else want to party" while just looking for 'yolo'. Keeping the '.' in there would be a bonus.
How about: (to replace "word" with "replacement")
msg.replaceAll("(?i)([^A-Za-z])w[^A-Za-z]?o[^A-Za-z]?r[^A-Za-z]?d([^A-Za-z])", "$1replacement$2") );
[^A-Za-z] is not a letter
[^A-Za-z]? is not a letter (optional)
$1 is the first thing in brackets (first
$2 is the first thing in brackets (last
An alternative is look-around:
msg.replaceAll("(?i)(?<=[^A-Za-z])w[^A-Za-z]?o[^A-Za-z]?r[^A-Za-z]?d(?=[^A-Za-z])", "replacement") );
It would not be difficult to generate the above automatically given a word.
Now that it's posted on the internet, everyone can see it and change their spamming to not get picked up by the above.
EDIT: I removed
\\b (word boundary) since 1word2 will get skipped.