When I split a string "hello world /n" with
"hello world \n".scan(/\w+/)
I would like to count \n or \t as string as well .
Do you want something like this?
"hello world \n".scan(/\w+|\n/)
Do not use
\w+ for counting words. It would separate numbers and words with Unicode like so:
"The floating point number is 13.5812".scan /\w+/ => ["The", "floating", "point", "number", "is", "13", "5812"]
The same is true for numbers with other delimiters like
In Ruby 1.8 the expression
w+ worked with Unicode, this has changed. If there are Unicode characters in your string, the word will be separated, too.
"Die Apfelbäume".scan /\w+/ => ["Die", "Apfelb", "ume"]
There are two options here.
You want to skip numbers altogether. Fine, just use
You don't want to skip numbers, because you want to count them as words, too. Then use
\S+ will match on non-whitespace characters
/[^ \t\r\n\f]/. The only disadvantage is, that your words will have other characters attached to them. Like brackets, hyphens, dots, etc. For the sole purpose of counting this should not be a problem.
If you want to have the words, too. Then you would need to apply additional character stripping.
\n has a special meaning: it evolves to caret return which counts as whitespace.
You should escape the backslash:
If you want to split your string by spaces only, you should use
"Hello world \n".split(/ /)
"hello world \n".scan /[\w\n\t]+/
This is better if you don't want to split up words with apostrophes (isn't, 90's, etc)
"hello world \n".split(/[^\w']+/)
You can use named character class [:cntrl:].
irb(main):001:0> "hello world \n".scan(/\w+|[[:cntrl:]]/) => ["hello", "world", "\n"]