当前位置: 动力学知识库 > 问答 > 编程问答 >

regex - split word in Ruby for counting

问题描述:

When I split a string "hello world /n" with

"hello world \n".scan(/\w+/)

I get ["hello", "world"]

I would like to count \n or \t as string as well .

网友答案:

Do you want something like this?

"hello world \n".scan(/\w+|\n/)
网友答案:

Do not use \w+ for counting words. It would separate numbers and words with Unicode like so:

"The floating point number is 13.5812".scan /\w+/
=> ["The", "floating", "point", "number", "is", "13", "5812"]

The same is true for numbers with other delimiters like "12,000".

In Ruby 1.8 the expression w+ worked with Unicode, this has changed. If there are Unicode characters in your string, the word will be separated, too.

"Die Apfelbäume".scan /\w+/
=> ["Die", "Apfelb", "ume"]

There are two options here.

  1. You want to skip numbers altogether. Fine, just use

    /\p{Letter}+/
    
  2. You don't want to skip numbers, because you want to count them as words, too. Then use

    /\S+/
    

    The expression \S+ will match on non-whitespace characters /[^ \t\r\n\f]/. The only disadvantage is, that your words will have other characters attached to them. Like brackets, hyphens, dots, etc. For the sole purpose of counting this should not be a problem.

    If you want to have the words, too. Then you would need to apply additional character stripping.

网友答案:

In strings \n has a special meaning: it evolves to caret return which counts as whitespace. You should escape the backslash: \\n.

If you want to split your string by spaces only, you should use

"Hello world \n".split(/ /)
网友答案:
"hello world \n".scan /[\w\n\t]+/
网友答案:

This is better if you don't want to split up words with apostrophes (isn't, 90's, etc)

"hello world \n".split(/[^\w']+/)
网友答案:

You can use named character class [:cntrl:].

irb(main):001:0> "hello world \n".scan(/\w+|[[:cntrl:]]/)
=> ["hello", "world", "\n"]
分享给朋友:
您可能感兴趣的文章:
随机阅读: