当前位置: 动力学知识库 > 问答 > 编程问答 >

php - Fastest implementation to do multiple string substitutions in Python

问题描述:

Is there any recommended way to do multiple string substitutions other than doing 'replace' chaining on a string (i.e. text.replace(a, b).replace(c, d).replace(e, f) ...)?

How would you, for example, implement a fast function that behaves like PHP's htmlspecialchars in Python?

I compared (1) multiple 'replace' method, (2) the regular expression method, and (3) Matt Anderson's method.

With n=10 runs, the results came up as follows:

On 100 characters:

TIME: 0 ms [ replace_method(str) ]

TIME: 5 ms [ regular_expression_method(str, dict) ]

TIME: 1 ms [ matts_multi_replace_method(list, str) ]

On 1000 characters:

TIME: 0 ms [ replace_method(str) ]

TIME: 3 ms [ regular_expression_method(str, dict) ]

TIME: 2 ms [ matts_multi_replace_method(list, str) ]

On 10000 characters:

TIME: 3 ms [ replace_method(str) ]

TIME: 7 ms [ regular_expression_method(str, dict) ]

TIME: 5 ms [ matts_multi_replace_method(list, str) ]

On 100000 characters:

TIME: 36 ms [ replace_method(str) ]

TIME: 46 ms [ regular_expression_method(str, dict) ]

TIME: 39 ms [ matts_multi_replace_method(list, str) ]

On 1000000 characters:

TIME: 318 ms [ replace_method(str) ]

TIME: 360 ms [ regular_expression_method(str, dict) ]

TIME: 320 ms [ matts_multi_replace_method(list, str) ]

On 3687809 characters:

TIME: 1.277524 sec [ replace_method(str) ]

TIME: 1.290590 sec [ regular_expression_method(str, dict) ]

TIME: 1.116601 sec [ matts_multi_replace_method(list, str) ]

So kudos to Matt for beating the multi 'replace' method on a fairly large input string.

Anyone got ideas for beating it on a smaller string?

网友答案:

Something like the following maybe? Split the text into pieces with the first "from" item to be replaced, then recursively split each of those parts into sub-parts with the next "from" item to be replaced, and so on, until you've visited all your replacements. Then join with the "to" replacement item for each as recursive function completes.

A little hard to wrap your head around the following code perhaps (it was for me, and I wrote it), but it seems to function as intended. I didn't benchmark it, but I suspect it would be reasonably fast.

def multi_replace(pairs, text):
    stack = list(pairs)
    stack.reverse()
    def replace(stack, parts):
        if not stack:
            return parts
        # copy the stack so I don't disturb parallel recursions
        stack = list(stack) 
        from_, to = stack.pop()
        #print 'split (%r=>%r)' % (from_, to), parts
        split_parts = [replace(stack, part.split(from_)) for part in parts]
        parts = [to.join(split_subparts) for split_subparts in split_parts]
        #print 'join (%r=>%r)' % (from_, to), parts
        return parts
    return replace(stack, [text])[0]


print multi_replace(
    [('foo', 'bar'), ('baaz', 'foo'), ('quux', 'moop')], 
    'foobarbaazfooquuxquux')

for:

barbarfoobarmoopmoop
网友答案:

How fast? Also, how big are your strings?

There's a fairly simple recipe for building a regular expression to do the job on another site. It might need some tweaking to handle regex metacharacters; I didn't look too closely.

If that's not good enough, you probably need to write some C code, honestly. You can build a simple state machine to do all the replacements, and then process any string byte by byte with no backtracking along the machine to actually do the work. However, I doubt you will beat the regex engine without going to C and optimizing that.

网友答案:

Normally, .replace method beats all other methods. (See my benchmarks above.)

分享给朋友:
您可能感兴趣的文章:
随机阅读: