当前位置: 动力学知识库 > 问答 > 编程问答 >

regex - Escaping question mark character in sed bash script variable

问题描述:

I have a set of saved html files with links in them of the form http://mywebsite.com/showfile.cgi?key=somenumber but I want to kill the question mark (side-story is that firefox hates ? and randomly converts it to %3F I'm sure there's some magic fix but that's for another question...)

However, I think my code is causing the question-mark character to not be read/saved/handled properly when storing the options as a variable by bash

# Doesn't work (no pattern matched)

SED_OPTIONS='-i s/\.cgi\?key/\.cgikey/g'

# Works e.g. http://mywebsite.com/showfileblah?key=somenumber

SED_OPTIONS='-i s/\.cgi/blah/g'

# Leaves question mark in e.g. http://mywebsite.com/showfile.blah?key=somenumber

SED_OPTIONS='-i s/cgi\?/blah/g'

# Actual sed command run when using SED_OPTIONS (I define FILES earlier in

# the code)

sed $SED_OPTIONS $FILES

# Not using the SED_OPTIONS variable works

# e.g. http://mywebsite.com/showfile.cgikey=somenumber

sed -i s/\.cgi\?key/\.cgikey/g $FILES

How can I get the full command to work using the SED_OPTIONS variable?

网友答案:

The safest way to store a list of options and arguments in variables is to use an array:

Also:

  • You're using a basic regular expression (no -r or -E option), so ? is not a special char. and needs no escaping.
  • In the replacement string, which is not a regex, do not escape ..
  • No need for option g, since you're only replacing 1 occurrence per line.
# Create array with individual options/arguments.
SED_ARGS=( '-i' 's/\.cgi?key/.cgikey/' )

# Invoke `sed` with array - note the double-quoting.
sed "${SED_ARGS[@]}" $FILES

Similarly, it would be safer to use an array for the list of input files. $FILES will only work if the individual filenames contain no embedded whitespace or other elements subject to shell expansions.

Generally:

  • Single-quote string literals, such as the sed script here - to prevent the shell from interpreting them.
  • Double-quote variable references, to prevent the shell from performing additional operations on them, such as pathname expansion (globbing) and word splitting (splitting into multiple tokens by whitespace).
网友答案:

I suggest storing the arguments for sed in an array:

SED_OPTIONS=( '-i' '-e' 's/\.cgi?key/\.cgikey/g' )

sed "${SED_OPTIONS[@]}" $FILES

However, that's only a part of the trouble.

First, note that when you type:

sed -i s/\.cgi\?key/\.cgikey/g $FILES

what sed sees as the script argument is actually:

s/.cgi?key/.cgikey/g

because you didn't use any quotes to preserve the backslashes. (To demonstrate, use printf "%s\n" s/\.cgi\?key/\.cgikey/g, thus avoiding any questions of whether echo is interpreting the backslashes.) One side effect of this is that a URL such as:

http://example.com/nodotcgi?key=value

will be mapped to:

http://example.com/nodo.cgikey=value

Using the single quotes when setting SED_OPTIONS ensures that the backslashes are preserved where required, and not putting a backslash before the ? works. I have both GNU sed and BSD sed on my Mac; I've aliased them as gnu-sed and bsd-sed for clarity. Note that BSD sed requires a suffix for -i and won't accept standard input with -i. So, I've dropped the -i from the commands.

$ URLS=(http://example.com/script.cgi?key=value http://example.com/nodotcgi?key=value)
$ SED_OPTIONS=( '-e' 's/\.cgi?key/\.cgikey/g' )
$ printf "%s\n" "${URLS[@]}" | bsd-sed "${SED_OPTIONS[@]}"
http://example.com/script.cgikey=value
http://example.com/nodotcgi?key=value
$ printf "%s\n" "${URLS[@]}" | gnu-sed "${SED_OPTIONS[@]}"
http://example.com/script.cgikey=value
http://example.com/nodotcgi?key=value
$ SED_OPTIONS=( '-e' 's/\.cgi\?key/\.cgikey/g' )
$ printf "%s\n" "${URLS[@]}" | bsd-sed "${SED_OPTIONS[@]}"
http://example.com/script.cgikey=value
http://example.com/nodotcgi?key=value
$ printf "%s\n" "${URLS[@]}" | gnu-sed "${SED_OPTIONS[@]}"
http://example.com/script.cgi?key=value
http://example.com/nodotcgi?key=value
$

Note the difference in behaviour between the two versions of sed when there's a backslash before the question mark (second part of the example).

分享给朋友:
您可能感兴趣的文章:
随机阅读: