当前位置: 动力学知识库 > 问答 > 编程问答 >

python - Unicode error in `str.format()`

问题描述:

I am trying to run the following script, which scans for *.csproj files and checks for project dependencies in Visual Studio solutions, but I am getting the following error. I have already tried all sorts of codec and encode/decode and u'' combination, to no avail...

(the diacritics are intended and I plan to keep them).

Traceback (most recent call last):

File "E:\00 GIT\SolutionDependencies.py", line 44, in <module>

references = GetProjectReferences("MiotecGit")

File "E:\00 GIT\SolutionDependencies.py", line 40, in GetProjectReferences

outputline = u'"{}" -> "{}"'.format(projectName, referenceName)

UnicodeDecodeError: 'ascii' codec can't decode byte 0xed in position 19: ordinal not in range(128)

import glob

import os

import fnmatch

import re

import subprocess

import codecs

gvtemplate = """

digraph g {

rankdir = "LR"

#####

}

""".strip()

def GetProjectFiles(rootFolder):

result = []

for root, dirnames, filenames in os.walk(rootFolder):

for filename in fnmatch.filter(filenames, "*.csproj"):

result.append(os.path.join(root, filename))

return result

def GetProjectName(path):

result = os.path.splitext(os.path.basename(path))[0]

return result

def GetProjectReferences(rootFolder):

result = []

projectFiles = GetProjectFiles(rootFolder)

for projectFile in projectFiles:

projectName = GetProjectName(projectFile)

with codecs.open(projectFile, 'r', "utf-8") as pfile:

content = pfile.read()

references = re.findall("<ProjectReference.*?</ProjectReference>", content, re.DOTALL)

for reference in references:

referenceProject = re.search('"([^"]*?)"', reference).group(1)

referenceName = GetProjectName(referenceProject)

outputline = u'"{}" -> "{}"'.format(projectName, referenceName)

result.append(outputline)

return result

references = GetProjectReferences("MiotecGit")

output = u"\n".join(*references)

with codecs.open("output.gv", "w", 'utf-8') as outputfile:

outputfile.write(gvtemplate.replace("#####", output))

graphvizpath = glob.glob(r"C:\Program Files*\Graphviz*\bin\dot.*")[0]

command = '{} -Gcharset=latin1 -T pdf -o "output.pdf" "output.gv"'.format(graphvizpath)

subprocess.call(command)

网友答案:

When Python 2.x tries to use a byte string in a Unicode context, it automatically tries to decode the byte string to a Unicode string using the ascii codec. While the ascii codec is a safe choice, it often doesn't work.

For Windows environments the mbcs codec will select the code page that Windows uses for 8-bit characters. You can decode the string yourself explicitly.

outputline = u'"{}" -> "{}"'.format(projectName.decode('mbcs'), referenceName.decode('mbcs'))
分享给朋友:
您可能感兴趣的文章:
随机阅读: