I am getting a UnicodeDecodeError when reading a file that has non-ascii characters. Here is the snippet of code
fname = "c:\\testing\nonascii.txt"
print type(file) #it's unicode
Judging by the filename, you're using Windows. Files on Windows will not be UTF-8 encoded unless you take special care to save them that way; by default they will use your code page.
If you don't know what code page Windows is using, you can use the special encoding
mbcs to get what it uses for a default. If you want your program to work on other systems besides Windows, you can use
sys.getfilesystemencoding() to get a value that should work on the current system; on Windows it will return
import sys f=codecs.open(fname,"r",encoding=sys.getfilesystemencoding())
Your file is not really UTF-8.
One possiblity is that it is UTF-16 with a Byte Order Mark. If this is the problem, your error will be one of:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xfe in position 0: invalid start byte
UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte
depending on the endianess of the file.
There are other possible encodings that might be in use. If you post the actual traceback we might be able to tell more definitively.