当前位置: 动力学知识库 > 问答 > 编程问答 >

Python - encoding difficulties with Hebrew

问题描述:

I am trying to concatenate a number of .txt files in Hebrew from a single folder into a single file. The encoding is cp1255, for Hebrew. I specified the coding, so it succeeds in opening the file, but the coding then fails when trying to write the string to the file.

If I don't specify the encoding at the open command, the open itself fails (on line 7).

dirLoc="source/folder"

import os

files=os.listdir(dirLoc)

for f in files:

if f.endswith('.txt'):

print(f)

data=open(dirLoc+'/'+f, 'r', encoding="cp1255")

out=open("outPut.txt", 'a')

for line in data:

out.write(line)

data.close()

out.close()

The error I get is the standard:

UnicodeDecodeError: 'charmap' codec can't decode byte X in position Y: character maps to undefined

Edit: Having played around with it some more, the problem seems to definitely be with writing a Hebrew string to the .txt file. This is true even if I resave the file in a different format (such as ANSI or utf-8) and change the encoding accordingly. It also works fine with .txt files in English.

网友答案:

Okay, having played around with this for another day, I found a solution, as follows:

dirLoc='source/folder'
import os
import codecs
files=os.listdir(dirLoc)
for f in files:
    if f.endswith('.txt'):
        data=codecs.open(dirLoc+'/'+f, 'r+', encoding='utf8')
    try:
        data1=data.read()
        out=codecs.open(dirLoc+'/outPut.txt', 'a+', encoding='utf8')
        try:
            out.write(data1)
        except:
            print('file ' +f+ ' failed to write')
    except:
        print('file '+f+' failed to read')
    out.close()       
    data.close()

The codecs.open allows me to specify encoding for the write function as well as the read - note that you have to import codecs in order to use it. The exceptions are there because the encoding is still a problem and the occasional file throws an exception. The try allows me to skip any file that does fail to read or write without failing altogether.

分享给朋友:
您可能感兴趣的文章:
随机阅读: