当前位置: 动力学知识库 > 问答 > 编程问答 >

python - Correct latin-1 encoded UTF-8 in Django-ORM

问题描述:

I'm setting up a django-admin on top of a legacy MySQL database.

The database declares that it is latin-1 encoded. Some of the entered data in the database is indeed in latin-1 but some is actually UTF-8. This shows up as corrupt characters like: é € ä ö

The legacy application does some black magic to hide these errors and I cannot modify the database.

I found a Python library ftfy that can convert latin-1 corrupted UTF-8 to real UTF-8, for example the above chars get translated to "é € ä ö". I want to use it on all django.db.models.CharField and django.db.models.TextField data that is loaded from database. How to do it?

I tried to subclass django.db.models.CharField and django.db.models.TextField but couldn't figure out where to intercept the data from database. Optimal solution would be something like FTFYCharField which would always correct data that it gets from database.

网友答案:

Assuming read-only, I think what you are looking for is Writing custom model fields. In particular, look at the section Converting database values to Python objects. In the .to_python() method you can do what ever you want to any/all fields read from the DB.

If you also need to write (and maintain the weirdness), see the section on Preprocessing values before saving.

网友答案:

I know that this might be OFF... but maybe prevents some headache

before you make any "unicode" related change then please get to know what unicode means, and note that what you wrote "ö" == ö is only right when the unicode is encoded by the method UTF-8

分享给朋友:
您可能感兴趣的文章:
随机阅读: