I'm setting up a django-admin on top of a legacy MySQL database.
The database declares that it is latin-1 encoded. Some of the entered data in the database is indeed in latin-1 but some is actually UTF-8. This shows up as corrupt characters like: Ã© â‚¬ Ã¤ Ã¶
The legacy application does some black magic to hide these errors and I cannot modify the database.
I found a Python library
ftfy that can convert latin-1 corrupted UTF-8 to real UTF-8, for example the above chars get translated to "é € ä ö". I want to use it on all
django.db.models.TextField data that is loaded from database. How to do it?
I tried to subclass
django.db.models.TextField but couldn't figure out where to intercept the data from database. Optimal solution would be something like
FTFYCharField which would always correct data that it gets from database.
Assuming read-only, I think what you are looking for is Writing custom model fields. In particular, look at the section Converting database values to Python objects. In the
.to_python() method you can do what ever you want to any/all fields read from the DB.
If you also need to write (and maintain the weirdness), see the section on Preprocessing values before saving.
I know that this might be OFF... but maybe prevents some headache
before you make any "unicode" related change then please get to know what unicode means, and note that what you wrote "Ã¶" == ö is only right when the unicode is encoded by the method UTF-8