The Problem: MSSQL database with "LATIN1" (case-sensitive) encoding, SQLAlchemy and PyMSSQL. Fields with non-ascii characters were sometimes being returned double encoded.. The actual case is that SQLAlchemy passes parameters to the DBAPI as however you pass them to SQLAlchemy.. the convert_unicode option doesn't seem to change the encoding of unicode parameters when saving, only the SQL string.. which seems kind of inconsistent. I'm actually not 100% certain of the interactions here, and who is to blame - PyMSSQL or SQLAlchemy - but let's split the difference and blame Microsoft, they're always a good target.

The Solution: A custom type:

class EncodedString(types.TypeDecorator):
    impl = types.String
    def process_bind_param(self, value, dialect):
        if type(value) == unicode:
            return value.encode(dialect.encoding, 'replace')
        return value
    def process_result_value(self, value, dialect):
        if value and type(value) == str:
            return value.decode(dialect.encoding)
        return value

and instead of defining a Column with types.String or types.Unicode, use EncodedString.

It looks like types.Unicode should do exactly this, but for some reason it's not working for me.

— by Robert Thomson, created 16th Jul, 2009, last modified 16th Jul, 2009 | Tags: Tech