The Problem: MSSQL database with "LATIN1" (case-sensitive) encoding, SQLAlchemy and PyMSSQL. Fields with non-ascii characters were sometimes being returned double encoded.. The actual case is that SQLAlchemy passes parameters to the DBAPI as however you pass them to SQLAlchemy.. the convert_unicode option doesn't seem to change the encoding of unicode parameters when saving, only the SQL string.. which seems kind of inconsistent. I'm actually not 100% certain of the interactions here, and who is to blame - PyMSSQL or SQLAlchemy - but let's split the difference and blame Microsoft, they're always a good target.
The Solution: A custom type:
class EncodedString(types.TypeDecorator):
impl = types.String
def process_bind_param(self, value, dialect):
if type(value) == unicode:
return value.encode(dialect.encoding, 'replace')
return value
def process_result_value(self, value, dialect):
if value and type(value) == str:
return value.decode(dialect.encoding)
return value
and instead of defining a Column with types.String or types.Unicode, use EncodedString.
It looks like types.Unicode should do exactly this, but for some reason it's not working for me.