The good old evil CGI module

Jun 11

von Sebastian am 11.06.2011 um 21:41 in English, Perl

Dieser Post wurde aus meiner alten WordPress-Installation importiert. Sollte es Darstellungsprobleme, falsche Links oder fehlende Bilder geben, bitte einfach hier einen Kommentar hinterlassen. Danke.

Currently I'm working on a new project, no, I was working on it and finished yesterday. Jörg, the other guy involved in this, is very good at testing new things - and he found a problem which hasn't been there before: UTF-8 chars are shown as two-byte crap, so "ä" was shown as "Ã¤". Common thing: UTF-8 conversion, but I didn't realize why it didn't show up before (on my development system) and why it didn't show up on other projects.I spent more than three hours tonight hunting this, trying to debug, encode and convert it still don't getting the point why it didn't show up on other projects using the same YAWF framework on the same server.

About half an hour ago, I got the final idea suspecting the CGI module. After logging into the project's website, the problematic text was shown right. No problems any longer - until you save it!

Here are the two reasons:

The text was okay now, because it came from the database. All my other projects (on this framework and server) are using Postgres, but this new one is using MongoDB and while the SQL framework module is fetching a fresh copy of the row from database after every INSERT or UPDATE, the MongoDB module doesn't do this. Bad behavior for the SQL module, I know, but no apologies for this now.
The main problem was... the CGI module. It used to use ISO-8859-1 as the default charset, which is ok because it's the same for a long time, but it converts incoming UTF-8 data to ISO-8859-1 bytes. The database doesn't make any difference between two ISO-8859-1 bytes and one UTF-8 char, so Perl accepts the two bytes from the database as one UTF-8 char. The solution was simple: use CGI qw(-utf8); and everything was fine. I don't blame the CGI module for being backward compatible, but a UTF-8 section might be a good idea for the POD.

I'm writing this down, because I know that I'll run into this again sooner or later and maybe others might also do so.

The UTF8 fix has gone to YAWF core, so it won't temper me again on my current default framework.

Jörg, you could go on testing now. (He's a developers nightmare as he always finds any bug in projects submitted as "done". :-) )