TOP

Don’t use exec and database connections

Typical Perl scripts (and others running as CGI-scripts) run and exit once they’re done, but this isn’t very efficient with medium or high request counts. Persistent solutions like FCGI and ModPerl avoid the additional interpreter loading and compiling phases, but start being challenging if any source file is changed. (weiterlesen…)

Weiterlesen
TOP

mySQL Date Bug: `date` >= ’2012-03-01 00:00:00′ is not the same as `date` >= ’2012-3-01 00:00:00′

I’m loving it! Really, mySQL is an interesting creature. It’s full of surprises and you never know what it’ll do. There seems to be a date bug which took me only about an hour to find. (weiterlesen…)

Weiterlesen
TOP

SQL-Problem: SELECT id; if ($id) { UPDATE } else { INSERT }

SQL hat viele Nachteile, aber einer der größten ist die Unsinnige Aufteilung zwischen der Erstellung und Veränderung eines Datensatzes. mySQL macht es dem genervten Programmiere mit MyISAM auch nicht leichter, denn Row-Locks oder Transaktionen werden nicht unterstützt. Aber ich schweife ab… (weiterlesen…)

Weiterlesen
TOP

Using a mySQL Temporary Table

Huston, we have a problem. Our company’s headquarter isn’t in Huston, but anyway… we got inconsistent data between some mySQL tables. There is a cron job fixing such problems but usually only very few records need assistance, much less than 10 a week. Updating the script to find the newly discovery differences was less than a hour, but running it turned out impossible.

The problem was memory and it was out of our control. I learned developing software on a Datapoint 6600 mainframe with 64kB memory. Not per variable or per process but 64kB overall for all running tasks. Switching to PC let me to a world with unlimited resources: 640kB memory! Single process operating system (MS-DOS) meaning: Nearly all of that memory was dedicated to my application! Pure profusion!

But back to these days. The Perl interpreter eats up about 2 MB of memory, loading database access and some other modules pushes each task to about 80 MB (way too much and I’m still working on this, but that’s another topic). This cronjob is limited to about 256 MB of memory (actually 512 MB VSZ which is much easier to limit than real RSS memory) and when starting to repair the newly discovered problems, his memory usage started to explode. I could simply  increase the limit – todays servers have more than enough memory – but it won’t help because it might really hit that hardware limitation.

The basic reason for the problem was mySQL: Unable to use cursors like real databases do, it pushes stuff to the client resulting in

Out of memory (Needed 2423520 bytes)
DBD::mysql::st execute failed: MySQL client ran out of memory at repair_cronjob.pl line 50.
DBD::mysql::st fetchrow_hashref failed: fetch() without execute() at repair_cronjob.pl line 51.
DBD::mysql::st execute failed: Lost connection to MySQL server during query at repair_cronjob.pl line 50.
Couldn't execute statement: Lost connection to MySQL server during query at repair_cronjob.pl line 50.

Yes: The script is using fetchrow_hashref to fetch the results row-by-row because the author (not me) didn’t want all results to be in memory at once – but mySQL failed to proceed as instructed and tried to download all the results at once. I totally agree with the author – I’m using the same idea to reduce resource usage which works really great on Sybase, Microsoft SQL and Postgres. Each row contains about 5 columns: One 64-byte-varchar and plain integer, one row alone could never get that big to require 2423520 bytes of memory.

Here is my solution: Collect all the records to be processed in one place and fetch them in chunks bypassing mySQL’s limitations. After thinking about a tied hash or a temporary local storage file, I ended up at a temporary table – something I never used before.

First, very important: If you’re using a mySQL cluster, you need to force every of the following queries to go to the same slave. If you can’t ensure this – don’t try to continue, you’ll end up having your temporary table on one slave while trying to insert the data on another one. Yes, we’re writing to a slave, unbelievable but supported by mySQL for temporary tables.

Simply create a table before collecting the data but add the “temporary” keyword. This table will only life on the server you’re connected to, won’t be accessible by other connections (even if your script is connecting twice to the same server) and will be dropped once your connection ends.

 CREATE TEMPORARY TABLE tt_repair_queue (
        `id` int(10) NOT NULL AUTO_INCREMENT,
        `user_id` int(10) unsigned NOT NULL,
        `user_email` char(200) NOT NULL,
        `reference_id` int(10) unsigned NOT NULL,
        `reference_email` char(80) NOT NULL,
        PRIMARY KEY (`id`)

Looks easy? It is! I added the “tt_” prefix to mark it as temporary table. Now let’s add some data:

INSERT INTO repair_queue(user_id, user_email, reference_id, reference_email)
        SELECT C.id AS user_id, C.email AS user_email, R.id AS reference_id, R.username AS reference_email
                FROM CUSTOMER_USERS C LEFT JOIN REFERENCE_LIST R ON C.id = R.user_id

Not that complicated, simple SQL you (should) already know. Here is how you get there: Build up your SELECT to get the required data and once you’re happy, add a regular INSERT INTO statement replacing the VALUES() by your SELECT – that’s all folks. These are samples, my real world temp. table holds more data and requires a dozen other queries to fill it up. You could use it like any regular static table.

The script is running much faster and using only about 12 MB over the minimum-including-modules now, great win but still nothing real – I still need to process those data in Perl doing things I can’t do within the database and – you might remember – the problem started trying to fetch rows one-by-one from the mySQL server. Here is my solution:

        my $sth_tt_read = $dbh->prepare_cached(
                "SELECT * FROM tt_repair_queue ORDER BY id ASC LIMIT ?,$MAX_ROWS_PER_LOOP" );
        my $start_row = 0;
        while (1) {
                $sth_tt_read->execute($start_row);
                while (my $row = $sth_tt_read->fetchrow_hashref) {
                        # Huge block of code doing the actual repair job
                }
                last if $sth_tt_read->rows < $MAX_ROWS_PER_LOOP;
                $start_row += $MAX_ROWS_PER_LOOP;
        }

What is happening? The statement handle reading 1000 rows at a time (I set the pseudo-constant $MAX_ROWS_PER_LOOP to 1000 at the beginning of the script) is prepared and the the first start row is set to 0. Each completed 1000-rows-fetch adds another 1000 to the $start_row starting value and tries again. If one fetch got less than 1000 rows, there is nothing more to find and the loop ends. In the unlikely case that the temp table row count is dividable by 1000, one additional fetch is being done returning zero rows. The inner while won’t run at all and the last does his job (as long as 0 is still less than 1000).

Don’t forget to finally DROP TABLE tt_repair_queue. I do this even if my script ends (including a auto-terminate of the mySQL connection) just after processing the last line, because I like to clean up things in my scripts. Maybe someone will change it later and add a long running loop at the end. The mySQL server could keep the temp. table much longer than required.

One thing left: My temp. table will never get additional data once the first SELECT call started, it’s by concept of the cronjob but I’m still unsure that mySQL will sort the data the same way on any subsequent SELECT … LIMIT. If sort order changes between requests, I might miss rows and get others twice. I addressed  this problem in a stackexchange question and the ORDER BY clause may be removed once the question is answered.

Weiterlesen
TOP

Unerklärliche mySQL “Unknown database” Fehlermeldungen

“Never change a running system” ist einer der Pfeiler, auf denen die IT-Welt erbaut ist. Jeder der schon einmal mit einem Computer zu tun hatte weiß, dass sich dieser Grundsatz nicht immer befolgen lässt, so auch heute Morgen. (weiterlesen…)

Weiterlesen
TOP

mySQL myISAM myPROBLEM

mySQL sorgt für viele interessante Effekte, einer davon ist das myISAM Table Locking, welches gerne auch Queries auf andere Tabellen blockiert.

(weiterlesen…)

Weiterlesen
TOP

mySQL Geschwindigkeit verdoppeln

Ein guter Datenbankexperte kann jedes Quentchen Geschwindigkeit aus einer Datenbank herausquetschen und ungeahnte Leistung freisetzen. Normalsterblichen bleibt häufig nur das Fluchen und der zweifelhafte Versuch die fehlende Optimierung durch Hardwareupgrades wett zu machen.
(weiterlesen…)

Weiterlesen
TOP

mySQL für Umsteiger

PHP und mySQL sind sind klassische Einstiegsdroge in die Web-Programmierung. Im Gegensatz zu den am Straßenrand verkauften Pülverchen & Co. kann es in der Web-Programmierung allerdings nur besser werden – womit allerdings auch das Suchtpotential zunimmt. Schwieriger wird es, wenn man von der anderen Seite kommt und sich auf mySQL umstellen muss. (weiterlesen…)

Weiterlesen