Seitenanfang

Hunting down a memory leak

Dieser Post wurde aus meiner alten WordPress-Installation importiert. Sollte es Darstellungsprobleme, falsche Links oder fehlende Bilder geben, bitte einfach hier einen Kommentar hinterlassen. Danke.


Happy new year to all of you!

A recently finished refactoring-project heavily uses Memcached to speed up access to calculated statistics data, but creation of this data is very slow and I decided to preheat the cache by filling it within a nightly cronjob.

The script was very easy:

#!/usr/bin/perl

# Calculate old data to pre-fill the cacheuse MyProject::DB;use MyProject::Table::User;

my $dbh = MyProject::DB->connect;

# Loop through all users, don't convert to for-loop as users won't go out-of-scope there..for my $user (MyProject::Table::User->list($dbh)) {

# These methods use ->traffic_months and pre-cache the data down to _per_day $user->impressions_all; $user->clicks_all; $user->leads_all;

# Loop through all pages for my $page ($user->get_pages) { $page->impressions_all; $page->clicks_all; }}

Really complicated, isn't it?

The script is loading the project's database access module and a class which is being used for all database access to the users table. It has some additional methods for getting impressions, clicks and leads for this user for a given timeframe:

EndingDescription
_allAll-time counter
_month($month, $year)Count for one month
_day($day, $month, $year)Count for one day
Each user has zero or more pages and they have their own statistics, so the second part loops through all of them and calculates their statistics as well.

All methods return the counting result - but I don't need it as the methods themself already contain the whole caching code. If they're called, they'll put their intermediate and final results into the cache as needed and the second call will be much faster. The _all methods detect all months which had some kind of action (internally using the $user->traffic_months method) and walk through them - resulting in a completely filled cache for each relevant month. The months themself are based on the _day values and finally they're cached to.

I just ran it and it finally died while using 13 gigabytes of memory. I couldn't believe, because every user is calculated and goes out of scope freeing his memory (at least back to Perl's internal pool).

I started it again and could see the increasing memory usage, growing some megabytes per minute, sometimes even much more.

Calculating one day of statistics out of a database isn't very efficient, the modules are always fetching at least one full month. The result is not only cached in Memcached but also stored in $self for faster access. Getting the January 2012 means one database access for the 2012-01-01, data from $self until 2012-01-05 (today) and a quick return 0 at the beginning of each sub if the date is in the future (for 2012-01-06 up to 2012-01-31). But as each user object is being destroyed at the end of the loop, this memory usage shouldn't ever sum up. The memory required for the second user should be used from memory just freed by destruction of the first user.

Oh, I made a mistake - the user objects aren't destroyed at the end of the loop, they still live in the temporary array created by the for loop, so I switched to a while loop. The items out of ->list are pretty small, they don't have much more weight than a basic table row (and this row doesn't have that many columns). Each loop run pulls one item of the array shrinking the array. As the loop scope ends, the $user goes out of scope and is being destroyed. The user object has been removed from the array before - no reference left, object is destroyed:

#!/usr/bin/perl

# Calculate old data to pre-fill the cacheuse MyProject::DB;use MyProject::Table::User;

my $dbh = MyProject::DB->connect;

# Loop through all users, don't convert to for-loop as users won't go out-of-scope there..my @users = MyProject::Table::User->list($dbh);while (my $user = shift @users) {

# These methods use ->traffic_months and pre-cache the data down to _per_day $user->impressions_all; $user->clicks_all; $user->leads_all;

# Loop through all pages my @pages = $user->get_pages; while (my $page = shift @pages) { $page->impressions_all; $page->clicks_all; }

}

Great - but the script was still leaking memory. How could that be?

Okay, don't assume things, prove them: A small destructor added to both the user and page object should confirm that everything is destroyed:

sub END { print "DESTROY $self\n"; }
Simple, but ok for debugging.

The output was really strange: Some users are destroyed like they should - and some are not. I added another print showing the number of items in the @pages array and it turned out that all users without pages got destroyed, but users with pages weren't.

There is Devel::Cycle on CPAN. A small, simple, useful module which confirmed my fears: The user object downloads the list of pages from the database. As database operations usually are heavy and expensive, the list is also cached within the user object. Each page object created this way also gets the user object for the $page->user method. This safes resources because an already existing object is re-used - but the two objects are circeling now.

As long as the user object it's pages references within itself, they won't go out of scope and get destroyed and also the page object references the user object, so this would also stay forever.

There is Scalar::Util which has a weaken function. This function keeps the reference in the referring variable but doesn't count it as reference any longer.

#!/usr/bin/perl

# Calculate old data to pre-fill the cache

use Scalar::Util qw(weaken);

use MyProject::DB;use MyProject::Table::User;

my $dbh = MyProject::DB->connect;

# Loop through all users, don't convert to for-loop as users won't go out-of-scope there..my @users = MyProject::Table::User->list($dbh);while (my $user = shift @users) {

# These methods use ->traffic_months and pre-cache the data down to _per_day $user->impressions_all; $user->clicks_all; $user->leads_all;

# Loop through all pages my @pages = $user->get_pages; for my $i (0 .. $#{$user->{pages}}) {weaken $user->{pages}->[$i];} while (my $page = shift @pages) { $page->impressions_all; $page->clicks_all; }

}

The weaken call decreases the Perl-internal reference-counter and the references stored inside the user object aren't counted any longer. Each page has only one reference living in the @pages array. This one is moved into $page and destroyed as soon as $page goes out of scope.

Another weaken call is done in the user object after itself was passed to the page object - the page doesn't block the user from being destroyed any longer if it's the only one referencing the user.

I clearly don't like to manipulate other modules/objects internals from outside (except from test scripts) but I didn't find any better way to solve the problem.

A final series of "ps" calls now shows the memory usage of every step. There are other ways to get the value (like BSD::Resource), but none of them is as fast to write as "system ps u $$".

#!/usr/bin/perl$|=1;

# Calculate old data to pre-fill the cacheuse MyProject::DB;use MyProject::Table::User;

system "ps u $$";

my $dbh = MyProject::DB->connect;

system "ps hu $$";

# Loop through all users, don't convert to for-loop as users won't go out-of-scope there..my @users = MyProject::Table::User->list($dbh);while (my $user = shift @users) {

print $user->id."\n";

# These methods use ->traffic_months and pre-cache the data down to _per_dayprint __LINE__."\t"; system "ps hu $$"; $user->impressions_all;print __LINE__."\t"; system "ps hu $$"; $user->clicks_all;print __LINE__."\t"; system "ps hu $$"; $user->leads_all;

print __LINE__."\t"; system "ps hu $$"; # Loop through all pages my @pages = $user->get_pages; for my $i (0 .. $#{$user->{pages}}) {weaken $user->{pages}->[$i];} while (my $page = shift @pages) { $page->impressions_all; $page->clicks_all; }

print __LINE__."\t"; system "ps hu $$";

}

Notice the additional $|=1 to see the line number before the ps output.

I often move all debug lines to the left to easily identify them once everything is done and they're no longer needed.The script is still leaking few bytes of memory per minute, but not megabytes. The ps-calls will show the exact function where this happens, but that's stuff for another blog post...

 

Noch keine Kommentare. Schreib was dazu

Schreib was dazu

Die folgenden HTML-Tags sind erlaubt:<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>