Seitenanfang

Getting started with Gearman

Dieser Post wurde aus meiner alten WordPress-Installation importiert. Sollte es Darstellungsprobleme, falsche Links oder fehlende Bilder geben, bitte einfach hier einen Kommentar hinterlassen. Danke.


I wrote about Gearman some time ago and didn't get the tuits to write the next posts, but here we go: How to start working with Gearman.

I strongly suggest that you don't start by adding Gearman to your productional systems because there are many things to consider and you probably don't want to rewrite everything once it's running.

Things you should know about Gearman's job handling

Connections

The Gearman dispatcher is the common place for all others to connect: The clients issuing jobs and the worker server connect to the dispatcher. No connection will be done from the dispatcher to anywhere else.

Redundancy

All parts of Gearman support pooling: Multiple dispatcher servers may be used and multiple workers may run the jobs. You should really use this: Set up (at least) two dispatchers on two different servers and spread your Workers. It's very easy to create a Gearman cluster which is immune to any single hardware failure, much easier than for web- or database servers (who share one frontend-IP or common, synchronized data).

Timeouts

Gearman timeouts don't work the way most people expect them to do! I keep explaining this to surprised people - who didn't listen to the first three explanations until they run into smaller or big trouble because they expected Gearman timeouts to work the way they expected.

A running job is never ever stopped, canceled or killed! A job is restarted on another worker once the timeout has been reached and the result (if any) returned by the worker who ran into the timeout is silently discarded. A job running "sleep 900" with a timeout of 10 seconds may easily block 90 workers!

Think carefully before using timeouts: A job inserting database rows which is restarted three times might end up with three copies of the same data if the writing job runs into timeouts.

Documentation doesn't fit the source

The German::Client and Gearman::Worker documentation has many some items which don't match it's source. If you're unsure, look for a feature or something else: Look at the module source!

I tried to get the German::Worker POD to the current source level, but the maintainers didn't upload it to CPAN until now, you need to get it from GitHub.

Argument transport

Gearman is able to pass really big chunks of data as arguments from the client to the worker and return values from the worker to the client, but you have to serialize your data for transport. A running job may update two integer values while it's running. No data may be changed or passed once a job has been submitted and nothing can be returned before the job has finished.

First steps

Start one dispatcher server (it's enough for testing) and create a quick-and-dirty test script to start your first worker:
use Gearman::Worker;

# Create a new worker objectmy $worker = Gearman::Worker->new(job_servers => ['127.0.0.1:4730']);

# Register a job name which is provided by this worker$worker->register_function('eval' => sub {my $job = shift; print $job->arg."\n"; return eval $job->arg;});

# Start the main loop$worker->work;

That's all: A worker object is being created, the worker offers the job names he may run to the dispatcher and loops forever. The documentation shows a $worker->work while 1; but that's not true, the endless loop is part of the work method.

Run the worker in one shell window (or ssh connection) and start a second one to run the client:

use Gearman::Client;

# Create a new client objectmy $client = Gearman::Client->new(job_servers => $GLOBAL::Shared::GEARMAN_SERVERS);

# Create a taskset (a group of jobs running at the same time)my $taskset = $client->new_task_set;

# Create a place for the return value of the jobmy $return_val;# Add a job to the taskset$taskset->add_task( 'eval' => "return 1 + 1", # Job name and argument { # Callbacks on_complete => sub {$return_val = shift;} });

# Run the jobs (they're not started until now!) and wait for them to complete$taskset->wait;

print $return_val;

The taskset collects jobs and runs them once ->wait is called. The on_complete callback of every single finished job will be called and get the job's return value as first (and only) argument. The ->wait call won't return until all jobs in this taskset are finished.

Run the client and see the request arriving at the worker and the result shown by the client.

This is a very simple way of using Gearman. Use it as a start and play around with it, I'll discuss a productive implementation in the next article(s).

 

6 Kommentare. Schreib was dazu

  1. Great post :). I think one thing is missing though, it's the 'uniq' property you have to define for each task query. A first and safe approach is to set it to a different UUID each time.


    But what happens if you re-use such an ID across many clients (or if you don't define it)?


    The documentation states: "the task will be run just once, but all the listeners waiting on that job will get the response multiplexed back to them."


    Which will lead to a lot of hair loss in production (where you probably have many gearman clients) if you're not aware of this.


    On the other hand, this is a very powerful mechanism. Imagine you set your 'uniq' property to a signature of your arguments. That allows you to calculate a function only once when 100's of clients are querying the same result simultaneously. We can see the benefits on a high load site.


    Let's say your homepage display some information coming from a remote function that takes let's say 10ms to calculate. If you have only one CPU on the box that calculate your function, and if you have 1000 simultaneous clients, we can predict that the least lucky of them will have their result after at least 10 seconds (and half of them will have to wait more than 5 seconds). Not great.


    Now if you put this function behind a gearman, and if all your 1000 simultaneous queries use a shared 'uniq' (based on the arguments), then ALL of them will get the result after 10ms (plus the small gearman overhead). That's a massive improvement. So gearman can also be a good tool to reduce congestion risk on a high load site.


    I haven't tried that in production since were I work, I'm not getting near the load required to put this to reality test. Do you have any experience about this? Are there some people around who can tell if what I suggested is total crap or if it makes sense?

  2. Sebastian

    Thank you for your comment.

    This post should give an introduction to common problems and show the first steps of accessing Gearman, a production approach will be topic of the next one.

  3. [Imported from blog]

  4. [Imported from blog]

  5. [Imported from blog]

  6. [Imported from blog]

Schreib was dazu

Die folgenden HTML-Tags sind erlaubt:<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>