Setting up Gearman

Mai 15

von Sebastian am 15.05.2012 um 20:21 in English, Gearman, Howto & Tutorial, Perl

Dieser Post wurde aus meiner alten WordPress-Installation importiert. Sollte es Darstellungsprobleme, falsche Links oder fehlende Bilder geben, bitte einfach hier einen Kommentar hinterlassen. Danke.

I recently announced an article about Gearman, today I'ld like to start with setting up a Gearman dispatcher server.

Being a great tool, Gearman still lacks a complete documentation and the server software is no exception. It's very easy to set up but there are still some things to consider before starting.

Gearman is no reliable tool. It's good, stable and fast but there is no guarantee that a job isn't lost if something goes wrong because everything is held in memory. It's cable to keep some kind of transaction log into a database (actually it's no log as you might think, just a copy of the current in-memory state, it's never read during runtime, just written). It's up to the developer: Do you want to add this safety possibly paying with speed? If you do, I strongly recommend SQLite instead of a classic SQL database server (which are also supported) because Gearman doesn't use features where database servers are great: Reading and filtering data and parallel access (well, myISAM isn't great for parallel access at all but that's another story). Memcache might be an option but I won't call it more reliable in case of trouble.

All dispatcher server arguments are shown on the Gearman homepage and I won't copy or repeat them here because most defaults are pretty good for everyone and also for trying it out, maybe you want to set the listen address and port to custom values.

Gearman has a simple management interface which could be used with any telnet client (like netcat (nc) and - obviously - telnet or use a Perl IPC socket to connect). After connecting to the ip address of the dispatcher and Gearman port, type "status" to get a list of current job names and tab-seperated status counts:

Simple::echo	0	0	2HTTP::post	2	1	1HTTP::get	0	0	1

Three job names are supported by this dispatcher. I'm using double-colons to group them into classes but this is just a personal decision, the job names may contain any letter or number and some special chars (like : ).

The first job "Simple::echo" has no jobs queued, no currently running and two workers registered with this dispatcher, one of them will get the next incoming Simple::echo job. My HTTP client worker registered two job names, one for each HTTP request method. Two jobs are currently laying in the dispatcher queue for HTTP::post and one of them is being processed. The first number (jobs in queue) minus the second number (jobs currently running) is the number ob jobs waiting for a free worker.

Type "workers" to get a list of the currently connected workers:

29 192.168.128.161 curbsjiecarbvhiduzmrsxtfphwghq : HTTP::post Simple::echo HTTP::get

The first column is the internal file descriptor number of the dispatcher server and I could hardly think of any use for this information.

The second column is the ip address of this worker. Many workers may run on one server, so the ip isn't unique but the third value is: It's the unique worker id for this worker process which is just a random value generated when the worker connects the server. A single colon is used as separation char between the fixed columns and the space-separated list of jobs provided by this worker.

There are few more commands with IMHO have much less use than the ones explained above.

This information is easily accessible and I quickly wrote a simple script to fetch this status information just after deploying Gearman in a production environment. It shows the "status" report from all of our dispatchers in a merged list.

It's simple, but (currently) one of my most-often-used tools.

Usually each worker connects to all dispatchers and whoever wants to pass a job to it may do so - if it isn't working for another dispatcher at the moment. The sample above shows 64 workers per dispatcher for issuing HTTP GET requests and 16 workers for HTTP POST requests which are connected to all dispatchers. The difference doesn't really show up from the numbers above.

It's highly uncommon that the GET workers don't connect to all dispatchers and result of the server layout, you should know such things when looking at your own status report.

This is how to setup and manage Gearman. Pretty easy, isn't it? The next post will show the actual developer side.

Gearman