Seitenanfang

First steps with MongoDB MapReduce

Dieser Post wurde aus meiner alten WordPress-Installation importiert. Sollte es Darstellungsprobleme, falsche Links oder fehlende Bilder geben, bitte einfach hier einen Kommentar hinterlassen. Danke.


I recently decided to prefer MongoDB for new projects and it turned out that I still need to learn a lot of things about it. One of them is MapReduce which is more powerful than (most) SQL SELECT options.

No need to do everything at once, so I started using the MongoDB shell trying to get MapReduceup and running:

MongoDB shell version: 1.8.3connecting to: test> use MyTestDB;switched to MyTestDB

I know, 1.8.3 is pretty old but I don't want to upgrade all servers on every new release (even if it would bring in some advantages and new features). 1.8.3 is my minimum installed version in productive environments and this is why it's my development version.

My project needs to get a list of all values of one specific column from all documents in one collection, very easy to do in mySQL:

SELECT DISTINCT colname FROM table

a bit more with real SQL:

SELECT colname FROM table GROUP BY colname

MongoDB is using MapReduce which is way more powerful but not as easys as SQL. The command requires a map function and a reduce function (and an optional finalize function but no need to use it for this case), all of them written in JavaScript. My map function is pretty easy, it has to extract the single key (called keyname here like the colname above) from one document (referenced as this) which should be aggregated and pass it to the emit() function. A map function doesn't get any argument and has to return exactly one key and one value (both of them might be documents themself).

> mapFn = function() { emit(this.keyname, 1); };

That's all folks: Each document passed to the map function is reduced to one value. A (temporary) document is built from the map function results (emit calls), the first emit argument is the key and the second one is the value.

There are many examples for reduce functions out there but most of them are too complex for my requirements. Document keys are unique and the temporary document created by the map run is exactly what I want but a reduce function is required. Here is my final reduce solution:

> reduceFn = function(key,value) { return 1; };

Really complex, isn't it?

Finally, here is my command:

> db.collectionname.mapReduce(m, function(key,value) { return 1; }, { out: { inline: 1 }});      {	"results" : [		{			"_id" : "Allyouneed",			"value" : 1		},		{			"_id" : "AllIneed",			"value" : 1		},		{			"_id" : "Atomic",			"value" : 1		},		{			"_id" : "Big",			"value" : 1		},[...]],	"timeMillis" : 3,	"counts" : {		"input" : 35,		"emit" : 35,		"output" : 28	},	"ok" : 1,}

Looks good! And it is.

MongoDB has a nice distinctfunction which is much more easy than using MapReduce, at least for simple querys like this, but I didn't know before. MapReduce is working, but distinct is smaller, faster and much more easy:

> db.collectionname.distinct('keyname');[	"Allyouneed",	"AllIneed",	"Atomic",	"Big",]

Nice, isn't it?

Here is the same command encapsulated in a Perl function (based on YAWF::Object::MongoDB):

sub shoplist {  my $class = shift;

my $list = $class->_database->run_command([ "distinct" => "collectionname", "key" => "keyname", "query" => {}])->{values};

return wantarray ? $@{$list} : $list;}

Sure, it's uncommon to return a list reference instead of the array item count but I'll pass the list to Template::Toolkit in most usage cases (for this project) and there is really no need to unpack the list, return it and repack it into a fresh array basically copying all data.

 

Noch keine Kommentare. Schreib was dazu

Schreib was dazu

Die folgenden HTML-Tags sind erlaubt:<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>