MongoDB, MapReduce and sorting -
i might bit in on head on i'm still learning ins , outs of mongodb, here goes.
right i'm working on tool search/filter through dataset, sort arbitrary datapoint (eg. popularity) , group id. way see can through mongo's mapreduce functionality.
i can't use .group() because i'm working more 10,000 keys , need able sort dataset.
my mapreduce code working fine, except 1 thing: sorting. sorting doesn't want work @ all.
db.runcommand({ 'mapreduce': 'products', 'map': function() { emit({ product_id: this.product_id, popularity: this.popularity }, 1); }, 'reduce': function(key, values) { var sum = 0; values.foreach(function(v) { sum += v; }); return sum; }, 'query': {category_id: 20}, 'out': {inline: 1}, 'sort': {popularity: -1} });
i have descending index on popularity datapoint, it's not working because of lack of that:
{ "v" : 1, "key" : { "popularity" : -1 }, "ns" : "app.products", "name" : "popularity_-1" }
i cannot figure out why doesn't want sort.
instead of inlining result set, can't output collection , run .find().sort({popularity: -1}) on because of way feature going work.
first of all, mongo map/reduce not designed used in query tool (as in couchdb), design run background tasks. use @ work analyze traffic data.
what doing wrong you're applying sort() input, useless because when map()
stage done intermediate documents sorted each keys
. because key document, being sort product_id
, popularity
.
this how generated dataset
function generate_dummy_data() { (i=2; < 1000000; i++) { db.foobar.save({ _id: i, category_id: parseint(math.random() * 30), popularity: parseint(math.random() * 50) }) } }
and map/reduce task:
var data = db.runcommand({ 'mapreduce': 'foobar', 'map': function() { emit({ sorting: this.popularity * -1, product_id: this._id, popularity: this.popularity, }, 1); }, 'reduce': function(key, values) { var sum = 0; values.foreach(function(v) { sum += v; }); return sum; }, 'query': {category_id: 20}, 'out': {inline: 1}, });
and end result (very long paste here):
http://cesarodas.com/results.txt
this works because we're sorting sorting, product_id, popularity
. can play sorting how ever remember final sorting key
regardless of how input sorted.
anyway said before should avoid doing queries map/reduce designed background processing. if design data in such way access simple queries, there trade-off in case complex insert/updates have simple queries (that's how see mongodb).
Comments
Post a Comment