发布于 2015-09-14 15:01:33 | 173 次阅读 | 评论: 0 | 来源: 网络整理
The mapReduce command allows you to run map-reduce aggregation operations over a collection. The mapReduce command has the following prototype form:
db.runCommand(
{
mapReduce: <collection>,
map: <function>,
reduce: <function>,
out: <output>,
query: <document>,
sort: <document>,
limit: <number>,
finalize: <function>,
scope: <document>,
jsMode: <boolean>,
verbose: <boolean>
}
)
Pass the name of the collection to the mapReduce command (i.e. <collection>) to use as the source documents to perform the map reduce operation. The command also accepts the following parameters:
参数: |
|
---|
The following is a prototype usage of the mapReduce command:
var mapFunction = function() { ... };
var reduceFunction = function(key, values) { ... };
db.runCommand(
{
mapReduce: 'orders',
map: mapFunction,
reduce: reduceFunction,
out: { merge: 'map_reduce_results', db: 'test' },
query: { ord_date: { $gt: new Date('01/01/2012') } }
}
)
The map function has the following prototype:
function() {
...
emit(key, value);
}
The map function exhibits the following behaviors:
The reduce function has the following prototype:
function(key, values) {
...
return result;
}
The reduce function exhibits the following behaviors:
Because it is possible to invoke the reduce function more than once for the same key, the following properties need to be true:
the type of the return object must be identical to the type of the value emitted by the map function to ensure that the following operations is true:
reduce(key, [ C, reduce(key, [ A, B ]) ] ) == reduce( key, [ C, A, B ] )
the reduce function must be idempotent. Ensure that the following statement is true:
reduce( key, [ reduce(key, valuesArray) ] ) == reduce( key, valuesArray )
the order of the elements in the valuesArray should not affect the output of the reduce function, so that the following statement is true:
reduce( key, [ A, B ] ) == reduce( key, [ B, A ] )
You can specify the following options for the out parameter:
out: <collectionName>
This option is only available when passing out a collection that already exists. This option is not available on secondary members of replica sets.
out: { <action>: <collectionName>
[, db: <dbName>]
[, sharded: <boolean> ]
[, nonAtomic: <boolean> ] }
When you output to a collection with an action, the out has the following parameters:
<action>: Specify one of the following actions:
replace
Replace the contents of the <collectionName> if the collection with the <collectionName> exists.
merge
Merge the new result with the existing result if the output collection already exists. If an existing document has the same key as the new result, overwrite that existing document.
reduce
Merge the new result with the existing result if the output collection already exists. If an existing document has the same key as the new result, apply the reduce function to both the new and the existing documents and overwrite the existing document with the result.
db:
Optional.The name of the database that you want the map-reduce operation to write its output. By default this will be the same database as the input collection.
Optional. If true and you have enabled sharding on output database, the map-reduce operation will shard the output collection using the _id field as the shard key.
nonAtomic:
2.2 新版功能.
Optional. Specify output operation as non-atomic and is valid only for merge and reduce output modes which may take minutes to execute.
If nonAtomic is true, the post-processing step will prevent MongoDB from locking the database; however, other clients will be able to read intermediate states of the output collection. Otherwise the map reduce operation must lock the database during post-processing.
Perform the map-reduce operation in memory and return the result. This option is the only available option for out on secondary members of replica sets.
out: { inline: 1 }
The result must fit within the maximum size of a BSON document.
The finalize function has the following prototype:
function(key, reducedValue) { ... return modifiedObject; }
The finalize function receives as its arguments a key value and the reducedValue from the reduce function. Be aware that:
In the mongo shell, the db.collection.mapReduce() method is a wrapper around the mapReduce command. The following examples use the db.collection.mapReduce() method:
Consider the following map-reduce operations on a collection orders that contains documents of the following prototype:
{
_id: ObjectId("50a8240b927d5d8b5891743c"),
cust_id: "abc123",
ord_date: new Date("Oct 04, 2012"),
status: 'A',
price: 250,
items: [ { sku: "mmm", qty: 5, price: 2.5 },
{ sku: "nnn", qty: 5, price: 2.5 } ]
}
Perform map-reduce operation on the orders collection to group by the cust_id, and for each cust_id, calculate the sum of the price for each cust_id:
Define the map function to process each input document:
var mapFunction1 = function() {
emit(this.cust_id, this.price);
};
Define the corresponding reduce function with two arguments keyCustId and valuesPrices:
var reduceFunction1 = function(keyCustId, valuesPrices) {
return Array.sum(valuesPrices);
};
Perform the map-reduce on all documents in the orders collection using the mapFunction1 map function and the reduceFunction1 reduce function.
db.orders.mapReduce(
mapFunction1,
reduceFunction1,
{ out: "map_reduce_example" }
)
This operation outputs the results to a collection named map_reduce_example. If the map_reduce_example collection already exists, the operation will replace the contents with the results of this map-reduce operation:
In this example you will perform a map-reduce operation on the orders collection, for all documents that have an ord_date value greater than 01/01/2012. The operation groups by the item.sku field, and for each sku calculates the number of orders and the total quantity ordered. The operation concludes by calculating the average quantity per order for each sku value:
Define the map function to process each input document:
var mapFunction2 = function() {
for (var idx = 0; idx < this.items.length; idx++) {
var key = this.items[idx].sku;
var value = {
count: 1,
qty: this.items[idx].qty
};
emit(key, value);
}
};
Define the corresponding reduce function with two arguments keySKU and valuesCountObjects:
var reduceFunction2 = function(keySKU, valuesCountObjects) {
reducedValue = { count: 0, qty: 0 };
for (var idx = 0; idx < valuesCountObjects.length; idx++) {
reducedValue.count += valuesCountObjects[idx].count;
reducedValue.qty += valuesCountObjects[idx].qty;
}
return reducedValue;
};
Define a finalize function with two arguments key and reducedValue. The function modifies the reducedValue object to add a computed field named average and returns the modified object:
var finalizeFunction2 = function (key, reducedValue) {
reducedValue.average = reducedValue.qty/reducedValue.count;
return reducedValue;
};
Perform the map-reduce operation on the orders collection using the mapFunction2, reduceFunction2, and finalizeFunction2 functions.
db.orders.mapReduce( mapFunction2,
reduceFunction2,
{
out: { merge: "map_reduce_example" },
query: { ord_date: { $gt: new Date('01/01/2012') } },
finalize: finalizeFunction2
}
)
This operation uses the query field to select only those documents with ord_date greater than new Date(01/01/2012). Then it output the results to a collection map_reduce_example. If the map_reduce_example collection already exists, the operation will merge the existing contents with the results of this map-reduce operation:
For more information and examples, see the Map-Reduce page.
也可以参考