发布于 2015-09-14 14:53:31 | 133 次阅读 | 评论: 0 | 来源: 网络整理
This document answers common questions about application development using MongoDB.
If you don’t find the answer you’re looking for, check the complete list of FAQs or post your question to the MongoDB User Mailing List.
Frequently Asked Questions:
A “namespace” is the concatenation of the database name and the collection names with a period character in between.
Collections are containers for documents that share one or more indexes. Databases are groups of collections stored on disk using a single set of data files.
For an example acme.users namespace, acme is the database name and users is the collection name. Period characters can occur in collection names, so that the acme.user.history is a valid namespace, with the acme database name, and the user.history collection name.
While data models like this appear to support nested collections, the collection namespace is flat, and there is no difference from the perspective of MongoDB between acme, acme.users, and acme.records.
In the mongo shell, you can use the following operation to duplicate the entire collection:
db.people.find().forEach( function(x){db.user.insert(x)} );
注解
Because this process decodes BSON documents to JSON during the copy procedure, documents you may incur a loss of type-fidelity.
Consider using mongodump and mongorestore to maintain type fidelity.
Also consider the cloneCollection command that may provide some of this functionality.
Yes.
When you use db.collection.remove(), the object will no longer exist in MongoDB’s on-disk data storage.
MongoDB flushes writes to disk on a regular interval. In the default configuration, MongoDB writes data to the main data files on disk every 60 seconds and commits the journal every 100 milliseconds. These values are configurable with the journalCommitInterval and syncdelay.
These values represent the maximum amount of time between the completion of a write operation and the point when the write is durable in the journal, if enabled, and when MongoDB flushes data to the disk. In many cases MongoDB and the operating system flush data to disk more frequently, so that the above values resents a theoretical maximum.
However, by default, MongoDB uses a “lazy” strategy to write to disk. This is advantageous in situations where the database receives a thousand increments to an object within one second, MongoDB only needs to flush this data to disk once. In addition to the aforementioned configuration options, you can also use fsync and getLastError to modify this strategy.
MongoDB does not have support for traditional locking or complex transactions with rollback. MongoDB aims to be lightweight, fast, and predictable in its performance. This is similar to the MySQL MyISAM autocommit model. By keeping transaction support extremely simple, MongoDB can provide greater performance especially for partitioned or replicated systems with a number of database server processes.
MongoDB does have support for atomic operations within a single document. Given the possibilities provided by nested documents, this feature provides support for a large number of use-cases.
也可以参考
The 隔离操作顺序 page.
In version 2.1 and later, you can use the new “aggregation framework,” with the aggregate command.
MongoDB also supports map-reduce with the mapReduce, as well as basic aggregation with the group, count, and distinct. commands.
也可以参考
The 聚合 page.
If you see a very large number connection and re-connection messages in your MongoDB log, then clients are frequently connecting and disconnecting to the MongoDB server. This is normal behavior for applications that do not use request pooling, such as CGI. Consider using FastCGI, an Apache Module, or some other kind of persistent application server to decrease the connection overhead.
If these connections do not impact your performance you can use the run-time quiet option or the command-line option --quiet to suppress these messages from the log.
Yes.
MongoDB users of all sizes have had a great deal of success using MongoDB on the EC2 platform using EBS disks.
也可以参考
MongoDB aggressively preallocates data files to reserve space and avoid file system fragmentation. You can use the smallfiles flag to modify the file preallocation strategy.
Each MongoDB document contains a certain amount of overhead. This overhead is normally insignificant but becomes significant if all documents are just a few bytes, as might be the case if the documents in your collection only have one or two fields.
Consider the following suggestions and strategies for optimizing storage utilization for these collections:
Use the _id field explicitly.
MongoDB clients automatically add an _id field to each document and generate a unique 12-byte ObjectId for the _id field. Furthermore, MongoDB always indexes the _id field. For smaller documents this may account for a significant amount of space.
To optimize storage use, users can specify a value for the _id field explicitly when inserting documents into the collection. This strategy allows applications to store a value in the _id field that would have occupied space in another portion of the document.
You can store any value in the _id field, but because this value serves as a primary key for documents in the collection, it must uniquely identify them. If the field’s value is not unique, then it cannot serve as a primary key as there would be collisions in collection.
Use shorter field names.
MongoDB stores all field names in every document. For most documents, this represents a small fraction of the space used by a document; however, for small documents the field names may represent a proportionally large amount of space. Consider a collection of documents that resemble the following:
{ last_name : "Smith", best_score: 3.9 }
If you shorten the filed named last_name to lname and the field name best_score to score, as follows, you could save 9 bytes per document.
{ lname : "Smith", score : 3.9 }
Shortening field names reduces expressiveness and does not provide considerable benefit on for larger documents and where document overhead is not significant concern. Shorter field names do not reduce the size of indexes, because indexes have a predefined structure.
In general it is not necessary to use short field names.
Embed documents.
In some cases you may want to embed documents in other documents and save on the per-document overhead.
For documents in a MongoDB collection, you should always use GridFS for storing files larger than 16 MB.
In some situations, storing large files may be more efficient in a MongoDB database than on a system-level filesystem.
Do not use GridFS if you need to update the content of the entire file atomically. As an alternative you can store multiple versions of each file and specify the current version of the file in the metadata. You can update the metadata field that indicates “latest” status in an atomic update after uploading the new version of the file, and later remove previous versions if needed.
Furthermore, if your files are all smaller the 16 MB BSON Document Size limit, consider storing the file manually within a single document. You may use the BinData data type to store the binary data. See your drivers documentation for details on using BinData.
For more information on GridFS, see GridFS.
As a client program assembles a query in MongoDB, it builds a BSON object, not a string. Thus traditional SQL injection attacks are not a problem. More details and some nuances are covered below.
MongoDB represents queries as BSON objects. Typically client libraries provide a convenient, injection free, process to build these objects. Consider the following C++ example:
BSONObj my_query = BSON( "name" << a_name );
auto_ptr<DBClientCursor> cursor = c.query("tutorial.persons", my_query);
Here, my_query then will have a value such as { name : "Joe" }. If my_query contained special characters, for example ,, :, and {, the query simply wouldn’t match any documents. For example, users cannot hijack a query and convert it to a delete.
注解
You can disable all server-side execution of JavaScript, by passing the --noscripting option on the command line or setting noscripting in a configuration file.
All of the following MongoDB operations permit you to run arbitrary JavaScript expressions directly on the server:- $where:
You must exercise care in these cases to prevent users from submitting malicious JavaScript.
Fortunately, you can express most queries in MongoDB without JavaScript and for queries that require JavaScript, you can mix JavaScript and non-JavaScript in a single query. Place all the user-supplied fields directly in a BSON field and pass JavaScript code to the $where field.
If you need to pass user-supplied values in a $where clause, you may escape these values with the CodeWScope mechanism. When you set user-submitted values as variables in the scope document, you can avoid evaluating them on the database server.
If you need to use db.eval() with user supplied values, you can either use a CodeWScope or you can supply extra arguments to your function. For instance:
db.eval(function(userVal){...},
user_value);
This will ensure that your application sends user_value to the database server as data rather than code.
Field names in MongoDB’s query language have a semantic. The dollar sign (i.e $) is a reserved character used to represent operators (i.e. $inc.) Thus, you should ensure that your application’s users cannot inject operators into their inputs.
In some cases, you may wish to build a BSON object with a user-provided key. In these situations, keys will need to substitute the reserved $ and . characters. Any character is sufficient, but consider using the Unicode full width equivalents: U+FF04 (i.e. “$”) and U+FFOE (i.e. “.”).
Consider the following example:
BSONObj my_object = BSON( a_key << a_name );
The user may have supplied a $ value in the a_key value. At the same time, my_object might be { $where : "things" }. Consider the following cases:
Insert. Inserting this into the database does no harm. The insert process does not evaluate the object as a query.
注解
MongoDB client drivers, if properly implemented, check for reserved characters in keys on inserts.
Update. The db.collection.update() operation permits $ operators in the update argument but does not support the $where operator. Still, some users may be able to inject operators that can manipulate a single document only. Therefore your application should escape keys, as mentioned above, if reserved characters are possible.
Query Generally this is not a problem for queries that resemble { x : user_obj }: dollar signs are not top level and have no effect. Theoretically it may be possible for the user to build a query themselves. But checking the user-submitted content for $ characters in key names may help protect against this kind of injection.
See the “PHP MongoDB Driver Security Notes” page in the PHP driver documentation for more information
MongoDB implements a readers-writer lock. This means that at any one time, only one client may be writing or any number of clients may be reading, but that reading and writing cannot occur simultaneously.
In standalone and replica sets the lock’s scope applies to a single mongod instance or primary instance. In a sharded cluster, locks apply to each individual shard, not to the whole cluster.
For more information, see FAQ: 并发.
MongoDB permits documents within a single collection to have fields with different BSON types. For instance, the following documents may exist within a single collection.
{ x: "string" }
{ x: 42 }
When comparing values of different BSON types, MongoDB uses the following compare order:
注解
MongoDB treats some types as equivalent for comparison purposes. For instance, numeric types undergo conversion before comparison.
Consider the following mongo example:
db.test.insert( {x : 3 } );
db.test.insert( {x : 2.9 } );
db.test.insert( {x : new Date() } );
db.test.insert( {x : true } );
db.test.find().sort({x:1});
{ "_id" : ObjectId("4b03155dce8de6586fb002c7"), "x" : 2.9 }
{ "_id" : ObjectId("4b03154cce8de6586fb002c6"), "x" : 3 }
{ "_id" : ObjectId("4b031566ce8de6586fb002c9"), "x" : true }
{ "_id" : ObjectId("4b031563ce8de6586fb002c8"), "x" : "Tue Nov 17 2009 16:28:03 GMT-0500 (EST)" }
The $type operator provides access to BSON type comparison in the MongoDB query syntax. See the documentation on BSON types and the $type operator for additional information.
警告
Storing values of the different types in the same field in a collection is strongly discouraged.
也可以参考
Fields in a document may store null values, as in a notional collection, test, with the following documents:
{ _id: 1, cancelDate: null }
{ _id: 2 }
Different query operators treat null values differently:
The { cancelDate : null } query matches documents that either contains the cancelDate field whose value is null or that do not contain the cancelDate field:
db.test.find( { cancelDate: null } )
The query returns both documents:
{ "_id" : 1, "cancelDate" : null }
{ "_id" : 2 }
The { cancelDate : { $type: 10 } } query matches documents that contains the cancelDate field whose value is null only; i.e. the value of the cancelDate field is of BSON Type Null (i.e. 10) :
db.test.find( { cancelDate : { $type: 10 } } )
The query returns only the document that contains the null value:
{ "_id" : 1, "cancelDate" : null }
The { cancelDate : { $exists: false } } query matches documents that do not contain the cancelDate field:
db.test.find( { cancelDate : { $exists: false } } )
The query returns only the document that does not contain the cancelDate field:
{ "_id" : 2 }
Collection names can be any UTF-8 string with the following exceptions: