发布于 2015-09-14 14:54:06 | 185 次阅读 | 评论: 0 | 来源: 网络整理
Data in MongoDB has a flexible schema. Collections do not enforce document structure. This means that:
Each document only needs to contain relevant fields to the entity or object that the document represents. In practice, most documents in a collection share a similar structure. Schema flexibility means that you can model your documents in MongoDB so that they can closely resemble and reflect application-level objects.
As in all data modeling, when developing data models (i.e. schema designs,) for MongoDB you must consider the inherent properties and requirements of the application objects and the relationships between application objects. MongoDB data models must also reflect:
These considerations and requirements force developers to make a number of multi-factored decisions when modeling data, including:
normalization and de-normalization.
These decisions reflect degree to which the data model should store related pieces of data in a single document or should the data model describe relationships using references between documents.
representation of data in arrays in BSON.
Although a number of data models may be functionally equivalent for a given application; however, different data models may have significant impacts on MongoDB and applications performance.
This document provides a high level overview of these data modeling decisions and factors. In addition, consider, the Data Modeling Patterns and Examples section which provides more concrete examples of all the discussed patterns.
Data modeling decisions involve determining how to structure the documents to model the data effectively. The primary decision is whether to embed or to use references.
To de-normalize data, store two related pieces of data in a single document.
Operations within a document are less expensive for the server than operations that involve multiple documents.
In general, use embedded data models when:
Embedding provides the following benefits:
Embedding related data in documents, can lead to situations where documents grow after creation. Document growth can impact write performance and lead to data fragmentation. Furthermore, documents in MongoDB must be smaller than the maximum BSON document size. For larger documents, consider using GridFS.
For examples in accessing embedded documents, see Subdocuments.
也可以参考
To normalize data, store references between two documents to indicate a relationship between the data represented in each document.
In general, use normalized data models:
Referencing provides more flexibility than embedding; however, to resolve the references, client-side applications must issue follow-up queries. In other words, using references requires more roundtrips to the server.
See 引用模型:一对多关系 for an example of referencing.
MongoDB only provides atomic operations on the level of a single document. [1] As a result needs for atomic operations influence decisions to use embedded or referenced relationships when modeling data for MongoDB.
Embed fields that need to be modified together atomically in the same document. See 模型数据: 原子操作 for an example of atomic updates within a single document.
[1] | Document-level atomic operations include all operations within a single MongoDB document record: operations that affect multiple sub-documents within that single record are still atomic. |
In addition to normalization and normalization concerns, a number of other operational factors help shape data modeling decisions in MongoDB. These factors include:
These factors implications for database and application performance as well as future maintenance and development costs.
Data modeling decisions should also take data lifecycle management into consideration.
The Time to Live or TTL feature of collections expires documents after a period of time. Consider using the TTL feature if your application requires some data to persist in the database for a limited period of time.
Additionally, if your application only uses recently inserted documents consider 封顶集合. Capped collections provide first-in-first-out (FIFO) management of inserted documents and optimized to support operations that insert and read documents based on insertion order.
In certain situations, you might choose to store information in several collections rather than in a single collection.
Consider a sample collection logs that stores log documents for various environment and applications. The logs collection contains documents of the following form:
{ log: "dev", ts: ..., info: ... }
{ log: "debug", ts: ..., info: ...}
If the total number of documents is low you may group documents into collection by type. For logs, consider maintaining distinct log collections, such as logs.dev and logs.debug. The logs.dev collection would contain only the documents related to the dev environment.
Generally, having large number of collections has no significant performance penalty and results in very good performance. Distinct collections are very important for high-throughput batch processing.
When using models that have a large number of collections, consider the following behaviors:
A single <database>.ns file stores all meta-data for each database. Each index and collection has its own entry in the namespace file, MongoDB places limits on the size of namespace files..
Because of limits on namespaces, you may wish to know the current number of namespaces in order to determine how many additional namespaces the database can support, as in the following example:
db.system.namespaces.count()
The <database>.ns file defaults to 16 MB. To change the size of the <database>.ns file, pass a new size to --nssize option <new size MB> on server start.
The --nssize sets the size for new <database>.ns files. For existing databases, after starting up the server with --nssize, run the db.repairDatabase() command from the mongo shell.
Create indexes to support common queries. Generally, indexes and index use in MongoDB correspond to indexes and index use in relational database: build indexes on fields that appear often in queries and for all operations that return sorted results. MongoDB automatically creates a unique index on the _id field.
As you create indexes, consider the following behaviors of indexes:
See 索引策略 for more information on determining indexes. Additionally, the MongoDB database profiler may help identify inefficient queries.
Sharding allows users to partition a collection within a database to distribute the collection’s documents across a number of mongod instances or shards.
The shard key determines how MongoDB distributes data among shards in a sharded collection. Selecting the proper shard key has significant implications for performance.
See 片式集群概述 for more information on sharding and the selection of the shard key.
Certain updates to documents can increase the document size, such as pushing elements to an array and adding new fields. If the document size exceeds the allocated space for that document, MongoDB relocates the document on disk. This internal relocation can be both time and resource consuming.
Although MongoDB automatically provides padding to minimize the occurrence of relocations, you may still need to manually handle document growth. Refer to Pre-Aggregated Reports for an example of the Pre-allocation approach to handle document growth.
The following documents provide overviews of various data modeling patterns and common schema design considerations:
For more information and examples of real-world data modeling, consider the following external resources: