发布于 2015-09-14 14:52:22 | 215 次阅读 | 评论: 0 | 来源: 网络整理
From the perspective of a client application, whether a MongoDB instance is running as a single server (i.e. “standalone”) or a replica set is transparent. However, replica sets offer some configuration options for write and read operations. [1] This document describes those options and their implications.
[1] | Sharded clusters where the shards are also replica sets provide the same configuration options with regards to write and read operations. |
MongoDB’s built-in write concern confirms the success of write operations to a replica set’s primary. Write concern uses the getLastError command after write operations to return an object with error information or confirmation that there are no errors.
After the driver write concern change all officially supported MongoDB drivers enable write concern by default.
The default write concern confirms write operations only on the primary. You can configure write concern to confirm write operations to additional replica set members as well by issuing the getLastError command with the w option.
The w option confirms that write operations have replicated to the specified number of replica set members, including the primary. You can either specify a number or specify majority, which ensures the write propagates to a majority of set members. The following example ensures the operation has replicated to two members (the primary and one other member):
db.runCommand( { getLastError: 1, w: 2 } )
The following example ensures the write operation has replicated to a majority of the configured members of the set.
db.runCommand( { getLastError: 1, w: "majority" } )
If you specify a w value greater than the number of members that hold a copy of the data (i.e., greater than the number of non-arbiter members), the operation blocks until those members become available. This can cause the operation to block forever. To specify a timeout threshold for the getLastError operation, use the wtimeout argument. The following example sets the timeout to 5000 milliseconds:
db.runCommand( { getLastError: 1, w: 2, wtimeout:5000 } )
You can configure your own “default” getLastError behavior for a replica set. Use the getLastErrorDefaults setting in the replica set configuration. The following sequence of commands creates a configuration that waits for the write operation to complete on a majority of the set members before returning:
cfg = rs.conf()
cfg.settings = {}
cfg.settings.getLastErrorDefaults = {w: "majority"}
rs.reconfig(cfg)
The getLastErrorDefaults setting affects only those getLastError commands that have no other arguments.
注解
Use of insufficient write concern can lead to rollbacks in the case of replica set failover. Always ensure that your operations have specified the required write concern for your application.
也可以参考
You can use replica set tags to create custom write concerns using the getLastErrorDefaults and getLastErrorModes replica set settings.
注解
Custom write concern modes specify the field name and a number of distinct values for that field. By contrast, read preferences use the value of fields in the tag document to direct read operations.
In some cases, you may be able to use the same tags for read preferences and write concerns; however, you may need to create additional tags for write concerns depending on the requirements of your application.
Consider a five member replica set, where each member has one of the following tag sets:
{ "use": "reporting" }
{ "use": "backup" }
{ "use": "application" }
{ "use": "application" }
{ "use": "application" }
You could create a custom write concern mode that will ensure that applicable write operations will not return until members with two different values of the use tag have acknowledged the write operation. Create the mode with the following sequence of operations in the mongo shell:
cfg = rs.conf()
cfg.settings = { getLastErrorModes: { use2: { "use": 2 } } }
rs.reconfig(cfg)
To use this mode pass the string multiUse to the w option of getLastError as follows:
db.runCommand( { getLastError: 1, w: use2 } )
If you have a three member replica with the following tag sets:
{ "disk": "ssd" }
{ "disk": "san" }
{ "disk": "spinning" }
You cannot specify a custom getLastErrorModes value to ensure that the write propagates to the san before returning. However, you may implement this write concern policy by creating the following additional tags, so that the set resembles the following:
{ "disk": "ssd" }
{ "disk": "san", "disk.san": "san" }
{ "disk": "spinning" }
Then, create a custom getLastErrorModes value, as follows:
cfg = rs.conf()
cfg.settings = { getLastErrorModes: { san: { "disk.san": 1 } } }
rs.reconfig(cfg)
To use this mode pass the string san to the w option of getLastError as follows:
db.runCommand( { getLastError: 1, w: san } )
This operation will not return until a replica set member with the tag disk.san returns.
You may set a custom write concern mode as the default write concern mode using getLastErrorDefaults replica set as in the following setting:
cfg = rs.conf()
cfg.settings.getLastErrorDefaults = { ssd:1 }
rs.reconfig(cfg)
也可以参考
Tag Sets for further information about replica set reconfiguration and tag sets.
Read preference describes how MongoDB clients route read operations to members of a replica set.
By default, an application directs its read operations to the primary member in a replica set. Reading from the primary guarantees that read operations reflect the latest version of a document. However, for an application that does not require fully up-to-date data, you can improve read throughput, or reduce latency, by distributing some or all reads to secondary members of the replica set.
The following are use cases where you might use secondary reads:
MongoDB drivers allow client applications to configure a read preference on a per-connection, per-collection, or per-operation basis. For more information about secondary read operations in the mongo shell, see the readPref() method. For more information about a driver’s read preference configuration, see the appropriate 驱动程序和客户端 API documentation.
注解
Read preferences affect how an application selects which member to use for read operations. As a result read preferences dictate if the application receives stale or current data from MongoDB. Use appropriate write concern policies to ensure proper data replication and consistency.
If read operations account for a large percentage of your application’s traffic, distributing reads to secondary members can improve read throughput. However, in most cases sharding provides better support for larger scale operations, as clusters can distribute read and write operations across a group of machines.
2.2 新版功能.
MongoDB drivers support five read preference modes:
You can specify a read preference mode on connection objects, database object, collection object, or per-operation. The syntax for specifying the read preference mode is specific to the driver and to the idioms of the host language.
Read preference modes are also available to clients connecting to a sharded cluster through a mongos. The mongos instance obeys specified read preferences when connecting to the replica set that provides each shard in the cluster.
In the mongo shell, the readPref() cursor method provides access to read preferences.
警告
All read preference modes except primary may return stale data as secondaries replicate operations from the primary with some delay.
Ensure that your application can tolerate stale data if you choose to use a non-primary mode.
For more information, see read preference background and read preference behavior. See also the documentation for your driver.
All read operations use only the current replica set primary. This is the default. If the primary is unavailable, read operations produce an error or throw an exception.
The primary read preference mode is not compatible with read preference modes that use tag sets. If you specify a tag set with primary, the driver will produce an error.
In most situations, operations read from the primary member of the set. However, if the primary is unavailable, as is the case during failover situations, operations read from secondary members.
When the read preference includes a tag set, the client reads first from the primary, if available, and then from secondaries that match the specified tags. If no secondaries have matching tags, the read operation produces an error.
Since the application may receive data from a secondary, read operations using the primaryPreferred mode may return stale data in some situations.
警告
在 2.2 版更改: mongos added full support for read preferences.
When connecting to a mongos instance older than 2.2, using a client that supports read preference modes, primaryPreferred will send queries to secondaries.
Operations read only from the secondary members of the set. If no secondaries are available, then this read operation produces an error or exception.
Most sets have at least one secondary, but there are situations where there may be no available secondary. For example, a set with a primary, a secondary, and an arbiter may not have any secondaries if a member is in recovering state or unavailable.
When the read preference includes a tag set, the client attempts to find secondary members that match the specified tag set and directs reads to a random secondary from among the nearest group. If no secondaries have matching tags, the read operation produces an error. [2]
Read operations using the secondary mode may return stale data.
In most situations, operations read from secondary members, but in situations where the set consists of a single primary (and no other members,) the read operation will use the set’s primary.
When the read preference includes a tag set, the client attempts to find a secondary member that matches the specified tag set and directs reads to a random secondary from among the nearest group. If no secondaries have matching tags, the read operation produces an error.
Read operations using the secondaryPreferred mode may return stale data.
The driver reads from the nearest member of the set according to the member selection process. Reads in the nearest mode do not consider the member’s type. Reads in nearest mode may read from both primaries and secondaries.
Set this mode to minimize the effect of network latency on read operations without preference for current or stale data.
If you specify a tag set, the client attempts to find a replica set member that matches the specified tag set and directs reads to an arbitrary member from among the nearest group.
Read operations using the nearest mode may return stale data.
注解
All operations read from a member of the nearest group of the replica set that matches the specified read preference mode. The nearest mode prefers low latency reads over a member’s primary or secondary status.
For nearest, the client assembles a list of acceptable hosts based on tag set and then narrows that list to the host with the shortest ping time and all other members of the set that are within the “local threshold,” or acceptable latency. See Member Selection for more information.
[2] | If your set has more than one secondary, and you use the secondary read preference mode, consider the following effect. If you have a three member replica set with a primary and two secondaries, and if one secondary becomes unavailable, all secondary queries must target the remaining secondary. This will double the load on this secondary. Plan and provide capacity to support this as needed. |
Tag sets allow you to specify custom read preferences and write concerns so that your application can target operations to specific members, based on custom parameters.
注解
Consider the following properties of read preferences:
A tag set for a read operation may resemble the following document:
{ "disk": "ssd", "use": "reporting" }
To fulfill the request, a member would need to have both of these tags. Therefore the following tag sets, would satisfy this requirement:
{ "disk": "ssd", "use": "reporting" }
{ "disk": "ssd", "use": "reporting", "rack": 1 }
{ "disk": "ssd", "use": "reporting", "rack": 4 }
{ "disk": "ssd", "use": "reporting", "mem": "64"}
However, the following tag sets would not be able to fulfill this query:
{ "disk": "ssd" }
{ "use": "reporting" }
{ "disk": "ssd", "use": "production" }
{ "disk": "ssd", "use": "production", "rack": 3 }
{ "disk": "spinning", "use": "reporting", "mem": "32" }
Therefore, tag sets make it possible to ensure that read operations target specific members in a particular data center or mongod instances designated for a particular class of operations, such as reporting or analytics. For information on configuring tag sets, see Tag Sets in the 副本集配置 document. You can specify tag sets with the following read preference modes:
You cannot specify tag sets with the primary read preference mode.
Tags are not compatible with primary and only apply when selecting a secondary member of a set for a read operation. However, the nearest read mode, when combined with a tag set will select the nearest member that matches the specified tag set, which may be a primary or secondary.
All interfaces use the same member selection logic to choose the member to which to direct read operations, basing the choice on read preference mode and tag sets.
For more information on how read preference modes interact with tag sets, see the documentation for each read preference mode.
在 2.2 版更改.
Connection between MongoDB drivers and mongod instances in a replica set must balance two concerns:
As a result, MongoDB drivers and mongos:
Reuse a connection to specific mongod for as long as possible after establishing a connection to that instance. This connection is pinned to this mongod.
Attempt to reconnect to a new member, obeying existing read preference modes, if the connection to mongod is lost.
Reconnections are transparent to the application itself. If the connection permits reads from secondary members, after reconnecting, the application can receive two sequential reads returning from different secondaries. Depending on the state of the individual secondary member’s replication, the documents can reflect the state of your database at different moments.
Return an error only after attempting to connect to three members of the set that match the read preference mode and tag set. If there are fewer than three members of the set, the client will error after connecting to all existing members of the set.
After this error, the driver selects a new member using the specified read preference mode. In the absence of a specified read preference, the driver uses primary.
After detecting a failover situation, [3] the driver attempts to refresh the state of the replica set as quickly as possible.
[3] | When a failover occurs, all members of the set close all client connections that produce a socket error in the driver. This behavior prevents or minimizes rollback. |
Reads from secondary may reflect the state of the data set at different points in time because secondary members of a replica set may lag behind the current state of the primary by different amounts. To prevent subsequent reads from jumping around in time, the driver can associate application threads to a specific member of the set after the first read. The thread will continue to read from the same member until:
If an application thread issues a query with the primaryPreferred mode while the primary is inaccessible, the thread will carry the association with that secondary for the lifetime of the thread. The thread will associate with the primary, if available, only after issuing a query with a different read preference, even if a primary becomes available. By extension, if a thread issues a read with the secondaryPreferred when all secondaries are down, it will carry an association with the primary. This application thread will continue to read from the primary even if a secondary becomes available later in the thread’s lifetime.
Clients, by way of their drivers, and mongos instances for sharded clusters periodically update their view of the replica set’s state: which members are up or down, which member is primary, and the latency to each mongod instance.
For any operation that targets a member other than the primary, the driver:
Once the application selects a member of the set to use for read operations, the driver continues to use this connection for read preference until the application specifies a new read preference or something interrupts the connection. See Request Association for more information.
[4] | Applications can configure the threshold used in this stage. The default “acceptable latency” is 15 milliseconds, which you can override in the drivers with their own secondaryAcceptableLatencyMS option. For mongos you can use the --localThreshold or localThreshold runtime options to set this value. |
在 2.2 版更改: Before version 2.2, mongos did not support the read preference mode semantics.
In most sharded clusters, a replica set provides each shard where read preferences are also applicable. Read operations in a sharded cluster, with regard to read preference, are identical to unsharded replica sets.
Unlike simple replica sets, in sharded clusters, all interactions with the shards pass from the clients to the mongos instances that are actually connected to the set members. mongos is responsible for the application of the read preferences, which is transparent to applications.
There are no configuration changes required for full support of read preference modes in sharded environments, as long as the mongos is at least version 2.2. All mongos maintain their own connection pool to the replica set members. As a result:
A request without a specified preference has primary, the default, unless, the mongos reuses an existing connection that has a different mode set.
Always explicitly set your read preference mode to prevent confusion.
All nearest and latency calculations reflect the connection between the mongos and the mongod instances, not the client and the mongod instances.
This produces the desired result, because all results must pass through the mongos before returning to the client.
Because some database commands read and return data from the database, all of the official drivers support full read preference mode semantics for the following commands:
[5] | Only “inline” mapReduce operations that do not write data support read preference, otherwise these operations must run on the primary members. |
mongos currently does not route commands using read preferences; clients send all commands to shards’ primaries. See SERVER-7423.
You must exercise care when specifying read preferences: modes other than primary can and will return stale data. These secondary queries will not include the most recent write operations to the replica set’s primary. Nevertheless, there are several common use cases for using non-primary read preference modes:
Reporting and analytics workloads.
Having these queries target a secondary helps distribute load and prevent these operations from affecting the main workload of the primary.
Also consider using secondary in conjunction with a direct connection to a hidden member of the set.
Providing local reads for geographically distributed applications.
If you have application servers in multiple data centers, you may consider having a geographically distributed replica set and using a non primary read preference or the nearest to avoid network latency.
Maintaining availability during a failover.
Use primaryPreferred if you want your application to do consistent reads from the primary under normal circumstances, but to allow stale reads from secondaries in an emergency. This provides a “read-only mode” for your application during a failover.
警告
In some situations using secondaryPreferred to distribute read load to replica sets may carry significant operational risk: if all secondaries are unavailable and your set has enough arbiters to prevent the primary from stepping down, then the primary will receive all traffic from clients.
For this reason, use secondary to distribute read load to replica sets, not secondaryPreferred.
Using read modes other than primary and primaryPreferred to provide extra capacity is not in and of itself justification for non-primary in many cases. Furthermore, sharding increases read and write capacity by distributing read and write operations across a group of machines.