发布于 2015-09-14 14:48:33 | 138 次阅读 | 评论: 0 | 来源: 网络整理
This document describes the design and pattern of a content management system using MongoDB modeled on the popular Drupal CMS.
You are designing a content management system (CMS) and you want to use MongoDB to store the content of your sites.
To build this system you will use MongoDB’s flexible schema to store all content “nodes” in a single collection regardless of type. This guide will provide prototype schema and describe common operations for the following primary node types:
This solution does not describe schema or process for storing or using navigational and organizational information.
Although documents in the nodes collection contain content of different times, all documents have a similar structure and a set of common fields. Consider the following prototype document for a “basic page” node type:
{
_id: ObjectId(…),
nonce: ObjectId(…),
metadata: {
type: 'basic-page'
section: 'my-photos',
slug: 'about',
title: 'About Us',
created: ISODate(...),
author: { _id: ObjectId(…), name: 'Rick' },
tags: [ ... ],
detail: { text: '# About Usn…' }
}
}
Most fields are descriptively titled. The section field identifies groupings of items, as in a photo gallery, or a particular blog . The slug field holds a URL-friendly unique representation of the node, usually that is unique within its section for generating URLs.
All documents also have a detail field that varies with the document type. For the basic page above, the detail field might hold the text of the page. For a blog entry, the detail field might hold a sub-document. Consider the following prototype:
{
…
metadata: {
…
type: 'blog-entry',
section: 'my-blog',
slug: '2012-03-noticed-the-news',
…
detail: {
publish_on: ISODate(…),
text: 'I noticed the news from Washington today…'
}
}
}
Photos require a different approach. Because photos can be potentially larger than these documents, it’s important to separate the binary photo storage from the nodes metadata.
GridFS provides the ability to store larger files in MongoDB. GridFS stores data in two collections, in this case, cms.assets.files, which stores metadata, and cms.assets.chunks which stores the data itself. Consider the following prototype document from the cms.assets.files collection:
{
_id: ObjectId(…),
length: 123...,
chunkSize: 262144,
uploadDate: ISODate(…),
contentType: 'image/jpeg',
md5: 'ba49a...',
metadata: {
nonce: ObjectId(…),
slug: '2012-03-invisible-bicycle',
type: 'photo',
section: 'my-album',
title: 'Kitteh',
created: ISODate(…),
author: { _id: ObjectId(…), name: 'Jared' },
tags: [ … ],
detail: {
filename: 'kitteh_invisible_bike.jpg',
resolution: [ 1600, 1600 ], … }
}
}
注解
This document embeds the basic node document fields, which allows you to use the same code to manipulate nodes, regardless of type.
This section outlines a number of common operations for building and interacting with the metadata and asset layer of the cms for all node types. All examples in this document use the Python programming language and the PyMongo driver for MongoDB, but you can implement this system using any language you choose.
The most common operations inside of a CMS center on creating and editing content. Consider the following insert() operation:
db.cms.nodes.insert({
'nonce': ObjectId(),
'metadata': {
'section': 'myblog',
'slug': '2012-03-noticed-the-news',
'type': 'blog-entry',
'title': 'Noticed in the News',
'created': datetime.utcnow(),
'author': { 'id': user_id, 'name': 'Rick' },
'tags': [ 'news', 'musings' ],
'detail': {
'publish_on': datetime.utcnow(),
'text': 'I noticed the news from Washington today…' }
}
})
Once inserted, your application must have some way of preventing multiple concurrent updates. The schema uses the special nonce field to help detect concurrent edits. By using the nonce field in the query portion of the update operation, the application will generate an error if there is an editing collision. Consider the following update
def update_text(section, slug, nonce, text):
result = db.cms.nodes.update(
{ 'metadata.section': section,
'metadata.slug': slug,
'nonce': nonce },
{ '$set':{'metadata.detail.text': text, 'nonce': ObjectId() } },
w=1)
if not result['updatedExisting']:
raise ConflictError()
You may also want to perform metadata edits to the item such as adding tags:
db.cms.nodes.update(
{ 'metadata.section': section, 'metadata.slug': slug },
{ '$addToSet': { 'tags': { '$each': [ 'interesting', 'funny' ] } } })
In this example the $addToSet operator will only add values to the tags field if they do not already exist in the tags array, there’s no need to supply or update the nonce.
To support updates and queries on the metadata.section, and metadata.slug, fields and to ensure that two editors don’t create two documents with the same section name or slug. Use the following operation at the Python/PyMongo console:
>>> db.cms.nodes.ensure_index([
... ('metadata.section', 1), ('metadata.slug', 1)], unique=True)
The unique=True option prevents to documents from colliding. If you want an index to support queries on the above fields and the nonce field create the following index:
>>> db.cms.nodes.ensure_index([
... ('metadata.section', 1), ('metadata.slug', 1), ('nonce', 1) ])
However, in most cases, the first index will be sufficient to support these operations.
To update a photo object, use the following operation, which builds upon the basic update procedure:
def upload_new_photo(
input_file, section, slug, title, author, tags, details):
fs = GridFS(db, 'cms.assets')
with fs.new_file(
content_type='image/jpeg',
metadata=dict(
type='photo',
locked=datetime.utcnow(),
section=section,
slug=slug,
title=title,
created=datetime.utcnow(),
author=author,
tags=tags,
detail=detail)) as upload_file:
while True:
chunk = input_file.read(upload_file.chunk_size)
if not chunk: break
upload_file.write(chunk)
# unlock the file
db.assets.files.update(
{'_id': upload_file._id},
{'$set': { 'locked': None } } )
Because uploading the photo spans multiple documents and is a non-atomic operation, you must “lock” the file during upload by writing datetime.utcnow() in the record. This helps when there are multiple concurrent editors and lets the application detect stalled file uploads. This operation assumes that, for photo upload, the last update will succeed:
def update_photo_content(input_file, section, slug):
fs = GridFS(db, 'cms.assets')
# Delete the old version if it's unlocked or was locked more than 5
# minutes ago
file_obj = db.cms.assets.find_one(
{ 'metadata.section': section,
'metadata.slug': slug,
'metadata.locked': None })
if file_obj is None:
threshold = datetime.utcnow() - timedelta(seconds=300)
file_obj = db.cms.assets.find_one(
{ 'metadata.section': section,
'metadata.slug': slug,
'metadata.locked': { '$lt': threshold } })
if file_obj is None: raise FileDoesNotExist()
fs.delete(file_obj['_id'])
# update content, keep metadata unchanged
file_obj['locked'] = datetime.utcnow()
with fs.new_file(**file_obj):
while True:
chunk = input_file.read(upload_file.chunk_size)
if not chunk: break
upload_file.write(chunk)
# unlock the file
db.assets.files.update(
{'_id': upload_file._id},
{'$set': { 'locked': None } } )
As with the basic operations, you can use a much more simple operation to edit the tags:
db.cms.assets.files.update(
{ 'metadata.section': section, 'metadata.slug': slug },
{ '$addToSet': { 'metadata.tags': { '$each': [ 'interesting', 'funny' ] } } })
Create a unique index on { metadata.section: 1, metadata.slug: 1 } to support the above operations and prevent users from creating or updating the same file concurrently. Use the following operation in the Python/PyMongo console:
>>> db.cms.assets.files.ensure_index([
... ('metadata.section', 1), ('metadata.slug', 1)], unique=True)
To locate a node based on the value of metadata.section and metadata.slug, use the following find_one operation.
node = db.nodes.find_one({'metadata.section': section, 'metadata.slug': slug })
注解
The index defined (section, slug) created to support the update operation, is sufficient to support this operation as well.
To locate an image based on the value of metadata.section and metadata.slug, use the following find_one operation.
fs = GridFS(db, 'cms.assets')
with fs.get_version({'metadata.section': section, 'metadata.slug': slug }) as img_fpo:
# do something with the image file
注解
The index defined (section, slug) created to support the update operation, is sufficient to support this operation as well.
To retrieve a list of nodes based on their tags, use the following query:
nodes = db.nodes.find({'metadata.tags': tag })
Create an index on the tags field in the cms.nodes collection, to support this query:
>>> db.cms.nodes.ensure_index('tags')
To retrieve a list of images based on their tags, use the following operation:
image_file_objects = db.cms.assets.files.find({'metadata.tags': tag })
fs = GridFS(db, 'cms.assets')
for image_file_object in db.cms.assets.files.find(
{'metadata.tags': tag }):
image_file = fs.get(image_file_object['_id'])
# do something with the image file
Create an index on the tags field in the cms.assets.files collection, to support this query:
>>> db.cms.assets.files.ensure_index('tags')
Use the following operation to generate a list of recent blog posts sorted in descending order by date, for use on the index page of your site, or in an .rss or .atom feed.
articles = db.nodes.find({
'metadata.section': 'my-blog'
'metadata.published': { '$lt': datetime.utcnow() } })
articles = articles.sort({'metadata.published': -1})
注解
In many cases you will want to limit the number of nodes returned by this query.
Create a compound index on the { metadata.section: 1, metadata.published: 1 } fields to support this query and sort operation.
>>> db.cms.nodes.ensure_index(
... [ ('metadata.section', 1), ('metadata.published', -1) ])
注解
For all sort or range queries, ensure that field with the sort or range operation is the final field in the index.
In a CMS, read performance is more critical than write performance. To achieve the best read performance in a sharded cluster, ensure that the mongos can route queries to specific shards.
Also remember that MongoDB can not enforce unique indexes across shards. Using a compound shard key that consists of metadata.section and metadata.slug, will provide the same semantics as describe above.
警告
Consider the actual use and workload of your cluster before configuring sharding for your cluster.
Use the following operation at the Python/PyMongo shell:
>>> db.command('shardCollection', 'cms.nodes', {
... key : { 'metadata.section': 1, 'metadata.slug' : 1 } })
{ "collectionsharded": "cms.nodes", "ok": 1}
>>> db.command('shardCollection', 'cms.assets.files', {
... key : { 'metadata.section': 1, 'metadata.slug' : 1 } })
{ "collectionsharded": "cms.assets.files", "ok": 1}
To shard the cms.assets.chunks collection, you must use the _id field as the shard key. The following operation will shard the collection
>>> db.command('shardCollection', 'cms.assets.chunks', {
... key : { 'files_id': 1 } })
{ "collectionsharded": "cms.assets.chunks", "ok": 1}
Sharding on the files_id field ensures routable queries because all reads from GridFS must first look up the document in cms.assets.files and then look up the chunks separately.