发布于 2015-09-14 14:50:48 | 346 次阅读 | 评论: 0 | 来源: 网络整理

GridFS is a specification for storing and retrieving files that exceed the BSON-document size limit of 16MB.

Instead of storing a file in an single document, GridFS divides a file into parts, or chunks, [1] and stores each of those chunks as a separate document. By default GridFS limits chunk size to 256k. GridFS uses two collections to store files. One collection stores the file chunks, and the other stores file metadata.

When you query a GridFS store for a file, the driver or client will reassemble the chunks as needed. You can perform range queries on files stored through GridFS. You also can access information from arbitrary sections of files, which allows you to “skip” into the middle of a video or audio file.

GridFS is useful not only for storing files that exceed 16MB but also for storing any files for which you want access without having to load the entire file into memory. For more information on the indications of GridFS, see When should I use GridFS?.

[1]The use of the term chunks in the context of GridFS is not related to the use of the term chunks in the context of sharding.

实现 GridFS

To store and retrieve files using GridFS, use either of the following:

  • A MongoDB driver. See the drivers documentation for information on using GridFS with your driver.
  • The mongofiles command-line tool in the mongo shell. See mongofiles.

GridFS 集合

GridFS 存储文件到两个集合:

  • chunks 存储二进制块. 有关详细信息,请参阅 chunks 集.
  • files 存储文件的元数据. 有关详细信息,请参阅 files 集.

GridFS places the collections in a common bucket by prefixing each with the bucket name. By default, GridFS uses two collections with names prefixed by fs bucket:

  • fs.files
  • fs.chunks

You can choose a different bucket name than fs, and create multiple buckets in a single database.

chunks

Each document in the chunks collection represents a distinct chunk of a file as represented in the GridFS store. The following is a prototype document from the chunks collection.:

{
  "_id" : <string>,
  "files_id" : <string>,
  "n" : <num>,
  "data" : <binary>
}

A document from the chunks collection contains the following fields:

chunks._id

The unique ObjectID of the chunk.

chunks.files_id

The _id of the “parent” document, as specified in the files collection.

chunks.n

The sequence number of the chunk. GridFS numbers all chunks, starting with 0.

chunks.data

The chunk’s payload as a BSON binary type.

The chunks collection uses a compound index on files_id and n, as described in GridFS 索引.

files

Each document in the files collection represents a file in the GridFS store. Consider the following prototype of a document in the files collection:

{
  "_id" : <ObjectID>,
  "length" : <num>,
  "chunkSize" : <num>
  "uploadDate" : <timestamp>
  "md5" : <hash>

  "filename" : <string>,
  "contentType" : <string>,
  "aliases" : <string array>,
  "metadata" : <dataObject>,
}

Documents in the files collection contain some or all of the following fields. Applications may create additional arbitrary fields:

files._id

The unique ID for this document. The _id is of the data type you chose for the original document. The default type for MongoDB documents is BSON ObjectID.

files.length

The size of the document in bytes.

files.chunkSize

The size of each chunk. GridFS divides the document into chunks of the size specified here. The default size is 256 kilobytes.

files.uploadDate

The date the document was first stored by GridFS. This value has the Date type.

files.md5

An MD5 hash returned from the filemd5 API. This value has the String type.

files.filename

Optional. A human-readable name for the document.

files.contentType

Optional. A valid MIME type for the document.

files.aliases

Optional. An array of alias strings.

files.metadata

Optional. Any additional information you want to store.

GridFS 索引

GridFS uses a unique, compound index on the chunks collection for files_id and n. The index allows efficient retrieval of chunks using the files_id and n values, as shown in the following example:

cursor = db.fs.chunks.find({files_id: myFileID}).sort({n:1});

See the relevant driver documentation for the specific behavior of your GridFS application. If your driver does not create this index, issue the following operation using the mongo shell:

db.fs.chunks.ensureIndex( { files_id: 1, n: 1 }, { unique: true } );

示例接口

The following is an example of the GridFS interface in Java. The example is for demonstration purposes only. For API specifics, see the relevant driver documentation.

By default, the interface must support the default GridFS bucket, named fs, as in the following:

GridFS myFS = new GridFS(myDatabase); // returns default GridFS bucket (e.g. "fs"  collection)
myFS.storeFile(new File("/tmp/largething.mpg")); // saves the file to "fs" GridFS bucket

Optionally, interfaces may support other additional GridFS buckets as in the following example:

GridFS myContracts = new GridFS(myDatabase, "contracts"); // returns GridFS bucket named "contracts"
myFS.retrieveFile("smithco", new File("/tmp/smithco.pdf")); // retrieve GridFS object "smithco"
最新网友评论  共有(0)条评论 发布评论 返回顶部

Copyright © 2007-2017 PHPERZ.COM All Rights Reserved   冀ICP备14009818号  版权声明  广告服务