Thundergallery – Mongo Schema Design
by Bioshox on 05/19/2012Before I get started I’d like to reiterate the fact that I’m a student currently studying Computer Systems & Software Development at University and I’m learning MongoDB from scratch (I’m 18 years old)! Following the new competition from MongoDB I’m going to talk about the schema design I used in Thundergallery! (http://blog.10gen.com/post/23237721457/blogging-contest-mongodb-schema-design)
First of all when developing the schema I had to take into consideration what the database was actually going to be used for, what functionality it would need to support and how it would power Thundergallery. First of all I knew that I was going to develop GridFS into the system, this was to store the binary data for images, which leads me on to the Data Life Cycle and Performance and Workload.
Data Life Cycle
The Data Life Cycle of Thundergallery is a simple one, mainly Inserts Updates and Deletes, multiple of which would need to be done over a series of collections for the functionality of Thundergallery, depending on the image size being uploaded would depend on how much data at any one time is being stored within Mongo.
Performance and Workload
The database would need to be able to keep up with multiple large files being stored as Binary Data at once, it would need the ability to split files down into chunks and be able to stream them back out in a speedy manor.
Because this could put serious load on the database and server various caching elements are in the pipeline to be developed into Thundergallery to protect it from locking up if a systems usage goes beyond anything I ever expected.
I also needed to look into possibilities in the future to extend the database, I knew of the features I wanted to build into the system, and I also knew the use of a document database would allow an easier implementation of these features later down the line.
Types of Queries
The types of queries being used for Thundergallery are not that complex, at most they are find a value exactly the same as a posted one from a form, with PHP behind the queries to help push the system along.
Documents
GridFS creates it’s own Document Structure within MongoDB, but we also add a unique key, this is to match against the entries in the images collection to pull the actual images within the script
fs.files collection:
{
“_id”: ObjectId(“4fb7fcebee3c47f326000001″),
“unique_id”: “content1656190032″,
“filename”: “thundergallery_logo.png”,
“uploadDate”: ISODate(“2012-05-19T20: 04: 59.989Z”),
“length”: 11138,
“chunkSize”: 262144,
“md5″: “731420c6ee1a59689218ec4cca46b081″
}
{
“_id”: ObjectId(“4fb7fcebee3c47f326000003″),
“unique_id”: “content1656190032″
}
MongoDB was chosen for Thundergallery mainly because of it’s awesome GridFS function, it’s scalability, it’s ability to drive web applications and the fact that it’s great at handling high volumes of load!
I hope this post is helpful and I urge you to check out Thundergallery over on GitHub: https://github.com/Bioshox/Thundergallery
You’ll also be able to find me at the next MongoDB User Group in London next month! (June)
I’d love some comments on how I can improve and further my learning, find me on Twitter: @imjacobclark
