Why I Migrated Away From MongoDB
I recently concluded a migration away from MongoDB to PostgreSQL for one of my apps - digiDoc. I’d like to tell you why I did so.
To be honest, the decision to use MongoDb was an ill-thought out one. Lesson learned - thoroughly research any new technology you introduce into your stack, know well the strengths and weaknesses thereof and evaluate honestly whether it fits your needs or not - no matter how much hype there is surrounding said technology. What follows is also not a litany of the usual compaints against MongoDB such as data corruption, global write lock, shard configuration and so on. For our use case, mongodb failed at a much more basic level.
digiDoc is all about converting paper documents like receipts and business cards into searchable database, and so a document database seemed like a logical fit(!). Alas, not being aware of the mathematics behind relational algebra, I could not see clearly the trap I was falling into - document databases are remarkably hard to run aggregations on and aggregating the data and presenting meaningrful statistics on your receipts is one of the core features of digiDoc. Without the powerful aggregation features that we take for granted in RDBMSs, I would constantly be fighting with unweildy map-reduce constructs when all I want is SUM(amount) FROM receipts WHERE <foo> GROUP BY <bar>
. I even contributed some patches to mongoid-map-reduce but the whole experience of aggregating data with mongodb was so ugh that I couldn’t bring myself to work on the app beyond a point. That is of course, a bad place to be in.
Secondly, with a document database, you lose the independence of your data access paths. People keep complaining that JOIN
s make your data hard to scale. Well, the converse is also true - Not having JOIN
s makes your data an intractable lump of mud. Consider a simple thing like an audit trail. In a document database, normally, this would be a set of embedded documents in the item being audited. Now, lets say you want to see a list of all the actions performed by a particular user. Boing! Trapped. You have to load every document in the database and extract the udit trail from it, then filter it in your app for the user you’re looking for. Just the thought of what that would do to my hardware was enough to turn me off the whole idea. JOIN
s are cool! And guess which problem you are more likely to have - needing joins, or scaling beyond facebook?
Thirdly, mongodb is quite feature poor. Perhaps this has changed in the last few months, but when I last looked something as simple as case-insensitive search did not exist. The recommended solution was to have a field in the model with all your search data in it in lower case. Now my model, which I have carefully constructed so as to adhere to the Single Responsibility Principal needs to have callback hooks to save this search string everytime it is updated. And if I add a new field to the model? Time to regenerate all the search strings. I can only come to the conclusion that mongodb is a well-funded and elaborate troll.
Fourthly, and this one completely blew my mind - somewhere along the stack of mongodb, mongoid and mongoid-map-reduce, somewhere there, type information was being lost. I thought we were scaling hard when one of our customers suddenly had 1111 documents overnight. Imagine my disappointment when I realised it was actually four 1s, added together. They’d become strings along the way. Now in this case, perhaos the fault was mine but somehow, I can’t see this happening with Postgres. And when you add up all the hours it takes to deal with these niggling problems and the surprising lack of features, it all leads to some pretty obvious conclusions.
Fifthly, what was I getting in return for dealing with all this? Web scale doesn’t interest me so much. digiDoc is tiny and RDBMS have proven themselves to work at whatever scale we’re likely to achieve. Then it might have been the lask of an enforced schema? Thinking about it though, schemas are wonderful. They take all the constraints about your data and put it in one place. Without a schema, this constraint checking would be spread all over my application. A document added a month ago and a document added yesterday could look completely different and I’d have no way of knowing. Such fuzzy schemaless data models encourage loose thinking and undisciplined object orientation.
I completed a migration to Postgres yesterday. Very happy. Aggregation is a breeze, search is a breeze and we’ve built some pretty powerful tag search and management features that do not even bear thinking about in mongodb. Postgres turns your data store into what it was always meant to be - a mere detail in the scheme of your app, not an overarching presence forcing you to adapt to its requirements.
Lesson learned - Be very circumspect when turning your back on 40 years of computer science.