The NoSQL Madness – Part 1

This article has been in draft for quite some time, and since one strike plus one storm led to a change in a flight schedule… what better moment for completing it? 😉

History

A few years ago, many developers somehow reached the conclusion that SQL databases are bad and the right solution would be no SQL databases. All this madness started when developers behind several highly popular services discovered that MySQL was not as scalable as they had hoped for. Shortly after that, they became pretty aggressive, telling the web that new solutions must be created. Naturally, SQL (programming language) is not a synonym to MySQL (database server). But as MySQL is so widely deployed, we will use it for exemplification in this article.

What “Not Scalable” Actually Meant

As a matter of fact, the problem was that the database server was not able to handle queries as fast/as many as expected. Therefore, they thought about adding multiple MySQL servers and discovered it was impractical. Having a master server and several slaves to update from it was a possible scenario, but it was neither ideal nor easy to handle. All queries that made changes to the database (update, delete, insert) should have been sent to the master and all queries that got data to one of the slaves. This added extra complexity to the code. But even this setup was limited because when the software made many updates to the database the master could no longer handle all the change requests.

But Were Limits Truly Reached?

With few exceptions, NO. In fact, there are some things that can be done to address such limitations. First, it’s data sharding. Since applications are usually modular, you can split data on multiple fully independent MySQL servers. Of course, each infrastructure shard can have a master and multiple slaves. Actually, you should start sharding after fixing all issues with queries and table definitions. In many cases, by optimizing the code you can double the overall performance, if not triple it.

How NoSQL Got In

The basic idea about NoSQL was to make a database that does not have the MySQL limitations. Unfortunately, most of the people who actually started to develop them had a poor theoretical background or database design experience. And, as it happens with developers in general, everyone thought that they will create something better. This is how we actually got to several hundred (not joking!) databases that, according to their creators, could replace MySQL to some extent.

Of course, they all seemed to lack the usability perspective, but to be completely fair, the SQL language is extremely flexible and you can do a lot with it. Most importantly, they also had design and implementation flaws. Even if all these developments advertised the power of the cloud and distribution, most of them did not have the capabilities to operate in a distributed environment. As a matter of fact, they used techniques that were taken over from MySQL.

Let’s focus on the information distribution across computing nodes and replication. In most implementations, either it did not exist (read: user had the liberty to implement in their code), or they were rather basic. Many “highly scalable” distributed databases still relied on master-slave replication (basically quite similar to MySQL) or used centralized controllers.

The Present

NoSQL Status

Most NoSQL replacements still deal with the following problems:

  • Limited flexibility – Each developer wanted to build something useful to him, so the transport protocol is non-standard, internals are ad-hoc, and so on.
  • Poor replication – Any data replication mechanism that cannot be used in infrastructures with thousands of computing nodes cannot be called scalable.
  • Faulty distribution – If data distribution is not transparent to the database user or if it relies on centralized entity, then we have a problem.
  • Exposure to disaster – It’s actually funny that a lot of people are willing to use databases that store information in a way that is not even documented.

And of course there are other issues as well, but these will do for now.

Choose The Right Database For The Job

Our recommendation before choosing between a MySQL and a NoSQL database is that you try to answer the following questions:

Do you need ACID?

If so, MySQL is your only choice. If you need ACID and you estimate that MySQL is not enough for you, today you can go for one of the following solutions:

  • Use the right hardware – Technically, MySQL is severely limited by I/O and the new SSD can do miracles. Going even further, we have memory based PCI cards such as Fussion-IO that provide an even better bandwidth and latency. MySQL performance in multi-processor environments has improved to a great extent. Also, the typical memory size, memory bandwidth, and CPU power dramatically improved as well in the last years. So, with the right hardware, MySQL can deliver a lot.
  • Use sharding – Split your data in such a way that you can put it on independent MySQL computing nodes.
  • Use the right queries – This is actually related to the following paragraph.

Of course, there are plenty of other techniques and usually you must use all of them. Should you feel that more details would be welcome, please insert your opinion in the comments field and we will post a dedicated article for these scenarios.

Do you understand the data structures that are optimal for storing your data?

This is often the most important thing developers miss. Internally, any database is organized using standard data structures. If you do not understand what is optimum for the data you want to store (actually, this is not only about data, but also about the operations you want to perform), then you will not be able to choose the right database.

Just think that you are going to store all that information in the memory – how are you going to do it? It is common sense that developers who worked their entire life with high-level languages might slightly be put to disadvantage, but that’s no excuse. Unfortunately, relational databases made developers ignore how data is actually stored because the abstraction was too good. Due to this, few people understand what happens behind the scenes. However, if you want scalability with any database, this is a mandatory requirement. Even with MySQL, if you want to optimize, you must understand how MySQL actually stores data.

After having discovered these data structures, you must accept that no database can perform better than the optimum algorithm. For example, if the insert into the data structure is theoretically O(n), no database using that structure can do it better. So, supposing you need a lot of inserts, data structure is hardly the best choice you can make.

A sequel to this article will offer advice on how to determine the right database engine and will include theory on what is behind distributed databases. You don’t want to miss our next posts! 🙂

5 Comments

You can post comments in this post.


  • What about PostgreSQL? ACID compliance, more robust and built in replication makes it a stronger choice than MySQL in a lot of ways. “MySQL is your only choice”?

    http://www.wikivs.com/wiki/MySQL_vs_PostgreSQL#ACID_Compliance

    Steve 12 years ago Reply


  • The article is from the perspective of someone using MySQL. That’s why the disclaimer at the start:

    Naturally, SQL (programming language) is not a synonym to MySQL (database server). But as MySQL is so widely deployed, we will use it for exemplification in this article.

    Obviously you can use Pgsql or any other SQL database. Technically you can achieve with MySQL the same you can do with Pgsql and viceversa, it’s a matter of preference.

    Blog wizard 12 years ago Reply


    • Personally I think it’s very bad practice to confound SQL with MySQL and is also probably a dilution of some company’s MySQL-related trademarks. It’s also very bad writing for a technical article. Be accurate! Don’t just say “we’re substituting the name of a programming language for the name of a software package, just pretend we’re talking about the language” because that’s a terrible excuse for above.

      I’ll be getting my SQL (haha) information elsewhere from now on, I think.

      Steve 12 years ago Reply


  • Steve, it’s not about confusing a language specification (SQL) with a software that implements it (MySQL).
    For these articles we cannot compare NoSQL databases with a language specification. We simply took a popular relational database as an example, a relational database that implements the SQL standard and has ACID properties. This is MySQL.

    In fact, the generic name NoSQL is wrong. Technically a database can fit into the NoSQL model, but it still implements the SQL language. SQL itself is not the problem in achieving extreme scalability and distribution, but ACID properties.

    It’s a matter of choice if you read or not the following articles.

    Blog wizard 12 years ago Reply


  • Check out at all our baby room remodel. I think it
    turned out great for purely 5 times of work.

    Vincent 9 years ago Reply


Post A Reply