Comparing MySQL to Postgres

Introduction

We are comparing MySQL to Postgres on a particular dataset and backend code.

Why?

We as a software service company do a crucial part of our clients workprocess automation by maintaining and improving their central application. The application is now over ten years old and as was common in that time, was written against a MySQL database. After the sale of MySQL we moved to MariaDB as so many companies have done, because MySQL is now no longer open source, and has moved to a subscription-based service for which must be paid.

In the course of our work, we frequently advise our clients on improvements to their systems. One of these was to move their MySQL database from a locally hosting MariaDB database to a managed cluster with failover nodes and and offsite backup. At the same time, we proposed to move to Postgres. Why we would want to do the latter is the object of this piece.

Backed by numbers

To convince our client (and ourselves) better we decided to set up a system where we could create statistics. After copying the MySQL/MariaDB data to postgres (that is the subject of another post), we created a management command that would query both databases with exactly the same set of data and measure the results.

The Set-up

The management command is set up as follows:

  1. The database model to query
  2. The operation to test with, read or write
  3. The number of cycles to run
  4. The time interval between queries

The Process

Because we want to deliver a coherent document, the management command will run through all test cases we have defined. This will be different for every project. The conditions for this project are the following


        

Results

The results below are the outcome of a number of runs. The details of each run are split out into the following:

  • Output: the output of the command, with:
    • Database name
    • Average query speed across the number of cycles with the given interval
  • Notes: any other conditions that may not have been mentioned
  • Exceptions: a list of exceptions that occurred during the running of the queries, per database.

Calculations

We give the relative results in percentages after each run through the databases. The calculation looks like this:

    
        def calc(self, default: float, newcomer: float)-> str:
            if default > newcomer:
                c = round(((default-newcomer)/newcomer)*100)
                return f"The Newcomer is faster by {c}%"
            elif newcomer > default:
                c = round(((newcomer-default)/default)*100)
                return f"The Default is faster by {c}%"
            else:
                return "Default and Newcomer are exactly the same."
        

Results per Run

{% for d in data %}

Model: {{ d.model.name }}, Operation: {{ d.operation }}

{% for k,v in d.output.items %}

Database: {{ k }}

    {% for k,v in v.items %}
  • {{ k }}: {{ v }}
{% endfor %} {% for k,v in v.items %} {% if k == "graph" %} {% endif %} {% endfor %}

{% endfor %}
{{ d.comparison }}
Exceptions
{% for k,v in d.exceptions.items %}
Database {{ k }}
{% if not v %}

No exceptions

{% else %}
    {% for i in v %}
                        Model: {{ i.model }}
                        PK: {{ i.pk }}
                        Error: {{ i.error }}
                    
    {% endfor %}
{% endif %} {% endfor %}
Notes

{{ d.notes }}

Full list of handled ids
{{ d.pks }}
{% endfor %}