Here are three SQL queries and one simple challenge: Order them by speed assuming that city has an index.
1. SELECT city, SUM(inhabitants) FROM population GROUP BY city
2. SELECT city, SUM(inhabitants) FROM population GROUP BY city ORDER BY city
3. SELECT city, SUM(inhabitants) FROM population GROUP BY city ORDER BY city DESC
Obviously, number one wins: It has no sorting and the others should be nearly the same, because both require sorting but it may be done using an index.
But... did I tell you that I'm running on MySQL? Thats important, because it stops the logic from working. Actually, number one and two have exactly the same speed while number three is significant slower (given enough data to show any difference).
MySQL always adds an implicit ORDER BY to every GROUP BY statement as stated on a secret documentation page:
By default, MySQL sorts all GROUP BY col1, col2, ... queries as if you specified ORDER BY col1, col2, ... in the query as well.
But MySQL is nice when expicit specifying the same columns as GROUP BY and ORDER BY:
If you include an ORDER BY clause explicitly that contains the same column list, MySQL optimizes it away without any speed penalty, although the sorting still occurs.
I was surprised when I recently discovered that GROUP BY some_col ORDER BY NULL part within a query - written by a guy who has been very deep into MySQL for years and within a query which handles a lot of data makeing that little extension really importent to speed up things.
SELECT city, SUM(inhabitants) FROM population GROUP BY city ORDER BY NULL
Always using ORDER BY NULL unless the application requires the data to be sorted will decrease the processing time, but why is number three the slowest? It should be as fast as the others. Well, it should, but there is another MySQL rule which - I think - is unknown to most users:
MySQL cannot use indexes to resolve the ORDER BY, although it still uses indexes to find the rows that match the WHERE clause [...] (if) you have different ORDER BY and GROUP BY expressions.
Number three's sorting can't use any index and thus will be slower than the others.
I thought about adding an explicit ORDER BY to every statement using GROUP BY from now on to clearly show that this query is sorted (or not, if it's a ORDER BY NULL), but doing so would also introduce some drawbacks. Every developer walking through this code would see, that the query is sorted. What if some change request requires other or additional sorting? One might accidently change the ORDER BY arguments and introduce an additional result sort - slowing down everything. Using a ORM might help, but I wonder if DBIx::Class and others insert ORDER BY NULL unless sorting is requested.