I’ve always assumed GROUP BY
was designed specifically for aggregate functions and in all other circumstances you should use ORDER BY
. For example, we have three tables: orders, shippers, and employees:
OrderID CustomerID EmployeeID OrderDate ShipperID
10248 90 5 1996-07-04 3
10249 81 6 1996-07-05 1
10250 34 4 1996-07-08 2
ShipperID ShipperName Phone
1 Speedy Express (503) 555-9831
2 United Package (503) 555-3199
3 Federal Shipping (503) 555-9931
EmployeeID LastName FirstName BirthDate Photo Notes
1 Davolio Nancy 1968-12-08 EmpID1.pic Education includes a BA....
2 Fuller Andrew 1952-02-19 EmpID2.pic Andrew received his BTS....
3 Leverling Janet 1963-08-30 EmpID3.pic Janet has a BS degree...
We can then determine the number of orders sent by a shipper:
SELECT Shippers.ShipperName,COUNT(Orders.OrderID) AS NumberOfOrders FROM Orders
LEFT JOIN Shippers
ON Orders.ShipperID=Shippers.ShipperID
GROUP BY ShipperName;
Group By helps here because it prevents a duplicate shipper name in the result set. That is, the aggregate function itself without using GROUP BY
would return two rows of shipper name if shipper name appears more than once in the shippers table. GROUP BY
gives our aggregate without duplicates.
Makes sense. But then I come across this result set from an ORM (ActiveRecord in Rails in this case):
SELECT users.* FROM users
INNER JOIN timesheets ON timesheets.user_id = users.id
WHERE (timesheets.submitted_at <= '2010-07-06 15:27:05.117700')
GROUP BY users.id
There were no aggregate functions in that sql statement. Shouldn’t it be using ORDER BY instead?