Friday, 4 March 2011

SQL Join Optimisation

Java developers are often required to write their own SQL after all, it’s a necessary evil which means that often doesn’t get the thought it requires, which, in turn, can lead to inefficient SQL and slow systems.

Take, for example, a hypothetical scenario whereby you need to select a list of all the employees in you company and the department they’re in. Given that this information is held in two tables: employee and department, you could write some simple SQL that goes:

Select e.name, d.dept_name from employee e, department d where e.dep_id = d.id;

You could then call it from your Java program and carry on coding: job done. Or is it? You could also have written:

Select e.name, d.dept_name from department d, employee e where e.dep_id = d.id;

You may think does it make any difference? Well yes it does.

Suppose that you have 10 departments in your company and 1000 employees... the first statement when executed will scan the employee table once and get 1000 rows. It will then scan the department table 1000 times matching each employee with a department.

The second statement will scan the department table once and get 10 rows. It will then scan the employee table 10 times matching the department with the employee.

It doesn’t take much to figure out that 1000 table scans is more costly than 10 table scans and hence the second SQL statement performs a lot better than the first.

So to end, when developing this kind of SQL I guess that it’s advisable to spend a minute or two figuring out which table in your join will return the least number of rows, and then use that as the first, or inner, table in your SQL statement. It might take you an extra bit of effort by the results will be worth it.

No comments: