HowTo: Comparing two tables and returning the # of rows that match FAST

The most intuitive way to write a comparison of two tables and spit out the # of rows that are the same involves a left join:

select 'Investment' as TableName, count(*) as RowCount
from Investment_A a, Investment_B b
  a.col1 = b.col1
  AND a.col2 = b.col2
  AND a.col3 = b.col3
  AND a.col4 = b.col4

This returns the correct answer but is very slow. Is there a better way? Of course!

select 'Investment' as TableName, count(*) as RowCount
   select 1 as num
   FROM (
      select * from Investment_A
      select * from Investment_B
   ) tmp
   GROUP BY col1, col2, col3, col4
   HAVING COUNT(*) > 1
) tmp2

By pushing the comparison off into the GROUP BY, we leverage the DBMS engine far more efficiently. There are two drawbacks:

  1. Readability of the SQL code
  2. Far more temporary storage is used for the GROUP BY. There is a real risk of running out of temporary storage if the tables are large.
Share Button