Tuesday, July 19, 2011

mysql compare two large tables - find records only in 1 table - for sitemap last modified date

I needed to compare two large tables each having over 2 million records and find the records that are in table 1 but not in table 2.

I only wanted the records from table 1 and not all records that were different from both tables, so on each select query I added a temporary column that stated which table the result was from, so that after the insert select, those with table2 in the temp column could be deleted, leaving just the table 1 records...which are new records that have been added to a database since the last update.

The UNION results are inserted into a new table, so that after being inserted, all records from table2 can be deleted...


INSERT INTO products_new (name,merchantid,tab)
SELECT MIN( name ) AS name, merchantid, tab
FROM (
SELECT name, merchantid, 'table1' AS tab
FROM products_import AS alias1
UNION ALL SELECT name, merchantid, 'table2' AS tab
FROM products_bak AS alias2
)AS alias_table
GROUP BY name
HAVING COUNT( name ) =1
ORDER BY name

So after populating the new products table, the records that were only in table 2 can be deleted, leaving a table with the unique values from table 1.


DELETE FROM products_new WHERE tab='table2'

I have used this in order to be able to make my sitemaps last modified date more accurate, as currently at each product update, all dynamically created product pages have a recent last modified date, even if they were previously present in the DB. Using this method, the last modified date of new products can be accurately shown as new pages, by accessing this new table.

No comments: