Let's say I have two tables:
ProdID | PartNumber | Data...
1 | ABC-a | "Data A"
2 | (null) | "Data B"
3 | ABC-c | "Data C"
ProdID | PartNumber | DataB
(null) | ABC-a | "Data D"
2 | (null) | "Data E"
3 | (null) | "Data F"
(null) | ABC-z | "Data G"
Not ideal, but anyway. I want
ProdID | PartNumber | Data | DataB...
1 | ABC-a | "Data A" | "Data D"
2 | (null) | "Data B" | "Data E"
3 | ABC-c | "Data C" | "Data F"
(null) | ABC-z | (null) | "Data G"
So I use
FROM Table1 T1
RIGHT JOIN Table2 T2 ON
T1.ProdID = T2.ProdID OR T1.PartNumber = T2.PartNumber
Which does exactly what I want, but is seems to take about 100 times as long as either side of the or individually. As part of a more complex query it takes 2 minutes for the
OR compared to <1 second for just the
int and 1 second for just the
nvarchar(50). Table "A" has ~13k rows, table "b" has ~35k and the whole query returns ~40k.
I think this "Table Spool" may be the problem.
SQL Server 2008 R2 Express. Thoughts?
Join each way separately, then combine the results:
SELECT T1.ProdID, T1.PartNumber, T1.Data, ISNULL(tprodid.DataB, tpartno.DataB) as DataB FROM Table1 T1 LEFT JOIN Table2 tprodid ON T1.ProdID = tprodid.ProdID LEFT JOIN Table2 tpartno ON T1.PartNumber = tpartno.PartNumber;
This will use both indexes and will perform well. You may want to tweak the
ISNULL logic to your liking.
Change the query to a union and you should get much better performance:
Select * from Table1 Left Join Table2 On Table1.ProdID = Table2.ProdID where Table1.PartNumber is null union Select * from Table1 Left Join Table2 On Table1.PartNumber = Table2.PartNumber where Table1.ProdId is null
The union operator will eliminate duplicate rows. That is, rows returned by both queries will only be returned once. So this should return the same data as your main query.
You still need the OR, but you might do a little better with a FULL JOIN:
SELECT COALESCE(t1.ProdID,t2.ProdID) ProdID, COALESCE(t1.PartNumber,t2.PartNumber) PartNumber, t1.Data, t2.DataB FROM TableA t1 FULL JOIN TableB t2 ON t1.ProdID = t2.ProdID OR t1.PartNumber = t2.PartNumber
The reason you have slow performance is because the OR forces it to not match up well with index, forcing a manual compare of one entire table with the other entire table. If you still have performance issues with the FULL JOIN, you can fix it either by adding an index for part number or by using an index hint to tell the optimizer your ProdID index will still be helpful.
While not knowing MSSQL very well I'll attempt to at least provide a solution to your problem!
You should be getting better results with using a
LEFT JOIN for each column you potentially want to join on and then merge the results something like the following:
SELECT COALESCE(TA.ProdID, TB2.ProdID) AS ProdID, COALESCE(TA.PartNumber, TB.PartNumber) AS PartNumber, TA.Data, COALESCE(TB.Data2, TB2.Data2) AS Data2 FROM TableA TA LEFT JOIN TableB TB On TA.ProdID = TB.ProdID LEFT JOIN TableB TB2 On TA.PartNumber = TB2.PartNumber GROUP BY ProdId
While completely guessing, I'd say that it may be limited to using only one index per join though, forcing it to do one of the columns with a full table scan instead. You could try putting both columns into one index and using that index as an index hint on the join and see how it performs.
I like Jeff Siver's suggestion of using
UNION, though his suggested query is wrong. Here's a possible fix:
SELECT * FROM Table1 T1 JOIN Table2 T2 ON T1.ProdID = T2.ProdID UNION SELECT * FROM Table1 T1 JOIN Table2 T2 ON T1.PartNumber = T2.PartNumber UNION SELECT NULL, NULL, NULL, * FROM Table2 T2 WHERE NOT EXISTS ( SELECT * FROM Table1 T1 WHERE T1.ProdID = T2.ProdID ) AND NOT EXISTS ( SELECT * FROM Table1 T1 WHERE T1.PartNumber = T2.PartNumber );