Using the T-SQL MERGE Statement

Posted on Updated on

In SQL Server 2008, Microsoft added a new SQL query type: the MERGE statement. This flexible query provides the ability to perform INSERTs, UPDATEs and even DELETEs all within a single statement. Used in combination with Common Table Expressions (CTEs), this can be a powerful tool to replace multiple SQL queries under the right circumstances.

One important rule to keep in mind when using MERGE, is that the statement must be terminated by a semicolon (;).

Case Study 1: A Simple Upsert

The most common usage of the MERGE statement is to perform what is colloquially called an “upsert,” which is really a diminutive form of UPDATE/INSERT. Without further preamble, let’s set up some test data and get to seeing how the MERGE statement can simplify your life.

CREATE TABLE #Master
(
    [key]       INT IDENTITY PRIMARY KEY
    ,name       VARCHAR(10)
);

INSERT INTO #Master
VALUES ('Dwain'),('Jeff'),('Paul')
    ,('Anon'),('Ralph'),('Tom'),('Sally');

CREATE TABLE #Staging
(
    [key]       INT PRIMARY KEY
    ,[NewName]  VARCHAR(10)
);

INSERT INTO #Staging
VALUES (2, 'Bob'),(4, 'Jim'),(6, 'Marvin'), (10, 'Buddy');

SELECT * FROM #Master;
SELECT * FROM #Staging;

The results in the two tables as displayed by the SELECT are:

key name
1   Dwain
2   Jeff
3   Paul
4   Anon
5   Ralph
6   Tom
7   Sally

key NewName
2   Bob
4   Jim
6   Marvin
10  Buddy

Our intention is to update (based on [key] in #Staging) the corresponding row by [key] in #Master. If the [key] in #Staging matches none of our [key] values in #Master, then insert a new row. The new row’s [key] does not need to match the value in the staging table. We can easily do this with a MERGE statement as follows:

MERGE #Master t
USING #Staging s
ON s.[key] = t.[key]
WHEN MATCHED THEN 
    UPDATE SET name = s.[NewName]
WHEN NOT MATCHED THEN 
    INSERT (name) VALUES (s.[NewName]); 

SELECT * FROM #Master;

The final SELECT result is as follows:

key name
1   Dwain
2   Bob
3   Paul
4   Jim
5   Ralph
6   Marvin
7   Sally
8   Buddy

You can see that Bob replaced Jeff, Jim replaced Anon and Marvin replaced Tom, and also that Buddy was added at the end.

The way the statement works is as follow:

  • The table name immediately after the MERGE keyword is the target table, in this case #Master, which we have aliased as t for easy understanding.
  • The USING table is the source, so #Staging will be merged into #Master.
  • The ON keyword represents the matching criteria between the records in the two tables. You should not think of this in the same way that ON appears after a JOIN as it operates quite differently.
  • Following those parts of the statement, are any number of WHEN clauses. The MATCHED criterion indicates a match based on the ON criteria. It can be combined with additional matching criteria if required.
  • NOT MATCHED (implied as BY TARGET), means that when a source row does not exist in the target table, we’re going to do something.
  • Following MATCHED or NOT MATCHED is the keyword THEN followed by either an INSERT or an UPDATE.

You can also use DELETE (instead of UPDATE or INSERT) and if you’d like to learn about how to DELETE rows from the target table, I suggest you read this article to understand exactly how it works: A Hazard of Using the SQL Merge Statement and the potential dangers when using it.

Case Study 2: A More Complicated MERGE

Suppose we have the following sample table and data:

CREATE TABLE #ItemTest
(
    ID INT NOT NULL
    ,LineID INT NOT NULL
    ,ProductID INT NULL
    ,PRIMARY KEY(ID, LineID)
); 

INSERT INTO #ItemTest (ID, LineID, ProductID)
SELECT 100, 1, 5 
UNION ALL SELECT 100, 2, 15 
UNION ALL SELECT 100, 3, 8 
UNION ALL SELECT 100, 4, 25 
UNION ALL SELECT 200, 1, 11 
UNION ALL SELECT 200, 2, 100 
UNION ALL SELECT 200, 3, 41 
UNION ALL SELECT 200, 4, 10 
UNION ALL SELECT 200, 5, 5 
UNION ALL SELECT 200, 6, 30
UNION ALL SELECT 300, 1, 20;

SELECT *
FROM #ItemTest;

From the final SELECT, we see that our data appears as follows:

ID   LineID ProductID
100  1      5
100  2      15
100  3      8
100  4      25
200  1      11
200  2      100
200  3      41
200  4      10
200  5      5
200  6      30
300  1      20

Notice how the entries for each ID contain a sequentially numbered LineID (1 to 4 for ID=100 and 1 to 6 for ID=200). Our business requirement is that we need to delete some rows and at the same time preserve the row numbering for LineID without introducing any gaps. So for example, if we need to delete LineID=3 from ID=100, we need to renumber LineID=4 for that ID to be LineID=3.

Ignoring for the moment that it’s probably poor application design to have this row renumbering requirement, this can be accomplished with a MERGE. Since it is a bit more complicated we’ll develop it in a couple of steps to help you understand. First, let’s say we want to delete three rows. We’ll put those into a table variable (a feature introduced in SQL Server 2005).

DECLARE @RowsToDelete TABLE
(
    ID      INT
    ,LineID INT
    ,PRIMARY KEY (ID, LineID)
);

INSERT INTO @RowsToDelete (ID, LineID)
SELECT 100, 3 UNION ALL SELECT 200, 2 UNION ALL SELECT 200, 4;

Note how we can create a PRIMARY KEY on a table variable. While not needed in this case, if you had lots of rows it will improve the performance of what we’re about to do.

Now we’ll construct the following query which will require some explanation:

SELECT a.ID, a.LineID, ProductID, LineID2=b.LineID
    ,rn=ROW_NUMBER() OVER (PARTITION BY a.ID ORDER BY NULLIF(a.LineID, b.LineID))
    ,XX=NULLIF(a.LineID, b.LineID)
FROM #ItemTest a
LEFT JOIN @RowsToDelete b ON a.ID = b.ID AND a.LineID = b.LineID;

XX is included only to illustrate what NULLIF is doing for us. This produces the following results:

ID   LineID ProductID LineID2  rn   XX
100  3      8         3        1    NULL
100  1      5         NULL     2    1
100  2      15        NULL     3    2
100  4      25        NULL     4    4
200  2      100       2        1    NULL
200  4      10        4        2    NULL
200  1      11        NULL     3    1
200  3      41        NULL     4    3
200  5      5         NULL     5    5
200  6      30        NULL     6    6
300  1      20        NULL     1    1

Each row from #ItemTest is returned because it is the left table of the LEFT JOIN. Matching rows from our @RowsToDelete temporary table have a value in LineID2, while rows not matched have a value of NULL (exactly how you’d expect the LEFT JOIN to work). The result in XX shows us that when the LineID of #ItemTest matches the LineID of @RowsToDelete, we get a NULL and NULL values usually sort first (there is a SQL Server setting that controls this). So in each case, the rows we want to delete are sorted to the top of the grouping (on ID).

For the 3 rows in our @RowsToDelete table, we have 1 for ID=100 and 2 for ID=200 (these counts are easy enough to obtain in SQL). So what happens if we subtract that count from rn?

WITH CountItemsToDelete (ID, c) AS
(
    SELECT ID, COUNT(*)
    FROM @RowsToDelete
    GROUP BY ID
)
SELECT a.ID, a.LineID, ProductID, LineID2=b.LineID
    ,[rn-c]=ROW_NUMBER() OVER (PARTITION BY a.ID ORDER BY NULLIF(a.LineID, b.LineID))-c
FROM #ItemTest a
LEFT JOIN @RowsToDelete b ON a.ID = b.ID AND a.LineID = b.LineID
JOIN CountItemsToDelete c ON a.ID = c.ID;

The results now appear as:

ID   LineID   ProductID LineID2  rn-c
100  3        8         3        0
100  1        5         NULL     1
100  2        15        NULL     2
100  4        25        NULL     3
200  2        100       2        -1
200  4        10        4        0
200  1        11        NULL     1
200  3        41        NULL     2
200  5        5         NULL     3
200  6        30        NULL     4

Note how the row for ID=300 has been eliminated by the INNER JOIN to our Common Table Expression (CTE) CountItemsToDelete. Looking at the [rn-c] column, we see that for rows where LineID2 is not NULL, the value is meaningless. But for rows where LineID2 is NULL, [rn-c] is precisely the final row number we’ll need to assign to LineID after deleting the rows we want to delete! Now we have enough information to write this into a MERGE statement:

WITH CountItemsToDelete (ID, c) AS
(
    SELECT ID, COUNT(*)
    FROM @RowsToDelete
    GROUP BY ID
), 
    SourceItems AS
(
    SELECT a.ID, a.LineID, ProductID, LineID2=b.LineID, c
        ,rn=ROW_NUMBER() OVER (PARTITION BY a.ID ORDER BY NULLIF(a.LineID, b.LineID))
    FROM #ItemTest a
    LEFT JOIN @RowsToDelete b ON a.ID = b.ID AND a.LineID = b.LineID
    JOIN CountItemsToDelete c ON a.ID = c.ID
)
-- The target table
MERGE #ItemTest t
-- The source table
USING SourceItems s
-- Matching criteria: lines up rows from SourceItems exactly with rows
-- from our target table (except for ID=300 which is not in the source) 
ON t.ID = s.ID AND s.LineID = t.LineID
-- LineID2 is not NULL for rows we need to delete
WHEN MATCHED AND s.LineID2 IS NOT NULL THEN
    DELETE
-- LineID2 is NULL for rows where we've calculated the new line number
WHEN MATCHED AND s.LineID2 IS NULL THEN
    UPDATE SET LineID = rn-c;

SELECT *
FROM #ItemTest;

The results shown in the final SELECT clearly indicate that this MERGE query has satisfied our business requirement.

ID   LineID   ProductID
100  1        5
100  2        15
100  3        25
200  1        11
200  2        41
200  3        5
200  4        30
300  1        20

To further improve the performance of the query, you can change the second MATCHED criteria to this, to avoid updating rows where the LineID isn’t changing.

WHEN MATCHED AND s.LineID2 IS NULL AND t.LineID <> rn-c THEN

To do this otherwise in SQL you would first need to DELETE the rows you want to delete, and then run a separate UPDATE to correct the row numbers that need correcting.

Summary of our Learning

Today we have learned about the MERGE query and how it can be used to replace multiple queries in the case of:

  • UPDATE/INSERT (the “Upsert”)
  • DELETE/UPDATE

I encourage you to also read the linked article about hazards present if you utilize the full capability of the MERGE (WHEN NOT MATCHED SOURCE THEN DELETE). The article shows a simple way of limiting the scope of the DELETE to avoid the hazard.

We also learned about SQL table variables, which are very handy under certain circumstances. They do have their drawbacks though, and some day we may blog on the differences between them and temporary tables.

Follow me on Twitter: @DwainCSQL

Copyright © Dwain Camps 2014 All Rights Reserved

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s