T-SQL MERGE

Stupid T-SQL Tricks – Part 1: Logarithms

Posted on Updated on

Not just for math geeks, logarithms and their inverse functions (the exponentials) can have useful and often unexpected capabilities in T-SQL.

    “Heeeeeere’s Johnny!” – Ed McMahon introducing Johnny Carson for the Tonight Show

If you’re old enough to remember the Tonight Show hosted by Johnny Carson, how can you not remember the iconic Stupid Pet Tricks skits, and the later tributes to those done by David Letterman?

While Carson’s animal antics were meant to entertain, my Stupid T-SQL Tricks are meant to be thought provoking and also add a tool or two to your T-SQL tools chest in the event that the need arises. So maybe they aren’t so stupid after all (and besides it’s my blog and I like the title). This will be the first of a series of blogs offering two to four specialty tips that are all somehow related. Today’s tricks will all utilize logarithms to do their magic.

Calculate a Factorial

Statisticians will recognize the factorial, a number that is represented as N!, where n=0, 1, 2, 3, … and for example 4! = 1*2*3*4 = 24. In the somewhat distant past I had the chutzpah to suggest that it could be done with a recursive Common Table Expression (CTE). Although in my article Exploring Recursive CTEs by Example, I wasn’t really advocating that approach. It was more like I was trying to help folks better understand how recursive CTEs work. Nowadays, I would never do it that way because there is something better.

Let’s start with a Tally table that generates numbers from 1 to 10. In our query below, we’ll SUM those numbers.

WITH Tally(n) AS
(
    SELECT TOP 10 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(b)
    CROSS JOIN(VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(c)
)
SELECT SUM(n)
FROM Tally;

-- Results:
55

Too bad there isn’t a PROD() built-in function in T-SQL. Or is there? Consider logarithms and how they can convert a SUM() into a PROD() using a few simple calls to the right built-in functions in the proper sequence. Let’s try these two queries:

WITH Tally(n) AS
(
    SELECT TOP 10 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(b)
    CROSS JOIN(VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(c)
)
SELECT EXP(SUM(LOG(n)))
FROM Tally;

SELECT 1*2*3*4*5*6*7*8*9*10;

-- Results:
3628800.00000001
3628800

Those two results are remarkably close, the first one of course being slightly imprecise due to the fact that LOG and EXP are both returning a FLOAT data type. Since we know a factorial number must be an integer, we can convert it as follows, so now it returns the exact integer value of interest (3628800).

WITH Tally(n) AS
(
    SELECT TOP 10 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(b)
    CROSS JOIN(VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(c)
)
SELECT CAST(ROUND(EXP(SUM(LOG(n))), 0) AS BIGINT)
FROM Tally;

While it unlikely that you’ll often have the need to calculate factorials in T-SQL, you may occasionally have the need to calculate a product across multiple rows for a column. So this technique will work for that also.

The Running Product

Using SQL 2012, we can use this same technique to calculate a running product (similar to a running total) across multiple rows in a table. To do this, we’ll use a window frame.

WITH Tally(n) AS
(
    SELECT TOP 10 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(b)
    CROSS JOIN(VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(c)
)
SELECT n
    ,CAST(ROUND(EXP(
        SUM(LOG(n)) OVER
            (
                ORDER BY n ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
            )
        ), 0) AS BIGINT)
FROM Tally;

-- Results:
1      1
2      2
3      6
4      24
5      120
6      720
7      5040
8      40320
9      362880
10     3628800

Be careful though! These numbers are going to grow really quickly, so hopefully you won’t be doing this over many, many rows.

Coded, Numeric Fields with and without Leading Zeroes

In many applications you’ll find columns that are coded as numbers, but are stored as characters. This is, of course, a best practice when you don’t expect to be doing arithmetic with the contents. Let’s use a bit of a contrived example in the guise of a Sales Order Details table:

CREATE TABLE #SO_Details
(
    SO_Number           VARCHAR(11)
    ,SO_Detail          VARCHAR(4)
    ,Quantity           INT
    ,Price              MONEY
    ,Extended_Price     AS (Quantity*Price)
);

INSERT INTO #SO_Details
    (
        SO_Number, SO_Detail, Quantity, Price
    )
VALUES ('2013SO00001', '1', 3, 15.20),('2013SO00001', '2', 3, 9.40)
    ,('2013SO00001', '3', 1, 11.50),('2013SO00001', '4', 2, 11.55)
    ,('2013SO00001', '5', 2, 14.30),('2013SO00001', '6', 10, 13.32)
    ,('2013SO00001', '7', 5, 19.42),('2013SO00001', '8', 6, 10.15)
    ,('2013SO00001', '9', 4, 12.15);

SELECT *
FROM #SO_Details
ORDER BY SO_Number, SO_Detail;

-- Results:
SO_Number     SO_Detail Quantity   Price   Extended_Price
2013SO00001   1         3          15.20   45.60
2013SO00001   2         3          9.40    28.20
2013SO00001   3         1          11.50   11.50
2013SO00001   4         2          11.55   23.10
2013SO00001   5         2          14.30   28.60
2013SO00001   6         10         13.32   133.20
2013SO00001   7         5          19.42   97.10
2013SO00001   8         6          10.15   60.90
2013SO00001   9         4          12.15   48.60

We’ve used a computed column to calculate Extended_Price as Quantity * Price. Let’s suppose that we now need to insert an additional SO_Detail row.

BEGIN TRANSACTION T1;

INSERT INTO #SO_Details
    (
        SO_Number, SO_Detail, Quantity, Price
    )
VALUES ('2013SO00001', '10', 2, 18.88);

SELECT *
FROM #SO_Details
ORDER BY SO_Number, SO_Detail;

ROLLBACK TRANSACTION T1;

-- Results:
SO_Number     SO_Detail Quantity   Price   Extended_Price
2013SO00001   1         3          15.20   45.60
2013SO00001   10        2          18.88   37.76
2013SO00001   2         3          9.40    28.20
2013SO00001   3         1          11.50   11.50
2013SO00001   4         2          11.55   23.10
2013SO00001   5         2          14.30   28.60
2013SO00001   6         10         13.32   133.20
2013SO00001   7         5          19.42   97.10
2013SO00001   8         6          10.15   60.90
2013SO00001   9         4          12.15   48.60

Oh my! Those weren’t the results we wanted because our new record (SO_Detail=10) is sorted into the wrong spot! Of course, this can easily be addressed by changing the ORDER BY so that it CASTs the SO_Detail column to INT, but then that’s why I said this example is a bit contrived. Note that we rolled back the transaction so that our new record 10 was not really saved in the table.

We could use our newfound knowledge of logarithms to address this. In our prior example, we used a natural log, but in this example we’ll use a base 10 logarithm. Let’s take a look at a property of base 10 logarithms that we can use to add a leading zero to each of the entries from 1 to 9. To show this property, we’ll fall back to our Tally table.

WITH Tally(n) AS
(
    SELECT TOP 100 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(b)
    CROSS JOIN(VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(c)
)
SELECT n, LOG10(n), FLOOR(LOG10(n))
    ,POWER(10, 1+FLOOR(LOG10(n)))
    ,MAX(n) OVER (PARTITION BY NULL)
FROM Tally
WHERE n IN(1, 9, 10, 11, 99, 100);

-- Results:
1     0                   0    10     100
9     0.954242509439325   0    10     100
10    1                   1    100    100
11    1.04139268515823    1    100    100
99    1.99563519459755    1    100    100
100   2                   2    1000   100

Let’s say that we know that our range of SO_Detail is 1 through 9. In that case, we can use a formula like the following to prepend exactly the right number of leading zeroes (each is shown for the number 9).

SELECT RIGHT(10+9, 1), RIGHT(100+9, 2), RIGHT(1000+9, 3);

-- Results:
9     09   009

The numbers 10, 100 and 1000, as well as the 1, 2 and 3 can be derived directly from the results returned by the former query. Let’s put this all together.

WITH Tally(n) AS
(
    SELECT TOP 100 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(b)
    CROSS JOIN(VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(c)
)
SELECT n, RIGHT(POWER(10, 1+FLOOR(LOG10(m))) + n, 1+FLOOR(LOG10(m)))
FROM
(
    SELECT n, m=MAX(n) OVER (PARTITION BY NULL)
    FROM Tally
) a
WHERE n IN (1, 9, 10, 11, 99, 100);

-- Results:
1      001
9      009
10     010
11     011
99     099
100    100

You see, our new formula using RIGHT not only gets us character strings. It also prepends exactly the correct number of zeroes so that the new result is sortable directly on the resulting character string.

Let’s return to our SO Details table and see how we can apply this new formula to normalize our detail line numbers.

BEGIN TRANSACTION T1;

INSERT INTO #SO_Details
    (
        SO_Number, SO_Detail, Quantity, Price
    )
VALUES ('2013SO00001', '10', 2, 18.88);

WITH SO_Details AS
(
    SELECT SO_Number, SO_Detail, Quantity, Price
        ,m=MAX(CAST(SO_Detail AS INT)) OVER (PARTITION BY NULL)
    FROM #SO_Details
    WHERE SO_Number = '2013SO00001'
)
UPDATE SO_Details
SET SO_Detail = RIGHT(POWER(10, 1+FLOOR(LOG10(m))) + SO_Detail, 1+FLOOR(LOG10(m)));

SELECT *
FROM #SO_Details
ORDER BY SO_Number, SO_Detail;

ROLLBACK TRANSACTION T1;

-- Results:
SO_Number     SO_Detail  Quantity   Price   Extended_Price
2013SO00001   01         3          15.20   45.60
2013SO00001   02         3          9.40    28.20
2013SO00001   03         1          11.50   11.50
2013SO00001   04         2          11.55   23.10
2013SO00001   05         2          14.30   28.60
2013SO00001   06         10         13.32   133.20
2013SO00001   07         5          19.42   97.10
2013SO00001   08         6          10.15   60.90
2013SO00001   09         4          12.15   48.60
2013SO00001   10         2          18.88   37.76

Now our results are sorted as expected. Once again we have rolled back the transaction so we are left with only 9 rows in our #SO_Details table and single digit SO_Detail values.

Let’s complicate our example just a bit and add a table of transactions we’d like to process, consisting of inserts, updates and deletes and see if we can do the whole thing with a MERGE. First, the transactions table and some sample transactions.

CREATE TABLE #SO_Trans
(
    [action]            CHAR(1)
    ,SO_Number          VARCHAR(11)
    ,SO_Detail          VARCHAR(4)
    ,Quantity           INT
    ,Price              MONEY
);

INSERT INTO #SO_Trans
    (
        [action], SO_Number, SO_Detail, Quantity, Price
    )
VALUES ('U', '2013SO00001', '5', 15, 12.23)
    ,('I', '2013SO00001', NULL, 4, 16.15)
    ,('I', '2013SO00001', NULL, 3, 12.22)
    ,('D', '2013SO00001', '3', NULL, NULL);

SELECT *
FROM #SO_Trans;

-- Results:
action SO_Number     SO_Detail   Quantity   Price
U      2013SO00001   5           15         12.23
I      2013SO00001   NULL        4          16.15
I      2013SO00001   NULL        3          12.22
D      2013SO00001   3           NULL       NULL

For INSERTs represented in the transactions table, we do not need an SO_Detail number. For DELETEs we do not need Quantity or Price. For UPDATEs we need all three. In order to do a MERGE, we need to consider the following:

  • What is the target table?
  • What is the source table?
  • What are the matching criteria and what operations are performed on which kinds of matches?

The target table is easy, although not so easy as just the #SO_Details table because we need to only include SOs that are in the #SO_Trans table, so we’ll start with that:

-- Target table
SELECT SO_Number, SO_Detail, Quantity, Price
FROM #SO_Details
WHERE SO_Number IN(SELECT SO_Number FROM #SO_Trans);

We’ll put the target table into a CTE for use in the MERGE. Coincidentally the results are the 9 original rows we put into the table, because there is only one SO_Number in the #SO_Trans table. We do need to include all rows for that SO because any of them may be subject to an UPDATE to change the SO_Detail number.

MERGE throws an exception if you try to update any row in the target table more than once, so we need to carefully construct the source table so that there is one and only one matching row for each row in the target.

WITH Target4MERGE AS
(
    SELECT SO_Number, SO_Detail, Quantity, Price
    FROM #SO_Details
    WHERE SO_Number IN(SELECT SO_Number FROM #SO_Trans)
)
SELECT [action]=ISNULL(b.[action], 'U')
    ,SO_Number=ISNULL(a.SO_Number, b.SO_Number)
    ,a.SO_Detail
    ,Quantity=ISNULL(b.Quantity, a.Quantity)
    ,Price=ISNULL(b.Price, a.Price)
FROM Target4MERGE a
FULL JOIN #SO_Trans b
ON a.SO_Number = b.SO_Number AND a.SO_Detail = b.SO_Detail;

-- Results:
action  SO_Number    SO_Detail  Quantity  Price
I       2013SO00001  NULL       4         16.15
I       2013SO00001  NULL       3         12.22
U       2013SO00001  1          3         15.20
U       2013SO00001  2          3         9.40
D       2013SO00001  3          1         11.50
U       2013SO00001  4          2         11.55
U       2013SO00001  5          15        12.23
U       2013SO00001  6          10        13.32
U       2013SO00001  7          5         19.42
U       2013SO00001  8          6         10.15
U       2013SO00001  9          4         12.15

Notes about these results:

  • The two detail lines to be INSERTed have a NULL SO_Detail number because we have not yet determined what it is to be. We’ll be trying to add them at the end.
  • Note the Quantity and Price (15 and 12.23) for SO_Detail=5, which are the values we want to UPDATE.

The key to the matching will be to construct a ROW_NUMBER() that we can use as the new SO_Detail number. So let’s add that:

WITH Target4MERGE AS
(
    SELECT SO_Number, SO_Detail, Quantity, Price
    FROM #SO_Details
    WHERE SO_Number IN(SELECT SO_Number FROM #SO_Trans)
),
    PreSource AS
(
    SELECT [action], SO_Number, SO_Detail, Quantity, Price
        ,r=ROW_NUMBER() OVER
            (
                PARTITION BY SO_Number
                    ,CASE [action] WHEN 'D' THEN 0 ELSE 1 END
                ORDER BY CASE
                         WHEN SO_Detail IS NULL
                         THEN 10000
                         ELSE SO_Detail
                         END
            )
    FROM
    (
        SELECT [action]=ISNULL(b.[action], 'U')
            ,SO_Number=ISNULL(a.SO_Number, b.SO_Number)
            ,a.SO_Detail
            ,Quantity=ISNULL(b.Quantity, a.Quantity)
            ,Price=ISNULL(b.Price, a.Price)
        FROM Target4MERGE a
        FULL JOIN #SO_Trans b
        ON a.SO_Number = b.SO_Number AND a.SO_Detail = b.SO_Detail
    ) a
)
SELECT [action], SO_Number, SO_Detail, Quantity, Price, r
    ,m=MAX(r) OVER
        (
            PARTITION BY SO_Number, CASE [action] WHEN 'D' THEN 0 ELSE 1 END
        )
FROM PreSource;

-- Results:

action  SO_Number    SO_Detail  Quantity  Price  r   m
D       2013SO00001  3          1         11.50  1   1
U       2013SO00001  1          3         15.20  1   10
U       2013SO00001  2          3         9.40   2   10
U       2013SO00001  4          2         11.55  3   10
U       2013SO00001  5          15        12.23  4   10
U       2013SO00001  6          10        13.32  5   10
U       2013SO00001  7          5         19.42  6   10
U       2013SO00001  8          6         10.15  7   10
U       2013SO00001  9          4         12.15  8   10
I       2013SO00001  NULL       4         16.15  9   10
I       2013SO00001  NULL       3         12.22  10  10

Additional comments on the latest query and results:

  • The PARTITION we created for our ROW_NUMBER() separates the record to be deleted from the remainder.
  • In the ORDER BY for our ROW_NUMBER() we substituted 10000 (one more than the maximum number of rows we can have per SO detail line) to make the INSERTed records move to the bottom.
  • We enclosed this query in a CTE, because we also want to use the MAX window aggregate function.
  • For that, the PARTITION is identical to what we used for ROW_NUMBER().
  • The result in the r column (except for the [action]=’D’ row) is the new SO_Detail number.
  • The result in m will be used when we convert our integers to characters with leading numbers.

All that’s left is to construct the final MERGE statement, so here it is.

WITH Target4MERGE AS
(
    SELECT SO_Number, SO_Detail, Quantity, Price
    FROM #SO_Details
    WHERE SO_Number IN(SELECT SO_Number FROM #SO_Trans)
),
    PreSource AS
(
    SELECT [action], SO_Number, SO_Detail, Quantity, Price
        ,r=ROW_NUMBER() OVER
            (
                PARTITION BY SO_Number
                    ,CASE [action] WHEN 'D' THEN 0 ELSE 1 END
                ORDER BY CASE
                         WHEN SO_Detail IS NULL
                         THEN 10000
                         ELSE SO_Detail
                         END
            )
    FROM
    (
        SELECT [action]=ISNULL(b.[action], 'U')
            ,SO_Number=ISNULL(a.SO_Number, b.SO_Number)
            ,a.SO_Detail
            ,Quantity=ISNULL(b.Quantity, a.Quantity)
            ,Price=ISNULL(b.Price, a.Price)
        FROM Target4MERGE a
        FULL JOIN #SO_Trans b
        ON a.SO_Number = b.SO_Number AND a.SO_Detail = b.SO_Detail
    ) a
),
    Source4MERGE AS
(
    SELECT [action], SO_Number, SO_Detail, Quantity, Price, r
        ,m=1+FLOOR(LOG10(
            MAX(r) OVER
                (
                    PARTITION BY SO_Number
                        ,CASE [action] WHEN 'D' THEN 0 ELSE 1 END
                )))
    FROM PreSource
)
MERGE Target4MERGE t
USING Source4MERGE s
ON s.SO_Number = t.SO_Number AND s.SO_Detail = t.SO_Detail
WHEN MATCHED AND [action] = 'D'
THEN DELETE
WHEN MATCHED AND [action] = 'U'
THEN UPDATE
SET SO_Detail = RIGHT(POWER(10, m) + s.r, m)
    ,Quantity = s.Quantity
    ,Price    = s.Price
WHEN NOT MATCHED AND [action] = 'I'
THEN INSERT 
    (
        SO_Number, SO_Detail, Quantity, Price
    )
VALUES 
    (
        s.SO_Number
        ,RIGHT(POWER(10, m) + s.r, m)
        ,s.Quantity
        ,s.Price
    );

SELECT *
FROM #SO_Details;

-- Results:
SO_Number    SO_Detail  Quantity  Price   Extended_Price
2013SO00001  01         3         15.20   45.60
2013SO00001  02         3         9.40    28.20
2013SO00001  03         2         11.55   23.10
2013SO00001  04         15        12.23   183.45
2013SO00001  05         10        13.32   133.20
2013SO00001  06         5         19.42   97.10
2013SO00001  07         6         10.15   60.90
2013SO00001  08         4         12.15   48.60
2013SO00001  09         4         16.15   64.60
2013SO00001  10         3         12.22   36.66

Now that’s one heck of a complicated looking query and it could probably do with some comments, but you can get those as needed from the descriptions I’ve provided. A few more additional notations may be warranted.

  • We applied our 1+FLOOR+LOG10 formula around the MAX() window aggregate function to save a bit of complexity when assigning SO_Detail in the MERGE.
  • The MERGE uses MATCHED twice, segregating DELETEs from UPDATEs based on [action].
  • The NOT MATCHED clause INSERTs the new rows specified by the #SO_Trans table.

Conclusions

You may be asking why not save all of this computational convolution and just always use a 4 digit SO_Detail value with as many leading zeroes as necessary because that’s pretty easy with something like:

SELECT RIGHT('0000'+CAST(SO_Detail AS VARCHAR(4)), 4);

The answer is, yes we could have done that if our business requirements allowed for always having 4 digits in the SO_Detail column. But frankly, this was much more fun and we got to learn a little about logarithms along the way. And I did warn you by calling this article Stupid T-SQL Tricks, now didn’t I?

Still we may have learned a few things, like how to calculate factorials, creating a pseudo-PROD() aggregate, how base 10 logarithms can be used to find the number of digits in a number and how to methodically construct a complex MERGE query one step at a time. All in all, that alone probably made it worthwhile for me to post this blog.

Perhaps it wasn’t quite as entertaining as Johnny Carson’s Stupid Pet Tricks skits, but we hope you enjoyed it anyway.

Follow me on Twitter: @DwainCSQL

Copyright © Dwain Camps 2014 All Rights Reserved

Advertisements

Using the T-SQL MERGE Statement

Posted on Updated on

In SQL Server 2008, Microsoft added a new SQL query type: the MERGE statement. This flexible query provides the ability to perform INSERTs, UPDATEs and even DELETEs all within a single statement. Used in combination with Common Table Expressions (CTEs), this can be a powerful tool to replace multiple SQL queries under the right circumstances.

One important rule to keep in mind when using MERGE, is that the statement must be terminated by a semicolon (;).

Case Study 1: A Simple Upsert

The most common usage of the MERGE statement is to perform what is colloquially called an “upsert,” which is really a diminutive form of UPDATE/INSERT. Without further preamble, let’s set up some test data and get to seeing how the MERGE statement can simplify your life.

CREATE TABLE #Master
(
    [key]       INT IDENTITY PRIMARY KEY
    ,name       VARCHAR(10)
);

INSERT INTO #Master
VALUES ('Dwain'),('Jeff'),('Paul')
    ,('Anon'),('Ralph'),('Tom'),('Sally');

CREATE TABLE #Staging
(
    [key]       INT PRIMARY KEY
    ,[NewName]  VARCHAR(10)
);

INSERT INTO #Staging
VALUES (2, 'Bob'),(4, 'Jim'),(6, 'Marvin'), (10, 'Buddy');

SELECT * FROM #Master;
SELECT * FROM #Staging;

The results in the two tables as displayed by the SELECT are:

key name
1   Dwain
2   Jeff
3   Paul
4   Anon
5   Ralph
6   Tom
7   Sally

key NewName
2   Bob
4   Jim
6   Marvin
10  Buddy

Our intention is to update (based on [key] in #Staging) the corresponding row by [key] in #Master. If the [key] in #Staging matches none of our [key] values in #Master, then insert a new row. The new row’s [key] does not need to match the value in the staging table. We can easily do this with a MERGE statement as follows:

MERGE #Master t
USING #Staging s
ON s.[key] = t.[key]
WHEN MATCHED THEN 
    UPDATE SET name = s.[NewName]
WHEN NOT MATCHED THEN 
    INSERT (name) VALUES (s.[NewName]); 

SELECT * FROM #Master;

The final SELECT result is as follows:

key name
1   Dwain
2   Bob
3   Paul
4   Jim
5   Ralph
6   Marvin
7   Sally
8   Buddy

You can see that Bob replaced Jeff, Jim replaced Anon and Marvin replaced Tom, and also that Buddy was added at the end.

The way the statement works is as follow:

  • The table name immediately after the MERGE keyword is the target table, in this case #Master, which we have aliased as t for easy understanding.
  • The USING table is the source, so #Staging will be merged into #Master.
  • The ON keyword represents the matching criteria between the records in the two tables. You should not think of this in the same way that ON appears after a JOIN as it operates quite differently.
  • Following those parts of the statement, are any number of WHEN clauses. The MATCHED criterion indicates a match based on the ON criteria. It can be combined with additional matching criteria if required.
  • NOT MATCHED (implied as BY TARGET), means that when a source row does not exist in the target table, we’re going to do something.
  • Following MATCHED or NOT MATCHED is the keyword THEN followed by either an INSERT or an UPDATE.

You can also use DELETE (instead of UPDATE or INSERT) and if you’d like to learn about how to DELETE rows from the target table, I suggest you read this article to understand exactly how it works: A Hazard of Using the SQL Merge Statement and the potential dangers when using it.

Case Study 2: A More Complicated MERGE

Suppose we have the following sample table and data:

CREATE TABLE #ItemTest
(
    ID INT NOT NULL
    ,LineID INT NOT NULL
    ,ProductID INT NULL
    ,PRIMARY KEY(ID, LineID)
); 

INSERT INTO #ItemTest (ID, LineID, ProductID)
SELECT 100, 1, 5 
UNION ALL SELECT 100, 2, 15 
UNION ALL SELECT 100, 3, 8 
UNION ALL SELECT 100, 4, 25 
UNION ALL SELECT 200, 1, 11 
UNION ALL SELECT 200, 2, 100 
UNION ALL SELECT 200, 3, 41 
UNION ALL SELECT 200, 4, 10 
UNION ALL SELECT 200, 5, 5 
UNION ALL SELECT 200, 6, 30
UNION ALL SELECT 300, 1, 20;

SELECT *
FROM #ItemTest;

From the final SELECT, we see that our data appears as follows:

ID   LineID ProductID
100  1      5
100  2      15
100  3      8
100  4      25
200  1      11
200  2      100
200  3      41
200  4      10
200  5      5
200  6      30
300  1      20

Notice how the entries for each ID contain a sequentially numbered LineID (1 to 4 for ID=100 and 1 to 6 for ID=200). Our business requirement is that we need to delete some rows and at the same time preserve the row numbering for LineID without introducing any gaps. So for example, if we need to delete LineID=3 from ID=100, we need to renumber LineID=4 for that ID to be LineID=3.

Ignoring for the moment that it’s probably poor application design to have this row renumbering requirement, this can be accomplished with a MERGE. Since it is a bit more complicated we’ll develop it in a couple of steps to help you understand. First, let’s say we want to delete three rows. We’ll put those into a table variable (a feature introduced in SQL Server 2005).

DECLARE @RowsToDelete TABLE
(
    ID      INT
    ,LineID INT
    ,PRIMARY KEY (ID, LineID)
);

INSERT INTO @RowsToDelete (ID, LineID)
SELECT 100, 3 UNION ALL SELECT 200, 2 UNION ALL SELECT 200, 4;

Note how we can create a PRIMARY KEY on a table variable. While not needed in this case, if you had lots of rows it will improve the performance of what we’re about to do.

Now we’ll construct the following query which will require some explanation:

SELECT a.ID, a.LineID, ProductID, LineID2=b.LineID
    ,rn=ROW_NUMBER() OVER (PARTITION BY a.ID ORDER BY NULLIF(a.LineID, b.LineID))
    ,XX=NULLIF(a.LineID, b.LineID)
FROM #ItemTest a
LEFT JOIN @RowsToDelete b ON a.ID = b.ID AND a.LineID = b.LineID;

XX is included only to illustrate what NULLIF is doing for us. This produces the following results:

ID   LineID ProductID LineID2  rn   XX
100  3      8         3        1    NULL
100  1      5         NULL     2    1
100  2      15        NULL     3    2
100  4      25        NULL     4    4
200  2      100       2        1    NULL
200  4      10        4        2    NULL
200  1      11        NULL     3    1
200  3      41        NULL     4    3
200  5      5         NULL     5    5
200  6      30        NULL     6    6
300  1      20        NULL     1    1

Each row from #ItemTest is returned because it is the left table of the LEFT JOIN. Matching rows from our @RowsToDelete temporary table have a value in LineID2, while rows not matched have a value of NULL (exactly how you’d expect the LEFT JOIN to work). The result in XX shows us that when the LineID of #ItemTest matches the LineID of @RowsToDelete, we get a NULL and NULL values usually sort first (there is a SQL Server setting that controls this). So in each case, the rows we want to delete are sorted to the top of the grouping (on ID).

For the 3 rows in our @RowsToDelete table, we have 1 for ID=100 and 2 for ID=200 (these counts are easy enough to obtain in SQL). So what happens if we subtract that count from rn?

WITH CountItemsToDelete (ID, c) AS
(
    SELECT ID, COUNT(*)
    FROM @RowsToDelete
    GROUP BY ID
)
SELECT a.ID, a.LineID, ProductID, LineID2=b.LineID
    ,[rn-c]=ROW_NUMBER() OVER (PARTITION BY a.ID ORDER BY NULLIF(a.LineID, b.LineID))-c
FROM #ItemTest a
LEFT JOIN @RowsToDelete b ON a.ID = b.ID AND a.LineID = b.LineID
JOIN CountItemsToDelete c ON a.ID = c.ID;

The results now appear as:

ID   LineID   ProductID LineID2  rn-c
100  3        8         3        0
100  1        5         NULL     1
100  2        15        NULL     2
100  4        25        NULL     3
200  2        100       2        -1
200  4        10        4        0
200  1        11        NULL     1
200  3        41        NULL     2
200  5        5         NULL     3
200  6        30        NULL     4

Note how the row for ID=300 has been eliminated by the INNER JOIN to our Common Table Expression (CTE) CountItemsToDelete. Looking at the [rn-c] column, we see that for rows where LineID2 is not NULL, the value is meaningless. But for rows where LineID2 is NULL, [rn-c] is precisely the final row number we’ll need to assign to LineID after deleting the rows we want to delete! Now we have enough information to write this into a MERGE statement:

WITH CountItemsToDelete (ID, c) AS
(
    SELECT ID, COUNT(*)
    FROM @RowsToDelete
    GROUP BY ID
), 
    SourceItems AS
(
    SELECT a.ID, a.LineID, ProductID, LineID2=b.LineID, c
        ,rn=ROW_NUMBER() OVER (PARTITION BY a.ID ORDER BY NULLIF(a.LineID, b.LineID))
    FROM #ItemTest a
    LEFT JOIN @RowsToDelete b ON a.ID = b.ID AND a.LineID = b.LineID
    JOIN CountItemsToDelete c ON a.ID = c.ID
)
-- The target table
MERGE #ItemTest t
-- The source table
USING SourceItems s
-- Matching criteria: lines up rows from SourceItems exactly with rows
-- from our target table (except for ID=300 which is not in the source) 
ON t.ID = s.ID AND s.LineID = t.LineID
-- LineID2 is not NULL for rows we need to delete
WHEN MATCHED AND s.LineID2 IS NOT NULL THEN
    DELETE
-- LineID2 is NULL for rows where we've calculated the new line number
WHEN MATCHED AND s.LineID2 IS NULL THEN
    UPDATE SET LineID = rn-c;

SELECT *
FROM #ItemTest;

The results shown in the final SELECT clearly indicate that this MERGE query has satisfied our business requirement.

ID   LineID   ProductID
100  1        5
100  2        15
100  3        25
200  1        11
200  2        41
200  3        5
200  4        30
300  1        20

To further improve the performance of the query, you can change the second MATCHED criteria to this, to avoid updating rows where the LineID isn’t changing.

WHEN MATCHED AND s.LineID2 IS NULL AND t.LineID <> rn-c THEN

To do this otherwise in SQL you would first need to DELETE the rows you want to delete, and then run a separate UPDATE to correct the row numbers that need correcting.

Summary of our Learning

Today we have learned about the MERGE query and how it can be used to replace multiple queries in the case of:

  • UPDATE/INSERT (the “Upsert”)
  • DELETE/UPDATE

I encourage you to also read the linked article about hazards present if you utilize the full capability of the MERGE (WHEN NOT MATCHED SOURCE THEN DELETE). The article shows a simple way of limiting the scope of the DELETE to avoid the hazard.

We also learned about SQL table variables, which are very handy under certain circumstances. They do have their drawbacks though, and some day we may blog on the differences between them and temporary tables.

Follow me on Twitter: @DwainCSQL

Copyright © Dwain Camps 2014 All Rights Reserved