There are two basic techniques for updating one table with information from another, in this example, we are updating an OLAP Dimension table with the results from an OLTP table.
 
Flush and Fill Technique
The first and easiest method is called the “flush and fill” technique. Basically, you remove all of the data from the Dimension table and fill it with data from an OLTP table. This is the simplest method by far, but can be extremely inefficient if we’re talking about tables with a lot of rows. The only thing you need to decide here is whether you are going to use the DELETE command or the TRUNCATE command.
 
DELETE COMMAND
Did you know that if you run the following command…
 
DELETE FROM [dbo].[DimDeliverable]
 
…and the CUSTOMER table contains 5,000, that the command will be executed 5,000 times, once for each row, because it deletes them one-by-one? Also, if you use the DELETE command, your IDENTITY key does not automatically RESEED itself. You would also need to explicitly reseed the identity field…
DBCCCHECKIDENT(‘DimDeliverable’,RESEED)
…or use this statement to reseed the identity field to start at a specific value.
DBCC CHECKIDENT (‘DimDeliverable’, 1)
 
TRUNCATE COMMAND
An alternative to the delete command is the TRUNCATE command. The nice thing about the truncate command is that it is executed only once, so it is much more efficient and it also automatically reseeds your identity key. How does it work? Well, the truncate command deallocates data pages which internally store the data in SQL Server. One thing to note, however, is that while the truncate command is the quickest way to clear out a SQL table, it cannot be used on a table with foreign key constraints. You would first need to remove the constraints as shown in the following example:
 
— temporarily remove constraint
ALTERTABLE[dbo].[DimDeliverable]DROPCONSTRAINT[PK_DimDeliverable]
 
— truncate table
TRUNCATETABLE[dbo].[DimDeliverable]
 
— re-add constraint
ALTER TABLE [dbo].[DimDeliverable] ADD CONSTRAINT [PK_DimDeliverable] PRIMARY KEY ([DeliverableKey])
 
Incremental Loading Technique
This technique is much more efficient than the “flush and fill” technique in that it only updates the items which have changed. While it is more efficient, it is definitely not simpler, so you will probably use the “flush and fill” technique on your smaller dimension tables. As of SQL 2008, a new command was introduced to support the “incremental loading” technique called MERGE.
Type 1:  Updates are overwritten and history of changes is not kept. 
This statement does the following: 1) if a record does not exist, create it; 2) if a record exists but is different, update it; 3) if a record exists in the target but not in the source, then deactivate it.
 
use CatapultDatawarehouse
Go
— Declare Variables
DECLARE @BC int = 570
— TYPE 1 (Overwrites history) – DimDeliverable:  CATINDB06.CAT_DB.project_deliverable
BEGIN TRY — perform in transaction so if it fails values are rolled back
      MERGE DimDeliverable AS [Target] — begin merge statements (merge statements end with a semi-colon)
      USING CATINDB06.CAT_DB.project_deliverable AS [Source]
      ON[Target].[DeliverableId]=[Source].[deliverable_id]
      WHEN MATCHED AND — record exists but values are different
      (
            [Target].[Name] <> [Source].[deliverable_name]
            OR [Target].[EstimatedHours] <> [Source].[est_hours]
            OR [Target].[EstimatedRate] <> [Source].[est_billing_rate]
            OR [Target].[EstimatedRevenue] <> [Source].[total_amount]
      )
      THEN UPDATE SET — update records (Type 1 means record values are overwritten)
            [Target].[Name] = [Source].[deliverable_name]
            , [Target].[EstimatedHours] = [Source].[est_hours]
            , [Target].[EstimatedRate] = [Source].[est_billing_rate]
            , [Target].[EstimatedRevenue] = [Source].[total_amount]
      WHEN NOT MATCHED BY TARGET — record does not exist
      THEN INSERT — insert record
      (
            [Name]
            ,[EstimatedHours]
            ,[EstimatedRate]
            ,[EstimatedRevenue]
      )
      VALUES
      (
            [Source].[deliverable_name]
            , [Source].[est_hours]
            , [Source].[est_billing_rate]
            , [Source].[total_amount]
      )
      WHEN NOT MATCHED BY SOURCE — record exists in target but not source
      THEN DELETE — delete from target
      OUTPUT $action, inserted.*, deleted.*; — output results
      COMMIT TRAN
END TRY
BEGIN CATCH
      ROLLBACK TRAN
ENDCATCH
Go
 
Type 2:  Slowly Changing Dimensions track historical changes. 
In this case the merge statement is simply checking to see if a row exists but contains changes.  If this is the case then the merge statement simply deactivates that row but keeps its data intact for historical purposes.  A new row then is created and marked as the current (most up-to-date) information using an INSERT statement that is wrapped around the SQL MERGE.
 
— TYPE 2 (Tracks history) – DimDeliverable:  CATINDB06.CAT_DB.project_deliverable
BEGIN TRY — perform in transaction so if it fails values are rolled back
      INSERT INTO DimDeliverable
      (
            [Name]
            , [EstimatedHours]
            , [EstimatedRate]
            , [EstimatedRevenue]
            , [IsCurrentRow]
      )
      SELECT
            [deliverable_name]
            , [est_hours]
            , [est_billing_rate]
            , [total_amount]
            , 1
      FROM
      (
            MERGE DimDeliverable AS [Target] — begin merge statements (merge statements end with a semi-colon)
            USING CATINDB06.CAT_DB.project_deliverable AS [Source]
            ON [Target].[DeliverableId] = [Source].[deliverable_id] AND [Target].[IsCurrentRow] = 1
            WHEN MATCHED AND — record exists but values are different
            (
                  [Target].[Name] <> [Source].[deliverable_name]
                  OR [Target].[EstimatedHours] <> [Source].[est_hours]
                  OR [Target].[EstimatedRate] <> [Source].[est_billing_rate]
                  OR [Target].[EstimatedRevenue] <> [Source].[total_amount]
            )
            THEN UPDATE SET — update records (Type 1 means record values are overwritten)
                  [Target].[IsCurrentRow] = 0
                  , [Target].[ValidTo] = GETDATE()
            WHEN NOT MATCHED BY TARGET — record does not exist
            THEN INSERT — insert record
            (
                  [Name]
                  ,[EstimatedHours]
                  ,[EstimatedRate]
                  ,[EstimatedRevenue]
                  , [IsCurrentRow]
            )
            VALUES
            (
                  [Source].[deliverable_name]
                  , [Source].[est_hours]
                  , [Source].[est_billing_rate]
                  , [Source].[total_amount]
                  , 1
            )
            WHEN NOT MATCHED BY SOURCE — record exists in target but not source
            THEN DELETE — delete from target
            OUTPUT $action AS Action, [Source].* — output results
      ) AS MergeOutput
      WHERE
            MergeOutput.Action = ‘UPDATE’
      ;
      COMMIT TRAN
END TRY
BEGIN CATCH
      ROLLBACK TRAN
ENDCATCH
Go
 
Advertisements