Monday, July 25, 2016

Date Dimension - DDL/DML to create and maintain

For any data warehouse or business intelligence work, you are going to need a stock process to create and then periodically maintain a date dimension table. The emphasis here is on maintenance.

Periodically, you'll need to add to the date dimension.  Over the years, I've worked out a fairly simple, but sound date dimension create and update process.   Below are the details:

DateKey


Dates never change, so the datekey can have intelligence. It is one of the few dimensions that can break the rule where a dimension key should never have intelligence. The datakey uses the ISO format:  YYYYMMDD. This allows you to assign a datekey to business data using only the date, without having to reference the date dimension.

The scripts


create schema rds;
go

IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[rds].[Date_dim]') AND type in (N'U'))
DROP TABLE [rds].[Date_dim]
GO

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO

CREATE TABLE [rds].[Date_dim](
 [DateKey] [int] NOT NULL,
 [FullDateAltKey] [date] NULL,
 [DayOfWeekNum] [tinyint] NULL,
 [DayOfWeekName] [nvarchar](10) NULL,
 [DayOfMonthNum] [tinyint] NULL,
 [DayOfYearNum] [smallint] NULL,
 [WeekOfYearNum] [tinyint] NULL,
 [MonthOfYearName] [nvarchar](10) NULL,
 [MonthOfYearNum] [tinyint] NULL,
 [CalendarQuarter] [tinyint] NULL,
 [CalendarYear] [smallint] NULL,
 [CalendarSemester] [tinyint] NULL,
 [FiscalPeriod] [tinyint] NULL,
 [FiscalQuarter] [tinyint] NULL,
 [FiscalYear] [smallint] NULL,
 [FiscalSemester] [tinyint] NULL,
 [FiscalDayOfPeriodNum] [tinyint] NULL,
 [FiscalDayOfYearNum] [smallint] NULL,
 [FiscalWeekOfYearNum] [tinyint] NULL,
 [FiscalPeriodOfYearName] [nvarchar](10) NULL,
 [SimpleCalendarDate] [varchar](20) NULL,
 [SimpleFiscalDate] [varchar](20) NULL,
 [SimpleCalendarMonth] [varchar](20) NULL,
 [SimpleCalendarQtr] [char](5) NULL,
 [SimpleFiscalQtr] [char](5) NULL,
 [SimpleFiscalYear] [char](7) NULL,
 [SimpleCalendarYear] [char](7) NULL,
 [CalendarQuarterDesc] [char](10) NULL,
 [FiscalQuarterDesc] [char](10) NULL,
 [CalendarSemesterDesc] [char](10) NULL,
 [FiscalSemesterDesc] [char](10) NULL,
 [SimpleFiscalPeriod] [varchar](20) NULL,
 [SimpleFiscalPeriodNo] [int] NULL,
 [FiscalPeriodLabel] [nchar](2) NULL,

) ON [PRIMARY]

GO
SET ANSI_PADDING OFF
GO

IF  EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[rds].[Date_dim]') AND name = N'PK_rds_Date_dim_DateKey')
DROP INDEX [PK_rds_Date_dim_DateKey] ON [rds].[Date_dim] WITH ( ONLINE = OFF )
GO

CREATE UNIQUE CLUSTERED INDEX [PK_rds_Date_dim_DateKey] ON [rds].[Date_dim] 
(
 [DateKey] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO

The following section, adds the core date data, Typically, this is added one year at a time.
But, it can be extended to insert as many years as needed.


/* -----------------------------------------------------
   -----------------------------------------------------
   
 Process to add records to the rds.Date_dim
 In production, dates are added one year at a time. 
    Alternatively, you could front load the table with 
    20-30 years into future.

 @intLOOPend= 370     -- added as simple infinite loop break, 
             
 @dtStartDate = 'jan 1, 2014'  -- calendar year start
 @dtEndDate = 'december 31, 2014'; -- calendar year end

 Fiscal dates vary, so they are not included
   -----------------------------------------------------  */

set nocount on;
SET DATEFIRST 7;  -- 7=default=sunday, monday=1
go


declare @intLOOPcnt int, @intLOOPend int , @dtStartDate datetime, @dtEndDate datetime, @dtProcessDate datetime;

-- KEY SETTINGS / INPUTS
Select @intLOOPcnt = 0,@intLOOPend= 4000
, @dtStartDate = 'Jan 1, 2000', @dtEndDate = 'December 31, 2004';

-- init date to be processed
select @dtProcessDate = @dtStartDate;

while @intLOOPcnt < @intLOOPend AND @dtProcessDate <= @dtEndDate
  begin

   /*debug*/ --print 'test' + cast(@dtProcessDate as varchar)
  BEGIN TRY
    -- insert 
        
  insert into rds.Date_dim
   ( dateKey, fullDateAltKey, dayOfWeekNum, DayOfWeekName, dayOfMonthNum, dayOfYearNum
   , weekOfYearNum
   , MonthOfYearName, monthOfYearNum
   , calendarQuarter, calendarYear, calendarSemester
   --, fiscalYear, FiscalDayOfYearNum,FiscalPeriodLabel
   --, fiscalQuarter, fiscalSemester
   )

  select
    dateKey   = cast(CONVERT( varchar(8),  @dtProcessDate , 112) as int)
   , fullDateAltKey = CONVERT( varchar(8),  @dtProcessDate , 112)
   , dayOfWeekNum  = datepart(weekday,@dtProcessDate) 
   , DayOfWeekName  = datename(weekday,@dtProcessDate)
   , dayOfMonthNum  = datepart(day,@dtProcessDate)
   , dayOfYearNum  = datepart(dayofyear,@dtProcessDate)
   , weekOfYearNum  = datepart(week,@dtProcessDate)
   , MonthOfYearName = datename(month,@dtProcessDate)
   , monthOfYearNum = datepart(month,@dtProcessDate)
   , calendarQuarter = datepart(quarter,@dtProcessDate)
   , calendarYear  = datepart(year,@dtProcessDate)
   , calendarSemester  = case when datepart(quarter,@dtProcessDate) in (1,2) then 1 else 2 end 
   --, fiscalYear        = datepart(year,@dtProcessDate)
   --, FiscalDayOfYearNum = datepart(dayofyear,@dtProcessDate) 
   --, FiscalPeriodLabel  = '  '
   ;

  END TRY
  BEGIN CATCH

   print 'CATCH - error: ' + cast(@@error as varchar) + ',  ProcessDate:  '+ cast(@dtProcessDate as varchar);
  END CATCH
      
  -- next item
  select @intLOOPcnt = @intLOOPcnt + 1;
  select @dtProcessDate = dateadd(day,1,@dtProcessDate );

  end
go

This final section simply updates the few remaining values. To keep the build script simple, I separated this out of the main insert. You could include it, but this is run infrequently, and simplicity is more important than having an impressive, but complex insert statement.

-------------------------------------------------------------------------------------------------------------------------
-- OTHER CALENDAR VALUES FOR UPDATE 
-------------------------------------------------------------------------------------------------------------------------
     
update d set 
    -- select top 50 
  SimpleCalendarDate = MonthOfYearName + ' ' + CAST(DayOfMonthNum as varchar) + ', ' + cast(CalendarYear as varchar)
 ,SimpleCalendarMonth = MonthOfYearName + ' ' + cast(CalendarYear as varchar)
 ,SimpleCalendarQtr = 'CY Q' + CAST(CalendarQuarter as CHAR(1))
 ,SimpleCalendarYear = 'CY ' +  cast(CalendarYear as varchar)
 ,CalendarQuarterDesc = 'Q' + CAST(CalendarQuarter as CHAR(1)) + ' CY ' + cast(CalendarYear as varchar)
 ,CalendarSemesterDesc = 'H' + cast(CalendarSemester as CHAR(1)) + ' CY ' + cast(CalendarYear as varchar)

    from rds.date_dim as d 
 WHERE d.DateKey between 20050101 and 20051231;


No comments: