Adventures Into DB2: July 2014

Sunday 20 July 2014

Temporal's Infinity

Here is some interesting trivia, lets say you have created a system temporal table, and added in some data, you will see your start and end date automatically updated.
Now clearly the start time, will be the instant the data was added, but the end date, well that is conceptually debatable, since we don't have an end date yet, we will only have that value at some point in the future... or perhaps never. So we need something to represent infinity.

However, in reality, we have little choice, since our schema is set in place, we have to pick something to represent infinity, I guess the two options would have been the largest possible date you can represent, or NULL.

Some what surprisingly, DB2 chose neither of these.. check it out :

tb_YourTable
------------
MY_DATA     SYS_START                        SYS_END
----------- -------------------------------- --------------------------------
          1 2014-07-19-12.14.16.100732000000 9999-12-30-00.00.00.000000000000

I know what you are about to say, "yes they did choose one, they chose the largest representable date", but no, look more carefully.
No I am not being pedantic about the hours mins and seconds, although that would also count.. there is something more blatant.

How many days are there in December? Yes, 31 hmmm interesting they chose 30 - but why, an oversight they just never bother to fix?

Thankfully not, it appears to have more intelligence than that, and it also explains why the date is also at the beginning of the day, not midnight.

Some languages can only represent years up to 9999, and then add into that the issues of timezones, it would theoretically be possible for the value that represents INFINITY to be read into a structure, adjusted for time zones, and then wrap beyond the end... in effect breaking the temporal table.
By giving yourself a full day, you eliminate this issue.

When will this information be useful to you ?... umm don't know, perhaps a trivia game, but hey nice to know anyway :)

Saturday 19 July 2014

System Time Temporal Tables

Following on from the temporal tables overview lets first look at the concept of System Time Temporal tables.

So we remember that temporal tables are all about keeping the history of changes to data, and what is nice about System Time Temporal tables, is that this is entirely a DB stack concept, it is ( or at least can be ) entirely transparent to the application stack.

But I am getting ahead of myself, lets go a bit slower.

If data is going to have a lifetime, then by definition we need a start time and end time, but what values we choose for those times is still up for debate. The first and most obvious choice is to use the time at which we insert data as the start time, and the time at which we change that data to be its end point. That choice of using the times at which the data was touched is what we mean when we talk about System Temporal Tables.

At this point, it does not take a rocket scientist to spot that DB2 can and should be able to help us out a lot here, since it will have very intimate knowledge of when data is changed. And so it is. At an application level, we can do all our CRUD operations as usual oblivious to the temporal nature of this data, and DB2 will automatically make copies of our data when it has changed, and set the start timestamp and end timestamp accordingly.

But to get this right, we will first have to give DB2 the hook points it needs to do the heavy lifting for us, we will need four things

A column for the start timestamp
A column for the end timestamp
A placeholder Timestamp variable for DB2 to use when updating rows.
A table to record history in Which we are be in a position to supply to DB2 when we create our table.

CREATE TABLE tb_YourTable
(
  -- your data
  my_data INT NOT NULL,  
  -- A place for DB2 to add a start date
  sys_start TIMESTAMP(12) NOT NULL GENERATED ALWAYS AS ROW BEGIN,
  -- A place for DB2 to add an end date
  sys_end TIMESTAMP(12) NOT NULL GENERATED ALWAYS AS ROW END,
  -- A place for DB2 to save a transaction time when a row is updated, hidden 'cause we dont care about this
  trans_start TIMESTAMP(12) GENERATED ALWAYS AS TRANSACTION START ID IMPLICITLY HIDDEN,
  -- Bind our timestams to DB2 concept of Sytem_Time
  PERIOD SYSTEM_TIME(sys_start,sys_end)
);
-- make a duplicate table for our update history
CREATE TABLE tb_YourTableHistory LIKE tb_YourTable;
-- tell DB2 to bind our table to the history tables
ALTER TABLE tb_YourTable ADD VERSIONING USE HISTORY TABLE tb_YourTableHistory;

So what are the interesting thing that we can pick up from this syntax above :

First there is no magic behind-the-scenes schema being created, we are in control of all of this, for both the table we are interested in and and the table used to keep history - nice!

There is just some extra syntax that gives DB2 the knowledge to bind the concept of temporal tables to the schema we have created. This is how we give DB2 permission to use our hook points ie the columns that we have created that it can use for the SYSTEM_TIME and the history table we have created which it can use to populate our history.

Clearly we do have to play by some rules regarding the schema, first our TIMESTAMP columns must be TIMESTAMP(12) and our history table must have exactly the same schema as our temporal table ( I like how easy LIKE make this, very DRY )

So now we have a temporal table set up, what happens when we add content to it ?

Well lets insert some data, just like we would if we did not know about this whole temporal thing..

insert into tb_YourTable(my_data) values(1)

tb_YourTable
------------
MY_DATA     SYS_START                        SYS_END
----------- -------------------------------- --------------------------------
          1 2014-07-19-12.14.16.100732000000 9999-12-30-00.00.00.000000000000

Not too surprisingly, we have data ... but equally unsurprising nothing in our history table....

tb_YourTableHistory
-------------------
MY_DATA     SYS_START                        SYS_END
----------- -------------------------------- --------------------------------

right now lets do an update to this table

update tb_YourTable set my_data=2

 tb_YourTable
 ------------
 MY_DATA     SYS_START                        SYS_END
----------- -------------------------------- --------------------------------
          2 2014-07-19-12.17.21.539897000000 9999-12-30-00.00.00.000000000000

We now see that our update took place as expected, well almost... notice that the start time was also updated.

Also if we look at our history table :

 tb_YourTableHisory
 ------------------
  MY_DATA     SYS_START                        SYS_END
----------- -------------------------------- --------------------------------
          1 2014-07-19-12.14.16.100732000000 2014-07-19-12.17.21.539897000000

We can see that it now has some data in it, where the histories start time is exactly the same as our tables original start value and the histories end time is the same as our updates new start value. ie the full time that the old data was valid for.

okay, well that worked as expected.... what about a delete ?

delete tb_YourTable where my_data=2

tb_YourTable
------------
MY_DATA     SYS_START                        SYS_END
----------- -------------------------------- --------------------------------

tb_YourTableHistory
-------------------
MY_DATA     SYS_START                        SYS_END
----------- -------------------------------- --------------------------------
          1 2014-07-19-12.14.16.100732000000 2014-07-19-12.17.21.539897000000
          2 2014-07-19-12.17.21.539897000000 2014-07-19-12.34.11.079309000000

Okay, so "magically" getting our history into our history table is all well and good, but if we don't read it, then who cares... so lets try and query some data from our history, by running a query on our now empty table.

select * from tb_YourTable for system_time as of '2014-07-19-12.14.17'

tb_YourTable
------------
MY_DATA     SYS_START                        SYS_END
----------- -------------------------------- --------------------------------
          1 2014-07-19-12.14.16.100732000000 2014-07-19-12.17.21.539897000000

Notice how we use the "for system_time as of" expression to pull data out of the history table while referencing our data table ( which right now is empty ).

Also notice how the SYS_END time is not some value trying to identify infinity, so we know that the value we have pulled was only true for this time period. There are a few more ways in which we can pull data from our history ( or current table for that matter )

select ... FOR SYSTEM_TIME FROM ... TO ...
select ... FOR SYSTEM_TIME BETWEEN ... AND ...

eg:

select * from tb_YourTable for system_time between '2014-07-19-12.14.15' and '2014-07-20-00.00.00'

tb_YourTable
------------
MY_DATA     SYS_START                        SYS_END
----------- -------------------------------- --------------------------------
          1 2014-07-19-12.14.16.100732000000 2014-07-19-12.17.21.539897000000
          2 2014-07-19-12.17.21.539897000000 2014-07-19-12.34.11.079309000000

sweet...

hmmm but did I not lead this article off with the promise that the application stack would be able to take advantage of all this temporal goodness without changes.. and so far, to get any real value.. we have to learn how to query in a new way ... hmmm, did I lie ?

No wait there is more :) With the use of a nice register "Current Temporal System_Time"

set current temporal system_Time = '2014-07-19-12.14.16.100732000000'

Then run a temporal agnostic query

select * from tb_YourTable

tb_YourTable
------------
MY_DATA     SYS_START                        SYS_END
----------- -------------------------------- --------------------------------
          1 2014-07-19-12.14.16.100732000000 2014-07-19-12.17.21.539897000000

ohhh did we just do a time shift from under our application nose .. nice To turn it off, we simply set the time to NULL

set current temporal system_time = NULL

All well and good, but if you cast you mind back to my overview, I spoke about the salary scenario, and if you are astute, you would know that this current solution would not actually solve that problem.. for that we need to look at the concept of Business Time Temporal Tables... but that is another post for the future.

Friday 11 July 2014

Temporal Tables - Overview

The first feature I have looked at in DB2 is the concept of temporal tables, which as its name implies ( albeit with one of those words that makes something sound more complex that it needs to be ) is related to time.

So, what are temporal tables all about, simply the addition of a lifetime concept to data. Each piece of data ( at a row level ) is given a lifetime, i.e. a fixed start date and end date.

While time is clearly the key data behind this feature, I think the core concept is actually better described as tracking changes, and the offering of first order principle support in DB2 to do this.

In some respects you could argue this is like adding version control to your data.

Why do I want to add more time information to my data ?

Yup, sounds like a fair question, since if you poke around I am sure you will find there is already time information for your data, however that is always just the time the data was added / updated etc. What we are talking about here is lifetime of data, not just a single point in time.

Imagine that your boss agrees to give you a raise next month (+1 Boss), further assume that the payroll for this month has already been run ( so only a few days left in the month ). HR might reasonably assume they can just go and update your salary information now with the new value, since "clearly" the next time the value is going to be used is for the next payroll run.

Meanwhile, the auditors come in unannounced and validates the last payroll... needless to say ( but I will anyway ) things will not balance, best they can tell is that you were underpaid -- you of course take the moral high ground and explain to them that it is just a failing of the DB as it does not have temporal concept of lifetime added to your new salary, as that update should only take affect in next months payroll run, if they are trying to evaluate this months payroll run, they needs to use your previous salary information.

What is the solution ?

Clearly there are two immediate solutions to this category of problem

Teach HR to only update the data at exactly the right time.
Don't make the salary update destructive, keep a history of all changes for given timeframes.

Option 1, is just littered with issues

Some poor HR person needs to be at work midnight on the last day of each the month
Even if you do find a way to update at exactly the right time ( with a scheduled task so HR can continue to have a family life ), you have lost data. The previous salary information is lost.

Option 2, is clearly far superior, but comes with the potential challenges of having to adjust your entire application stack to take into account the concept of life time, so that queries into the DB know which version data to look at.

This is exactly where temporal tables come in, they offer you built in support creating this change history, with the added benefit of being able to do this with between zero to minimal changes to your existing product application stack above the DB.

What support does DB2 give me ?

There are 3 things that DB2 gives you ( I would argue only 2, but we can get into that later ), each of these I will touch on in future posts ( linked below as and when )

Thursday 10 July 2014

Dipping my toes into DB2

With a new job on the horizon, I will soon be stepping out of the Microsoft bubble, as my current job is a very strong Microsoft shop, and the new one.. not so much :)

One of the new DB technologies that I am going to have to look into is IBM's DB2.

This will be my bread crumb trail of what I learn, and with any luck, what I learn and write will be close enough to correct to help you too ( where "you" might just be my future self )

SELECT *
FROM tb_unknown
WHERE knowledge = 'new';