Sunday, August 20, 2017

Backing up DB2 database(s) to HADOOP


At my work, I was sick for TSM and DDBoost running out of space and not getting right support at the time. In the process I learn that those infrastructure are quiet expensive (software and hardware). So I started thinking in the days of "Big Data" why pay premium price for these things of past. I started thinking of Hadoop which can accept file as an input unlike Splunk/Cassandra. So started testing the possibilities and these are just possibilities nothing implemented in real use.

There are 2 options to backup to Hadoop that I looked at, here are those
1 - backup database to disk first, then push the files to Hadoop
2 - directly stream the backups to Hadoop using unix named pipes

In general run Hadoop commands in Hadoop user profile and DB2 command in db2 user profile

Backup database to disk first, then push the files to Hadoop:
This is very simple here are the commands and steps

-- run under db2 instance profile
db2 backup db sample to /db2backup/

-- run under Hadoop user profile
hadoop fs -put SAMPLE.0.db2inst1.DBPART000.20170819060433.001

This requires local diskspace, if your database is large then you might be needing lot of space and that could be a constraint too. So look at next option

Restore works the same, way  - get the file back from Hadoop on to a local disk and use db2 restore command.


Directly stream the backups to Hadoop using unix named pipes:
I like this option the most as I don't need much space on the local disk, as backup files are streamed directly to Hadoop for storage.

-- run under db2 instance profile
mkfifo /tmp/db2fifo && chmod 744 /tmp/db2fifo && db2 backup db sample to /tmp/db2fifo; sleep 10 ;rm /tmp/db2fifo

Create named pipe, change it to read only by public, start the backup to that pipe, start reading the pipe to Hadoop

-- run under Hadoop user profile
hadoop fs -put /tmp/db2fifo SAMPLE.0.db2inst1.DBPART000.`date +%Y%m%d%H%M%S`.001

Here i am renaming the file to "look like" db2 backup set file with instance name/timestamp etc. This generated timestamp is no exactly the same like db2 returned timestamp, but is close.

Restore procedure using named pipes,

First find the filename to restore
hadoop fs -ls

-- run under Hadoop user profile
mkfifo /tmp/db2fifo && chmod 744 /tmp/db2fifo && hadoop fs -cat SAMPLE.0.db2inst1.DBPART000.20170819060433.001 > /tmp/db2fifo ; rm /tmp/db2fifo


-- run under db2 instance profile
db2 restore db sample from /tmp/db2fifo into mynewdb

Of course there are other issues like data security, other backup and recovery scenarioes which I have not discussed, but this is an idea away from traditional solutions.


No comments:

About Me

By profession I am a Database Administrator (DBA) with total 13 yrs. of experience in the field of Information Technology, out of that 9 yrs as SQL DBA and last 3 years in IBM System i/iSeries and DB2 LUW 9. I have handled Developer, plus production support roles, and I like both the roles. I love and live information technology hence the name "Techonologyyogi" Apart from that I am a small, retail investor, with small investments in India and United States in the form of Equity holdings via common stocks. Don't ask me if I have made money, I have been loosing money in stocks.