Tuesday, March 20, 2012

Backup Recovery Philosophy... pull up a chair

(Be forewarned... this is long, but I think you'll find the information interesting)

When it comes to SQL Server backup/restore/recovery, my approach is based on a two-part philosophy. I'd like to share this with you and get you opinion and feedback...

First, I approach recovery from the instance level, not the individual databases. While a SQL Server instance is made up of system- and user-databases, I make no distinctions between them in terms of recovery - the instance is the whole "database". I'll later explain why.

Second, I approach it so that it will ultimately allow me to do a "displace" recovery... not just "in-place" restore. In other words, be able to restore the database on the current host ("in-place"), such in a case of a media or other failures where the host is intact - which is usually the common, easy case. But above that, be able to recover the database onto another host, such in a case of a catastrophic failure of the original host.

At any rate, the two part approach equates to what I consider provisions for doing "displace SQL Server instance recovery" - the ability to restore the whole SQL Server Instance, including the original system-databases, onto another host.

I come from an Oracle background... in the world where there's only one "database instance"... no such thing as "individual" databases. And really, the term "database" in Microsoft equates to what Oracle calls a "tablespace". Oracle's "system tablespace" equates to Microsoft's "system databases"... and so forth. It's a common case of one vendor calling it, "TOE-MAY-TOE", and the other calling it, "TOE-MAH-TOE". And in Oracle... I'm not the habit of making distinction between these tablespaces - system or non-system - when it comes to backing up... not many are (I would think). Our backup appoach views backing up the WHOLE database instance... therefore if needed... we can restore the WHOLE database instance. I can't find any other logical way to approach database backup.

But as I've found... in the world of SQL Server, there's seems to be a disconnect between how user-databases should be backed up (and restored) from the the way system-databases should be backed up (and restored). Most people make distinction between these database when it comes to backup and recovery.

Let me it explain it this way...

In my shop, my backup strategy provides the capability to restore both system- and user-databases. Some people might be surprised, but all this can actually be done using only the built-in SQL Server backup and restore functions. What I've also found is SQL DBAs find it almost "un-natural" to restore system-databases. I've heard and read some say is this... "just restore the user-databases on the new server and "re-create" the objects (that were in the previous system database) into the new system databases on the new host". Some also suggests using scripts to make "re-creating" these objects easier.

I think that people approach it this way because restoring system database onto another server (a different host name, a different file directory structure, etc) is not as simple as restoring user-databases.

But it is doable... and doable using only the built-in SQL Server functions.
My previous post on this matter can be found here (http://sqlforums.windowsitpro...22923&highlight_key=y)

I long time ago, I found a way to do this. And since then, my backup strategy has remained fairly the same... backup the whole sql server instance... and have the ability to restore the whole instance (if/when needed). And in the end, we not only have a great backup strategy, but a strategy for disaster recovery.

I agree that the approach to [simply] restore user-database, and then "re-creating" system-database objects is viable, but it's not "absolute". And here's why...

You'll always have to make provisions around when system-database objects are created/changed... for the life of the instance... such as logins, for example.
I've seen some application create extended procedures (in the master database) and create jobs and plans (in msdb).
You'll have to consistently ensure you update your "re-create scripts" to ensure you capture these objects and changes... through the life of the instance.
You'll have to be that much more intimate with the applicatons in order to understand all the components it may be adding outside it's user-database.

There a huge potential risk that you'll miss something.

Bottom line... restoring the whole SQL Server instance avoids any of the above caveats, guaranteeing a "physical restore" [of the original instance] versus one where parts of it had to be "logically" recreated and/or potentially be missing.

I admit, I also use scripts to complete a full SQL Server instance recovery. For me, it is a set of batch and sql scripts, but I only use it to expedite the process. And more importantly... the scripts are created once for an instance and doesn't have to be updated...
regardless of what occurs within the instance...
regardless if new logins or extended procedures are created/updated (in the master)...
regardless if new/updatess to jobs/maintenance plans/etc occur (in msdb)...
and I don't even really know what goes on in model... but again... it doesn't matter...
all of it gets restored.

The process... scripts or no scripts... is the same for all SQL Server instance I support... regardless of the idiosyncrasies of the application running on them. I have to make NO special provisions for any of them.

It's a solution where at the end of the recovery... I can tell the customer that whatever they originally lost has been restored (from the last backup)... without any doubt. Again it's a "physical restore", not one where parts of it had to be "logically" recreated.

Lastly... If you understand my philosophy... then you could probably understand why I don't consider Log Shipping and Replication to be a thorough disaster recovery solution. The reason being is that these solutions only provision for the recovery of user-databases, right? You cannot use log shipping or replication for system databases, right? And as with the other... you'll have to provision for re-creating system-database objects. Again, I admit log shipping and replication are viable solution, but again, with the idea of re-creating system database objects... it's not absolute... not for system recovery.

I'm looking for good arguments and feedbacks to really put this approach to the test. Thanks.... this is long, but I think you'll find the information interestingOne out of two ain't bad...|||Thanks alot!
I was looking for "constructive" feedback... I guess that's best you can do.|||I'll expand on what I think Blindman was driving at with his comment, and add a few thoughts of my own.

Your system is good, and a notch better than database backups for some problems, but it is not foolproof for many reasons.

What you are looking to create is something like an Oracle dump combined with a full system backup, with the intent of being able to reload a full image of everything that either affects the databse or is affected by the database. The first part of this is easy to do with either Oracle or Microsoft SQL, the second part is out of scope for both products.

If an absolute, byte-for-byte one hundred percent backup is what you want, you can get there fairly easily but not using strictly Oracle or Microsoft SQL tools. The gist of the idea would go something like:

1) Spin up two or more virtual machines
2) Cluster them to form a single logical machine
3) Periodically halt each virtual machine (one at a time)
4) Backup the data files that represent the virtual machine
5) Restart the virtual machine
6) Loop back to step 3 for any additional virtual machines

This will give you the 100% control and full image backup that you've envisioned. It will include all of the registry entries, all of the DLL files, and all of the necessary system resource settings to do an "idiot level" reinstall of either type that you've described.

It seems relatively painless to me, although I think that it is overkill for anyone not banking on at least five nines of reliability.

-PatP|||No, what I was driving at was that his post was verbose, specific to his situation, offered little or no practical advice, and asked for no definable assistance.
This is a forum. It is not a blog. It is not a soapbox.|||I'll offer my .02 (and further violate blindman's sensibilities)

I try to create my backup/recovery plans based on the premise that tomorrow, I won't be at the office (won the lottery, got hit by a truck on my m/c, etc).

Keep it simple. Keep it consistent. From one server to the next and to the next beyond that.

Pick a plan. Document it. Test it. Tweak it if you have to (and document it again). Then make each instance the same. Dev. Test. Prod. If you're using third party tools in Prod, then use them in Dev. If Prod is Full Recovery Model, Dev and Test are too. If Prod is dumping to a remote server, test is too. If you're using SAN-based backups in Prod, use them in Test and Dev (okay, that may be a hard sell, but at least Test).

Then validate your plan. Have the junior DBA validate it. Go over it with the system admins and the network admins, too. Everyone should know what the plan is and have confidence that it will work.

By the way, I have a pretty strong Oracle background as well. I understand when you say that an Oracle tablespace is like an MS database, but I disagree with you. MS and Oracle diverged here and that's just all there is to it.

Regards,

hmscott|||Pat/hms... thanks for the feedback (despite the current company... which I address at the very end).

hms... interesting philosophy. There's definitely aspect of that I think I'll keep in mind and borrow.

Pat... It's not that I'm looking for byte-for-byte... but I'm looking for more a "physical" recovery approach than that of a "logical" one.

Let me try to clear my point.

At this stage... after discussing with others the same subject... I think I know how better clarify the point I'm trying to get at in my original post.

In terms of sql server instance recovery... such as in a disaster... or even if you just want to move the sql server instance to another server... and using only sql server tools... there seems to two prominent way to do this.

As I tried to explain in my original post, the one way I've been acquainted with is this:

1.) Restore the user-databases and then "re-create" the system databases objects. These system database objects could potentially include administration objects like backup jobs and devices. But I'm more concerned about the objects that belong to an application... because as I've witnessed it... some application create objects outside there user-database.

The other... that I've come to realize is this. And I know I may be "20 years late" (as implied by others elsewhere) but nonetheless... I realized it on my own:
2.) Restore the sql instance, including the system- and user-database from backup. Restoring the system databases from backup would mean you wouldn't have to "re-create" anything.

If you approach this situation via #1, here's my view on that. You have to know exactly what objects in the system database you need to re-create. And when it comes to potential objects that belong to the application... how can you ever be 100% sure you know all of them... and if, over time, they change, how would know about the changes if it is triggered/made by the application.

However, if you approach it via #2, then it doesn't matter what and where any of these objects are... restoring system databases ensures your recovery is complete of these objects.

So I guess all I'm trying to get is that for those you still prefer to do it via #1 (in the context I've described... restoring the whole instance to another server... whether for disaster recovery or simply moving the instance), I'd like to understand your rational... because there could be some things I am not realizing.

For me... doing via #2 means I can have a single procedure [in recovering an instance to another server] that can apply to all my sql server instance. I would view this as having a very consistent and standard way of performing instance recovery, which would help simplify my administration.

#1 would almost always have the need for me to fully understand an application to determine the objects that would need to be "re-created". This isn't necessarily impossible, but what make it even more unfavorable (for me) is that not only do I need to know what and where these objects are... but I'll also have to know when these objects gets updated by the application. Now, I admit, this circumstance may not be common enough... but I think it's probable enough to warrant the argument. Therefore any recovery of the instance would always have to be open the any potential idiosyncrasies of an application. This would complicate my administration... having me stray more into the realm of the application that I'd prefer.

#2 on the other hand... would simply restore the system databases and thus all/any application objects that are there will be restored "automatically" and be current up to version since the last backup of the system databases. This method removes any burden related to identifying what/where/when this objects are and the need to identify and manage an application potential idiosyncrasies.

I hope that's made my point a little clearer that you can address.
Thanks again for the feedbacks.

To Blindman:
I mean only to pose a discussion topic... something we can dialog about.
I honestly don't have any agenda other than trying to find some truth about how we do our jobs. I'm interested in what you think and also in how you think (your philosophy) on the subject. Is that so horrible??
Why can't that be enough?? Why does it gotta meet some set of criteria (all the time)?

And what BUGS me is that it wouldn't kill you to leave it alone. If it doesn't interest you... then don't reply! I even "warn" potential readers right from the start... "This is long."

"... this is a forum, not a blog" What the hell does that mean??
No one's forced to read anything... I'm not shoving my post down anyone's throat...and whatever I post, doesn't hinder anyone else for using the system... so what's the big deal.

Give me a break, man...
You're like angry... or something, man. It's just post.

No comments:

Post a Comment