There is work to be done! There's a war to be won!

Friday, 5 July 2013

NetBackup Exchange 2010 DAG

Exchange 2010 DAG with NetBackup integration.

So I first looked at the doco's from our friends SYMC surrounding all this and did my usual summary. I diligently started drawing up pictures to separate the drinkers from the dancers, as I truly couldn't find anything that good to illustrate what was going on. Rummaging through the other 10 billion experiences out there of confused folk wondering what/where and how, I figured more than a little pre-preparation was necessary in exposing errors when they'd arrived. I also stumbled upon the following links from MS folk and um ... well you read it and judge for yourself:

The gist of the NBU Exchange stuff is:
You're using nbfsd (which is like a network share) to mount up the exchange DB to a drive with credentials (you can test these credentials to validate manually if you like between media server and client) by issuing nbfsd syntax, you know you want to:

I think I pulled/summarised the following from a nbu forum, unsure where, however its a good find and I madly scribbled it down in my quest to completely follow the flow of things


1) NBU initiates backup and signals VSS subsystem on client to request a VSS snapshot with log truncation
2) Client quieces Exchange Server using Exchange VSS writer, performs the snapshot and signals NBU that snapshot is ready
3) NetBackup backs up files associated with the Exchange DB
4) NetBackup performs additional tasks associated with GRT process:
a) Plays log files against the backup image
b) Mounts backup image as an Exchange DB on the Exchange Server via NFS
c) Uses the mounted image to build a partial index of the database, down to the top of each mailbox
5) When backup is complete NetBackup signals VSS subsystem on the client, which deletes the snapshot. 
6) If backup is successful the VSS subsystem also signals the Exchange server to perform a log truncaction

Good things about GRT:

1) No need to do 2 backups (i.e. 1 Full backup for DB restore and 1 backup for mailbox/messages)
2) Exchange mailbox objects are not catalogued at backup time * (see bad things below)
3) No more 2 stage recovery to RSG needed.

Bad things I've found so far:

1) * Great, but the cataloguing to tape takes ages in an environment where I'm using life-cycles to successfully make 3 copies of 52TB in just over 50 hours (2 on disk, 1 on tape) - this excludes the new Exchange data, so I'm left wondering about the issues installing and using this stuff is going to cause to others with much larger installations. We made a decision to not send our Daily backups to tape, therefore no indexing will occur, therefore ... what do mean you're not creating a copy on tape? Um, well ... theres a problem going to tape... oh dear. Ok ok ... we'll keep it online for 35 days covering our agreement with the business. Weeklies/Monthlies/Yearlies will be an issue. So I suppose there are a few options here: a) Create a specific SLP for Exchange using a group of dedicated drives for the weekend run? b) Buy more disk and keep it online longer? c) Script up a manual duplication of your dag images to tape (one or two at a time) with the not so well-known undocumented/unsupported bpduplicate flag that will duplicate a SLP?

The 'Exchange SLPs' grab all my drives (ok, I dont have many, but thats not the point) with our friend nbrb reserving everything that it doesnt actually use, but it reserves so nothing else can use it. Indexing of the Exchange images occur, with the drives simply waiting around reserved. When they finally get their turn to write data its relatively quick. This needs to change. Enhancement request please!
Dude, wheres my car?

2) Browsing: Timeouts using the Remote Admin Console and Java GUI when browsing images. This tends to affect larger Exchange DB's and Mailboxes with a large amount of files. I've raised a case about this - will update asap, as all I see is bplist/nbwin in logs issuing a request to bpdbm on master, interrogating catalog, followed by the usual restore job in the gui that has allocated resources via nbrb. My remote admin logging (nbfs logging) stutters with the gui reporting 'database error'. Smaller DB's come back within a 10 second delay.

I expect to see a delay, but more than 5 minutes has left me wondering whether I really need a 3rd cup of coffee considering its just gone 09:30am and a recovery is needed from a user who had a rough weekend and deleted some mails he didn't remember receiving. We need speed peeps, we cant deal with these delays particularly not on a Monday morning.

3) I'm all for load-balancing, but choose a media server that is geographically closer to the piece of disk you've written your primary copy to (if thats the one you're getting the data back from). I just don't see the point in requesting an image cross-site when your environment is running a few duplications and a few backup jobs with very little load going on, on one of the core media servers you should be using to browse/restore the data from in the 1st instance. This is just a general swipe at the whole thing because I've waited a long time to get this to tape (indexing as mentioned takes ages), and I suppose you could've installed Networker and gone to tape/disk/vtl, followed best practices from MS Exchange team and stayed supported(?) and recovered large RSG's and messages without having to duplicate images from tape to disk, in order to restore images from disk in the 1st place! No, I'm not irritated.

So - how does this stuff install and work:

Originally, the intention was to use NetBackup, but there are restrictions on having to run the required NetBackup Windows service with full Domain rights; clearly, not all environments are comfortable with allowing NetBackup this amount of control. However, much to my delight, lifts these restrictions.

The release note is your friend:

Pg.37 goes into great detail about 'creating a minimal NetBackup account for Exchange operations'.

* The user account you give to NetBackup service is given local GPO perms to 'replace a process level token'. At the exchange level, a new role group is added with perms: database copies, exchange servers, monitoring, mail recipient creation, mail throttling policy. The user account is also assigned to this role and a mailbox for the user account is created. To perform restores new management role is added with an associated throttling policy and the user account is assigned to this role, with a mailbox created

Check the release notes.

After this is done, there are some additional tasks to perform in NetBackup:

1) Understand your environment - if you're running Windows Media Servers then configure NFS services. Briefly, the exchange client (DAG/CAS server,etc) setup a NFS client connection to a Media Server running NBFSD (Media Server actually 1st sets the credentials, and the client connects to those credentials over NFS). If you're running a Unix OS, then NFS is configured. So, just think of all this as it creating a network share and mounting the DB over NFS. Things like Jet (I think this is related to JetBlue or Joint Engine Technology, now called ESE (extensible storage engine) that Exchange runs on, also uses the NFS mounting to roll the db forward by applying logs that its backed-up. Anyway, you'll see ESE references in beds logging I think (back end database server/storage?)

2) Distributed Application Restore Mapping: Simply allows you to run restores on Exchange hosts that you authorise. You add these components to bp.conf (or use GUI: Master Server -> Host Properties) to emulate something similar to the following that contains ALL your Exchange components (cas, dag, mbx hosts, dont worry about edge servers)

yourdag cas-1
yourdag cas-2
yourdag cas-x
yourdag mbx-1
yourdag mbx--2
yourdag mbx-x

Now you know your environment, therefore you might need to specify fully qualified names or not or possibly a combination. tail bprd logs to understand whats going on and ensure its working correctly. resolution peeps.

3) If you choose to do your restore from client boxes (like the dag itself for instance) or somewhere else, ensure db/altnames is configured. No.Restrictions is really just a lazy excuse; the only argument I think would be valid, would be when you're handing things over to non-NetBackup related techies. Even then I would argue that opening a case and configuring specific configuration is better than allowing a free-for-all. Anyway, get this working depending on who's doing the browsing and restoring.

4) Configure your policy. Tick GRT box. Use the DAG server your specified in <yourdag> above. Lastly, add your include. If you use usual MS Info Storage Group:\\ and a stream fails, you'll need to wade through the job overview to determine what the hell just happened. Would be great to just re-run it, wouldn't it? Wouldn't it?!

Happy days? All working? A few bugs/issues?
You love it.


No comments:

Post a Comment