Hi, Chris Butcher here, and it’s been a while but I'm back. Since I installed Data Protection Manager 2010 I don't seem to be attending to the DPM 2010 server nearly as much so I’m spending a lot of free time working on other projects (like my SharePoint farm). Well, I now know just enough about SharePoint to be dangerous. That theory played itself out recently when I deleted a site I had been working on so it was time to put my "Backup Administrator" hat back on and smooth this out before anyone realized the site was gone.
This seemed like a pretty easy task, but I found myself bumping into a little hurdle. The restore seemed very straight forward as this was SharePoint 2010 and thus I didn't need to create a recovery farm (whew!).
I stepped through the process indicating to not use a recovery farm and gave it the information for the temporary server where it was going to put the content database and mount it up to pull the data I need.
I walked through the whole process with no errors, and then started the waiting game as it looked like it was trying to restore.
After several minutes, it failed on me with the error: DPM failed to communicate with the protection agent on <TemporaryServerName> because the agent is not responding. (ID 43)
I decided to look into this myself to see what I could find. So, I looked closer at the monitoring tab to see the failure there and what information it had for me.
We get a little more information here, but really it tells me the same thing, that the communication failed or agent is not responding. This just adds ID 3111 and Internal error code: 0x8099090E (which looks to map to a response timeout error. So, at this point I walk through the recommended actions.
1 - Looking at the application and system event log on the temporary server (SQL01), I see some events that the DPMRA had a fault, but no indications why as there are no other events to indicate any problems.
2 - I checked from the temporary server (SQL01) and have full connectivity to the DPM server.
3 - I have fully eliminated firewalls in my environment for this and these servers are all on the same segment and thus have no switches or other devices through which it must communicate.
4 - I checked the DPM Protection Agent service (DPMRA) on my temporary server and find it is already up and running.
Just to be sure on all of the above, I tested some generic file backup and restores to the temporary server and find it is working fine, so now I must dig a bit deeper.
I decided to watch the logging on the temporary server (SQL01) to see what is going on. I navigated to the temp (program files\microsoft data protection manager\dpm\temp) directory where DPM stores all of the logs it creates and filtered by time/date so the newest ones are on top. I then went back to the DPM server and tried my restore again. When the failure finally happened on the DPM server, I noticed that the DPMRA*.errlog had created a .crash file with the information it was logging at the time.
This looked promising, so I pulled it into notepad and scrolled to the bottom to see what errors it was logging. At the bottom of the file, I see a lot of similar errors:
As I hear so often when my son is watching his favorite show… "A CLUE! A CLUE!"
It appears there is a problem using the share I created even though it has everyone listed for full control, so I tried to go back and do the restore using a drive instead.
Sure enough, data began to flow and the restore put my site back up and running. I guess I have learned my lesson and will let your SharePoint guys do their thing from now on, but if or when they need a restore, I now know that I need to use a drive location and not a share for the temporary server and staging locations.
Chris Butcher | Senior Support Escalation Engineer