Blog Post

Exchange Team Blog
7 MIN READ

Public Folder Replication Troubleshooting 3: Troubleshooting Replica Deletion and Common Problems

The_Exchange_Team's avatar
Jan 24, 2006

 

This is a third blog post about troubleshooting public folder replication issues. In first post we covered Troubleshooting the Replication of New Changes. The second blog post covered Troubleshooting the Replication of Existing Data. This post in this series will cover Troubleshooting Replica Deletion and Common Problems. To get the full picture, please read all referenced material!

 

Troubleshooting Replica Deletion

 

You removed an old server from the replica list on all your folders. However, when you go to Public Folder Instances for the old store in ESM, you still see a bunch of folders there. This is due to a problem with the replica deletion process. In the Exchange 2003 Sp2 version of ESM, if you try to delete a public store in this state, ESM presents a dialog stating:

 

“You cannot delete this public folder store because it contains folder replicas. To avoid data loss, right click the public folder store and use Move Replicas to move the replicas to a different server. It may take several hours until the content is replicated to the new server and the local replicas are removed.”

 

When you remove a store from the replica list on a folder, that store does not immediately delete the data. Instead, it sends out a special 0x20 status request to all the other replicas. This is called a Replica Delete Pending Status Request (RDPSR), and can not be distinguished from a normal status request in the application log. A RDPSR contains a flag that indicates the replica is pending deletion. When the other stores receive this 0x20, they respond with a special 0x10 called a Replica Delete Pending Ack (RDPA). The RDPA indicates that it’s ok to delete that data – but the other stores only send this 0x10 if they already have all the CNs that the pending deletion replica has. The replica will only be deleted once the store has received a 0x10 indicating that someone else has the data.

 

This means that if you delete the store before Public Folder Instances is empty, you are probably losing data. Only 2003 Sp2 ESM will stop you from doing this – in older versions you must manually check Public Folder Instances to see if it’s ok to delete the store. You should always check Public Folder Instances before deleting a public store, and when 2003 Sp2 ESM gives you this warning, you should not try to ignore it or work around it – instead, you should troubleshoot the replica deletion process.

 

Note that prior to Exchange 2003 Sp2, the server that was removed from the replica list only sends the RDPSR once. If no one responds, you’ll see that the folder just stays in Public Folder Instances indefinitely, unless you add the store back to the replica list and then remove it again, causing a new RDPSR to be sent. 2003 Sp2 changed this behavior so that the store will retry every hour until it gets a RDPA from someone.

 

Troubleshooting

 

This is almost the same as troubleshooting the backfill process.

 

1. Did the pending deletion replica send a 0x20?

 

Unless you already had logging turned up when you removed the replica, you won't know. Fortunately, you can just add the replica back and then remove it again. Then watch the application log for the 0x20.

 

2. Did the 0x20 reach the other replicas?

 

You should know the drill by now. Check the application logs on the other replicas to see if they received the 0x20.

 

3. Did any other replica respond with a 0x10?

 

This is the part you'll probably end up focusing on. If a replica received the 0x20 from the pending deletion replica, but did not respond with a 0x10, that means that the pending deletion replica has data that the other replica doesn't. Since you know it just received a 0x20 from that replica, then you also know that it already knows what data it's missing. Therefore, you'd expect to see a backfill request for that folder every 24-48 hours. Watch the application log, and troubleshoot it exactly like the normal backfill process described earlier.

 

4. Did the pending deletion replica receive the 0x10?

 

Once any other replica has all the data, that replica should respond with a 0x10. When the pending deletion replica receives that 0x10, it will finally be willing to delete that data. That doesn't mean it will happen immediately, though. If there are clients using that replica, it won't be deleted until later during online maintenance. If you want, you could speed this up by dismounting and mounting the store to disconnect the clients.

 

Common Problems

 

You may find that a server sent some type of replication message to another server, but the receiving server never logged the incoming message in the application log. However, message tracking says it was delivered locally to the store on that server. This behavior usually indicates either a problem with the Replication State table or a permissions problem on the SMTP virtual server.

 

Let's cover the easy one first. One problem that causes an incoming replication message to be ignored by the receiving server is a problem with the Replication State, or ReplState, table. Note that a problem with the ReplState table may also cause the server to fail to issue backfill requests (0x8) for some folders, so this information also applies to that situation. Each public store uses its ReplState table to track the state of replication for any replicated folders. The table contains multiple rows for each folder - one row per replica. It's possible for the rows in the ReplState table to get out of sync with the replica list, such that it has extra or missing rows. Sometimes you can get it to sync up again just by making a change such as removing a server from the replica list, applying the change, and then immediately adding it back, but this doesn't always work. Fortunately, a ReplState test was added to isinteg. See KB889331 for Exchange 2003, or KB892485 for Exchange 2000. As long as you have the updated isinteg.exe and store.exe, you can use isinteg to correct the problem with the ReplState table. If you run only the ReplState test, it is typically very fast - less than a minute even on a large public store. Once isinteg has been run, you may still need to go back and make a change to the folder to get the ReplState table to sync up with the replica list. After they're in sync, the server should be able to process the incoming replication messages, or should begin issuing backfill requests normally.

 

The other common problem that causes an incoming replication message to be ignored is an issue specific to Exchange 2003. An Exchange 2003 server requires that the sending server has the Send As right on the receiving SMTP virtual server. That is, if ServerA is Exchange 2003, and ServerB is sending a PF replication message to ServerA, ServerB must have Send As on ServerA's SMTP virtual server. Otherwise, ServerA does not process the incoming replication message. This permission is normally granted through the Exchange Domain Servers groups. If the Send AS right is the problem, all incoming replication messages from a particular server will fail. I find it easiest to identify this problem with a network trace taken while a replication message is being transferred from one server to another. The conversation should go like this:

 

ServerA: 220 ServerA.microsoft.com Microsoft ESMTP MAIL Service...

ServerB: EHLO ServerB.microsoft.com

ServerA: 250-ServerA.microsoft.com Hello

         250-TURN

         250-SIZE

         250-ETRN

         250-PIPELINE

         250-DSN

         250-ENHANCEDSTATUSCODES

         250-8bitmime

         250-BINARYMIME

         250-CHUNKING

         250-VRFY

         250-X-EXPS GSSAPI NTLM LOGIN

         250-X-EXPS=LOGIN

         250-AUTH GSSAPI NTLM LOGIN

         250-AUTH=LOGIN

         250-X-LINK2STATE

         250-X-EXCH50

         250 OK

 

The important part here is that ServerA must be advertising the GSSAPI NTLM LOGIN options. If you don't see these in ServerA's response to the EHLO, it's usually because Integrated Windows Authentication has been unchecked on the SMTP virtual server. This is mentioned in step 1 of KB843106 and step 3 of KB842273. As long as these authentication verbs appear, you should see ServerB try to use them:

 

ServerB: X-EXPS GSSAPI

ServerA: 334 GSSAPI supported

ServerB: <a bunch of base64 encoded data>

ServerA: 334 <more base64 encoded stuff>

ServerB: CRLF

ServerA: 235 2.7.0 Authentication successful.

 

If authentication does not succeed, you may have a kerberos problem or an issue with the computer account for ServerB. Next the servers will transmit linkstate information. After that, they finally get around to the business of transferring email:

 

ServerB: MAIL FROM:<ServerB-IS@microsoft.com>

ServerA: 250 2.1.0 ServerB-IS@microsoft.com....Sender OK

ServerB: RCPT TO:<ServerA-IS@microsoft.com> NOTIFY=NEVER

ServerA: 250 2.1.5 ServerA-IS@microsoft.com

ServerB: XEXCH50 2404 2

ServerA: 354 Send binary data

 

It's this last response to the XEXCH50 verb that's important. If the response is "354 Send binary data", then everything is fine, at least as far as permissions to the SMTP virtual server are concerned. If the GSSAPI NTLM login options were not advertised, or the authentication attempt failed, then it's expected that ServerA will instead respond with "504 Need to authenticate". If those steps succeeded, but ServerA still says "504 Need to authenticate" instead of "354 Send binary data", then ServerB does not have the Send As right on ServerA's SMTP virtual server. There are several ways this could happen. For one, when you delegate rights such as Exchange Full Administrator in ESM, that user or group inherits a deny on the Send As right. Therefore, using ESM to delegate admin rights to the computer account, the Exchange Domain Servers group, or some other group that contains the Exchange servers will break public folder replication. Another possibility is that the computer account is not in the Exchange Domain Servers group, which is how it normally has the Send As right. You'll need to evaluate the permissions on the SMTP virtual server and determine why the computer account for the sending server does not have the proper rights. See KB843106 and KB842273 for more details about the "504 Need to authenticate" problem.

 

Conclusion

 

You may have noticed as you read through this document that Sp2 for Exchange 2003 contains several important enhancements under the hood to prevent replication issues and assist in troubleshooting them. Environments with multiple public stores can really see a huge benefit from Sp2, especially when it comes to moving replicas between servers, and adding and removing public stores.

 

I hope this was helpful. Thanks to Dave Whitney  for reviewing all this!

 

- Bill Long 

 

 

Updated Jun 05, 2020
Version 3.0
  • Although it's a nice safety feature that you can't delete a Public Folder Store without replicating it first in SP2... what do you do if there's NO data in the store at all and you only have one server, so you're not worried about losing data, as there isn't any to lose.

    Is there any way at all to forcibly remove public folder instances or stores in SP2 without first being forced to replicate it.
  • If there's really no unique data in the store, you won't have anything in Public Folder Instances. :-)

    But, assuming you want to ignore Public Folder Instances and you don't care about losing data and such, then you can certainly use any number of tools to delete the directory object out of the AD and bypass ESM's sanity checks. LDP or ADSI Edit will do the trick.
  • Maybe I didn't mean instances, but there is data automatically entered into a new public folder store, hidden system folders etc.



    Where in LDP or ADSI Edit would the Public Folder Store or Storage Group information be stored? I had a look and didn't see anything of relevance.


  • Well anything that's not in Instances won't stop you from deleting the store.



    In ADSI Edit, your Exchange org will be under Configuration->Serivces->Microsoft Exchange. Under here you'll see basically the same structure as you have in ESM. You can expand the org, admin group, servers object, server name, then expand Information Store

    and you'll see your storage groups. Highlight the SG in the lefthand pane and you'll see the stores within it in the righthand pane.


  • Excellent, thank you very much.

    My ESM looks a lot cleaner now that I've removed that useless storage group.
  • Hi Bill,

    I have a quick question regarding backfills.

    1) If I create a new PF store and now it is trying to get the hierarchy backfilled from another PF store.

    2)the entries are stored in the backfill array on the store that has the hierachy data to be send to the new store.

    3)In the meantime I deleted the new store from the Exchange organization.

    4)I then create a new PF store on the same server with the exact same name.

    The question I have is what happen to the original backfill entries for the 1st PF store in the array? Did those entries get deleted when the store is deleted or they stay in the array? If not deleted, do they get send to the new store since the name is the same?
  • Hey Kevin,

    #2 is incorrect. The backfill array stores a list of data that needs to be requested. As such, the backfill array is only populated on a store that is missing data from another store. There would be nothing in the backfill array on the store that has data to be sent to the new store (unless that store was missing data from some other store). The new store would be the one with entries in the backfill array.

    Read through part 2 of this troubleshooting guide one more time. :-) It's a lot of information to absorb!
  • ManjunBN's avatar
    ManjunBN
    Copper Contributor

    Hello Bill,

     

    This question is to give confirmation for customers who still have 3-4TB of PF databases on exchange 2010. Does changing path of PF DB via Adsiedit or management console to new drive , trigger public folder replication in any way. 

    DB is in clean shutdown 'state'. we need to move it to new bigger drive. 

    moving DB to new drive offline method (adsiedit / move database), does it trigger full replication or partial replication  in anyway.

     

    Thank you