Wednesday, August 17, 2005

Exchange clustering lesson learned

About two weeks ago, a newly installed Exchange 2003 cluster (2-node, Active-Passive) started doing some weird things. We really did not notice until we started trying to run backups and move mailboxes. We had failures while using the Exchange Move Mailbox Wizard in Active Directory Users and Computers or Exchange System Manager.

We found this event in the event log:
Event Type: Error
Event Source: MSExchangeIS
Event Category: General
Event ID: 1182
Description:
Than you for participating in the Microsoft Exchange Server beta program. Your license to use this beta version of the Microsoft Exchange Server software has expired. Contact Microsoft Corporation.

We knew for a fact that the original CD was not a beta CD. We had installed at least 30 Exchange 2003 servers that were all running in production using the same source CD.

The reason we were experiencing problems is that 60 minutes (1 hour) from that message popping up, the information store service was stopping. We were then seeing these messages.

Event Type: Error
Event Source: MSExchangeCluster
Event Category: Services
Event ID: 1005
Description:
Exchange Information Store Instance (SERVERNAME): The IsAlive check for this resource failed.

Event Type: Error
Event Source: MSExchangeCluster
Event Category: Services
Event ID: 1012
Description:
Exchange Information Store Instance (SERVERNAME): The RPC call to the service to take the resource offline failed.

Event Type: Error
Event Source: Service Control Manager
Event Category: None
Event ID: 7034
Description:
The Microsoft Exchange Information Store service terminated unexpectedly. It has done this 14 time(s).

Event Type: Error
Event Source: ClusSvc
Event Category: Failover Mgr
Event ID: 1069
Description:
Cluster resource 'Exchange Information Store Instance (SERVERNAME)' in Resource Group 'RESGROUPNAME' failed.

The information store was stopping due to the fact that it thought it was a beta copy. The reason? There was some information missing from the HKLM\Cluster hive of the registry.

The valuable lesson? When a Exchange clustered node is evicted from the cluster and then re-joined to the cluster, make absolutely sure that you re-install Exchange 2003, and then put back on the service packs and hotfixes.

Shortly before we had noticed this problem, we had been troubleshooting problems with the cluster and our (overly tightened) security templates. Each of the nodes were evicted and re-joined in to the cluster (one at a time). A re-install of Exchange 2003 was not installed. Some important subkeys and values were removed from the HKLM\Cluster hive when the node was evicted. Not enough to keep the Exchange server from running when it was put back in the cluster, but something was just wrong enough to generate that "Thank you for your participation in the Microsoft Exchange Server beta program." message.

Thanks much to Dave M. from Microsoft that stuck with me and went through a number of steps of checking other things all the while knowing that a re-install was probably imminent.

3 Comments:

At 6:18 AM, Blogger Doug Welch said...

Jim, we are experiencing the same problems on some of our E2K3 cluster servers. Can you email me those Cluster Service sub-keys?

Thanks

Doug Welch (IBM)

 
At 6:36 AM, Blogger KathyBK said...

Hi,
We are seeing this on our E2K3 cluster too. Could I get the keys as well?

Thanks,
Kathy Kirchberg

 
At 11:35 PM, Blogger M-IT said...

HI
I HAVE THE SAME PROBLEM ON EXCHANGE SERVER STANDARD IT STOP EVERY 1 HOURS

THANKS
DEEPAK

 

Post a Comment

<< Home