{"id":9238,"date":"2008-11-04T13:22:00","date_gmt":"1999-11-29T20:00:00","guid":{"rendered":"https:\/\/aidanfinn.com\/?p=9238"},"modified":"2008-11-04T13:22:00","modified_gmt":"1999-11-29T20:00:00","slug":"day-2-windows-server-2008-failover-cluster-troubleshooting-tips","status":"publish","type":"post","link":"https:\/\/aidanfinn.com\/?p=9238","title":{"rendered":"Day 2: Windows Server 2008 Failover Cluster Troubleshooting &#038; Tips"},"content":{"rendered":"<p>The speaker is David Dion from MS.<\/p>\n<p>Windows Server 2008 is the last x86 release.\u00a0 All nodes do not need to be exactly identical in W2008 Clustering.<\/p>\n<p><strong><u>Cluster Validation<\/u><\/strong><\/p>\n<p>Lots of problems in deployments of previous editions of Windows clustering (MSCS) were caused by configuration issues.\u00a0 Cluster Validation tool resolves this.\u00a0 Built into W2008.\u00a0 Tests servers, OS and storage to check if the configuration is valid.\u00a0 Should be run before cluster build or after adding node, adding drivers, patches, update firmware or BIOS (server or device), etc.\u00a0 You can also run the validate tool as a troubleshooting tool &#8211; primary course of action.<\/p>\n<p>Very easy to use; it&#8217;s just a wizard.\u00a0 Best to run <em>all<\/em> of the tests.\u00a0 However, doing all of the storage tests can take hours with hundreds of disks, e.g. a 16 node Hyper-V cluster.\u00a0 A report is generated as an MHT file in IE.\u00a0 You get pass, pass with warning or fail.\u00a0 This is stored in the WindowsClusterReports folder.\u00a0 <\/p>\n<p>Do not assume the hardware configuration will be fine; run the validation utility to test it.<\/p>\n<p>Concerns:<\/p>\n<ul>\n<li>Validation of storage requires that the storage be offline. Beware for Hyper-V.\u00a0 Schedule a full cluster maintenance window.<\/li>\n<li>Running validate with a single node is pointless.<\/li>\n<\/ul>\n<p>W2003 clustering required the H\/W was on a clustering HCL.\u00a0 Niche H\/W, therefore expensive.\u00a0 Everyone hated it.\u00a0 Not used in W208.\u00a0 The validation tool is your cluster certification.\u00a0 Purchase gear with W2008 logo.\u00a0 Run the tool and if you get a pass then you&#8217;re certified.\u00a0 Keep a copy of the report for PSS.<\/p>\n<p>MS recommends you purchase &quot;Failover Cluster Configuration Program&quot; solutions from vendors, i.e. the pricey niche solutions, e.g. a cluster kit.\u00a0 Interestingly, HP is <em>not <\/em>one of the 9 partners in the program.\u00a0 Dell and IBM are.<\/p>\n<p><strong><u>Event Viewer<\/u><\/strong><\/p>\n<p>Check MicrosoftWindowsFailover Clustering log.\u00a0 Event logs are no longer replicated across all nodes in the cluster.\u00a0 You should use the MMC to view events from all nodes.\u00a0 You can also build event queries there.\u00a0 You can filter events for applications and resources.\u00a0 Because of this pooling of events, beware using the MMC remotely from the cluster and killing the WAN.\u00a0 Normally we only see critical and warning events.\u00a0 By enabling the operational &quot;log&quot; you can see information events.<\/p>\n<p>Start with events if looking at non-configuration issues on the cluster.<\/p>\n<p><strong><u>Cluster Debug Logging<\/u><\/strong><\/p>\n<p>Lots of information and not user friendly.\u00a0 The legacy cluster log file no longer exists.\u00a0 Logging to to an event trace session: &quot;Microsoft-Windows-FailoverClustering&quot;.\u00a0 Log enabled by default.\u00a0 You can produce a human readable log using &quot;Cluster.exe log&quot; command.<\/p>\n<p>Tracrpt.exe can be used to dump the trace session.\u00a0 .EVTX and view the file in event viewer.\u00a0 .XML for you scripting freaks or to open in IE.\u00a0 Cluster.exe can raise or reduce the level of logging 3 is default.\u00a0 1 is low, 5 is high.\u00a0 Running this command on one node configures all the nodes.\u00a0 Changing the size of the file causes historical logs to be lost.\u00a0 Copy them safely before doing this.\u00a0 It&#8217;s quite verbose at level 5.\u00a0 Running at level 3 (default) is recommended.\u00a0 <\/p>\n<p>This is the last logging solution you should pick.\u00a0 Retaining 72 hours of data as a minimum is recommended.\u00a0 What size of log is 72 hours?\u00a0 How long is a piece of string.\u00a0 File shares are quiet.\u00a0 Exchange is noisy.\u00a0 Hyper-V probably could be as well if VM&#8217;s are moving about.\u00a0 Change the log size first, then set the required verbosity.\u00a0 Cluster logs <em>are always<\/em> GMT time zone.\u00a0 You&#8217;ll have to mentally map this when comparing with Windows Event Viewer if in different time zone to GMT.<\/p>\n<p><strong><u>Windows Server 2008 R2<\/u><\/strong><\/p>\n<ul>\n<li>Validation Tool includes best practices tests.\u00a0 Quorum configuration, status of cluster resources, network name settings in multi-site cluster.<\/li>\n<li>Performance Counters are added into perfmon for clustering.<\/li>\n<li>There will be Powershell support.<\/li>\n<li>There is a read only mode for the console.<\/li>\n<\/ul>\n<p><strong><u>Best Practices For Now<\/u><\/strong><\/p>\n<ul>\n<li>Try to use identical hardware on all nodes. Especially storage: HBA, firmware, driver, cables, etc.<\/li>\n<li>Run the validation tool.<\/li>\n<li>Don&#8217;t add resources to the Cluster Group or the Available Storage Group.<\/li>\n<li>Keep regular system state backups.\u00a0 This includes the cluster database automatically.<\/li>\n<li>Use &quot;preferred owners&quot; and &quot;possible owners&quot; to balance the cluster.<\/li>\n<li>Multi-site clusters are more complex so check out the <a href=\"http:\/\/go.microsoft.com\/fwlink\/?LinkID=129120\" target=\"_blank\">MS site<\/a> for a whitepaper.<\/li>\n<\/ul>\n<p>Quorum:<\/p>\n<ul>\n<li>Node and disk majority where there is shared storage.\u00a0 Small disk &#8211; 512MB at least.\u00a0 Only use it for the quorum. Use it as a GUI drive to discourage alternate usage.\u00a0 No need to backup on the quorum.<\/li>\n<li>Node and File Share Majority: use one file serve for many clusters but dedicate 1 share to each cluster.\u00a0 OK to use a clustered file server but keep it in a different cluster (chicken and egg).\u00a0 File server should be in the same forest as the cluster.\u00a0 Avoid DFS namespaces.<\/li>\n<li>More <a href=\"http:\/\/go.microsoft.com\/fwlink\/?LinkID=129345\" target=\"_blank\">information available<\/a>.<\/li>\n<\/ul>\n<p>Old 2003 best practices that are gone:<\/p>\n<ul>\n<li>You can add nodes as you want &#8211; nodes do not need to be powered off.<\/li>\n<li>No NIC teaming restrictions any more.<\/li>\n<li>No need to stagger boot times, e.g. w2003 required 30-60 second gaps.<\/li>\n<li>Clustering runs as local system now.\u00a0 No password to change for the service.<\/li>\n<li>Keep an eye on the hotfixes page for clustering.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>The speaker is David Dion from MS. Windows Server 2008 is the last x86 release.\u00a0 All nodes do not need to be exactly identical in W2008 Clustering. Cluster Validation Lots of problems in deployments of previous editions of Windows clustering (MSCS) were caused by configuration issues.\u00a0 Cluster Validation tool resolves this.\u00a0 Built into W2008.\u00a0 Tests &hellip; <a href=\"https:\/\/aidanfinn.com\/?p=9238\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Day 2: Windows Server 2008 Failover Cluster Troubleshooting &#038; Tips&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"footnotes":""},"categories":[42],"tags":[],"class_list":["post-9238","post","type-post","status-publish","format-standard","hentry","category-teched-emea-it-pro"],"aioseo_notices":[],"jetpack_featured_media_url":"","amp_enabled":true,"_links":{"self":[{"href":"https:\/\/aidanfinn.com\/index.php?rest_route=\/wp\/v2\/posts\/9238","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/aidanfinn.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/aidanfinn.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/aidanfinn.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/aidanfinn.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=9238"}],"version-history":[{"count":0,"href":"https:\/\/aidanfinn.com\/index.php?rest_route=\/wp\/v2\/posts\/9238\/revisions"}],"wp:attachment":[{"href":"https:\/\/aidanfinn.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=9238"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/aidanfinn.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=9238"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/aidanfinn.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=9238"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}