The importance of cleaning up WSUS – even if your are only using it for SCCM

In the distant past, we would generally say to clients “Leave WSUS admin alone – the SCCM settings will take precedence, so there is no point in using it”

As the years have passed, and the number of updates available has grown considerably, this is no longer the case. The SCCM settings still take precedence, however the pure number of updates has gotten so large that it can cause performance issues for the SCCM server – and even the IIS timeout to expire when SCCM is syncing updates. This generally results in and endless loop and 100% CPU for w3wp.exe.

Unfortunately, trying to list the updates in the WSUS console will often lead to the console crashing and the dreaded prompt to “reset server node”.

The best way to address this, isn’t really one of the many articles you will find by googling “sccm high CPU w3wp.exe” (or similar). Generally these will suggest modifying a number of entries in your IIS config to increase time outs, etc – these can assist, but they don’t really address the root cause of the issue – which is simply the huge number of updates.

The best way to resolve this is to simply reduce the number of updates shown in WSUS. This will reduce your memory usage, reduce the number of updates that SCCM has to scan each time, and generally put less load on your server.

There are two ways you can go about this:

 

The manual method

If you have the resources available, I’ve found increasing the RAM and CPU count on the SCCM server temporarily can help allevate the issue of the “reset node” issue.

Once you get in (it may take a few attempts), go to ‘updates’ > ‘all updates’, set the search criteria to “Any except declined” and “any” and hit refresh. Once loaded, add the “supersedence” column to the view and sort by that.

Decline all updates that are superseded. If you don’t clean up regularly, this number could be very high.

After this, you can create views to decline updates for products you no longer use (e.g. Windows 7 and Office 2010) or search for things including “beta”, “preview” and “itanium” and decline those updates as well.

After all that is done, run the server cleanup wizard. You will likely need to do this a number of times, as if your server is struggling already, this also will struggle to complete (and it seems to be quite poorly coded to handle large numbers of updates on low end servers)

 

The scripted method

A guy called “AdamJ” has written a very useful script which you can get at https://community.spiceworks.com/scripts/show/2998-wsus-automated-maintenance-formerly-adamj-clean-wsus . I know, I can see some of you recoiling at the suggestion of using a user submitted spiceworks script… they do have a whole bunch of people (just like the MS forums) suggesting to use “sfc /scannow” for anything and everything – which is a sign of a non-enterprise tech that has NFI… however, this script is really very good – and something I’ve been using for approx 2 years, with nothing but good things to say about it.

You can run it with the “-firstrun” parameter and it will, by default, clean out superseded updates – which is the main cause of huge update numbers, but it will also grab the ever annoying itanium, preview and expired updates. At approx line 629 of the script, you can also configure it to remove IE7,8,9,10 updates, Beta updates etc (or if you are one of the few people in the world with itanium, keep itanium updates!).

This script, unlike the console, will keep plugging away… and if it should happen to get stopped for whatever reason, will resume where it left off.

When removing obsolete updates, I have seen some clients (with lower spec servers) where this process can take a long time, so long that you may have to leave it overnight (or over the weekend), and sometimes restart the process.

This process will get you a fair chunk of the way, and allow you to then open the WSUS console and decline further updates, such as products you no longer use (Windows 7 and office 2010 are reasonably common ones these days), x86 updates if you have an all x64 environment, and in the case of Win 10, updates that no longer apply to you (e.g. if your entire fleet is on Win 10 1609 and 1703, you don’t need the 1511 updates)

After all this is complete, you do need to run the server cleanup wizard again – which does frequently crash and end up with the “reset server node” error. So you can re-run the WSUS cleanup script, or simply run the server cleanup wizard multiple times.

 

My experiences using these methods

I’ve found that in environments that were previously at 100% CPU, they start working again, and environments that were massively over-specc’d that didn’t have the high CPU issue went from using 6gb of RAM for the w3wp.exe process down to 500mb. This will obviously vary from environment to environment.

After this process is completed, you should be able to get into the WSUS console and run the server cleanup wiazrd, without crashes.

If you’re interested, you can also sync SCCM software updates and look at the wsyncmgr.log and see the far smaller list of updates it will sync against now.

Longer term, the AdamJ script does have monthly options that you can schedule in, or for our clients that are uncomfortable with that, simply get in and clean up once every 3 months or so, so you list of updates doesn’t get out hand.

The first cleanup is the biggest, after that, performing the same operations once every 3 months is plenty, and if you forget about and it happens to be once every 6 months instead, you’ll still be fine.

 

Taking it a step further – shrinking the SUSDB

One of the things which the AdamJ cleanup script does is truncate the SQL table “tbEventInstance” which uses up the majority of space in most WSUS databases that have been in use for a while.

If you are not comfortable with a script doing this, you can connect to the database and execute the following query against the “SUSDB” database – “truncate table tbEventInstance”.

If the DB is on a full version of SQL (which, if your running SCCM, i would argue the SUSDB should be on the same SQL instance, rather than installing an additional windows internal database), you can then create a maintenance plan to reindex, shrink etc the database.

If you are using Windows internal database, you can still install SQL management studio, then connect to “\\.\pipe\MICROSOFT##WID\tsql\query”, from there you can execute the truncate, shrink the database etc. Keep in mind that you cannot use maintenance plans with Windows internal databases.

 

What about large enviornments where you do require a wide range of updates ?

In large environments, you may not be able to decline entire product sets for extended periods (e.g. its relatively easy to move everyone onto Windows 10 (and get rid of all Win 7) for 2,000 PC’s, but not so easy for 50,000 PC’s), however, many of the points in this article still hold true.

  • The largest reduction in updates will still come from superceded updates
  • Language packs are another area where there’s lot’s of opportunity for reduction (e.g. if you require english and french – there are many other languages that can declined)
  • Ensure your SUSDB is on your full SQL instance…. that way you are running one less database instance (and therefore utilising less resources) and also have maintenance plans at your disposal
  • Use a maintenance plan to keep your SUSDB database optimal

 

SCCM Update Cleanup

It’s also worth noting that once the SUSDB has been cleaned up, SCCM will execute its own cleanup after the next sync. This cleanup removes obsolete Update CIs (Configuration Items) that corresponded to the items removed from the SUSDB. In most environments, this isn’t usually something noticeable, however in severely under resourced SCCM servers it can cause its own set of problems (though there’s not a huge amount you can do about it other than wait). This will generally present as the SCCM console locking up while it’s doing back-end SQL processes – and if you look at the SQL threads, you’ll see a WSUS related one blocking all other threads. Realistically your best option to resolve this is to increase the resources available to the server – and if that isn’t a possibility, settle in for a long wait!

Exchange hybrid – mailboxes missing on-premise

While hybrid exchange environments are awesome for stretching your on premise exchange topology to Office 365, they do introduce a bunch of complexity – primarily around user creation, licensing, and mail flow.

I recently had an issue at a client where they had email bounce-backs from an on premise service destined for a few Exchange Online mailboxes. For some reason, these few mailboxes didn’t appear in the on-premise exchange environment (as remote Office 365 mailboxes), so exchange was unable to route the emails destined for those particular mailboxes.

In general, you should be creating your mailboxes on premise (Enable-RemoteMailbox), then synchronising via AADConnect – that way the on premise environment knows about the mailbox and it can be managed properly. This client was actually doing this, but obviously the process broke somewhere along the way for a few mailboxes.

There’s a bunch of different options on Google about how to get the mailbox to show up on premise – with a lot of them recommending to remove the mailbox and start again (er… how about no!).

I came across this Microsoft article on a very similar issue, but for Shared Mailboxes created purely in Exchange Online. Looking at the process, it looked like a modified version may work for user mailboxes – and it does. Below is a quick and dirty powershell script that can be used to fix a single mailbox:

 

Windows 10 1709 and installing Hyper-V

It’s not often that I actually install Hyper-V on a client OS, so it was only by chance that I came across a bit of a weird issue when installing it on Windows 10 1709. Obviously I performed the usual process: Virtualization was enabled in the BIOS, enabled Hyper-V in Windows Features, rebooted and it all appeared to install/enable successfully.

Launched the Hyper-V console, and the local PC wasn’t automatically selected. Odd. Added ‘Localhost’ to the view, and received an error that indicated the services may not be running. Sure enough, Hyper-V Virtual Machine Manager was running, but Hyper-V Host Compute Service (vmcompute.exe) wasn’t. When trying to launch it, I received “The service did not respond to the start or control request in a timely fashion”. Event viewer detailed the exact same error – nothing more. Awesome!

Tried it on another machine in the same environment and experienced the exact same issue. Apparently, another Adexian (Hayes) also installed Hyper-V on one of his 1709 PCs recently – and his worked fine – so what the trigger is, I’ve yet to determine. On a related note, Hayes’s machine won’t shut down since the Hyper-V install – it reboots instead (and he’s yet to find a fix for this).

Obviously it’s time for Google – and it seems to be quite a common issue with 1709. Apparently Microsoft added some additional security policies that prevents Hyper-V running in certain scenarios (usually when there’s some non-Microsoft dll’s loaded in vmcompute.exe). There’s even a Microsoft support article detailing a similar issue where the vmcompute.exe process is crashing (rather than in my case where it wasn’t even launching in the first place).

In the end, the recommended solutions I could find were pretty varied:

  • Roll back to 1703 (no thanks – plus it wasn’t an upgrade)
  • Uninstall Sophos (wasn’t installed)
  • Uninstall any other Antivirus (McAfee installed in this instance, though anecdotal evidence suggests uninstalling it doesn’t work – didn’t try)
  • Configure ‘Control Flow Guard’ in the Exploit settings of Defender to be ‘On’ (which it was)

Going with the easiest option first (configure Control Flow Guard), I figured I’d set that to ‘On’. You can find this setting under:

Windows Defender Security Center > App and Browser Control > Exploit Protection Settings > Control flow guard

For me, it was already set to ‘Use Default (On)’. Damn. Ok, so what happens if we turn it off (and reboot). Unsurprisingly, it didn’t fix the issue. What it did do though, was cause vmcompute.exe to start launching and generating a crash error (as detailed in the microsoft support article).

Given the setting is meant to be ‘On’, I decided to turn it back on and see what happens. And it works. Why? No idea!

Either way, the solution for me (on two computers) was to disable CGF, reboot, re-enable CFG and reboot again.