Cloud should stay up forever, right? Well, no.

Last month there was an outage in the Azure – South Central US region, which, by reports, seemed to have some knock on effects for other regions.

This was reported at:

In the discussions that followed with our customers, particularly with those currently considering their digital transformation strategies including moves to Office 365 and/or Azure, some expressed varying levels of concern. This prompted some very valuable debate around Adexis and what we feel are some important viewpoints when it comes to digital transformation. Here were some of our thoughts;

Outages happen

Even with the enterprise-grade resources of Microsoft (or Amazon), 100% uptime of any service over a long period of time is not realistic. Between hardware issues, software bugs, scheduled downtime and human error, something, at some point will go wrong – just like in your on-premise environment. With all the buzz around cloud, it can sometimes be easy to forget that this is essentially just an IT environment somewhere else maintained by someone else. Like any IT environment, it is still reliant on humans and physical hardware which will inevitably experience failures of service from time to time.

Control and visibility

When an outage happens on-premise, the local IT team are able to remediate and have as much information as it’s possible to have – and can provide their users with detailed information regarding the restoration of service. Everything is in the hands of the local IT team (or the company to which it has been outsourced).
When an outage happens with Azure, the amount of information the local IT team has is minimal in comparison. Microsoft’s communication during O365/Azure outages varies, however, ETA’s and other information is generally vague at best. All control is with Microsoft and all the local IT team can say to staff is “Microsoft are working on it”. While Microsoft may be able to resolve the situation faster than you could on site (or not), the lack of visibility and control can sometimes be daunting. It’s not all doom and gloom though. In situations where the issue would need to be escalated to Microsoft anyway (i.e. premier support), the criticality of an international user-base can often mean a greater focus from Microsoft and inherently a faster resolution than what would be achieved for your single company.

Site resilience

Azure has many features which enable site resilience to protect a single data centre failure – but sometimes these are not used. This could be down to flawed design of services or simple cost saving. When architecting your environment (or engaging the experts at Adexis to provide these specialist services), it’s important you carefully consider your DR and BCP plans and ensure you have the redundancy built into your environment that matches those requirements. This is not unique to either cloud or on-premise and always must be carefully considered.

Root cause

It’s not uncommon for on-premise service outages to be “fixed” by a reboot. Root cause analysis and effective problem management is something that while nice, not many IT teams have time to complete.
Microsoft have the resources to perform these functions to great depth and in-fact their brand depends on it. A complete root cause analysis feeds back into improvement of their overall operations, which leads to greater consumer confidence and therefore greater penetration into the market. They also literally have access to the source code for the operating systems and many apps, in addition to strong relationships with hardware vendors to be able to get patches/fixes in times that all of us can only dream of.
While Microsoft has been known to hold their cards close to their chest at times in terms of releasing the real root cause of outages, they are definitely invested in resolving those root causes behind the scenes and preventing further outages. This means that the environment remains far more up to date and typically, far more robust than an on-premise environment.

SLA

While Microsoft might suffer reputational damage as the result of an outage, do not expect any form of meaningful compensation
The finically backed SLA that salespeople spruik is a joke – http://www.microsoftvolumelicensing.com/DocumentSearch.aspx?Mode=3&DocumentTypeId=37
This is table for many services (but it does vary depending on specific services)

Monthly Uptime %Service Credit
<99.9%25%
<99.0%50%
<95.0%100%

A 31 day month has 44,640 minutes, 2,232 minutes is 5% of that. So the service would have to be down a whopping 37.2 hours to get back 100% of your fees for that month only, and the compensation is in the form of a service credit off next month’s bill.
How to claim this service credit is detailed on page 5 of the document and basically, the onus is on you to prove that there was an outage and submit the paperwork within 2 months. A separate claim must be created for each service. What this essentially means is it’s usually more effort than it’s worth to log the claim for the service credits.

In Summary

Outages for cloud services must be anticipated, just like outages to on-premise services. The attitude of “It’s in the cloud so it’s not our problem” is simply not realistic and likely to catch you out, unprepared.
If you have vital services that you are considering moving to Azure (or AWS, or anywhere else), rest assured it can be safe to do so, but make sure you allow for site resiliency in your design and costing.

Adexis is neither pro, nor anti cloud. Unlike many other vendors, we have no skin in the game, no incentive to push you in one direction or the other. We are completely independent and can provide you with unbiased specialist advice on what is best for your environment and your business, including the pros and cons of staying on-premise or moving to the cloud for each service.

Every environment is different when it comes to security requirements, IT skillset, hardware availability, CapEx vs OpEx spend and a range of other factors – and these all feed into what is the best solution for your business.

If you’d like to explore your IT strategy further, please be sure to give us a call.

Avoiding a Microsoft Teams Nightmare

Have you ever had the experience of providing users a document management system or Sharepoint site only to find that everyone uses it differently, creates folders all over the place in different ways, stores documents differently and after six months time it’s so hard to find anything that it defeats the purpose for which it was implemented in the first place? What a nightmare! You’re not alone.

With Microsoft Teams quickly becoming a preferred collaboration tool, you’d be forgiven for having fears of this nightmare becoming a reality all over again. The primary reason for that is there’s no technical ‘silver-bullet’ to prevent this from happening, it’s more of a governance discussion. Notwithstanding, there are some things you can do on a technical level that can help.

There are basically four levels of administration to be considered:

  • Global Settings – There are a number of features and functionality for Teams that can be turned on or off at a global level and these should be risk assessed for each environment. Ideally this should be done before the first Team site is even created.
  • Team creation – Microsoft Teams, while based off Office 365 Groups, will also provision a Sharepoint site for each Team. Therefore the decision as to who should be creating Teams is the same as for who should be creating Groups and Sites. One approach that we’ve found works well is to have these functions centrally managed with Teams created on request. There is of course an admin overhead to be considered however. See below;
  • Team Owners – These are the users that really run the individual Teams and will have the best insight as to the value of the Team and how it should be used. Trying to run this centrally is likely to lead to frustration all round so once created, administration should really be handed over to the Team owners. They can then add Team members, assign roles, create Channels and enable Apps etc as they see fit.
  • Team Users – Obvious statement but these are the ones who should be seeing value in Teams collaboration. Paradoxically one way to dilute that is by being in too many Teams. Users shouldn’t be confused about what spaces they should be collaborating in or where to store documents etc. To prevent this, ideally Teams should have clearly defined functions, whether that be organisational, operational or project based collaboration. Confusion arises where these functions overlap between Teams so clear delineation is important. This is another reason centrally managing Team creation can work well. In larger environments implementing practices like naming standards for Teams will also be of value.

Some of the central administration technical considerations are outlined here: https://docs.microsoft.com/en-us/microsoftteams/enable-features-office-365

Melissa Hubbard also provides some useful considerations in her blog post on the topic and while it’s a little while ago now, it’s still a great starter for some of the governance considerations:  https://melihubb.com/2017/07/25/microsoft-teams-governance-planning-guide

If Microsoft Teams is on your agenda for implementation, be sure to reach out to the Adexis team who can assist with design and implementation and help you to provide this wonderful platform to your users to enable communication and efficient collaboration, without the admin headaches.

Importing updates into WSUS on Server 2016 fails

I ran into a situation recentlly where i needed to import a specific update from the Windows update catalog into WSUS (and in turn into SCCM)

I opened WSUS, clicked on “import updates”, seletced my update and was presented with

“This update cannot be imported into Windows Server Update Services, because it is not compatible with your version of WSUS”

Strange…. WSUS on 2016 is extremely similar to WSUS on 2012 R2… so whats going on here ?

Long story short… there seems to be issue with the url passed by the WSUS console when you click “import updates” to the browser.

When you first click on “Import updates”, IE will open (or you will use IE because it makes importing updates into WSUS easier) to

http://catalog.update.microsoft.com/v7/site/Home.aspx?SKU=WSUS&Version=10.0.14393.2248&ServerName=<servername>&PortNumber=8530&Ssl=False&Protocol=1.20

Simply change the last part “1.20” to “1.80” – and importing updates will now work

i.e

http://catalog.update.microsoft.com/v7/site/Home.aspx?SKU=WSUS&Version=10.0.14393.2248&ServerName=<servername>&PortNumber=8530&Ssl=False&Protocol=1.80

Microsoft products – consolidated table of end of life dates

Microsoft product end of support dates are sometimes not easy to find and its not getting any better with the “current branch” releases and cloud solutions being governed by the Modern lifecycle policy.

The Modern lifecycle policy page further links to 3 product catagories, O365, Cloud platform and Dynamics. Unfortunately, its not clear (at least to me) how this helps with products such as SCCM current branch (be it 1606, 1702, 1706, 1710 or 1802) – however this information is available at another location

Likewise with the “traditional” products, most end of life information is available here – but to say that the information is difficult to search through is an understatement.

It also sometimes lacks detail, for example, there is no metion of the differing support for Windows 8.1 without update 1 and with update 1.

We have a number of clients that take the approach that while a server is running, to leave it there – and while I may personally not like this approach (i prefer to roll through the OS upgrades as they come out) – they have a valid approach and end of life information is important for them.

Keep in mind that everything listed below is end of extended support, not mainstream support – and i have taken some liberties (e.g. assumed that windows 8.1 is 8.1 with update 1)

Windows 10 dates have been sourced from the product lifecycle page, however this blog entry states than an additional 6 months has been granted to displayed Windows 10 versions.

If you find the below useful – cool. If i’ve got something wrong, or missed something that is key (in your opinion), please leave a comment.

 

ProductEnd of life date (end of extended support)
Windows 2003 SP2July 14, 2015
Windows 2008July 12, 2011
Windows 2008 R2 SP1Jan 14, 2020
Windows 2012Oct 10, 2023
Windows 2012 R2Oct 10, 2023
Windows 2016Jan 11, 2027
Windows XP SP3Jan 11, 2011
Windows Vista SP2April 4, 2017
Windows 7 SP1Jan 14, 2020
Windows 8Jan 12, 2016
Windows 8.1 with update 1Jan 10, 2023
Windows 10 RTM (1507)May 9, 2017
Windows 10 1511Oct 10, 2017 (End of support)
April 10, 2018 (additional servicing for enterprise and education)
Windows 10 1607April 10, 2018 (End of support)
Oct 9, 2018 (additional servicing for enterprise and education)
Windows 10 1703Oct 9, 2018 (end of support)
April 9, 2018 (additional servicing for enterprise and education)
Windows 10 1709April 9, 2019 (End of support)
Oct 8, 2019 (additional servicing for enterprise and education)
Office 2007Oct 10, 2017
Office 2010 SP2Oct 13, 2020
Office 2013April 11, 2023
Office 2016Oct 14, 2025
Lync 2010April 13, 2021
Lync 2013April 11, 2023
Skype for Business 2015April 11, 2023
Exchange 2010 SP3Jan 14, 2020
Exchange 2013 SP1April 11, 2023
Exchange 2016Oct 14, 2025
Forefront TMGJuly 12, 2011
Sharepoint 2010July 10, 2012
Sharepoint 2013 SP1April 11, 2023
Sharepoint 2016July 14, 2026
SCCM 2012 SP2July 12, 2022
SCCM 2012 R2 SP1July 12, 2022
SCCM 1606July 22, 2017
SCCM 1610Nov 18, 2017
SCCM 1702March 27, 2018
SCCM 1706July 31, 2018
SCCM 1710May 20, 2019
SCCM 1802Sept 22, 2019
SCVMM 2012 SP1July 12, 2022
SCVMM 2016Jan 11, 2027
SCOM 2012 SP1July 12, 2022
SCOM 2012 R2July 12, 2022
SCOM 2016Jan 11, 2027
SCORCH 2012 SP1July 12, 2022
SCORCH 2012 R2July 12, 2022
SCORCH 2016Jan 11, 2027
SCSM 2016Jan 11, 2027

KB4038777 fails on some Windows 2008 R2 servers

Recently, we had an issue where KB4038777 was failing to install on some Windows 2008 R2 servers, but was fine on others.

Sometimes, this indicates that the “maximum run time” on a patch has been set ludicrously low (generally 10 minutes) on a specific patch – and the servers that it is failing on, are those that don’t perform so well – and therefore time out.

In this case, the patch was failing with the following line in the CBS.log

Failed to find file: x86_microsoft-windows-directwrite_31bf3856ad364e35_7.1.7601.23688_none_c657164201eacd8d\DWrite.dll [HRESULT = 0x80070002 – ERROR_FILE_NOT_FOUND]

We tried a number of things to “fix” this, including comparing file versions of Dwrite.dll, cleaning out the softwaredistribution cache, disabling AV etc – to no avail.

After a few hours, we found that installing the “desktop experience” feature (which requires a reboot), then running a disk cleanup (including windows updates) on the server then allowed us to install this update.

Its not an ideal “solution” – and quite frankly – all Windows 2008 R2 server should be in the process of being decommissioned… but aside from that, it seems that admins have the option of

a) installing desktop experience, rebooting, then running a disk cleanup

b) waiting for next months rollup – which may not have the same issue.

 

SMB 1 no longer installed by default in Win 10 1710/Server 2016 (next release)

https://support.microsoft.com/en-us/help/4034314/smbv1-is-not-installed-by-default-in-windows-10-rs3-and-windows-server

As per the link above, SMB 1 will no longer be installed by default in Win 10 1710 (which, given the release date, I’m guess that’s what it will be called) or the next version of Server 2016 (whatever that ends up being called).

Considering the recent-ish SMB1 targeted attacks, this isn’t surprising – and is a good move in my opinion. Issue is of course, the companies likely to hit by SMB1 (or other old-school attacks) are likely to not be up to date with their patching and even less likely to be up to date with OS versions – so it wont help secure the more vulnerable networks out there….

Welcome aboard Jamie Brooks

We are proud to introduce the newest addition to the Adexis team of senior consultants, Jamie Brooks.

We’ve known for some years of Jamie’s outstanding reputation for quality and technical skill and we are honoured to have him onboard.

Jamie brings to Adexis a new set of talents including expert skills in Microsoft Azure. This is an exciting addition which expands the cloud and hybrid services and solutions we can bring to our clients.

Jamie also brings with him extensive skills in our existing engagements with our customers such as SCCM, AD, Exchange and more.

I would like to thank our customers who continue to support us, making this increase in our team possible.

Welcome aboard Jamie, we are proud to have you onboard and look forward to achieving great things together.

SCCM – BADMIF error 4

It is very common to get the following errors in your SCCM component status window for the component “SMS_Inventory_Data_loader” – the most of common of which goes something along the lines of

Inventory Data Loader failed to process the file D:\Program Files\Microsoft Configuration Manager\inboxes\auth\dataldr.box\Process\H38H6C71.MIF because it is larger than the defined maximum allowable size of 5000000.

The size of the MIFs can be checked by navigating to D:\Program Files\Microsoft Configuration Manager\inboxes\auth\dataldr.box\BADMIFS\ExceedSizeLimit and taking note of the largest MIF, then adding a bit of headroom, modifying the registry as per https://thedesktopteam.com/heinrich/event-id-2719-sms_inventory_data_loader-error-sccm-2012-r2/

For one client recently, once that was done, the larger MIFs started processing, however they then got many entry’s in D:\Program Files\Microsoft Configuration Manager\inboxes\auth\dataldr.box\BADMIFS\ErrorCode_4

This article – https://blogs.technet.microsoft.com/umairkhan/2014/10/01/configmgr-2012-hardware-inventory-resync-and-badmif-internals/ nicely documents some of the errors you may get, but not specifically what error code 4 relates to. This TechNet forum post seems to nail the issue, but not necessarily how to solve it.

In my case, I navigated to the SCCM logs directory, open dataldr.log and searched for errors to find the specific line of SQL which was not being imported nicely – it was pretty easy to find thanks to CMTrace’s desire to highlight lines with “error” in them to red.

With this, its fairly easy to see that the troublesome statement is

*** [23000][547][Microsoft][SQL Server Native Client 11.0][SQL Server]The INSERT statement conflicted with the FOREIGN KEY constraint “WINDOWS8_APPLICATION_USER_INFO_DATA_FK”. The conflict occurred in database “CM_xxx”, table “dbo.System_DATA”, column ‘MachineID’. : pWINDOWS8_APPLICATION_USER_INFO_DATA

 

Armed with this information, you can then choose if you care about this hardware inventory information – and if not, you can exclude it from inventory.