As we approach the end of the year, things continue to be a flurry with clients who didn’t plan well screaming for services. I’ve reached the limit of what I can deliver with my small team since this is the last week my partner and I are spending on site with customers. We’re willing to let some business go because we’re not willing to run ourselves ragged trying to be everything to everyone. That’s the perk of owning your own business, although it’s sometimes challenging when you have to agree to disagree with clients.
For those clients that we would like to be able to serve but just can’t, we have larger consulting firms that we can refer them to when it’s crunch time. You would expect that some of them might elect to stay with the group that met their needs when we couldn’t, but a good number of them come back to us because they appreciate the fact that we knew our limits and steered them into capable hands.
One of the prospective clients that I steered to a colleague was one who wanted to hire an external help desk because they felt that their vendor’s help desk wasn’t meeting their needs. They feel the vendor’s Tier 1 support is passive-aggressive, doing things like intentionally calling the office after hours so that they can say they called back and didn’t reach anyone. The vendor offers a discount on maintenance if clients provide their own Tier 1 support, so they did the math and decided to outsource to a third party if the price was right. My colleague happens to be a former reseller for the vendor in question and was happy to take their business, so it was a win for everyone.
Since this is my last week on the road, I plugged in a post-upgrade go-live for myself so I could work Monday through Thursday and start my holiday travel a bit earlier than last year. It meant that I had to fly on the weekend, which is always interesting given the change in mix from business travelers to family travelers. I was pleased to see Chicago’s Midway Airport decked out for the holidays, with lots of twinkle lights and giant ornaments. There were “take a sweet treat” stands with bowls of Skittles. As I made my way down the B gates, there was even a man on stilts dressed as a toy soldier handing out boxes of candy. It was unexpected and made me smile so, kudos to the folks that put it together.
The mood didn’t last long once I reached my destination and had frantic voice mails from my customer that their upgrade wasn’t going as planned. I had encouraged them to start the upgrade on Friday night so that if they had issues, they would have time to resolve them. Instead, they insisted on starting it Saturday afternoon, citing staffing issues. This is the challenge of scheduling major projects around the holidays, because people want time off and to be with their families and weekends are challenging if they’re not scheduled well in advance or if your teams don’t have a lot of backup. They had done a dry run of the upgrade and theoretically should have had enough time, but ran into some issues.
Whenever I give training on an upgrade, I reinforce (and reinforce, and reinforce) how important it is to follow the upgrade playbook line by line. There is zero room for the kind of errors that result when steps are performed out of sequence or missed. Certain applications are finicky, and their pre-upgrade scripts are looking for specific criteria to be met in the client environment before they proceed. Depending on where a missed step occurs, it can cost hours to get the timeline back on track. Although I provided some high-level project management for the client, they were running the upgrade process themselves and I wasn’t supervising them as closely as I do when I am personally responsible for the upgrade event.
There is a step in their upgrade plan that requires them to disable their disaster recovery solution a certain way, and an enterprising DBA decided to do it a different way than what was documented. The result was the failure of the upgrade package, which wasn’t finding the conditions it needed. Instead of rechecking the plan and following it, the DBA restarted the upgrade two additional times expecting a different outcome. By the time I landed they were significantly off the timeline, and it took a couple of calls to figure out what had gone wrong and how to fix it.
The relative comedy of errors pushed on through most of Sunday evening, when they still hadn’t brought the upgraded system back up because data integrity checks were failing. We spend several hours on the phone with the vendor’s team trying to figure out what went wrong and weren’t able to isolate a cause. At that point, we had some decisions to make. We could either keep working on it and prepare to open the offices on Monday using downtime procedures, or we could restore the system from a backup and move forward. As we were weighing the choices, there was a question of whether users had been accessing the system during the backup that took some investigation and stalled things further.
We needed to make a decision as we approached midnight, and ultimately my client opted to restore from the backup and try the upgrade again at a later date. I was crossing my fingers that their backup process was solid since we all know clients who never test their backups or go to restore from one and find out it’s corrupted, or even worse, blank. Fate was smiling on us because the backup restored not only without a hitch but in less time than anticipated, which allowed us to get the users back on the system without too much of a delay.
Of course the end users were disappointed at their inability to use the new features, and the organization has to reschedule. We spent several hours today in a post mortem discussion of the event and what went wrong, and they appear to have learned some important lessons about following the playbook exactly and in asking for help when you run into a problem rather than just repeating the same steps over and over.
There wasn’t much go-live for me to support, so I am headed back to the airport. Although they failed, they made a smart decision and can try it again either after the first of the year. These are the hard lessons that most organizations learn at one time or another, and now they can join the club with the rest of us who have been there and done that.
What’s your worst upgrade story? Email me.
Email Dr. Jayne.