Friday, May 14, 2010

Apply That Patch....

What should you do when system performance starts to deteriorate?

More specifically, if password changes are taking upwards of 12 minutes to complete, what should you do?

Even more specifically, what if password changes invoked from IBM Tivoli Identity Manager through the TAM Combo Adapter to Tivoli Access Manager are taking upwards of 12 minutes to complete yet manually changing the passwords via TAM's pdadmin command line tool completes sub-second?

This was the dilemma I faced yesterday and it seemed to happen "all of a sudden".

The password change via pdadmin confirmed that we weren't talking about a DB2 or LDAP issue. They had recently been tuned and everything was performing as expected. So I did what any decent IT professional would do - the on/off approach.

I stopped and restarted the TAM Policy Server - for no real reason, to be honest. I then stopped and restarted the TDI RMI Dispatcher that was "hosting" the TAM Combo Adapter.

The next password change that came through the system took approximately 6 minutes to complete. A 50% improvement but nowhere near good enough in an environment hosting 300,000 users eager to change their passwords (with the result being a bottle-necked system).

The TAM Combo Adapter was v5.0.5 and the Assembly Line for the password change seemed to be as straight-forward as an Assembly Line can be. At no point could I find:
if (systemAge > 12Months) {
   sleep(600);
}

Google was no help either. Nobody on the planet had experienced this sudden slowness and documented it!

Running out of ideas, I decided to upgrade the adapter to v5.0.9. A quick download, install, copy of the TAMComboUtils.jar file to the TDI directory structure and a restart of the TDI RMI Dispatcher was all it took.

Moments later, a password change came into the system. How long would it take? I guessed it would be around 6 minutes again but I was wrong. A handful of milli-seconds!

I quickly looked through the supporting documentation for the adapter to see what fixes were incorporated. No mention of password change "slowness". And no mention in the v5.0.6, v5.0.7 and v5.0.8 release notes!

So what's the moral of the story?

Developers don't change fix bugs when a new release of code is coming out. They frequently "tidy" things in a non-functional way which may have positive impacts!

For those of you still running your ITIM v5.0 environments with an old TAM Combo Adapter - upgrade now. You won't regret it. Maybe. Unless one of those "tidy" code changes has a negative impact on your environment. In that case - forget you ever read this post!

Of course, the approach adopted for this particular scenario can also be applied to all software environments. Keeping up-to-date with the latest patches is a good thing to do even though it can be time-consuming. But how do you do that in a manner which meets all your Change Management processes? Can you really patch WebSphere without having to perform a full regression test of all your J2EE applications? Can you upgrade your LDAP without a full regression test of all applications that make use of its services?

The answer is you can but it takes you to be convincing when it comes to getting the authorities to take that particular leap of faith. And therein lies the problem. Far too often, sensible environment management is put into the "too hard bucket" not for technical reasons, but for political reasons.

When will we learn.

NOTE: As my experience yesterday can attest, sometimes the best way to get patches applied is in a Sev 1, emergency scenario. Don't go creating those scenarios though!

No comments: