Tuesday, March 30, 2010

ITIM Messaging & Synchronisation

WebSphere gurus all the world over will probably understand what a messaging cluster is; what tranlogs are; and how all this "stuff" works.

For systems integrators who don't necessarily specialise in any one technology, these things may seem more like a dark art and certainly the information made available in the myriad of documents produced by the various authors of WebSphere technical books doesn't seem to shed enough light on what is going on.

Yesterday, I had a misbehaving IBM Tivoli Identity Manager (ITIM) instance which saw all transactions sitting in a pending state with not a single transaction being flushed through to completion.

The setup was: ITIM v5.0 running in a WebSphere v6.1.0.9 clustered environment with two physical servers. Everything about the deployment was fairly vanilla (though the amount of data going through the system is quite large).

The problem started on Sunday night during the automatic regeneration of LTPA keys - it seems. The Deployment Manager lost the ability to control the Node Agents/Clusters which forced the following sequence of events:
  • Forced shutdown of the Application Servers and Node Agents
  • Removal of Global Security
  • Manual synchronisation of the nodes on the physical servers
  • Startup of the Node Agents
  • Reconfiguration of Global Security
  • Resynchronisation of the nodes (via the Deployment Manager)
  • Startup of the Messaging Cluster
  • Startup of the Application Cluster

Everything looked OK, until transactions started to appear in ITIM and they still sat pending!

The logs showed that the Messaging Cluster would start, then stop, then start, then stop, then start, then stop.....

Communications channels between the local queues and the shared queues weren't as they ought to have been and the root cause seems to be an inconsistency between transactions that ITIM expected to be pending; transactions the Messaging Cluster thought it had; transactions stored in the "physical" storage of DB2 for the messaging cluster and the tranlog.

NOTE: The recovery procedure for this is probably not something anyone should undertake but it was a final straw after 12 hours of trying various tactics which could have saved my pending transactions (which proved futile).

  • Stop the clusters
  • Stop the Deployment Manager
  • Kill any rogue WebSphere processes (of which there were a few)
  • Stop the database manager supporting ITIMDB
  • Delete the various tranlog files for the two application server instances
  • Reboot the two servers (which probably wasn't required, but these were Windows 2003 server instances which seemed to be hanging on to various ports needlessly and I wanted a clean environment to start everything up again)
  • DANGER: delete from itiml000.sib000 in the ITIM Database (as well as itiml001 and itims000 and for sib001 as well)
  • Start up ITIM

At this stage, all was well. Apart from my physical well being.

CAUTION: The above approach was a very brutal way of clearing out everything to a point where the system was operational once again. I never want to have to repeat this process and having a clean ITIM instance which is well tuned and well looked after is very much more preferable than having to perform this kind of recovery.

Saturday, March 20, 2010

Reputation On The Net

I've commented on more than one occasion on the power of the internet and, in particular, the power of the various social networking sites that have sprung up in recent years. It's not enough to have an account on Facebook any more. We have to have a Twitter account feeding Facebook. We also want our random thoughts automatically updating our professional networking pages on LinkedIn. We use Plaxo to amalgamate the email addresses stored in various repositories. We have our .TEL domains, we're Buzzed and we blog.

We must spend a lot of time generating unstructured information for the consumption of the masses.

But what about the information that is generated ABOUT us BY others? It's one thing to look after one's online reputation by ensuring that we portray ourselves as professional in our tweets and blogs. But what if the baddies out their are destroying our reputation in forums that we have no control over?

I've spent a number of years applying my brain cells to the task of working out how to control access to systems and how to manage the credentials within systems. We in the industry call it Identity & Access Management as these tasks are frequently joined "at the hip". Maybe my brain has had enough of IdAM activity as most of my recent conversations seem to be related to Cloud based services, reputation management and how social networking can be an invaluable business tool.

Here's a question for you? How frequently to you actively search for information related about YOU on the net? What dangerous comments could be lurking out there? What do people really think of you?

Sometimes it may be best to not know. But, sometimes you can be presented with a very nice surprise.

Take, for example, Vintage1951's recent blog entry! In it, I've been called experienced and thoughtful! In an industry where reputation is everything, I can think of almost no higher accolade (at least, no higher accolade that won't sound over-the-top and smug!)

It's nice to see that Vintage1951 takes a similarly thoughtful approach to his reputation by connecting with only those people he has the utmost respect for. I feel privileged to be amongst them.

On a final note - have a browse around Vintage1951's blog. There are some excellent articles in there!

Sunday, March 14, 2010

Enterprise Compliance For Business Managers

I spent a number of years acting as a Line Manager for a team of Identity & Access Management specialists at a time when there was increasing focus on compliance controls within the organisation. Certifying that my employees only had the systems' access rights that they required for their job was a quarterly event to look forward to.

Being told that Joe Bloggs can access System X with a role of Developer might seem like sufficient information for me to testify that the this is appropriate, but I would have to have made an assumption that developer access on System X has been correctly configured by whomever configured the system. What would the impact be if the administrator of System X had inadvertently given superuser access to those users in the developer group?

Traditionally, we have merely accepted that assigning users to groups is enough to satisfy our legislative requirements. But this is not the case! We also need to attest that the lower level permissions have been configured appropriately.

So what if we could find a way to deliver the lower level permissions to the Line Manager during his quarterly certification? What if I could see that superuser access had been inadvertently granted to my set of developers on System X?

This raises another problem, though. Lower level permissions on various systems take the form of data values which may be meaningless in their own right and therefore not in a format that would allow a Line Manager to understand them. As an example, informing a Line Manager that their employee has access to a system which means they have been assigned the value of S to the attribute ACCESS_RIGHTS. What does S mean? Standard? Super? Something-Else?

So what if we could provide the details in a manner which is more meaningful for the Line Manager?

I've spent some time in the recent past working on a mechanism to augment these lower-level permissions to provisioning data in such a format, thus enhancing the attestation experience of access certifiers but also as a means of validating that systems have been configured properly (and, as a consequence, the ability to perform lower-level Segregation of Duties checking).

The result of this work has been documented by Alan Harrison and is now available on the Pirean website. The work shows how low-level attributes held in RACF can be presented in a meaningful way through the IBM Tivoli Identity Manager interface. The principles, however, can be applied to any credential storage mechanism.

How I wish I had had that ability when I was a Line Manager!

Monday, March 08, 2010

ITIM Provisioning with SPML

I had an interesting day last week playing with IBM Tivoli Identity Manager and IBM Tivoli Directory Integrator. The brief was to send provisioning requests from ITIM via ITDI as SPML requests to a target credential store. Sounds easy, doesn't it?

SPML parsing support has been available within ITDI since v6.1.1 (Fix Pack 1) and dropping the openspml2-toolkit.jar into ITDI's JAR directory should've been enough for me to get motoring with the requirement. And there is a certain amount of truth in that statement in that it was easy to put together a Proof Of Concept showing an ITDI Assembly Line sending SPML formatted provisioning requests to a SPML listener.

That was fine until I elevated the Assembly Line into ITIM. Sending the AL to an ITDI RMI Dispatcher as part of an ITIM provisioning request has been a feature of the ITIM setup since its v4.6 days and there hasn't been any reason to suspect that ALs invoked via the dispatcher mechanism would behave any differently than ALs invoked from either the command line or through the ITDI GUI.

But there is...

Unfortunately, I had created an Assembly Line that was made up the following:
  • Parser Function (to parse my work entry into SPML format)
  • HTTP Client Connector (to send my SPML formatted string to a SPML listener)
  • Parse Function (to parse the response to my connector above)

There's nothing wrong in what I did. I could've attached the SPML parser to the HTTP Client Connector, but I didn't. I wanted to build up my AL slowly and show what was happening during the SPML creation process.

The AL worked fine through the GUI. But it did not work when "dispatched" as it throw a nullPointerException during the attempt to call the parser. My RMI Dispatcher had all the required JARs at its disposal and the environment variables were identical to my GUI environment. I even tried to use the CSV parser rather than SPML (thinking that maybe a parser introduced as part of a Fix Pack may not have full support) but I got the same result.

If only I had attached the parser to my HTTP Client Connector because guess what? You guessed it. That worked! It seems that (in my environment) parser functions cannot be instantiated when the AL is invoked by the RMI Dispatcher. NOTE: I did upgrade my RMI Dispatcher code to the very latest version during my investigations.

So a note to all - Parser Functions may not behave as you expect when dispatched!