Thursday, June 05, 2014

LDAP Load Balancing And Timeouts

It's been ages since I posted anything despite the fact I have lots to say. The last few months have been extraordinarily busy and challenging (and not necessarily in a technical way).

Some colleagues have pointed out that I have aged significantly in recent times. It is true that the hair on my chin has gone from grey to white and that the hair around my temples is severely lacking in any kind of hue. It seems that my work-life balance has gone slightly askew.

Balancing brings me rather neatly on to the topic of LDAP Load Balancing. I say Load Balancing though what I really mean to say is providing a mechanism for assuring availability of Directory Servers for IBM Security Identity Manager. My preference is for load to go to one directory server which replicates to a second directory server which can be used should something go awry on the primary.

So, what's the best way to ensure traffic routes to the correct directory server and stays stuck to that directory server until something happens? Well, that's the domain for the F5 BIG-IP beast. Or is it.

There is plenty of documentation around the web that states that one should tread carefully when attempting to use these kind of tools to front a directory server (and IBM Tivoli Directory Server in particular). In recent dealings, I've observed the following which is worth sharing:

Point 1
Be careful of the BIG-IP idle timeout default of 300 seconds. Any connection that BIG-IP thinks is stale after 300 seconds will be torn down. (You should see how a TDI Assembly Line in iterator mode behaves with that without auto-reconnect and skip-forward disabled!)

Point 2
Be careful of the TCP connection setup and seriously consider using Performance Layer 4 as the connection profile on the BIG-IP. A 50% increase in throughput was not atypical in some of my recent tests.

Point 3
Ensure that the default settings for LDAP pool handling are updated in ISIM's enRole.properties file. In particular, pay attention to the following:

enrole.connectionpool.timeout
enrole.connectionpool.retryCountForSUException

The timeout should be set just a little below the BIG-IP's idle timeout. The retryCountForSUException should be set to ensure that ISIM reconnects should the BIG-IP device tear-down the connection regardless of the timeout.

And with those tips, you should have a highly available infrastructure with a level of tolerance.