I was reviewing the wait signature of an IBM i batch application the other day using IBM iDoctor, and noticed that the job spent a significant portion of its time in Journal Wait status.
Journal waits happen when an application is delayed waiting for journal bundles to get written out to disk. Sometimes that is a symptom of a very busy disk subsystem. That wasn’t the case here.
This particular batch job was running on a lightly-loaded test system, so I immediately suspected commitment-control.
The associated journal was set up for caching (CHGJRN command), so in a job that doesn’t use commitment control, journal management will calculate an efficient journal bundle size, and cache journal entries until that bundle size is reached. Journal management then pushes that bundle out to disk in a single operation.
It is much faster and more efficient to do a single large write to disk than to do a lot of small writes. Journal caching is “record blocking for journal entries”.
Commitment control breaks journal caching
Commitment control, by default, and intentionally, breaks journal caching.
COMMIT operations, by default, cause journal management to flush journal entries to disk, regardless of caching or bundle size. As a result, it is not uncommon, especially in high-volume batch programs that use commitment control, to see a lot of cumulative journal waits in iDoctor, unless the developer understood the impact of COMMITing too frequently and was careful to ensure an adequate journal bundle size.
The traditional fix for this problem is to have the developer to change the application to COMMIT less frequently. Developers sometimes don’t like to hear that, since it can cut into their long lunches and involve tedious things like change control forms, specifications documents, approvals, funds authorizations – oh – and some small amount of actual coding and testing.
It can also requires some repetitive testing to determine the optimal bundle size (though I have a trick I’ve been using and it works pretty well – ask me nicely and I’ll tell you about it in a future post).
I know from experience that the optimal bundle size has grown larger over the years, so hard-coding is out. If you only COMMIT every 20th transaction today, after your next upgrade the optimal bundle size may take 40 transactions. So then you’ll start to see journal waits again, and you have to do some testing to figure out the optimal bundle size.
Soft Commit – journal caching for batch programs
Wouldn’t it be nice if you could choose to turn on journal caching for batch commitment control applications? Maybe not on a system-wide basis. You might have some critical applications that just must be allowed to flush every journal entry to disk for replication or audit purposes, but for the majority of applications, caching would be just the ticket.
And you can. Starting in V5R4, IBM offers “soft commit” support using the environment variable QIBM_TN_COMMIT_DURABLE. By turning OFF Durable commits, you ENABLE soft commits. You can set this environment variable at either the system level so it applies to all jobs on the system, or at the job level.
Just a note: if you set an environment variable at the *JOB level, it doesn’t get copied down to child jobs submitted by the parent job by default. If your parent job submits child jobs, you’ll want to specify CPYENVVAR(*YES) on each SBMJOB or add the ADDENVVAR command to each job in the job stream.
ADDENVVAR ENVVAR(QIBM_TN_COMMIT_DURABLE) VALUE(*NO) LEVEL(*JOB|*SYS)
In a nutshell, “soft commit” allows batch jobs to use journal caching. That’s it. Turn on soft commit, and then COMMIT as early and often as you like. Journal management will ignore those CM journal entries, and wait until a healthy bundle size has been reached, and then flush then entire bundle to disk in one large, efficient operation. Check out the IBM article below.
Journal caching is also required for soft commit support. This means installing option 42 of the IBM i Operating system “HA Journal Performance” in V6R1, and using CHGJRN JRNCACHE(*YES) for each journal that you want to allow to use caching.