BatchLoader Tuning

Recently I have been tuning BatchLoader. Using sample PDFs around 100K in size, throughput was initially pretty good at 16 files per second but got progressively worse with more batches loaded; at some point reaching a low of 2 files per second.  In looking at the database activity while running BatchLoader, it was apparent that the following query was consuming 90% of the overall processing time:

SELECT DocMeta.dID FROM Revisions, DocMeta
WHERE
upper(dDocName)=upper(:”SYS_B_0″) AND
NOT(DocMeta.xCollectionID=:”SYS_B_1″) AND
Revisions.dID=DocMeta.dID

There was already an index on dDocName for the Revisions table, but because this query asks for dDocName in uppercase, the existing index is not used! To attempt to solve this problem a new index was created on the Revisions table for upper(dDocName). The result was a clear performance enhancement, with throughput in excess of 27 files per second after the change.

Advertisements