BatchLoader Tuning

Recently I have been tuning BatchLoader. Using sample PDFs around 100K in size, throughput was initially pretty good at 16 files per second but got progressively worse with more batches loaded; at some point reaching a low of 2 files per second.  In looking at the database activity while running BatchLoader, it was apparent that the following query was consuming 90% of the overall processing time:

SELECT DocMeta.dID FROM Revisions, DocMeta
WHERE
upper(dDocName)=upper(:”SYS_B_0″) AND
NOT(DocMeta.xCollectionID=:”SYS_B_1″) AND
Revisions.dID=DocMeta.dID

There was already an index on dDocName for the Revisions table, but because this query asks for dDocName in uppercase, the existing index is not used! To attempt to solve this problem a new index was created on the Revisions table for upper(dDocName). The result was a clear performance enhancement, with throughput in excess of 27 files per second after the change.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s