PreviousNext…

Processing stuff

Whilst perusing Rocky’s latest weblog entry, Large-scale doc processing in LotusScript - design considerations…, I noted a link to an older complementary post on AndrewPollack’s site, A faster way to update data! It’s awesome. Faced with comparing millions of rows of Oracle-sourced data, with a relative handful (60,000 documents) of Notes data, Andrew hit upon comparing hashed values: 32 bytes, no more no less, as opposed to the complete data set. This turned a Notes agent with a run-time of around twelve hours to something that whizzed through the data in less than four!

Hats off to Mr. P. I would never have thought of such a thing. I am such a lowly vessel in the sea of programmer brilliance… ;-)

One other performance-enhancing tip that Andrew briefly mentions is that of the NotesViewEntry class. This has been mentioned before on this very site, but I cannot emphasise its use enough: skimming a view’s index in Lotusscript is invariably quicker than delving into an actual NotesDocument object. Do it whenever you can (this index-skimming explains why @DbLookup and @DbColumn are speedier than things like GetDocumentByKey).

Comments

  1. Ben, thanks for the link and the credit - and the compliment.

    If you were indeed lost in the sea of other people's skills, I for one wouldn't subscribe to RSS to your site.

    ;-)Andrew Pollack#
  2. nice post ben. i have to agree with andrew, personally i've learned a lot from this site!

    :-)jonvon#
  3. Wow, this is the second time you've cross-blogged me recently Ben - we have to stop meeting like this ;) Glad you know you're a regular reader, and I motivate you enough to respond :) Seriously, I do appreciate your contributions, and I have learned a ton reading yours (and other's) sites.

    Thanks, Ben!!Rock#
  4. Ben,

    I've been musing for some time about publishing an entry around this subject as I seem to be using the technique more and more (and because I originally started the blog for technical issues and haven't - as yet - actually published any such entries).

    My eyes were opened to the incredible performance acheivable some years ago by Ian Tree. Walking through "indexes" means in one pass you can find additions, deletions and modifictions. Using Andrews data Hash is a GREAT "trick". Like you, I'd never have though of that one.

    What I've done over the past few years is to use a basic class to control and manage the index handling. What it uses as the "source" (i.e. view, viewEntryCollection, array or list) is hidden from the calling code but it always provides the same methods i.e. Document, GetNext, GetPrevious, GetFirst, GetLast and IndexEntry. GetIndexEntry returns the current key being used and if used in a comparison with the other index (or indexes) you can work out if an entry is missing from either index and then add or remove entries accordingly.

    The other thing from a performance perspective is that as soon as you have a handle to the document in the index you also have the SummaryBuffer information so using ColumnValue is done in memory. I beleive that getting the value of a field on the document actually retreives the entire document so is quite an overhead!Roy Holder#

Comments on this post are now closed.

About

I’m a software architect / developer / general IT wrangler specialising in web, mobile web and middleware using things like node.js, Java, C#, PHP, HTML5 and more.

Best described as a simpleton, but kindly. You can read more here.

";