PreviousNext…

StringBuffer re-visited

Some time back, I mentioned a tip I picked up about using StringBuffer rather than endlessly concatenating String objects thus:

    String newString = "widgets" + 4 + "cola" + "general mincery";

All very well, and given that String is an object, makes perfect sense. But actually reading the API documentation about StringBuffer the other day yielded this wee nugget of information:

String buffers are used by the compiler to implement the binary string concatenation operator +. For example, the code:
     x = "a" + 4 + "c"

is compiled to the equivalent of:

    x = new StringBuffer().append("a").append(4).append("c")
        .toString()
which creates a new string buffer (initially empty), appends the string representation of each operand to the string buffer in turn, and then converts the contents of the string buffer to a string. Overall, this avoids creating many temporary strings.

So now I'm not so sure about the validity of the original tip (bar saving the compiler some work). Anyone else care to chip in?

Comments

  1. The Stringbuffer Myth (could be better formatted, it suffered from conversion between blogging tools): http://fishbowl.pastiche.org/archives/000463.html

    You should use StringBuffer explicitly in two circumstances:

    (1) You are looping over the string concatenation, passing the buffer between methods, or using some other structure the compiler isn't smart enough to optimise around.

    (2) You are creating a very large string in a part of the code where performance is critical, and you want to avoid the StringBuffer having to resize its internal buffer by explicitly setting the buffer size in its constructor.Charles Miller#
  2. Just the sort of thing I was after Charles: many thanks!Ben Poole#
  3. Well to be more explicit, do not write code like
    String res = "";
    for( …. ) {
    res += …
    }
    This is easily 100 to 1000 times slower than the StringBuffer equivalent, even with the JDK1.4 generational GC.Frank Nestel#
  4. Also,

    dont forget a StringBuffer is mutable (changeable) so you can pass it into a method and its contents can be changed. The content change will be reflected outside the method also. You cannot do this with Strings as they are immutable (un-changeable). This is not passing a parameter by reference which apparently Java does not do, its passing a copy of the reference.

    So if you have a very large amount of text to pass between methods use a StringBuffer and there will be only one copy of the text in memory. I think…john marshall#
  5. there is a caveat here, for more detailed info check out Jack Shirazi's Java Performance Tuning.

    I was just reading up on this subject the other night (which means, i think, i'm becoming a completely hopeless raging nerd).

    here is the caveat:

    Let's say you create a StringBuffer "sb". You loop around, mutate your chars however you want. Cool. It is very efficient for this. As Mr. Shirazi says, there are no "intermediate objects" being created. Then you pop the results into a String -> sb.toString();

    After this your program does some more stuff, and the StringBuffer gets mutated again. Now the char array in the StringBuffer is copied, and the StringBuffer now points to the new char array. So the more this happens, the higher the overhead. In other words, do all the mutating you can do before calling toString(), as it adds overhead. This is because the String object maintains the reference to the previous char array. Of course, you might need several different Strings, there may be reasons to do this. But it is something to be aware of at least.

    I'm still sorting all this out myself. See Shirazi's book… it's really fantastic. There is an example in the book where he compares methods that perform a word count on a text file, one using straight up char arrays, and one where it uses StringTokenizer. it turns out that the method that uses StringTokenizer ends up creating 1.2 million objects (in his example). The code using the char array takes less than 1% of the time to do the same task because it isn't creating objects (that have to be garbage collected and take up memory, etc) along the way.

    what belies all of this is a good lesson for java: objects are expensive, and programs that create lots of objects are really expensive. jonvon#
  6. The best guide to this that I've com across has been "Building up Strings" at http://jinx.swiki.net/78 , which ends with the summary points below…

    • Within a single string assignment, using String concatenation is fine.
    • If you're looping to build up a large block of character data, go for StringBuffer.
    • Using += on a String is always going to be less efficient than using a StringBuffer, so it should ring warning bells - but in certain cases the optimisation gained will be negligible compared with the readability issues, so use your common sense.
    Gwyn Evans#
  7. Use StringBuffer to reduce the number of objects created. Each string concatenation operation (+) creates a StringBuffer and a new String object. Chuck Simpson#
  8. Donald Knuth said: We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

    I think this thread is instructive; in most cases this kind of string concatenation isn't going to cripple your application. If it does, you can always whip out your handy dandy profiler and see which portion of your code is causing the problem.

    I've done Java code in Domino where I worried about stuff like StringBuffer optimization, but the profiler showed that repeated calls to Database.getView were causing the performance problems.

    To paraphrase Knuth: Don't sweat the small stuff.

    Nik Shenoy#
  9. Good calls everyone, and many thanks for your involvement in this discussion: all useful stuff to file away… ;-) Ben Poole#
  10. And yes, I need to edit Gwyn's comment with its unruly tags :-D Ben Poole#
  11. ah yes, the 80/20 rule… 80% of the work takes place in 20% of the code. figure out where the 20% is that is kicking your a55 and optimize that…

    learned that at one of those fancy conventions they send us to. i'm not sure where they get the numbers from, but it seems to make sense anyway.

    :-)jonvon#
  12. Nik [8] you make a fair point. William Wulf also said "more computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other reason - including blind stupidity." (I prefer that to Knuth's quote :-)

    But having said that, I still take slight exception to the notion that these things don't matter. Sure, you see some travesties, like overly-complicated algorithms, throwing assembler in the wrong places, and generally really fscking the cat trying to save a few processor cycles.

    In contrast, I've occasionally tried to get answers to very simple 'which is a more efficient way of doing it' questions, and had the 'don't worry about optimisation' response. I suspect that answer is occasionally given because the person answering doesn't know the real answer and doesn't want to admit it. (I'm not saying that's the case with you so please don't read too much into it). I've also been on the other side and expressed my frustration with colleagues by saying "who cares, just write the damned code cleanly and stop fluffing on about a few microseconds!"

    The REAL issue and reason for asking though, is that there is bound to be a Right Way and a Wrong Way. Or at the least, a Better Way and a Slightly Less Better Way. The difference in performance might be negligible, and equally, the difference in effort or readability might be negligible too. And yes, if that's the case, why bother? My response: in such cases, why not learn the Right/Better Way, and use it? Firstly, there is satisfaction gained in Doing The Right Thing, and secondly, knowing which is the better way and why, often leads to a better understanding of the nature and guts of the platform/language you're using. There's no evil in that, and often a deepened understanding of your platform *will* make a difference to your code somewhere down the line.Colin Pretorius#
  13. In reality, we make dumb mistakes. Well, I do. I know heaps about performance improvements, testing, and the like (at least with regards Lotusscript). But I still make schoolboy errors.

    Knuth's opinion re performance and the like is also espoused in that old benpoole.com favourite, The Pragmatic Programmer, and makes perfect sense. Whilst logic dictates we should abide by the 80/20 rule, at the same time, keeping an eye on the small stuff can't hurt, which is what the original post was all about: a tiny detail in a huge system can make all the difference I think.

    But what do I know?Ben Poole#

  14. PS jonvon. You can say "arse" here. So long as you say "arse" and not that silly Americanism ;-) Ben Poole#
  15. The problem with the 80/20 rule and "make a clean design, optimize later" is: if you have already optimized the 20% hotspots, and that's not enough, you are likely to start from scratch. - There is nothing wrong with keeping some performance fundamentals generally in mind like "don't allocate objects unnecessary" Robert Rudolph#

Comments on this post are now closed.

About

I’m a software architect / developer / general IT wrangler specialising in web, mobile web and middleware using things like node.js, Java, C#, PHP, HTML5 and more.

Best described as a simpleton, but kindly. You can read more here.

";