Measuring TWiki Performance
The speed of TWiki has been an issue for some years now, yet there is no cookbook how to actually
measure how fast (or slow) it is, or how to reasonably compare different implementations of a particular function.
And really, it ain't that easy.
All TWiki installations are different.
Some heavily rely on
TWikiVariables, others don't. Some are using very fancy incantations of
%SEARCH{}%
, others don't. Some have, over time, accumulated almost every plugin, others are still using just the core set. Here and then we will end with a situation where a change in the code will be reported as an acceleration by some TWiki users, whereas others will complain that the change is slowing down their TWiki installations.
The Tools
Let's first list some contenders:
AthensMarks,
ab
,
LWP
, and browser tests.
|
AthensMarks |
ab |
LWP |
browser tests |
automated? |
+ |
+ |
+ |
- |
HTTP based? |
- |
+ |
+ |
+ |
multiple topics? |
- |
- |
+ |
+ |
client effects? |
- |
- |
- |
+ |
programming required? |
- |
- |
+ |
- |
- Automated tests are cool if you want others to run your test suite easily.
- HTTP based tests are required if you are interested in e.g. the effect of persistency (ModPerl, PersistentPerl) on performance. The drawback is that for reliable figures you need control over what's running on the server.
- Measuring with a topic mix is important if scaling is important: Server caches or swap spaces are not really challenged with a sequence of identical requests.
- Client measurement with a real browser include the performance effects of layout, stylesheets, javascripts, images, and the other things which make the optical effect. Client measurements are helpful for an overall statement to compare different releases, but don't provide any clue where a problem might come from.
- For Firefox, there's an extension fasterfox which allows to measure loading time.
- For
LWP
based tests, there is no readlily available test environment yet.
So, depending on what you're trying to achieve, you'll need more than one of these tools.
The Objects
Both
AthensMarks and
ab
are suffering from the fact that they measure just one single topic. This means that for useful results, you need to carefully select (or even
create) a topic which allows to demonstrate the difference between two implementations.
For an overall comparison of different TWiki releases (like e.g.
CairoRelease and
DakarRelease) measuring just this one topic
WhatIsWikiWiki (and an extremely boring one, performance-wise) is insufficient.
AthensMarks were specifically designed to provide a level, consistent playing field for measuring
core code performance - that's why
AthensMarks describes them as "Core code benchmarks" and advises using different pages for performance comparison of different features.
AthensMarks, on the other hand, has one big asset left: It is well document how to do it.
We know many things which affect TWiki performance - see
SpeedAndCompatibility for a recent list. Only half of the points could be efficiently be measured with
WhatIsWikiWiki. And the environment is important, too: How many users and groups do you have? How many topics in one web? How many webs? How many (and which) plugins?
A precise description of
what you measure is at least as important as
how.
Suggestions
If you try to do a benchmark, record at least the following information:
- The Code base (SVN release, or release plus diff file)
- The tool you are using
- The topic(s) you are using - maybe the Testcases web could be used to collect topics which provide valuable benchmark information?
- Your TWiki scale (number of users/groups/topics) - may be irrelevant in many cases
- The platform (OS, webserver) - not always crucial, but in some instances (slow
fork
implementation in cygwin, for example) might matter
I hope that over time we'll develop a list of benchmark relevant topics, so that for every idea to improve performance there may be
one topic designed to demonstrate the value of the idea, and
a list of topics against which the change can be tested, much in the way of a regression test.
--
Contributors: HaraldJoerg,
CrawfordCurrie,
AntonAylward
Discussion
Feedback is welcome, of course. Maybe we need a page about
ProfilingTWiki as well.
--
HaraldJoerg - 24 Apr 2006
Maybe so that everyone can test on the same data we need a set of webs (or even entire TWiki sites) with topics in them carefully selected or designed to stretch the various parts of twiki.
We could put them in
SVN so that anyone can check them out and run automated or manual tests against them.
--
SamHasler - 24 Apr 2006
Much like the
ViewDEVELOP:TestCases web in
SVN: For developer's use only. +1 to that proposal.
--
RafaelAlvarez - 25 Apr 2006
Regarding
performance-wise) is - sorry - nonsense. No, it isn't.
AthensMarks were specifically designed to provide a level, consistent playing field for measuring
core code performance - that's why
AthensMarks describes them as "Core code benchmarks" and advises using different pages for perrformance comparison of different features.
Having defended them, I confess I am ready to abandon them for something better; but I haven't seen anything better yet.
A data set which can be fully rendered multiple times to build up an average performance sounds very sensible. A minor mod to the
benchmark.pl
script would support that. +1 from me as well. Though remember that the formatting has to be to the lowest common denominator (i.e. Cairo features only).
--
CrawfordCurrie - 25 Apr 2006
Users don't see Athensmarks or
ab
. I'm not putting down these as useful metrics, but do bear in mind that what counts is not the analytics but what gets delivered to the end user. That's what counts.
Firefox has a extention called 'fasterfox' which, as a side effect, measures the time from when you click on a link to the rendering stops. You may think that unfair, but this method acocmopadtes things in the templates about 'layout before content' and the like, which affect how the user perceives the delivery of the page. You may think 'perception' is not analytic, but that's beside the point its how the user is evaluating the utility of the tool. User's aren't aieve - they know that more complex things take longer. The sad thing is that some operations that use
SEARCH
are internally complex but don't appear complex to the user. The use of
SEARCH
for
Category
to build the
WebLeftBar is one example - repalcing it by a static link improved performance and perceived performance since the user could not tell the former method was complex.
The user views performance via a browser, not via
wget
,
ab
or the like. Yes, browsers upset the balance by caching things like javascript, css and graphics.
after the first access. User may visit other pages that might cause the cache to be flushed on TWiki realted material.
Yes, its complex, but the bottom line is always the user perception of performance, not the analytic metrics. Obsessing about the analytic metrics may produce a system that is faster for
ab
but not for the user.
--
AntonAylward - 25 Apr 2006
There are three main truths about TWiki performance:
- If the Core is slower, the perceived speed is also slower (the opposite is not true).
- Given the same Core and Skin, the speed is only affected by Plugins.
- Given the same Core and Plugins, the only things that affects the perceived speed are Skins.
So, each kind of developers needs a different kind of benchmark.
- Core developers need a fast benchmark on the raw core speed (that's AthensMarks so far) to ensure that a change don't slowdown the Core.
- Plugins developers need a fast benchmark that shows if the Plugin has degraded performance (AthensMarks can be used for that) and a benchmark that show if the Plugin slows down the page rendering by using additional css and js files (we don't have an automated test for that, yet).
- Skin developers need a benchmark that shows if changes to the Skin degrades or improves the rendering time.
Let's not say one bechmark is better than the other. They are measuring different things, and serve different purposes. As long as the results are consistent they should be ok.
One of the main problems on using
AthensMarks as the only way to benchmark is that there is a big possibility that the Core is faster but, given a specific skin, the rendering is slower. This is because there are many, many thing that affects the rendering in the browser. I have seen improvements of 100% in speed just by recoding an html page and it's css, or a 1000% gain just by changing a badly-coded javascrip-based menu to a well-behaved css-based one.
So, please stop beating the core developers because they're not taking into account perceived speed. I speak for myself, but really I don't care about perceived speed at this moment because we have a severe case of "speed degradation" at the core. Once TWiki4 with print skin on a totally blank topic with default plugins is as fast as Cairo with the same conditions, we can start talking about perceived performance. Until then, there is no point as the difference is too big to be solved only with "rendering hacks".
--
RafaelAlvarez - 25 Apr 2006
Your first main truth if not true (at least the bit in the brackets). A slower core may deliver increased preceived performance, because - for example - it delivers a cached page. managing the cache slows it down for the first access only.
All this talk about benchmarks is fine; but when is someone going to deliver a test data set and scripts? Is analysis paralysis setting in?
BTW note that "simple" benchmarks are no longer enough; the configuration has to be recorded. There are many optional switches that can affect performance. I would suggest a fragment of
LocalSite.cfg that can be 'do'ed into
LocalSite.cfg for running the benchmarks. Or alternatively, and probably better, a unit testcase that initialises all that stuff.
--
CrawfordCurrie - 25 Apr 2006
No paralysis, just laziness (which is a virtue for a perl programmer). But
the story continues. Stay tuned
--
HaraldJoerg - 02 May 2006