Dakar Performance Issues
Performance Numbers
| Skin |
Plugins |
i18n |
Additional |
Page load starts |
Page fully loaded |
Athensmarks |
| Athens Benchmark |
| Classic |
None |
None |
Some rendering errors |
0.31 |
0.54 |
100 |
| Cairo Benchmark |
| Pattern |
1 * |
None |
- |
0.67 |
0.90 |
46.2686567164179 |
| Pattern |
18 + 1 |
None |
- |
0.99 |
1.20 |
31.3131313131313 |
| TWiki 4.X |
| Pattern |
18 |
All |
- |
2.61 |
3 |
11.8773946360153 |
| Pattern |
None |
All |
- |
2.38 |
2.7 |
13.0252100840336 |
| Classic |
None |
All |
- |
1.08 |
1.33 |
28.7037037037037 |
| Pattern |
None |
No |
- |
1.06 |
1.40 |
29.2452830188679 |
| Classic |
None |
No |
- |
0.85 |
1.10 |
36.4705882352941 |
| Pattern |
None |
All |
Removed Language Selector |
1.31 |
1.65 |
23.6641221374046 |
| Pattern |
None |
All |
Disabled %LANGUAGES% in LanguageSelector |
1.31 |
1.55 |
23.6641221374046 |
| Pattern |
None |
All |
I18N::available_languages Commented out foreach my $file ( @all ) block |
1.03 |
1.31 |
30.0970873786408 |
| Pattern |
None |
All |
I18N::available_languages Commented out all except return |
1.03 |
1.28 |
30.0970873786408 |
| Pattern |
18 |
All |
After SVN 9526 |
1.64 |
1.91 |
18.9024390243902 |
| Pattern |
None |
All |
After SVN 9526 |
1.30 |
1.66 |
23.8461538461538 |
| Classic |
None |
All |
After SVN 9526 |
1.08 |
1.36 |
28.7037037037037 |
| Pattern |
18 |
All |
After SVN 9883 |
1.90 |
2.16 |
16.3157894736842 |
| Pattern |
18 |
All |
SVN 9877 |
1.72 |
2.02 |
18.0232558139535 |
| Pattern |
18 |
All |
SVN 9878 |
1.91 |
2.21 |
16.2303664921466 |
| Pattern |
18 |
All |
SVN 9526 |
1.72 |
2.05 |
18.0232558139535 |
| Pattern |
18 |
All |
SVN 9883 565 users |
1.97 |
2.28 |
15.7360406091371 |
| Pattern |
18 |
All |
SVN 9877 565 users |
1.75 |
2.07 |
17.7142857142857 |
| Pattern |
18 |
All |
SVN 9885 565 users and ALLOWTOPICVIEW on topic |
1.97 |
2.32 |
15.7360406091371 |
| Pattern |
18 |
All |
SVN 9877 565 users and ALLOWTOPICVIEW on topic |
1.75 |
2.04 |
17.7142857142857 |
| Pattern |
18 |
All |
SVN 10114 565 users and ALLOWTOPICVIEW on topic |
1.77 |
2.09 |
17.5141242937853 |
| Pattern |
18 |
None |
SVN 10114 565 users and ALLOWTOPICVIEW on topic |
1.48 |
1.82 |
20.9459459459459 |
| Pattern |
18 |
All |
SVN 10257 565 users |
1.73 |
2.08 |
17.9190751445087 |
| Pattern |
18 |
All |
SVN 10270 565 users |
1.73 |
2.08 |
17.9190751445087 |
| Pattern |
18 |
All |
SVN 10270 modperl 565 users - note 1 |
1.53 |
2.51 |
20.2614379084967 |
| Pattern |
18 |
All |
SVN 10270 modperl 565 users - note 2 |
0.75 |
1.07 |
41.3333333333333 |
| Pattern |
18 |
All |
SVN 10368 565 users + Item2327 patch |
1.74 |
2.08 |
17.816091954023 |
| Pattern |
18 |
All |
SVN 10684 565 users |
1.72 |
2.01 |
18.0232558139535 |
| Pattern |
18 |
None |
Cairo - recalibrating July 2006 - no change |
0.97 |
1.21 |
31.9587628865979 |
| Pattern |
18 |
None |
SVN 11018 565 users |
1.38 |
1.71 |
22.463768115942 |
| Pattern |
18 |
All |
SVN 11018 565 users |
1.65 |
1.94 |
18.7878787878788 |
| Pattern |
18 |
None |
TWikiFn SVN 11074 565 users |
1.38 |
1.72 |
22.463768115942 |
| Pattern |
18 |
None |
SVN 11140 565 users |
1.35 |
1.67 |
22.962962962963 |
| Pattern |
18 |
None |
SVN 11898 565 users |
1.37 |
1.67 |
22.6277372262774 |
| Pattern |
18 |
All-2 |
SVN 11898 565 users |
1.74 |
2.04 |
17.816091954023 |
| Pattern |
18 |
None |
Patch04x01 SVN 14020 565 users |
1.40 |
1.91 |
22.1428571428571 |
| Pattern |
18 |
All-2 |
Patch04x01 SVN 14020 565 users |
1.77 |
2.24 |
17.5141242937853 |
| Pattern |
18 |
None |
MAIN SVN 14020 565 users |
1.42 |
1.90 |
21.830985915493 |
| Pattern |
18 |
All-2 |
MAIN SVN 14020 565 users |
1.74 |
2.23 |
17.816091954023 |
Discussion
(for previous discussion, see
DakarPerformanceIssuesArchive2005)
From Bug item 1572 I copied this description of how to measure timing using browser (IE) and Ethereal because it has a general interest beyond the scope of the bug report.
- Measurements are done from another computer then the one running the TWiki server so that the browser does not take CPU time.
- Measurements are done using a text topic that can be seen on the screen without scrolling.
- Measurements are done with the same topic in Cairo and Dakar.
- The page has been loaded several times so that images and style sheets are cached since this is how 99% of all pages are viewed.
- Browser is setup to "Check for newer versions of the pages" -> "Every time you visit the page" as this is the way users have to have their browser setup to use CMS sites and Wikis. Otherwise they keep on missing the update they just did.
- When browser is checking versions it does not download the entire images and style sheets. It starts by doing a http GET of the page. followed by a sequence of GETs for each element with "if modified since" in the http protocol. The Apache server returns a "304 Not Modified" instead of retransmitting the entire page.
- The measurements are done by capturing the Ethernet traffic on the client machine using Etherreal.
- The definition of page loaded is when the browser Acknowledge the last received info as this is the same moment that it displays the resulting page.
- Internet Explorer is always used because I have noticed that the browser does not send the final ack until it actually shows the page.
- The reason for this method is that the user could not care less which part of the delay comes from TWiki code, pattern skin downloading, and browser rendering. He sees the sum. And the many speed improvements my investigations have triggered have happened both in code compiling, skin related delays like unnecessary searches and javascript execution. An extra advantage is that the Etherreal traces gives you all the timing. You get the time from the first GET to the first TCP segment of the HTML returned which is in practical the perl compilation/execution time. You get the time it takes to check that all page elements are not changed, and finally you get the time from the last element is checked and till IE decides to acknowledge the page received which in practical is the time it takes the browser to execute the javascripts and calculate image positions etc.
--
KennethLavrsen - 03 Feb 2006
Interesting methodology, but I disagree with the cache settings 'check for newer version of pages every time you visit the page'. If used in normal TWiki usage, this might well re-create the
BackFromPreviewLosesText bug as discussed in
BrowserIssues. It's fine for benchmarking though.
--
RichardDonkin - 03 Feb 2006
IMHO this is a difficult methology to keep konsistent, too many variable factors - just to name a few:
- Graphic card rendering times (yes, they can be surprisingly slow in some setups, disturbing results)
- Bogomips available at the client and server
- Network factors
- Delay (can be varied, delays are "timed up" on number of connections done)
- Bandwidth (some browsers does not repect if-modified-since on reload, skewing timing results on slower connections)
- HTTP-connections reused or not
- Browser issues (we all know browser rendering speeds vary tremendously)
- Don't forget to add local desktop setup variations, (known/not known) firewall/proxy implications, (known/not known) local software / os / cache setting implications, etc etc
Important to me, is that the small but important deviations in timings of the single components (i.e. core performance) are easily lost in the "big view"
But Kenneth, why don't you create a topic descriping this metholody (
KjlMarks /
TWikiRelease4UserExperienceMarks), and try to decide on something that is "index 100" in it (e.g. default 4.0.1-installation, ie version x.x.x, other needed (server/client) versions, network setup etc) allowing others to redo your findings - and then you can easily link to it, whenever you need to post results (and we will be able to see and understand exactly what you are trying to do).
AthensMarks is an example of a measurement methology that has an index 100 strategy and therefore is very useful in that it is easily used to re-prove findings in different environments.
--
SteffenPoulsen - 03 Feb 2006
Nothing you say is wrong Steffen. With measurement and analysis it is rare that there is one method which is the right one. Ethereal is one of many methods that you need to analyse performance.
Let me address a few.
- Graphics cards vary but in 2D you hardly notice the difference. In 3D things change major.
- Bogomips - when I do an AB test I normally use a server that does nothing else than being Webserver. When I measure on production machine results vary up to 300% so you have to repeat the measurment 10-20 times and pick the best.
- Network factors - yes - and this is important. When I measure at home the Network is idle and I am the only one really using it. In the real Internet you need to think network performance and design your applications so that you avoid too many connections to be established and closed.
- Browser differences. YES. Exactly. When I found the Javascript performance issue it was using Ethereal and testing with different browsers and it turned out that by changing Javascript we reduced the time it took to view large topics from 30 seconds to 5 seconds. For those of you hacking Perl code this may seem irrelevant but for the end user 30 seconds is 30 seconds. It is a performance issue no matter where the delay occurs.
The purpose of my numbers is to give some tools to identify where design can be improved to make the total experience faster. Comparing TWiki 4.0 with Pattern Skin with Athens with the ugly old classic skin does not always make sense. It is useful to do Athensmarks on the perl code but when measuring the total performance the comparison makes no sense.
An Ethereal trace now and then to check performance is something I encourage all of you to do as a supplement and mainly compare with your own previous measurements and especially look at the details because it gives a much more detailed view than an Athensmark score. If you want to analyse the Perl code only - it is probably some more advanced profiling tools that should be used or maybe even implemented in special versions of the code to find where the bottlenecks are or to experiment with better ways to only compile the code that is really needed for each topic view or experiment with copying functions from
CPAN libs instead of including the whole lib etc etc. I have seen many ideas from different people.
Different tools for different purposes.
--
KennethLavrsen - 03 Feb 2006
Time for new benchmark before 4.0.2 release.
Things are worse than ever. Again measurements are done using the most true way - using a browser - tracing network with Ethereal. Server is unloaded. Network connection is 100 Mbits. Tests repeated many times. BEST results uploaded.
URLs:
Conclusion: Code needs to be optimized again. Without localization Dakar is much slower than Cairo. With localization feature Dakar 4.0.2 is in my view useless. I would recommend anyone not to activate it.
--
KennethLavrsen - 16 Mar 2006
One of the later changes that
ArthurClemens was to fraction the Pattern templates more (more files included). I wanted to see what impact this has on performance. So I "handcompiled" view.pattern.tmpl into one big file.
Conclusion. I may see a few milliseconds difference. Nothing that matters at all. So this is
not the root cause of the slowdown.
--
KennethLavrsen - 16 Mar 2006
Can you instrument the view script to leave a log of the total time spent in the server? It would be useful to actually know how much is spending processing the topic and how much is spending in the comunication+rendering.
--
RafaelAlvarez - 16 Mar 2006
It's good to know that the pattern skin refactoring is not a problem, because that makes it much easy to customise things. It's
not good that things are so slow.
Can you identify any plugins that might be a problem? They're getting in whether or not they're being used, after all.
--
MeredithLesly - 16 Mar 2006
Kenneth, thanks for bringing this to our attention. It would be helpful to compare 4.0.1 and 4.0.2 to see the performance difference.
--
PeterThoeny - 16 Mar 2006
Hmm, quick finding: r9192 (
Bugs:Item1828
) adds ~30% overhead to 4.0.2. in my setup.
--
SteffenPoulsen - 16 Mar 2006
I was shocked to see yet another report about Dakar being faster than Cairo when you can in fact "feel" that it is slower just by browsing.
In connection with
Bugs:Item1961
I did a lot of benchmarking measureing on the client side using semi-long topic with no searches and no fancy features. Just normal
TML and a TOC.
Here are all the numbers.
Table updated and moved down -- KennethLavrsen - 22 Apr 2006
--
KennethLavrsen - 26 Mar 2006
Rafael later clarified that the performance numbers for Cairo was WITH many plugins and the numbers for Dakar was without. That is why the comparison and looked like Dakar was much faster than Cairo. Thanks for clarifying this. It gives more hope for the validity of the CLI benchmark numbers.
--
KennethLavrsen - 26 Mar 2006
I tested my theory that ACL was slowing things down. I did it in the crudest possible way: bypassing the checks altogether. Alas, it seemed to make no difference. (This was a pretty uninteresting page, so the numbers weren't swamped by searches or other expensive operations.)
It's going to be a
lot harder to figure out how to speed this up. One person said that it's all the regexes. I don't know enough about perl to know, but something is seriously wrong.
--
MeredithLesly - 30 Mar 2006
ML, I'm curious, in what way did you bypass the ACLs - by bypassing the loading of the preference hierchy altogether or just the code working out the effective ACLs?
Did you by any chance save the patch you created somewhere? I'd like to take a look.
--
SteffenPoulsen - 30 Mar 2006
Updated my numbers and again moved them to the end to avoid this topic to become too repetitive.
--
KennethLavrsen - 22 Apr 2006
AthensMarks measure only the
clientserver side time. What it's measuring is "raw core" performance. It could be the case that between versions the core is faster but on the client side the rendering is slower. But what couldn't happen is that the core is slower but the rendering is faster (not if measuring using the same skin)
I have to say that in every benchmark I performed with exactly the same conditions (same number of plugins installed and enabled, no sessions enabled in Dakar, using classic skin), Dakar has consistently being slower since July 2005. Before that that I can testify that Dakar was at least 20% faster, and that is consistent with what my users reported (and that's why I'm still running that old revision pre-July 2005).
As an example, my last benchmark (the one you pointed out before to show a flaw in the tool and later retracted) shows that Dakar
wihout any plugin installed is just barely faster than Cairo with all the default plugins installed and enabled. Given that each plugin will add a considerable overhead, I would say that it shows that Dakar is slower than Cairo. Yours show also that the new
PatternSkin is twice as slow as the
ClassicSkin
I'll try to get some time this weekend to run the benchmark against TWiki4, and post the results here. I bet that they will be consistent (but not the same) with yours.
--
RafaelAlvarez - 22 Apr 2006
I'm current running a slightly modified develop. Here are some results on the same page
So, at first glance, it's between 1 and 2% cost per plugin.
--
MeredithLesly - 22 Apr 2006
Kenneth's Speed Measurements on TWiki
- Server
- Pentium 4 1.8 GHz
- RAM 512 MB
- IDE harddisk 7200 rpm 160 GB.
- Network is 100 Mbit standard ethernet
- Distribution: Fedora Core 4.
- Cron disabled doing measurements
- Client
- Pentium 4 2.4 GHz
- RAM 512 MB
- IDE harddisk 7200 rpm 160 GB
- Network is 100 Mbit standard ethernet
- Client browser is Internet Explorer
- OS is Windows XP
- Ethereal 0.10.13 with filter for the server IP address and tcp only
Similar results has been measured on my production machine which is a 2.8 GHz machine with 1 GB RAM.
And at work on a DUAL XEON 2.8 GHz with 2GB RAM and two SCSI disks running at 10000 rpm I measure the same relative difference between Cairo and Dakar (measured at late night at no load). Load time per page during daytime is currently round 1.5 seconds for pages with no searches. At times with load the time increases to 2-5 seconds. But average is round 1.5 at daytime.
Results
I have renamed the last column to Athens Index. Just to ensure that noone think I am using the benchmark tool to generate those numbers. That column is a mathematical calculation of the "page load starts" numbers relative to Athens. And they are done so the maintainer of the perl benchmark tool can compare the calculated athensmarks with my index. The numbers should not be the same but they should be similar.
Table moved down the topic -- Kenneth Lavrsen
- In Cairo SessionPlugin is enabled to compare with TWiki4 which also runs with sessions (to compare apples with apples)
- Athensmarks are calculated based on 'Page load starts' to measure mainly the code execution. But the true benchmark would be to look at the page fully loaded numbers.
Conclusions.
- SVN 9526 really works. Great job.
- The benchmark tool and its athensmarks is dead wrong. There is no connection between what it says and what you measure on the client side.
- SVN 9883 is a very bad. We are again going in the wrong direction.
- SVN 9878 is the one that added 200 ms of the delay
- Increased number of users from 5 to 565.
- Before SVN 9878. 1.75 -> 1.75 s = No real change
- After SVN 9878. 1.91 -> 1.97 = Small increase
- Conclusion. Users cost more with SVN 9878
- ALLOWTOPICVIEW does not add time
We really need a better tool than what I use because it is too slow to perform in practical use. But it is probably the best tool we have to measure real performance as seen from the customer. And I will be happy to perform the tests on a regular basis. But we need to find out how we can have a perl based benchmark tool that measures something completely different than measured from the client. And when we find the reason we have found a major bottle neck.
I have been heavily attacked on IRC about my latest numbers and being accused that they were not to be believed and that hardware was probably to blame. I fail to see the scientific argument how a relative measurement on the same machine, the same setup, the same method - with only one
SVN checkin as a variable can trigger a major hardware difference. The absolute numbers are for sure very depending on hardware. Look at the relative difference. And focus on the time from GET to first byte returned because this is what varies with the code changes. The time between first byte and page fully loaded changes when people changes the skin and add plugins that adds more css and js.
I hope more people will verify my numbers so I do not stand alone against the unscientific arguments.
--
KennethLavrsen - 22 Apr 2006
Sadly I don't have the complete output here, but I ran a benchmark (
AthensMarks) this weekend against Cairo, TWiki 4.0.2 (release package) and an old revision (rev 4664, IIRC). The numbers where:
- Cairo (Classic): Around 50
- Cairo (Pattern): Around 48
- Dakar (Classic): Around 42
- Dakar (Pattern): Around 38
- 4662 (Classic): Around 15
- 4662 (Pattern): Around 13
All those measurements where performed using only
InterWikiPlugin and
DefaultPlugin.
As I predicted, the numbers are quite consistent with yours. There is something very significant in the fact that
PatternSkin don't add that much overhead to the core, but acording to your benchmark it really adds a lot in the "transference" part.
I don't believe that your numbers are "wrong". But also I don't believe that the
AthensMarks benchmark is wrong either.
--
RafaelAlvarez - 24 Apr 2006
Yes, there are different aspects to performance. An important aspect is measuring the server end only, since total time to full client view has to be larger than that (ignoring any client-side caching, of course).
--
MeredithLesly - 24 Apr 2006
CategoryPerformance
--
CrawfordCurrie - 25 Apr 2006
ok. Now I have the numbers. These tests where ran using
AthensMarks. The benchmark was run under Cygwin, on a PII machine with 256Mb of ram. Session support and internationalization were turned off. Subwebs setting has the default value. All languages where disabled (just in case, I don't know that code).
TWiki core code benchmarks (
WhatIsWikiWiki)
Having a sudden rush of inspiration, I commented out the template handling (ie, make readTemplateFile always return '%TEXT'). Afterwards, I optimized the common case where the skin is totally defined in the .tmpl files. Here are the numbers (only TWiki4 numbers shown):
In my excitement, I forgot to copy the patch... I'll upload it later today.
--
RafaelAlvarez - 25 Apr 2006
With or without sessions? With session cleaning in core enabled, or backed off to cron job? With hierarchical subwebs enabled?
--
CrawfordCurrie - 25 Apr 2006
I added those bits of information. Also, attached a zip with the
LocalSite.cfg for DEVELOP and TWiki4.
--
RafaelAlvarez - 25 Apr 2006
I would like to see your Athensmark numbers for configurations that are more like a real live installation. Very few will run a TWiki with only plugins and Session support is so essential and smart that it should be enabled also.
In Cairo I install the Session Plugin. And in Dakar I simple enable sessions.
Also try and measure with and without i18n.
When I have said I did not believe in the Athensmarks then it is because a few developers have reported that Dakar was faster than Cairo which is not anywhere near what you experience when you use it. But maybe they simply did not compare apples to apples. You did and that it good. I am just very curious to see if you get similar Athensmarks as I get when you have Sessions enabled (use a negative expiry value in Dakar) and enable all the default plugins plus a few extra so you have 18 like I used. The choice of 18 is based on the assumption that a typical TWiki has all the default plugins plus a few extras.
--
KennethLavrsen - 06 May 2006
when I get the chance, I'll do a benchmark with those conditions.
--
RafaelAlvarez - 06 May 2006
I did some performance tests with a large number of users. I created 60K users and one large group containing all 60K users. I also created a test web, which is view access restricted to that group. TWiki version is
SVN 10760 (TWiki 4 branch), which is almost 4.0.3. This is on an entry level server (Celeron 2.0 MHz, 1 GB RAM, RedHat Enterprise), with a very light load.
Here are the benchmark numbers for accessing WebHome in the view protected web as a non-admin user:
2 sec: Before creating 60K users, accessing protected web with two users in group, total 3 registered users, total 3 TWiki groups
3 sec: After creating 60K users, accessing protected web with two users in group, total 60K registered users, total 4 TWiki groups, one with 60K members
18 sec: Accessing protected web with 60K users in group, total 60K registered users, total 4 TWiki groups, one with 60K members
As you can see, there is a big performance impact if there are 60K users in a group.
--
PeterThoeny - 30 Jun 2006
Time to see where 4.0.1, 4.0.2, 4.0.3, 4.0.4 + up till date has brought the benchmark numbers.
Again I post the entire table here. And remove it from above. Test method is still Ethereal to seperate code time from skin/browser time.
numbers moved to topic start
So we have gotten nowhere in practical. Looking at what people have added to
WhatIsIn04x01 it is clear that performance has no priority with developers. That is very very sad because this is the biggest challenge TWiki has if it wants to survive. I have added it now. And I will insist that it stays. The "feature" that needs to be implemented is
profiling. It is clear that those that created Dakar has no clue where the performance disappeared since Cairo. We need to know where the time is spent to fix things. We need to be able to build in measuring points in the code so we can isolate the time consumers.
Also - As you can see.
I18N as feature is still far too expensive in performance to be something I would ever consider enabling. There is a proposal to do more i18n including plugins. You must have patient users. Not with the current
I18N implementation.
--
KennethLavrsen - 15 Jul 2006
But this test would give a clue where to spend our efforts.
%LANGUAGES% seems to be responsible for 6 points for example.
--
ArthurClemens - 15 Jul 2006
My users are getting 19 seconds-45 seconds or more to log in. I'm hoping this won't kill the project but it might. Why would logging in take so long?! I'm even using fcgi, for whatever that's worth.
And, yeah, I'm with Kenneth. Have been for a while. (Surprise!) Features are useless unless they're at best performance-neutral and ideally performance improvements. I don't know how to figure out timing on a finer grain though.
--
MeredithLesly - 15 Jul 2006
Time to get involved in
BenchmarkFramework, isn't it

It is on my agenda, but as I wrote occasionally, I am making progress very slowly.... I'm as far as
http://munich.pm.org/cgi-bin/view/Benchmarks
and
http://munich.pm.org/cgi-bin/view/Benchmarks/Tools_MiniAHAH_6
(Sorry,
you can't run benchmarks on that TWiki - it is not yet public) but have postponed it....
--
HaraldJoerg - 15 Jul 2006
Re
I18N - I'm not sure if you mean the basic
WikiWord charset support, which is virtually performance neutral and has been around since the Feb 2003, but you probably mean the
UserInterfaceInternationalisation features added in Dakar. The latter seems to be responsible for about 0.20 to 0.30 seconds of 2.09 seconds or so (comparing a few pairs of tests with and without /UserInterfaceInternationalisation).
Given that you can turn off
I18N if you have an English-only site, paying an 'I18N tax' of about 10-15% of performance doesn't seem too bad, compared to the alternative of much more expensive or less well supported translations of TWiki.
I18N gives access to a large number of new users, particularly in Asia-Pacific countries. Of course, it would be great to improve
I18N performance through profiling, which I'm sure can be done.
--
RichardDonkin - 19 Jul 2006
This may already be the case but I am not sure....
I would suggest that
- in the out-of-the-box installation all features that eat substantial performance are turned off, or at least
- there is mention in the installion documentation which features eat substantial performance and how to turn them off.
--
ThomasWeigert - 01 Aug 2006
Like
HierarchicalWebs, which are now considerably improved thanks to Peter, but still slow things down.
--
MeredithLesly - 03 Aug 2006
I18N is already turned off in Dakar (and all other releases) as shipped - you have to explicitly turn it on. This applies to both the 'WikiWord
I18N' added in 2003 and 'Message
I18N' added in Dakar.
--
RichardDonkin - 06 Aug 2006
Even without
I18N Twiki4 (Dakar) is round 30% slower than Cairo.
What went wrong? When you turn off
I18N and subwebs there is hardly any new features in TWiki4. And yet it runs 30% slower than Cairo.
Why? This is a question that must be answered. And fixed.
--
KennethLavrsen - 06 Aug 2006
I will check back into the code, but as far as I remember, hierarchical webs doesn't add much of anything besides a few extra terms to some regular expressions, and if you're actually in a hierarchical web, it loads the preferences for all the webs above it in the hierarchy (top-to-bottom) overriding as it goes. One of the biggest problems with TWiki is that it doesn't (or can't) use one-time compiles for some of the bigger regexes, which forces the interpreter to re-compile an expression every time it's hit. I think this has always been a problem, so it wouldn't explain the sudden slow-down.
How about using -d:DProf on a non-mod_perl installation? It'd tell you where the majority of the time is being spent. I have a TWiki4 installation I could try it on.
--
PeterNixon - 07 Aug 2006
Kenneth, I really, really wish you would stop making sweeping statements like
yet it runs 30% slower than Cairo. Where is your evidence? It doesn't run 30% slower on
my system, when I normalise the configurations so that I am comparing apples with apples (
AthensMarks). Show me benchmarks that I can reproduce!
To compare Dakar with Cairo, you have to switch
all the new features off, otherwise you are just screaming in space.
--
CrawfordCurrie - 16 Aug 2006
I tried turning everything off (in the hope that the monster can't breathe vacuum) and what do the
AthensMarks show? A 30% slowdown w.r.t Cairo, exactly as reported by Kenneth. Sorry Kenneth. Now, I wonder what has changed since I last ran the
AthensMarks?
--
CrawfordCurrie - 16 Aug 2006
Actually, from the users point of view, the performance of a standard "out of the box" TWiki installation should be measured, because this is what an admin and the users experience.
--
PeterThoeny - 16 Aug 2006
I have taken a standard new TWiki. Added my Motion web from a certain date and my users, approx 550. And added a few extra plugins adding up to a total of 18. I chose 18 because that is approx what I run with at Motorola and at home.
And I use different test topics to test different features. But for my published benchmark I have chosen a semi long topic where only feature used besides headers and text is the TOC. I have also tried without TOC.
And when I measured Cairo - I used an installation with more or less the same plugins and I installed
SessionPlugin so that Cairo would be slowed down by handling sessions the same way that TWiki4 must be doing. Again to compare apples with apples as much as possible.
I have tried to add more and less plugins. And this has never really made any significant difference between the two. I have tried totally without plugins also. The difference is always very similar, but the plugins do add about 2% of the naked core time for each plugin.
All my numbers are above or in attachments. And I have confirmed the approx 30-40% slowdown on 3 machines with 3 different generations of Fedora/Redhat. Rafael also published the same kind of numbers (measured with Athensmarks). And others have followed with either numbers or confirmations about the experienced slowdown. I must admit that I got very upset when I read Crawfords first posting from 16th Aug above. I have provided so much evidence and with others confirming the numbers that could not understand the claim that I had no evidence. I have documented my setup and attached many many files to this topic. And I have measured so many times over a long period with consistant results.
But let us forget that now. I am happy that Crawford now also has seen the same 30% and this is even when using Athensmarks.
Some experience I have made. It is very important that the test machine is much under control.
- Cron must be killed.
- Web access from users must be blocked.
- GUI "stuff" must not run. A screensaver kicking in on a GUI that is normally only accessed using VNC can change results dramaticly. So always make sure Xvnc does not run. Turn off services that are not needed. You never know when they run.
- Running
ab from remote machine vs locally should in principle make a difference but my experience is that the difference drowns in noise. ab seems to use very little CPU time to do its job.
- Make sure that the URL is in your localhost file or use raw IP address. You do not want DNS lookup to vary results.
It is valid to run fractions of TWiki to try and benchmark changes to a certain feature. But it it always important to come back to a normal usecase.
I found that running
ab many times and throw away numbers that are worse than average is a good approach. And I still use Ethereal to sanity check how long the skin takes to load. This is important because some code can suddenly take 30 seconds to execute in the browser because of browser bugs. But Ethereal is a slow measurement and if people work in the perl code the ab results seems very consistant and very easy to run.
But ab must be run with at least -n 10 and this must be done at least 10 times. Check that number do not consistantly raise. Make sure the session are not expired by Twiki (negative value in configure). Throw away unnormal long durations. In fact the fastest of the measurents is what I normally consider the most correct because this means the code ran undisturbed. If numbers vary too much you need to find the process that still runs and disturb the result.
For a comparison between Cairo and TWiki4 to be valid the plugin environment, and the number of webs/topics and users must be the same or very near the same. Cairo must run with
SessionPlugin because otherwise the Cairo install gets an unfair advantage.
Apache server must be same. Authentication method must be same. Same number of users in .htpasswd.
All this is good for benchmarking towards Cairo, and for before and after checking of code changes. But to see where the time is spent we need the profiling and here I have my hopes that Harald will help with guidance in getting a good profiling framework working.
--
KennethLavrsen - 16 Aug 2006
Updated the numbers and moved the table to the top of this topic. And it is going the wrong way again. Without i18n the numbers are almost unchanged since last measurement. With i18n the performance has gone from bad to worse. It is still my impression that performance gets near zero attention. The
WhatIsIn04x01 shows this. All proposals except mine are about adding more features. And noone are trying to use the new profiling framework for anything.
When I test i18n I now select (All - 2) languages because more languages have been added and for each language added TWiki becomes slower. Which is actually very strange?? Does Twiki read in all the language files and not just the current? The two I do not select are the chinese ones.
--
KennethLavrsen - 06 Nov 2006
Ken, agree 100%.
On the table above, can you explain what was done to get some of the results. E.g., "Commented out
foreach my $file ( @all
) block"... how do I get to that point on my current release, as I like the performance numbers in that version?
Finally, can you add some pointers to the Performance benchmark and how to use it? I was not even aware that it existed. So sorry
--
ThomasWeigert - 06 Nov 2006
The actual setup I use is explained in the text above (Ethereal). But I have later found that a simple
ab test pretty well gives the numbers "Page Load Starts". The page fully loaded is a measure of how long it takes to load all the "skin stuff" including the time it takes IE. to render it. It is normally mainly given the time to check all the extra files for updates. Normally fairly constant extra but sometimes some misfortunate javascript triggers a bug in IE so that is one of the reasons I measure this also.
Benchmarking is easy. Turn off anything that can vary results. Ie. turn off crond. Stop all services that you do not need. Make sure nothing else load your test server. And start using
ab.
What Harald has probided is a profiling framework. See
BenchmarkFramework. But it is not giving any answers easily. It is hard work to find out where the time is consumed. I personally believe the key to cutting down the execution time is to find out exactly where TWiki is spending the time. Not just guess but actually measure it and then work on coding things smarter. But nothing is going to come easy because then it would have been found a long time ago. And it is going to be 100+ small improvements that makes the numbers. But Haralds good initiative deserves more attention - even if cutting off 10 ms is less sexy than making a new feature.
--
KennethLavrsen - 06 Nov 2006
I believe we can win some performance by changing
po files to compressed
mo files as described in
MAKETEXT.
--
ArthurClemens - 06 Nov 2006
Sorry Ken, what I meant was "how do you created the version that was described as 'Commented out
foreach my $file ( @all
) block'" as that version seemed to have much better performance than similar releases.
--
ThomasWeigert - 07 Nov 2006
Thomas. The "foreach" was related to the language list where I turned off the feature this way. Not relevant as the slow code was later eliminated.
I have done a fresh benchmark here the 03 Jun 2006 on
SVN 14020. I benchmarked both the Patch04x01 branch and the MAIN branch.
I recalibrated with Cairo and got same numbers as earlier.
Last benchmark was run shortly after release 4.0.5 and before we released 4.1.0.
Without languages enabled the slowdown is 1.37 -> 1.42 seconds for the Perl code to execute. A very small increase which can be a noise. With languages the performance was the same as last benchmark. But measuring additionally how long the skin takes to load things are going the wrong direction. Without languages the time has gone from 1.67 -> 1.90. And with languages the increase is 2.04 -> 2.23.
So a simple FAQ pages with languages now takes 2.23 second to show on a server with no load at all. It is going the wrong way.
And compared with Cairo - comparing the numbers only without languages since Cairo was English only.
- Perl code: 0.99 -> 1.74.
- Perl code + Pattern Skin load: 1.20 -> 1.90
So for the customer TWiki 4.2.0 takes 58% longer to show simple pages than Cairo.
The numbers also say that Patch04x01 and MAIN run the same speed. This means that since 4.1.2 the speed is unchanged. The slowdown happened round the development of 4.1.0
I am a but surprised that we do not see just a small improvement because I had gotten the impression from various discussions that new user code would be more efficient. There was some optimisations discussed related to creating user objects. On a mid size TWiki with round 600 registered users I cannot see a positive trend. Maybe if someone runs with 10000 users you will see a difference?
We know that the speed of the new query type search and GROUPS if you have a large group definition is a problem. But this benchmark does not take this into consideration at all. This benchmark runs a very plain text topic with a TOC and no searching.
I hope the next weeks of bug fixing and optimizations will give us at least a small improvement. If we are going to look at the latest numbers with positive eyes then at least the last large code refactorings have not added huge delays except for the tql search and GROUPS issues which need to be addressed.
--
KennethLavrsen - 03 Jun 2007
For anyone interested in delving deeper into performance issues, I worked out a way to get reproducable performance information for the startup (compile) and runtime phases. See
InvestigatingTWikiPerformance.
--
CrawfordCurrie - 04 Jun 2007