Tags:
archive_me1Add my vote for this tag extract_stuff1Add my vote for this tag performance1Add my vote for this tag create new tag
, view all tags

Introduction

This describes a series of performance experiments performed on TWiki 1612, with the code change in Prefs described before. The experiments were performed using "ab" running on the server machine, a 1GHz Athlon running Suse 9.0, with no accelerator. Experiment indicates that the first number (maximum speed) goes to almost zero when using an accelerator, and the speed gain is a constant for all releases.

Each run consists of an ab -n 50 -c 3 preceded by an apache server restart, so each experiment is an average of 200 page views. The page viewed consisted of a single "-" character. All experiments are run on the same server.

Terminology

RPS
Requests per second
SPR
Seconds per request
Delta
average time to step from last step to this step
Percent
Delta expressed as a percentage of (Full view SPR) - (No initialize SPR)

Previous releases

Previous releases; exactly the same experiment. Core RPS is RPS for the core code alone (discounting the constant initialisation overhead), and more accurately reflects the relative performance of the core code in the three releases. Accelerated TWiki should deliver approx. 50% of this number of RPS.
Release RPS Core RPS
Athens 2.17 26.7158
Beijing 1.94 11.1195
Cairo 1612 1.7 6.1790

First level analysis

Experiment indicates that the first number (maximum speed) goes to almost zero when using an accelerator, so the interesting times are the deltas.
  Run 1 Run 2 Run 3 Run 4 RPS SPR Delta Percent
No initialize - max possible performance 2.28 2.34 2.38 2.4 2.35 0.43  
Initialise with immediate return 2.33 2.27 2.29 2.29 2.3 0.44 0.0102 6.30
Initialise with return before access controls 2.17 2.24 2.19 2.18 2.2 0.46 0.0199 12.27
Initialise with return after access controls 2.07 2.1 2.11 2.21 2.12 0.47 0.0156 9.62
Initialise with return before user prefs 2.16 2.15 2.12 2.08 2.13 0.47 -0.0011 0.00
Initialise with return after user prefs 1.92 1.98 2.03 1.98 1.98 0.51 0.0357 22.03
Full initialise 1.97 2.05 1.91 2 1.98 0.5 -0.0013 0
View::view with immediate return 1.94 2.03 1.99 2.01 1.99 0.5 -0.0025 0.00
View with return before handleCommonTags 1.92 1.97 1.98 1.92 1.95 0.51 0.0116 7.17
View with return after handleCommonTags text 1.91 1.92 1.93 1.94 1.93 0.52 0.0060 3.71
Return before hCT $tmpl 1.94 1.91 1.98 1.94 1.94 0.51 -0.0047 0.00
Return after hCT $tmpl 1.74 1.82 1.88 1.83 1.82 0.55 0.0354 21.88
Return before getRenderedVersion 1.52 1.72 1.79 1.82 1.71 0.58 0.0337 20.84
Return before checkAccessPermissions 1.67 1.7 1.75 1.69 1.7 0.59 0.0034 2.12
Full view 1.68 1.69 1.73 1.71 1.7 0.59 0.0000 0

Notes and observations:

  • The "No initialise" time is reduced to almost nothing with SpeedyCGI or ModPerl.
  • The 6.3% taken to initialise TWikiDotPm is interesting. This is not accelerated by SpeedyCGI or ModPerl.
  • StoreDotPm and PrefsDotPm both take an inordinately long time to initialise.
  • Because the page has only text content, the first call to handleCommonTags is relatively fast. The second call processes the template and is much slower. But this shows that view is dominated by three things; the time taken to initialise user preferences, the time taken to handleCommonTags and the time taken to getRenderedVersion.

Additional experiments

The same experiment performed on build 1617 with some different variants:
Description RPS
Initialise with return after user prefs 1.95
Initialise with return after user prefs with prefs from forms commented out 1.97
Initialise with return after user prefs with prefs from topics commented out 1.98
Initialise with return after user prefs with body of readPrefs commented out 2.02
i.e. it doesn't make a vast difference wether prefs are taken from forms or topics. The last experiment comments out the topic reading, and indicates that that is where the time is being spent.

-- CrawfordCurrie - 26 Jul 2004

Some more numbers. Rev 1665, rendering 2 pages, one with no content, a second with 13 searches returning 41 results:

  No content 13 searches
open or opendir calls 19 212
unique open or opendir calls 17 61
calls to handleCommonTags 3 65
calls to getRenderedVersion 3 3

"unique open or opendir calls" is the number of calls to open or opendir less the number of re-opened and re-read files.

-- CrawfordCurrie - 02 Aug 2004

Same machine, same environment, same experiment, sorted in order of increasing awfulness.

Release Skin Plugins RPS
athens   DefaultPlugin 2.2675
beijing   DefaultPlugin 2.1250
beta20040816   DefaultPlugin 1.2350
beta20040816 dragon DefaultPlugin 1.1675
beta20040816 pattern DefaultPlugin 1.1300
Note that the skinless pefrformance has slipped from 1.7 with build 1612 to the current 1.235.

-- CrawfordCurrie - 18 Aug 2004

PatternSkin imports a rather big css file (25K), that is cached by the browser after the first view. I don't know about ab, but I guess it does not deal with caching...

Problem area for PatternSkin is probably the dynamic inclusion of WebTopBar, WebLeftBar and WebBottomBar. A quick win could be to replace WebTopBar and WebBottomBar by a Web variable. I would like to keep WebLeftBar, even when this slows down.

For this I would need to know how much performance improves when WebTopBar and WebBottomBar are not included by twiki.pattern.tmpl.

I am also curious what happens when the lines

   /* Custom overriding layout per web or per topic */
   @import url("%USERLAYOUTURL%");
   /* Custom overriding style per web or per topic */
   @import url("%USERSTYLEURL%");
are not in the template. Although these are not used yet, I don't know if they are sending server requests and thus slowing things down.

-- ArthurClemens - 19 Aug 2004

OK, I've done some tests with ab myself, to get some answers. First I used the same setup as Crawford, with a topic with just one '-' character (test page), with the command ab -n 50 -c 3. I could not do the Apache restart. However, I found very inconsistent results, so I cranked up the requests to 500. This time I got constistent results with subsequent runs.

The difference between default skin and pattern skin is indeed big. But to find out where the problem is proves difficult. It seems that each dynamic element adds a bit to the overall performance loss. %TABLE% (2 times) and %REVINFO% add quite a bit. But when I remove all dynamic parts from the templates, there remains a performance gap that I cannot explain.

Skin Template adjustments RPS
default   3.04
pattern   2.15
pattern no css 2.25
pattern with css, no USERLAYOUTURL, USERSTYLEURL import 2.22
pattern with css, no top bar, no bottom bar 2.30
pattern no css, no top bar, no bottom bar, no left bar 2.46
pattern no css, no top bar, no bottom bar, no left bar, no TABLE 2.55
pattern no css, no top bar, no bottom bar, no left bar, no TABLE, no REVINFO 2.77
pattern stripped everything, just htmldoctype, head, body open and close 3.44
pattern no css and removed top, left, bottom once again, now from view tmpl 3.04

-- ArthurClemens - 19 Aug 2004

Wow, Arthur's server is a lot faster than mine! wink

For clarification, the experiment I use is not just ab -n 50; that's just one run. Each experiment comprises four runs with a server reset between each. The idea was to clear any server caches (such as cached httpd processes) to try to prevent repeated evaluations of the same page being trivially cached. So my results are actually based on ab -n 200. Yes, agreed, there can be considerable variation in the results, even with ab -n 500.

TBH I think we need a benchmark that eliminates the server completely i.e. calls TWiki to render from the command-line without an HTTP transaction.

BTW your assessment that "each dynamic element adds a bit" is IMHO bang on the mark, and is the case for the TWiki core generally - each tag, feature, bell and whistle "adds a bit" to the performance problem. Cairo is suffering the death of a thousand cuts (death of a thousand features?)

-- CrawfordCurrie - 19 Aug 2004

Have you tried to profile a TWiki run using DProf? To do it, just change the shebang in the view script to add the -d:DProf option. It'll create the tmon.out file in the bin directory.

-- RafaelAlvarez - 19 Aug 2004

OK, these are good numbers to look at. Now we need to decide how to interpret them and how to proceed. First, some questions:

  1. Difference between recent builds. Assuming a 100% base line for Beijing (at 2 RPS), what is the reason for the 25% difference between Cairo 1612 (1.7 RPS, 85%) and Cairo 1725 (1.2 RPS, 60%) for classic templates? This period (from 26 Jul and 16 Aug) had mainly bug fixes and pattern skin enhancements. Different TWiki.cfg and preferences settings?
  2. Difference between classic templates and pattern skin. Crawford measures about a 47% drop between classic templates and pattern skin; Arthur measures around 30%. Why this difference? Different TWiki.cfg and preferences settings?

Are there any quick fixes for one, for both?

-- PeterThoeny - 20 Aug 2004

I must say these statistics are not precise at all. On my server the speeds differ from minute to minute. So 47% should instead read "a lot slower". Also the differences between builds could be due to network traffic.

-- ArthurClemens - 20 Aug 2004

Agreed. It has to be said that benchmarking using ab is not a good way to do it. Apache is indeterministic. I have tried various ways of establishing consistent performance benchmarks, but even on a server isolated from the outside network, with cron and all other services shut down, I still get up to 15% variation between two subsequent runs of the same code on the same data, when run from Apache. So you have to read all the numbers above as "it is about this much slower". I persisted with ab only because I was mainly interested in the effect of CGI accelerators when I first generated the experiments.

The only way to benchmark deterministically is (I suspect) to instrument the code using Benchmark.

-- CrawfordCurrie - 20 Aug 2004

Have you tried DProf? Is this useful?

I really would like to have an instrument that helps decide what features to omit from PatternSkin.

-- ArthurClemens - 20 Aug 2004

Yes, I've DProf'ed TWiki quite extensively in the past. Generally profiling tools are useful for isolating "patches" of performance-soaking code, but with TWiki it's "a death of a thousand cuts" i.e. the performance problems tend to be a result of repeated application of rather large blocks of code, making this kind of profiling of limited use. I have found I learnt more using the kinds of techniques we both employed above; cutting out whole blocks of functionality, and observing the impact.

Peter has highlighted this topic (in RestructureCodevWorkFlow) as being one of the areas where people can help get Cairo out of the door. He has also highlighted previously that performance optimisations will have to wait until after Dakar. Which is it to be? What is the performance target for Cairo?

OK, to address the vagaries of startup times I redesigned the experiment. I added a script called "benchmark" to each bin dir. This is an instrumented copy of the 'view' script that works in the browser and from the command line. From the browser, it simply generates a copy of the query in a file. From the command line, it reads the previously saved query and times the view. The timing is repeated 7 times and the first result thrown away, to try to eliminate effects from paging in.

In the case of Cairo this is trivial, but requires a wee bit more footwork in athens and beijing. The Cairo benchmark script is attached, as is the script used to analyse the results.

The results are far more reliable than the ab results above. Especially interesting are the results for different skins on the beta. They highlight that the performance of skins is affected heavily by the CSS load times. Time is CPU time in seconds.

Release Skin Plugins Absolute Relative to athens
athens   DefaultPlugin .1416 100.00%
beijing   DefaultPlugin .6316 22.95%
beta20040816   DefaultPlugin .4683 30.96%
beta20040816 pattern DefaultPlugin .4700 30.85%
beta20040816 dragon DefaultPlugin .4683 30.96%
beta20040816   DefaultPlugin .4683 30.96%
beta20040816   add SmiliesPlugin .5166 28.0600%
beta20040816   add InterwikiPlugin .5300 27.6600%
beta20040816   add SpreadSheetPlugin .5666 24.7000%
beta20040816   add RenderListPlugin .6133 24.4500%
beta20040816   add SlideShowPlugin .6166 22.7000%
beta20040816   add CommentPlugin .6350 22.2900%
beta20040816   add EditTablePlugin .6833 21.7000%

Arthur, I have noticed that a lot of CSSes are stripped down by removing extraneeous spaces and comments. I doubt this will make a vast difference but it might make some difference.

-- CrawfordCurrie - 21 Aug 2004

My feeling is that also these stats don't take browser caching into account. So they measure the load for the first page view on the site. Each following page view won't load the css as it is cached. So the numbers are in reality not that bad as they seem.

I have tried earlier to compress the css file by removing spaces, but this gave me only 0.5K on a total of 25K, at the cost of unreadability.

-- ArthurClemens - 21 Aug 2004

Many eyeball even out benchmarks, so I did my own. I used the geturl utility, which simply grabs a web page and ignores CSS files (which gets cached by the browser anyway).

  • Time is measured 5 times, the average is taken (only a few percentage variation; server is almost unused by other tasks)
  • Command used:
    time -p ./geturl myserver.com /users/pthoeny/cgi-bin/20040816/view/TWiki/WikiName > /dev/null
  • I disabled/enabled all Plugins in TWiki.cfg to see how the core code performs
  • Time is sum of real, user and sys; unit is seconds
  • Percentage is normalized to Athens release

Release Disabled Plugins With Preinstalled Plugins
  sec vs.
Athens
sec vs.
Athens
vs.
Classic
20011201 Athens 0.50 100% 0.51 100%  
20030201 Beijing 0.53 94% 0.57 89%  
20040119 0.56 89% 0.78 65%  
20040729 0.64 78% 0.90 57%  
20040816 classic 0.65 78% 0.91 56% 100%
20040816 dragon 0.66 76% 0.94 54% 97%
20040816 pattern 0.72 69% 0.99 52% 92%

Based on this we can read the following:

  • There is a big difference in performance with/without preinstalled Plugins
    • Interpretation: This is the price for added functionality
    • Action: Longer term we can find ways to improve the performance of Plugins
  • Without Plugins, performance droped by 11% for 20040119; between 20040119 and 20040816 there is another drop of 9%
    • Interpretation: The relative large 9% drop for recent feature enhancements means that there is (at this point unknown) feature that is a performance hog
    • Action: Investigate in DakarRelease
  • With pattern skin, performance drops by 8% compared to classic skin; Dragon skin drops by 3%
    • Interpretation: As Arthur discovered, the includes seem to have an impact on performance
    • Action: Find ways to improve performance. Possibly with simplified skin that has hardcoded headers and footers. Realistically this should be done in DakarRelease

Cairo should be released with Pattern skin enabled by default. The question is if the 8% drop can be justified. Taking this into perspective, half the performance in 3.5 years is not that bad considering that the performance of servers increased n-fold in the same time period. I'd vote for releasing Cairo with the current Pattern skin enabled.

Opinions?

-- PeterThoeny - 22 Aug 2004

I vote, for releasing, and then focussing on performance for a few weeks, and then releasing Dakar. (and pushing most of the features to Endinbouroughoughghgh (nope, still can't spell smile ))

-- SvenDowideit - 22 Aug 2004

Release, focus on performance for Dakar (which should release in weeks not years) and push features to Edinburgh (Sven,think of "Ed in burg" and add a "h". And pronounce it "Ed-in-burra". And thank your lucky stars we didn't call it eilean na coire choinneachan)

-- CrawfordCurrie - 22 Aug 2004

I don't know if I have a vote, but it would be for releasing now. Doing so will provide a fixed target for me to make some changes to my skin and plugin, and provide a larger pool of testers for Cairo. And, yeah, includes do affect skin performance (this is one reason the default DragonSkin configuration doesn't use includes).

-- ClaussStrauch - 22 Aug 2004

After more experimentation using the core benchmarks, there is no appreciable difference in core performance between 1612 and 1729; in fact, 1729 is fractionally faster.

Read the core numbers in the table above again. Cairo core performance is faster than Beijing. What we gained on the swings, we lost on the roundabouts (more complex shipped topics, skins).

Each plugin imposes a 1-3% performance hit, which seems to be largely down to the module load + BEGIN time. I'd vote for shipping without preinstalled plugins (I personally find it a nuisance anyway, as I de-install several of them as a matter of course in most installations). It would be much better to have a menu-driven optional enable.

-- CrawfordCurrie - 22 Aug 2004

OK, lets release Cairo with Pattern skin enabled by default.

Related, I just enabled the Pattern skin here on TWiki.org.

-- PeterThoeny - 22 Aug 2004

Crawford, I took the idea in your benchmark script, but I limited it to the command line and added the ability to specify a "fixture" (i.e: which page to test). Attached you'll find the script, some fixtures and a mock of the CGI module. If you find it useful I'll release it as a contrib.

-- RafaelAlvarez - 26 Aug 2004

I changed the way I do benchmarks to eliminate the random server element that makes it so hard to understand what is going on. I created a modified view script called benchmark that works in two modes; when invoked from the browser it writes the $query to a file, and when run from the command line it reads the $query from that file. This allows me to run benchmarks on a range of pages from the command line and eliminate the server altogether from the timings. I've been using this approach with DProf to analyse what is going on in the core code.

Looking at a blank page, around 30% of the core runtime is taken up with expanding preferences ( not loading preferences, which is only 3%, but expanding the %% vars ). A further 30% is taken up by expanding internal variables (handleCommonTags). The bulk of the remainder is time taken in BEGIN blocks (which should be eliminated by ModPerl and SpeedyCGI). Note that it's really hard to pin down specific code blocks, as the variation in the profile between runs is enormous. Basically, the twiki core is so fast that a sampling approach to profiling just doesn't work very well!

I rewrote the preference var expansion in several different ways (e.g. precompiling the REs), and reduced the impact considerably, but on a simple text-only page, expanding %% vars is still the biggest single time consumer.

I found it almost impossible to accelerate handleCommonTags without affecting the semantics of TWikiML.

Each initPlugin incurs a 1% to 3% overhead in core runtime. Most of this time is spent in extractNameValuePair. However on the whole, plugins are pretty efficient.

I tried using AutoLoader to accelerate the BEGIN blocks. However I just couldn't get it to work; perhaps a perl expert could try (WillNorris?)

Rafael, you are right, we could do with a place to hang benchmark scripts. However it's pretty specialist stuff, running benchmarks. We're probably OK just attaching the scripts to topics like this one. I think it would be good to have a benchmark topic that is updated for each and every release (alpha, beta and main), by running a set of standard scripts.

-- CrawfordCurrie - 27 Aug 2004

I failed to mention that the script I attached is supposed to be run from the command line. I use it mostly to have an idea of global performance impact in plugins and in the plugin handling mechanism.

About the topic to hang benchmarks... In PluginBenchmarks Peter suggest some "standard" pages to perform plugin benchmarks. We could define a set of BenchmarkTestPages (that could be distributed with twili) and a standard benchmark script so we can get consistent results across releases.

-- RafaelAlvarez - 27 Aug 2004

With the new benchmark method, how can I evaluate the speed difference between classic skin and pattern skin?

-- ArthurClemens - 28 Aug 2004

You can still make that comparison, but you have to take into account that there is a constant additional overhead for the browser to load the CSS. Since CSS's are cached in the browser this should be a one-time load. But at the end of the day, if your principal interest is user-perceived performance in browsers, then this discussion doesn't help you. Rafael and I are both focused on analysing the core code, rather than skins.

Note that IMHO ab is misleaing for benchmarking skin performance as well, as it doesn't model a browser's cache strategy. To really assess end user performance, you have to use a browser and a stopwatch. Sorry. Perhaps someone could write a Mozilla plugin that does it?

Rafael, I see what you have done. That's cool, I think it pretty much achieves the same thing as what I did. I created a topic AthensMarks that reports my results and publishes the scripts.

-- CrawfordCurrie - 29 Aug 2004

Crawford, I don't understand your comment "Cairo core performance is faster than Beijing." Certainly your own numbers above show the opposite. What am I missing?

-- ThomasWeigert - 29 Aug 2004

The benchmarks on this page are end-user benchmarks i.e. they measure the performance as viewed by a browser. Cairo performance is slower for an end user, I guess because of the amount of extra stuff they have to download and the browser has to parse. Core performance is the time taken for the TWiki core to generate the HTML for the page only, and is not shown on this page. For core performance, visit AthensMarks. In the core only, Cairo is faster than Beijing, mainly I think down to optimisations in the way plugins are discovered.

-- CrawfordCurrie - 30 Aug 2004

Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatEXT benchmark r2 r1 manage 1.2 K 2004-08-21 - 08:30 CrawfordCurrie  
Compressed Zip archivezip benchmarkview.zip r1 manage 3.2 K 2004-08-26 - 15:34 RafaelAlvarez a view benchmark script and CGI mock
Unknown file formatEXT experiment r1 manage 1.3 K 2004-08-21 - 08:28 CrawfordCurrie  
Edit | Attach | Watch | Print version | History: r40 < r39 < r38 < r37 < r36 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r40 - 2008-09-04 - TWikiJanitor
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.