use versus require
TWiki has always suffered from the problem of compilation time. Because perl is an interpreted language, the code has to be compiled at run-time. Since many TWiki servers don't use a
CGI accelerator (such as
mod_perl or
PersistentPerl) this compilation cost has to be paid for every invocation of a perl
CGI script. This problem has been a subject for my own research for some time now.
I recently went through the TWiki codebase changing a lot of
use pragmas to
require instead. The reason I did this is that I believe I finally understand how to use them in a
CGI environment. Rather than throwing this over the wall as black magic, I thought it would be helpful to perl developers in general to understand the rationale. Any perl monks out there may want to correct me; I'm still a perl newbie, even after 3 years of fairly intensive perl experience.
First, how do
use and
require differ? The
CamelBook is somewhat vague:
"Because the use declaration (in any form) implies a BEGIN block, the module is loaded (and any executable initialization code in it run) as soon as the use declaration is compiled, before the rest of the file is compiled. ..... If, on the other hand, you invoke require instead of use, you must explicitly qualify any invocation of routines within the required package. ..... In general, use is recommended over require because you get your error messages sooner."
In fact
use does quite a bit more work than this section implies. It is
mostly equivalent to doing a
BEGIN { require Module; import Module; }. The
BEGIN ensures it is done at compile time. The
require is always needed if you call symbols from
Module, but the
import is useless in TWiki, and is in fact a
bad idea. The function of
import is to drag any exported symbols from the module into the namespace of the calling module.
The approach recommended by the
CamelBook is to use
Exporter to export symbols to the the other package.
Let's say you have a package:
package Fred;
use Exporter 'import';
@EXPORT_OK = qw(bloggs);
sub bloggs {
}
If you do this, you can write
bloggs() (without the
Fred::).
package John;
use strict; # *always* use strict
use Fred 'bloggs';
sub smith {
bloggs();
}
But in TWiki we take the view that such namespace pollution is A Very Bad Thing, so we always explicitly qualify references to symbols in other modules. So the
import part of the
use is just wasted effort. On realising this, the first change I made was to eliminate
Exporter wherever possible, and change all
use calls at the top of TWiki modules to
require calls.
package Fred;
sub bloggs {
}
package John;
use strict; # *always* use strict
require Fred;
sub smith {
Fred::bloggs();
}
Amazingly, this achieved a runtime improvement of the view script over a simple page by about 4%!
So the rule is:
- Explicitly qualify all references to symbols in external modules, and use require and not use wherever possible.
The only exception to this rule is when we
use a module which doesn't have the same "green credentials" as the TWiki core. Examples are
Assert and
Error, both of which effectively extend the syntax of perl, and would be very clumsy to use without
Exporter.
The second argument for
require over
use is lazy compilation. Any time you execute a
require, the package gets compiled
even if you don't actually call any of the symbols in it. If you think about how TWiki works, you can see that for any given invocation of a TWiki script, only about a third of the codebase is actually called. If you
use a package, this package is compiled immediately at compile time, regardless of whether the
use statement is within a conditional or subroutine definition. So, the implicit
require in the
use is
always executed, potentially wasting compilation time on code that is never called.
This problem is addressed by using
require to perform
just in time compilation. This technique ensures that the code isn't compiled until you actually
know you are going to use it.
package John;
use strict; # *always* use strict
sub smith {
require Fred; # just-in-time
Fred::bloggs();
}
Once a package has been brought in by a
require, then the incremental cost of another
require of the same package is unmeasurably small, so you can afford to scatter
require around quite liberally.
A simple
require Fred; can not be used if the package name is not known in advance. In this case, runtime compilation is mandatory, and best achieved by
eval "require $module". Check the error condition
$@ if you want a graceful reaction if the module either isn't found or has compilation problems.
The rules are:
- Only use packages such as
Assert and Error which must export symbols into the callers namespace.
- If you know a package is always needed, then require it unconditionally as early as possible, and again in every package that uses it.
- Don't require a package until you are sure you need it, and never assume a package has already been required by a calling package.
- If you need to check compilation status of a package then use
eval "require Package" (or eval "use Package") and check $@
There is a script to help you with this
tools/check_requires.pl analyses
use and
require, and tries to identify problems.
By moving the
require calls into the code body this way I was able to reduce the runtime of the view script over a simple page by a further 6%.
On the flip side (there is always a flip side) this has made testing harder. Because compilation is lazy, there is a risk that a module never gets compiled during testing because the code path that would
require the module is never exercised. Without the unit test suite I wouldn't even have considered the lazy compilation changes, and even
with the test suite the risk is high, as evidenced by errors like
Bugs:Item4380
. However IMHO the appropriate response to this is to
increase the coverage of the unit tests - something which everyone involved in TWiki should be constantly striving to do.
--
Contributors: CrawfordCurrie - 19 Jul 2007
Discussion
In the paragraph above that begins "The second argument for
require over
use...", I believe the second sentence should say "Any packages you
use get compiled...." I'll leave it to Crawford to verify that; in any case, please feel free to delete this comment after you do so. Thanks.
--
DavidBright - 19 Jul 2007
No, it's correct.
use Fred is actually equivalent to
BEGIN{ require Fred; import Fred; }. I reworded it to try and make it clearer.
--
CrawfordCurrie - 19 Jul 2007
So, the argument is not only about "require over use" but also "require at the very last moment"?
--
RafaelAlvarez - 19 Jul 2007
I am surprised about the difference in run time between
use and
require. I guess this is not related to different run times of
use vs.
require but due to the fact that with
require, some modules are
not compiled at all during a
view? Did you check which of TWiki's features (attachments, forms, ...) pull in particularly expensive modules?
Generally, I'm not a fan of using
require instead of
use because it is sacrificing clarity. Per convention,
use statements are written at the beginning of a module, so that a casual reader can easily spot module dependencies. A better organisation of modules and packages would seem a better idea. On the other hand, this takes quite some effort without bringing user benefit, and so the performance gain outweighs such considerations.
I agree that there's no use within TWiki to import symbols. Importing is hardly ever necessary in object oriented Perl. Its main use is for convenience when calling utility routines which are not associated with any object, and even then it is often better to write the package name explicitly with every call.
A minor note about a profiling artefact triggered by
eval "use $module" vs.
eval "require $module": There is no difference between the two regarding when compilation takes place, since the
eval will postpone it until runtime anyway. There is, however, a difference in where the compilation time will be recorded if profiling with
Devel::DProf. Any time spent during execution of
use, whether within an
eval or not, will be added to the cumulative time of the
BEGIN block of the module. Execution of
require, on the other hand, will be correctly added to the cumulative time of the subroutine where the
require actually happens.
--
HaraldJoerg - 19 Jul 2007
Yes, with
require embedded in the code, approximately 1/2 of the codebase is never compiled for a simple, non-SEARCH, view. Also, by late-requiring, there is an opportunity for earlier feedback, as modules can start generating output even before all the code is compiled. Yes, I did try to see what the expensive features are; as I think I have said before, it's a "death of a thousand cuts". The most expensive "single non-essential feature" is SEARCH, and it's essential in all but a few pages. Note that use of SEARCH in skin construction can have a considerable impact on runtime.
Generally, I'm not a fan of using require instead of use because it is sacrificing clarity. Absolutely, me too, I totally agree. But perl isn't smart enough to do it otherwise. One possibility would be to keep the
use statements at the top of the module, but comment them out. e.g.
#use TWiki::Blah; # imported using require
Good point about the profiling. Personally I haven't had any success whatsoever using profilers on perl; all my numbers come from explicit instrumentation of the code using the Monitor module.
One tool which would be
immensely useful for performance analysis would be a coverage analyser. I tried to use
CPAN:Devel::Cover
, but just couldn't get it to work on TWiki
--
CrawfordCurrie - 21 Jul 2007
Later: I just ran Devel::Cover with the rewritten unit test framework, and it now works! So I guess I accidentally fixed whatever was foxing it before. Now thinking about a way to publish coverage stats.
--
CrawfordCurrie - 21 Jul 2007