Question
Hi,
I am having trouble running TWiki behind a proxy web server. I have one webserver that is directly connected to the Internet, does firewall/NAT, and also runs an Apache webserver with the following 2 lines in httpd.conf:
ProxyPass /twiki/ http://tsit.grigorians.org/twiki/
ProxyPassReverse /twiki/ http://tsit.grigorians.org/twiki/
The firewall machine is: grigorians.org, while the TWiki server is tsit.grigorians.org and is not accessible from the Internet.
What happens is that when I click on the "Get started" link off of the starting page, the resulting page has the following <base> value:
<base href="http://tsit.grigorians.org/twiki/bin/view/Main/WebHome" />
And even though all the links on the page are relative, clicking them results in a request to tsit.grigorians.org instead of just grigorians.org. All the images are also broken for the same reason. Also, the name of the internal host changes, if I use a not-fully-qualifies host name in
ProxyPass configuration. For example, if I configure
ProxyPass as:
ProxyPass /twiki/ http://tsit/twiki/
The resulting
tag will just have the host name "tsit" without the domain "grigorians.org"
At this point, I have tried a lot of different configurations, including setting the
ServerName on the internal box to that of the firewall box, changing the
UseCanonicalName, etc.
I have used mod_proxy before and it has always worked like a charm. I can't fingure out what the issue is now.
Thanks in advance for any help/pointers.
Arshavir
Environment
--
ArshavirGrigorian - 14 Aug 2003
Answer
You should change your
ProxyPass directives so that they are inside a
LocationMatch block, like this:
<LocationMatch "/twiki">
ProxyPass http://tsit.grigorians.org/twiki
ProxyPassReverse http://tsit.grigorians.org/twiki
</LocationMatch>
Or something like that...
--
FrancisLiu - 18 Aug 2003
Thanks for trying, Francis, but what you suggest is not a valid httpd.conf construct, and I am not quite sure why it would make a difference.
--
ArshavirGrigorian - 18 Aug 2003
If you will refer to
http://httpd.apache.org/docs-2.0/mod/mod_proxy.html#proxypass
, you'll see that it is valid syntax. But, yes you're correct, I've recompared the config I thought I had with the config that I really do have, and it won't do what you need...
Anyway, from the inside, do you need to see it as tsit.grigorians.org? What if your inside twiki thought it was running on grigorians.org? ie, use grigorians.org as the defaultUrlHost?
--
FrancisLiu - 19 Aug 2003
Well, I am actually using only Apache 1.3.28, not 2.0. The document you referenced is 2.0 specific.
Maybe it's time to switch ...
As for the defaultUrlHost variable, I have always had it set to "http://www.grigorians.org" but that doesn't seem to make one bit of difference ... the links are still generated using whatever I use in the
ProxyPass/ProxyPassReverse directives.
--
ArshavirGrigorian - 19 Aug 2003
I have exactly the same problem and was about to ask it when I saw your question from... yesterday!
For all I saw I guess that vanilla TWiki is simply unable to handle this.
Tomorrow I'm going to try a little tweek in &TWiki::initialize. I want to do something like this to set $urlHost:
if( ( $theUrl ) && ( $theUrl =~ /^([^\:]*\:\/\/[^\/]*)(.*)\/.*$/ ) && ( $2 ) ) {
if( $doGetScriptUrlFromCgi ) {
$scriptUrlPath = $2;
}
$urlHost = $1;
if( $urlHost =~ $proxyPrefix ) {
$urlHost = $defaultUrlHost;
}
if( $doRemovePortNumber ) {
$urlHost =~ s/\:[0-9]+$//;
}
} else {
$urlHost = $defaultUrlHost;
}
My intention is that $proxyPrefix is the prefix used in the last argument of the ProxyPass directive.
Well, I'm at home now and can't test it from here. I'll let you know if it works tomorrow.
--
GustavoChaves - 20 Aug 2003
OK, I just did it and it seems to work out fine. I made two small modifications on the files
lib/TWiki.cfg and
lib/TWiki.pm.
On
TWiki.cfg I inserted the following lines just below the one defining the
$defaultUrlHost variable (but it could be anywhere):
# URL convertion for TWiki when accessed by a reverse proxy.
$proxiedUrlMap = [ "http://java.cpqd.com.br" => "http://www.cpqd.com.br" ];
The default value for this variable should be
undef. In my case, I put the following line on the
httpd.conf of my Apache:
RewriteRule ^/twiki(.*) http://java.cpqd.com.br/twiki$1 [P,L]
Since my site is at
http://www.cpqd.com.br/ this way I configured it to make a reverse proxy from
http://www.cpqd.com.br/twiki... to
http://java.cpqd.com.br/twiki....
java.cpqd.com.br is a machine in our internal network which is normally accessed with the defaultUrlHost of
http://wiki.cpqd.com.br/twiki.... What I want is to tell TWiki that IF it's accessed with another name (in this case,
java.cpqd.com.br) it should use a different base URL for the generated links.
OK, the modification in
TWiki.pm is this:
--- TWiki.pm.orig.20030201 2003-08-21 15:58:34.000000000 -0300
+++ TWiki.pm 2003-08-21 16:18:38.000000000 -0300
@@ -51,7 +51,7 @@
use vars qw(
$webName $topicName $includingWebName $includingTopicName
$defaultUserName $userName $wikiName $wikiUserName
- $wikiHomeUrl $defaultUrlHost $urlHost
+ $wikiHomeUrl $defaultUrlHost $proxiedUrlMap $urlHost
$scriptUrlPath $pubUrlPath $viewScript
$pubDir $templateDir $dataDir $logDir $twikiLibDir
$siteWebTopicName $wikiToolName $securityFilter $uploadFilter
@@ -354,6 +354,14 @@
$scriptUrlPath = $2;
}
$urlHost = $1;
+
+ if( $proxiedUrlMap ) {
+ my $len = length $proxiedUrlMap->[0];
+ if( $len && substr($urlHost, 0, $len) eq $proxiedUrlMap->[0] ) {
+ substr($urlHost, 0, $len) = $proxiedUrlMap->[1];
+ }
+ }
+
if( $doRemovePortNumber ) {
$urlHost =~ s/\:[0-9]+$//;
}
And it worked. (Right now I have a
RewriteCond just above the
RewriteRule directive telling it to disallow external references because I still have to block some content. So don't worry if you can't access it.)
What do you think? Does it work for you?
--
GustavoChaves - 21 Aug 2003
That certainly works and is a very flexible solution, in case one would want to proxy the TWiki server through different names.
Proxy A has:
ProxyPass /twiki/
http://java.cpqd.com.br/twiki/
Proxy B has:
ProxyPass /twiki/
http://perl.cpqd.com.br/twiki/
While both
http://java.cpqd.com.br/
and
http://perl.cpqd.com.br/
point to the same machine.
I am not sure when you would use something like this, though.
A simpler (but not as flexible) version of your solution could be the following:
--- /usr/local/src/twiki/lib/TWiki.pm 2003-02-01 19:55:21.000000000 -0500
+++ /usr/local/src/twiki_1/lib/TWiki.pm 2003-08-21 22:44:27.000000000 -0400
@@ -51,7 +51,7 @@
use vars qw(
$webName $topicName $includingWebName $includingTopicName
$defaultUserName $userName $wikiName $wikiUserName
- $wikiHomeUrl $defaultUrlHost $urlHost
+ $wikiHomeUrl $defaultUrlHost $proxiedUrl $urlHost
$scriptUrlPath $pubUrlPath $viewScript
$pubDir $templateDir $dataDir $logDir $twikiLibDir
$siteWebTopicName $wikiToolName $securityFilter $uploadFilter
@@ -354,6 +354,14 @@
$scriptUrlPath = $2;
}
$urlHost = $1;
+
+ if( $proxiedUrl ) {
+ my $len = length $proxiedUrl;
+ if( $len && substr($urlHost, 0, $len) eq $proxiedUrl ) {
+ substr($urlHost, 0, $len) = $defaultUrlHost;
+ }
+ }
+
if( $doRemovePortNumber ) {
$urlHost =~ s/\:[0-9]+$//;
}
while TWiki.cfg would have:
$proxiedUrl =
http://tsit.grigorians.org
What do you think? Any thoughts from TWiki developers?
--
ArshavirGrigorian - 21 Aug 2003
I tried something like that first but stumbled on a problem. Suppose the remote client accesses the twiki with URLs beginning with
http://external.com/. My external apache has to sit on a machine accessible externally via the name
external.com and has to have a
ProxyPass directive like this:
ProxyPass /twiki/ http://internal.com/twiki/
The internal apache has to sit on a machine accessible via the name
internal.com. Moreover, the
$proxiedUrl variable has to be equal to
http://internal.com/. Then, the pages generated by TWiki will have internal links pointing to
http://internal.com/twiki/ which would travel all the way back to the remote client unchanged.
The problem is that the remote client doesn't know about
internal.com, all it knows is
external.com.
This is why I had to make it a mapping from the internal name to the external one.
$proxiedUrlMap tells TWiki that when it receives a request for something beginning with
http://internal.com/ it should generate pages with internal links pointing to
http://external.com/ so that the remote client can follow them.
Thinking about it, I guess one could make your simplified solution work
if one modifies the way the proxy machine resolves names. My external proxy resolves names from the
cpqd.com.br zone talking to our external DNS nameservers, which doesn't know about the names of internal machines. If I make it use some of our internal nameservers, it would resolve names from the
cpqd.com.br zone talking to a server which knows the internal machines names. This would solve the problem, but at the expense of yet another external configuration dependency. I don't think it's worth it.
Does all of this make sense to you?
--
GustavoChaves - 25 Aug 2003
Well, not quite.
As I mentioned in my previous post, $proxiedUrl should be set to the
http://internal.com/, not
http://external.com/, because when the connection is forwarded from the proxy/external host (
http://external.org/) to the internal host (=http://internal.com/), the external/proxy server will rewrite the URL from
http://external.org/twiki/bin/view/etc to
http://internal.org/twiki/bin/view/etc. And this is what
http://internal.com/ will see. When the internal server sees that URL matches the $proxiedUrl, it will know that the connection is being proxied, so it'd better use $defaultUrlHost to build the links.
As for accessing internally via a different internal URL (
http://wiki.cpqd.com.br/twiki...), it should still work, because that's (
http://wiki.cpqd.com.br/) what the internal server will use to build the links.
(since
http://wiki.cpqd.com.br/ will not match the value of $proxiedUrl).
I am not sure I understand your explanation of DNS servers. It seems to me that the proxy server will have to know about the internal server (
http://internal.org/) otherwise it cannot proxy connections to it. After all, proxying a connection means that your proxy has to be able to connect to the internal machine.
Am I missing something?
--
ArshavirGrigorian 28 Aug 2003
I see. By setting $defaultURL to
http://external.com/ you doesn't need the explicit mapping. Fine.
Which leads me to another question: what is the purpose of $defaultURL in the first place?
Regarding the DNS discussion, the problem is that
internal.com has a private IP and is not registered in our external DNS zone. There are a few different ways to solve this problem. In our case, the
/etc/hosts file of the proxy contains a record for
internal.com. But this has little to do with the problem of this topic.
--
GustavoChaves - 02 Sep 2003
Well, from I've been able to see through a quick grep through the code, this is the only place
that $defaultUrl is used:
if( ( $theUrl ) && ( $theUrl =~ /^([^\:]*\:\/\/[^\/]*)(.*)\/.*$/ ) && ( $2 ) ) {
if( $doGetScriptUrlFromCgi ) {
$scriptUrlPath = $2;
}
$urlHost = $1;
if( $doRemovePortNumber ) {
$urlHost =~ s/\:[0-9]+$//;
}
} else {
$urlHost = $defaultUrlHost;
}
which leaves me puzzled, too, as far as what $defaultUrl is for (since there is always going to be $theUrl.
Am I missing something?
--
ArshavirGrigorian - 02 Sep 2003
The TWiki libs are not just called from CGI-scripts, they are also used by shell scripts like mailnotify. If used by a CGI-script, the URL host is taken from the CGI environment, this allows you to access the same installation, say, under http and https. The
$defaultUrlHost setting is reserved for shell scripts.
--
PeterThoeny - 03 Sep 2003
OK. Getting back to the original question, what is the purpose of having a
base tag in the topic's header and some absolute references in its body? These things make it difficult to put TWiki behind a reverse proxy. If all links in the topic were relatives the problem wouldn't exist.
--
GustavoChaves - 09 Sep 2003
TWiki allows you to omit parts of the URL (e.g. the topic name, giving you
WebHome, or the web name as well, giving you
Main.WebHome) - the result is that purely relative URLs would not always work (depending on how you got to the page, or at least to
WebHome). The BASE tag helps work around this, but TWiki doesn't implement this consistently. And in fact, if the
WebHome page is written or rendered carefully (particularly
Main.WebHome) it might be possible to avoid using the BASE tag completely.
See
Google:twiki+base+tag
for quite a lot of earlier discussion on this.
--
RichardDonkin - 09 Sep 2003
I found most interesting the topic
WhyBaseTag. However, despite the fact that it is tagged as
BugRejected, it seems to be inconclusive. I guess the last anonymous remarks sumarize the problem quite well, i.e., if all links were purely relative (without even an absolute path) it should work.
Another interesting topic is
RelativeURLs, in which
PeterThoeny explains the need for absolute URLs citing
PageRedirectionNotWorking, in which this need is justified on the grounds that "some Perl environments require a complete URL (including host name)". I'm definitely not a HTTP master, so don't take too seriously what I'm going to say, but I think his rationale is mistaken. The book "CGI Programming with Perl" (2nd edition) says on page 53:
[redirects to] an absolute URL or to a relative URL with a relative path is sent back to the browser, which then creates another request for the new URL. A relative URL with a full path produces an internal redirect. An internal redirect is handled by the web server without talking to the browser. It gets the content of the new resource as if it had received a new request, but it then returns the content for the new resource as if it is the output of your CGI script. This avoids a network response and request; the only difference to users is a faster response. The URL displayed by their browser does not change for internal redirects; it continues to show the URL of the original CGI script.
I think the problem cited on
PageRedirectionNotWorking was caused by the redirection to relative URLs with full paths. The solution taken was to substitute absolute URLs for them. I guess they could be solved by using relative URLs with relative paths instead. And then, there probably would not be any other need for absolute URLs, solving the reverse proxy problem as a side effect.
A final interesting topic (external) is
TwikiEnhancements
where Eli Mantel says it made some "enhancements" on TWiki, one of them being the removal of the "base href" tag. It's not clear if this led to any problems though.
--
GustavoChaves - 10 Sep 2003
I am having this same exact problem: our "content" server is behind the firewall. It is only accessable via its "external" name, TWiki is setting the URL for "pictures" (stuff in /twiki/pub) to the internal hostname. I only recently realized this when getting certificate errors for the internal server while trying to access the external server

.. I "worked around" the problem by "forcing" urlHost like this..
# initialize $urlHost and $scriptUrlPath
if( ( $theUrl ) && ( $theUrl =~ /^([^\:]*\:\/\/[^\/]*)(.*)\/.*$/ ) && ( $2 ) ) {
if( $doGetScriptUrlFromCgi ) {
$scriptUrlPath = $2;
}
$urlHost = $1;
if( $doRemovePortNumber ) {
$urlHost =~ s/\:[0-9]+$//;
}
# Added by TommyMcNeely to force urlHost because of rProxy
$urlHost = $defaultUrlHost;
} else {
$urlHost = $defaultUrlHost;
}
However, after reading this
ProxyConfiguration I have noticed that $defaultUrlHost was for shell scripts, which would explain why they don't work anymore :). I am running the latest beta (19 Jan 2004), and can see no mention of the word "proxy" anywhere in the config or TWiki.pm. Is something like this available in alpha code, or are we doomed to hack in our "external" host like I did above (perhaps using a different variable).. I kindof like the idea of having proxyHost .. then if ($proxyHost), see if proxyHost is in urlHost, if its not, set urlHost to
ProxyHost .. right?
--
TommyMcNeely - 08 Feb 2004
Back to the Problem with mod_proxy. mod_proxy receives the external HTTP request and forwards it to the configured internal server, thereby changing the
Host header of the request to match that of the URL of the
ProxyPass directive. This is exactly what's causing the problem. If the
Host header remained unchanged, TWiki would see the original (external) host and port of the request (i.e., theUrl would be the external URL) and would generate the links correctly with the external URL and port. For Apache 1.x, there doesn't seem to be a way to prevent mod_proxy from doing this. For Apache 2.x however, mod_proxy supports a new directive
ProxyPreserveHost, which does what it says: the
Host header is passed to the target server unmodified. This would solve the problem.
If you really want to fiddle with name resolution, the easiest way for an Apache 1.x proxy would be to tweak its
/etc/hosts to map the external hostname of the proxy (including the domain) to the internal server's IP address. This way, the
ProxyPass directive could use the
external URL (which would be converted to the internal IP address by /etc/hosts), and the internal server would receive the external URL, as desired. But this will not work in environments where such hostname mapping is not desired or not possible.
-- Thomas Schürger (
thomas@schuergerPLEASENOSPAM.com) - 12 May 2005
A very flexible solution
HTTP_X_FORWARDED_HOST is set by the Apache-Modules.
/usr/local/src/twiki_1/lib/TWiki.pm
if( ( $theUrl ) && ( $theUrl =~ m!^([^:]*://[^/]*)(.*)/.*$! ) && ( $2 ) ) {
if( $doGetScriptUrlFromCgi ) {
$scriptUrlPath = $2;
}
$urlHost = $1;
+ # URL convertion for TWiki when accessed by a reverse proxy.
+ # Tested with Apache2 (ProxyPass and RewriteRule)
+ if ( $ENV{'HTTP_X_FORWARDED_HOST'} ) {
+ $urlHost = "https://".$ENV{'HTTP_X_FORWARDED_HOST'};
+ }
if( $doRemovePortNumber ) {
Maybe someone find a solution for the "https".
--
DirkHeitzmann - 06 Oct 2005