Tags:
create new tag
view all tags

Question

Hi,

I am having trouble running TWiki behind a proxy web server. I have one webserver that is directly connected to the Internet, does firewall/NAT, and also runs an Apache webserver with the following 2 lines in httpd.conf:

ProxyPass /twiki/ http://tsit.grigorians.org/twiki/
ProxyPassReverse /twiki/ http://tsit.grigorians.org/twiki/
The firewall machine is: grigorians.org, while the TWiki server is tsit.grigorians.org and is not accessible from the Internet.

What happens is that when I click on the "Get started" link off of the starting page, the resulting page has the following <base> value:

<base href="http://tsit.grigorians.org/twiki/bin/view/Main/WebHome" />
And even though all the links on the page are relative, clicking them results in a request to tsit.grigorians.org instead of just grigorians.org. All the images are also broken for the same reason. Also, the name of the internal host changes, if I use a not-fully-qualifies host name in ProxyPass configuration. For example, if I configure ProxyPass as:
ProxyPass /twiki/ http://tsit/twiki/
The resulting tag will just have the host name "tsit" without the domain "grigorians.org"

At this point, I have tried a lot of different configurations, including setting the ServerName on the internal box to that of the firewall box, changing the UseCanonicalName, etc. I have used mod_proxy before and it has always worked like a charm. I can't fingure out what the issue is now.

Thanks in advance for any help/pointers.

Arshavir

Environment

TWiki version: TWikiRelease01Feb2003
TWiki plugins: DefaultPlugin, EmptyPlugin, InterwikiPlugin
Server OS: RedHat Linux 9.0, kernel 2.4.18-14
Web server: Apache 1.3.28
Perl version: 5.8.0
Client OS: RedHat Linux 7.2, kernel 2.4.18
Web Browser: Mozilla 1.4

-- ArshavirGrigorian - 14 Aug 2003

Answer

You should change your ProxyPass directives so that they are inside a LocationMatch block, like this:

   <LocationMatch "/twiki">
      ProxyPass http://tsit.grigorians.org/twiki
      ProxyPassReverse http://tsit.grigorians.org/twiki
   </LocationMatch>
Or something like that...

-- FrancisLiu - 18 Aug 2003

Thanks for trying, Francis, but what you suggest is not a valid httpd.conf construct, and I am not quite sure why it would make a difference.

-- ArshavirGrigorian - 18 Aug 2003

If you will refer to http://httpd.apache.org/docs-2.0/mod/mod_proxy.html#proxypass, you'll see that it is valid syntax. But, yes you're correct, I've recompared the config I thought I had with the config that I really do have, and it won't do what you need...

Anyway, from the inside, do you need to see it as tsit.grigorians.org? What if your inside twiki thought it was running on grigorians.org? ie, use grigorians.org as the defaultUrlHost?

-- FrancisLiu - 19 Aug 2003

Well, I am actually using only Apache 1.3.28, not 2.0. The document you referenced is 2.0 specific. Maybe it's time to switch ... As for the defaultUrlHost variable, I have always had it set to "http://www.grigorians.org" but that doesn't seem to make one bit of difference ... the links are still generated using whatever I use in the ProxyPass/ProxyPassReverse directives.

-- ArshavirGrigorian - 19 Aug 2003

I have exactly the same problem and was about to ask it when I saw your question from... yesterday! For all I saw I guess that vanilla TWiki is simply unable to handle this. Tomorrow I'm going to try a little tweek in &TWiki::initialize. I want to do something like this to set $urlHost:

    if( ( $theUrl ) && ( $theUrl =~ /^([^\:]*\:\/\/[^\/]*)(.*)\/.*$/ ) && ( $2 ) ) {
        if( $doGetScriptUrlFromCgi ) {
            $scriptUrlPath = $2;
        }
        $urlHost = $1;
        if( $urlHost =~ $proxyPrefix ) {
            $urlHost = $defaultUrlHost;
        }
        if( $doRemovePortNumber ) {
            $urlHost =~ s/\:[0-9]+$//;
        }
    } else {
        $urlHost = $defaultUrlHost;
    }

My intention is that $proxyPrefix is the prefix used in the last argument of the ProxyPass directive.

Well, I'm at home now and can't test it from here. I'll let you know if it works tomorrow.

-- GustavoChaves - 20 Aug 2003

OK, I just did it and it seems to work out fine. I made two small modifications on the files lib/TWiki.cfg and lib/TWiki.pm.

On TWiki.cfg I inserted the following lines just below the one defining the $defaultUrlHost variable (but it could be anywhere):

  #                   URL convertion for TWiki when accessed by a reverse proxy.
  $proxiedUrlMap    = [ "http://java.cpqd.com.br" => "http://www.cpqd.com.br" ];

The default value for this variable should be undef. In my case, I put the following line on the httpd.conf of my Apache:

  RewriteRule ^/twiki(.*) http://java.cpqd.com.br/twiki$1 [P,L]

Since my site is at http://www.cpqd.com.br/ this way I configured it to make a reverse proxy from http://www.cpqd.com.br/twiki... to http://java.cpqd.com.br/twiki.... java.cpqd.com.br is a machine in our internal network which is normally accessed with the defaultUrlHost of http://wiki.cpqd.com.br/twiki.... What I want is to tell TWiki that IF it's accessed with another name (in this case, java.cpqd.com.br) it should use a different base URL for the generated links.

OK, the modification in TWiki.pm is this:

--- TWiki.pm.orig.20030201      2003-08-21 15:58:34.000000000 -0300
+++ TWiki.pm    2003-08-21 16:18:38.000000000 -0300
@@ -51,7 +51,7 @@
 use vars qw(
         $webName $topicName $includingWebName $includingTopicName
         $defaultUserName $userName $wikiName $wikiUserName
-        $wikiHomeUrl $defaultUrlHost $urlHost
+        $wikiHomeUrl $defaultUrlHost $proxiedUrlMap $urlHost
         $scriptUrlPath $pubUrlPath $viewScript
         $pubDir $templateDir $dataDir $logDir $twikiLibDir
         $siteWebTopicName $wikiToolName $securityFilter $uploadFilter
@@ -354,6 +354,14 @@
             $scriptUrlPath = $2;
         }
         $urlHost = $1;
+
+       if( $proxiedUrlMap ) {
+           my $len = length $proxiedUrlMap->[0];
+           if( $len && substr($urlHost, 0, $len) eq $proxiedUrlMap->[0] ) {
+               substr($urlHost, 0, $len) = $proxiedUrlMap->[1];
+           }
+       }
+
         if( $doRemovePortNumber ) {
             $urlHost =~ s/\:[0-9]+$//;
         }

And it worked. (Right now I have a RewriteCond just above the RewriteRule directive telling it to disallow external references because I still have to block some content. So don't worry if you can't access it.)

What do you think? Does it work for you?

-- GustavoChaves - 21 Aug 2003

That certainly works and is a very flexible solution, in case one would want to proxy the TWiki server through different names.

Proxy A has:

ProxyPass /twiki/ http://java.cpqd.com.br/twiki/

Proxy B has:

ProxyPass /twiki/ http://perl.cpqd.com.br/twiki/

While both http://java.cpqd.com.br/ and http://perl.cpqd.com.br/ point to the same machine. I am not sure when you would use something like this, though.

A simpler (but not as flexible) version of your solution could be the following:

--- /usr/local/src/twiki/lib/TWiki.pm   2003-02-01 19:55:21.000000000 -0500
+++ /usr/local/src/twiki_1/lib/TWiki.pm 2003-08-21 22:44:27.000000000 -0400
@@ -51,7 +51,7 @@
 use vars qw(
         $webName $topicName $includingWebName $includingTopicName
         $defaultUserName $userName $wikiName $wikiUserName
-        $wikiHomeUrl $defaultUrlHost $urlHost
+        $wikiHomeUrl $defaultUrlHost $proxiedUrl $urlHost
         $scriptUrlPath $pubUrlPath $viewScript
         $pubDir $templateDir $dataDir $logDir $twikiLibDir
         $siteWebTopicName $wikiToolName $securityFilter $uploadFilter
@@ -354,6 +354,14 @@
             $scriptUrlPath = $2;
         }
         $urlHost = $1;
+
+       if( $proxiedUrl ) {
+           my $len = length $proxiedUrl;
+           if( $len && substr($urlHost, 0, $len) eq $proxiedUrl ) {
+                substr($urlHost, 0, $len) = $defaultUrlHost;
+            }
+       }
+
         if( $doRemovePortNumber ) {
             $urlHost =~ s/\:[0-9]+$//;
         }

while TWiki.cfg would have:

$proxiedUrl = http://tsit.grigorians.org

What do you think? Any thoughts from TWiki developers?

-- ArshavirGrigorian - 21 Aug 2003

I tried something like that first but stumbled on a problem. Suppose the remote client accesses the twiki with URLs beginning with http://external.com/. My external apache has to sit on a machine accessible externally via the name external.com and has to have a ProxyPass directive like this:

   ProxyPass /twiki/ http://internal.com/twiki/

The internal apache has to sit on a machine accessible via the name internal.com. Moreover, the $proxiedUrl variable has to be equal to http://internal.com/. Then, the pages generated by TWiki will have internal links pointing to http://internal.com/twiki/ which would travel all the way back to the remote client unchanged.

The problem is that the remote client doesn't know about internal.com, all it knows is external.com.

This is why I had to make it a mapping from the internal name to the external one. $proxiedUrlMap tells TWiki that when it receives a request for something beginning with http://internal.com/ it should generate pages with internal links pointing to http://external.com/ so that the remote client can follow them.

Thinking about it, I guess one could make your simplified solution work if one modifies the way the proxy machine resolves names. My external proxy resolves names from the cpqd.com.br zone talking to our external DNS nameservers, which doesn't know about the names of internal machines. If I make it use some of our internal nameservers, it would resolve names from the cpqd.com.br zone talking to a server which knows the internal machines names. This would solve the problem, but at the expense of yet another external configuration dependency. I don't think it's worth it.

Does all of this make sense to you?

-- GustavoChaves - 25 Aug 2003

Well, not quite.

As I mentioned in my previous post, $proxiedUrl should be set to the http://internal.com/, not http://external.com/, because when the connection is forwarded from the proxy/external host (http://external.org/) to the internal host (=http://internal.com/), the external/proxy server will rewrite the URL from http://external.org/twiki/bin/view/etc to http://internal.org/twiki/bin/view/etc. And this is what http://internal.com/ will see. When the internal server sees that URL matches the $proxiedUrl, it will know that the connection is being proxied, so it'd better use $defaultUrlHost to build the links.

As for accessing internally via a different internal URL (http://wiki.cpqd.com.br/twiki...), it should still work, because that's (http://wiki.cpqd.com.br/) what the internal server will use to build the links. (since http://wiki.cpqd.com.br/ will not match the value of $proxiedUrl).

I am not sure I understand your explanation of DNS servers. It seems to me that the proxy server will have to know about the internal server (http://internal.org/) otherwise it cannot proxy connections to it. After all, proxying a connection means that your proxy has to be able to connect to the internal machine.

Am I missing something?

-- ArshavirGrigorian 28 Aug 2003

I see. By setting $defaultURL to http://external.com/ you doesn't need the explicit mapping. Fine.

Which leads me to another question: what is the purpose of $defaultURL in the first place?

Regarding the DNS discussion, the problem is that internal.com has a private IP and is not registered in our external DNS zone. There are a few different ways to solve this problem. In our case, the /etc/hosts file of the proxy contains a record for internal.com. But this has little to do with the problem of this topic.

-- GustavoChaves - 02 Sep 2003

Well, from I've been able to see through a quick grep through the code, this is the only place that $defaultUrl is used:

    if( ( $theUrl ) && ( $theUrl =~ /^([^\:]*\:\/\/[^\/]*)(.*)\/.*$/ ) && ( $2 ) ) {
        if( $doGetScriptUrlFromCgi ) {
            $scriptUrlPath = $2;
        }
        $urlHost = $1;

        if( $doRemovePortNumber ) {
            $urlHost =~ s/\:[0-9]+$//;
        }
    } else {
        $urlHost = $defaultUrlHost;
    }

which leaves me puzzled, too, as far as what $defaultUrl is for (since there is always going to be $theUrl. Am I missing something?

-- ArshavirGrigorian - 02 Sep 2003

The TWiki libs are not just called from CGI-scripts, they are also used by shell scripts like mailnotify. If used by a CGI-script, the URL host is taken from the CGI environment, this allows you to access the same installation, say, under http and https. The $defaultUrlHost setting is reserved for shell scripts.

-- PeterThoeny - 03 Sep 2003

OK. Getting back to the original question, what is the purpose of having a base tag in the topic's header and some absolute references in its body? These things make it difficult to put TWiki behind a reverse proxy. If all links in the topic were relatives the problem wouldn't exist.

-- GustavoChaves - 09 Sep 2003

TWiki allows you to omit parts of the URL (e.g. the topic name, giving you WebHome, or the web name as well, giving you Main.WebHome) - the result is that purely relative URLs would not always work (depending on how you got to the page, or at least to WebHome). The BASE tag helps work around this, but TWiki doesn't implement this consistently. And in fact, if the WebHome page is written or rendered carefully (particularly Main.WebHome) it might be possible to avoid using the BASE tag completely.

See Google:twiki+base+tag for quite a lot of earlier discussion on this.

-- RichardDonkin - 09 Sep 2003

I found most interesting the topic WhyBaseTag. However, despite the fact that it is tagged as BugRejected, it seems to be inconclusive. I guess the last anonymous remarks sumarize the problem quite well, i.e., if all links were purely relative (without even an absolute path) it should work.

Another interesting topic is RelativeURLs, in which PeterThoeny explains the need for absolute URLs citing PageRedirectionNotWorking, in which this need is justified on the grounds that "some Perl environments require a complete URL (including host name)". I'm definitely not a HTTP master, so don't take too seriously what I'm going to say, but I think his rationale is mistaken. The book "CGI Programming with Perl" (2nd edition) says on page 53:

[redirects to] an absolute URL or to a relative URL with a relative path is sent back to the browser, which then creates another request for the new URL. A relative URL with a full path produces an internal redirect. An internal redirect is handled by the web server without talking to the browser. It gets the content of the new resource as if it had received a new request, but it then returns the content for the new resource as if it is the output of your CGI script. This avoids a network response and request; the only difference to users is a faster response. The URL displayed by their browser does not change for internal redirects; it continues to show the URL of the original CGI script.

I think the problem cited on PageRedirectionNotWorking was caused by the redirection to relative URLs with full paths. The solution taken was to substitute absolute URLs for them. I guess they could be solved by using relative URLs with relative paths instead. And then, there probably would not be any other need for absolute URLs, solving the reverse proxy problem as a side effect.

A final interesting topic (external) is TwikiEnhancements where Eli Mantel says it made some "enhancements" on TWiki, one of them being the removal of the "base href" tag. It's not clear if this led to any problems though.

-- GustavoChaves - 10 Sep 2003

I am having this same exact problem: our "content" server is behind the firewall. It is only accessable via its "external" name, TWiki is setting the URL for "pictures" (stuff in /twiki/pub) to the internal hostname. I only recently realized this when getting certificate errors for the internal server while trying to access the external server smile .. I "worked around" the problem by "forcing" urlHost like this..

    # initialize $urlHost and $scriptUrlPath
    if( ( $theUrl ) && ( $theUrl =~ /^([^\:]*\:\/\/[^\/]*)(.*)\/.*$/ ) && ( $2 ) ) {
        if( $doGetScriptUrlFromCgi ) {
            $scriptUrlPath = $2;
        }
        $urlHost = $1;
        if( $doRemovePortNumber ) {
            $urlHost =~ s/\:[0-9]+$//;
        }
        # Added by TommyMcNeely to force urlHost because of rProxy
        $urlHost = $defaultUrlHost;
    } else {
        $urlHost = $defaultUrlHost;
    }

However, after reading this ProxyConfiguration I have noticed that $defaultUrlHost was for shell scripts, which would explain why they don't work anymore :). I am running the latest beta (19 Jan 2004), and can see no mention of the word "proxy" anywhere in the config or TWiki.pm. Is something like this available in alpha code, or are we doomed to hack in our "external" host like I did above (perhaps using a different variable).. I kindof like the idea of having proxyHost .. then if ($proxyHost), see if proxyHost is in urlHost, if its not, set urlHost to ProxyHost .. right?

-- TommyMcNeely - 08 Feb 2004

Back to the Problem with mod_proxy. mod_proxy receives the external HTTP request and forwards it to the configured internal server, thereby changing the Host header of the request to match that of the URL of the ProxyPass directive. This is exactly what's causing the problem. If the Host header remained unchanged, TWiki would see the original (external) host and port of the request (i.e., theUrl would be the external URL) and would generate the links correctly with the external URL and port. For Apache 1.x, there doesn't seem to be a way to prevent mod_proxy from doing this. For Apache 2.x however, mod_proxy supports a new directive ProxyPreserveHost, which does what it says: the Host header is passed to the target server unmodified. This would solve the problem.

If you really want to fiddle with name resolution, the easiest way for an Apache 1.x proxy would be to tweak its /etc/hosts to map the external hostname of the proxy (including the domain) to the internal server's IP address. This way, the ProxyPass directive could use the external URL (which would be converted to the internal IP address by /etc/hosts), and the internal server would receive the external URL, as desired. But this will not work in environments where such hostname mapping is not desired or not possible.

-- Thomas Schürger (thomas@schuergerPLEASENOSPAM.com) - 12 May 2005

A very flexible solution


HTTP_X_FORWARDED_HOST is set by the Apache-Modules.

/usr/local/src/twiki_1/lib/TWiki.pm

    if( ( $theUrl ) && ( $theUrl =~ m!^([^:]*://[^/]*)(.*)/.*$! ) && ( $2 ) ) {
        if( $doGetScriptUrlFromCgi ) {
            $scriptUrlPath = $2;
        }
        $urlHost = $1;

+       # URL convertion for TWiki when accessed by a reverse proxy.
+       # Tested with Apache2 (ProxyPass and RewriteRule)
+       if ( $ENV{'HTTP_X_FORWARDED_HOST'} ) {
+           $urlHost = "https://".$ENV{'HTTP_X_FORWARDED_HOST'};
+       }

        if( $doRemovePortNumber ) {

Maybe someone find a solution for the "https".

-- DirkHeitzmann - 06 Oct 2005

Edit | Attach | Watch | Print version | History: r21 < r20 < r19 < r18 < r17 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r21 - 2005-10-06 - DirkHeitzmann
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2026 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.