Tags:
archive_me1Add my vote for this tag create new tag
, view all tags

Feature Proposal: Add getUrl to TWiki::Func

Motivation

TWiki has a TWiki::Net::getUrl to get a webpage via http. Extensions that need to retrieve web content do not have an official function to call. They do that by:

  • Reinventing the wheel (such HeadlinesPlugin)
  • Call the unofficial TWiki::Net::getUrl (such as BlackListPlugin) and pray that the function does not change (which just happened in 4.1)
  • Wait for an API function

Description and Documentation

Add the TWiki::Net::getUrl() funtionality to TWiki::Func for extensions to use.

Simple case:

    my $response = TWiki::Func::getExternalContent( $url );
    $text = $response->content() unless( $response->is_error() );

Impact and Available Solutions

WhatDoesItAffect: Plugins, Published API
AffectedExtensions:  
HaveQuickFixFor:  

Implementation

Currently implemented, but spec is under debate at this time.

-- Contributors: PeterThoeny

Discussion

Possibly simplify the parameters? I think a single URL is easier than separate parameters for protocol, host, post, path, user, password.

-- PeterThoeny - 12 Dec 2006

I think it would be much better to recommend plugins authors use LWP, for the following reasons:

  1. LWP handles https and many other protocols
  2. LWP handles post, so you can pass complex parameters. getUrl is fundamentally limited to the (ill-defined) url length limit, which is often a PITA for plugins authors.
  3. LWP handles remote authentication challenge/response cleanly
  4. LWP parses the response and handles status codes for you
  5. LWP is well tested, and is already available on most platforms
  6. Requires no change to the API
On the downside, as you have pointed out previously, LWP imposes significant excise.

-- CrawfordCurrie - 13 Dec 2006

Erm, TWiki::Net::getUrl is already using LWP if installed, with fallback to low level socket handling. TWiki is evolving into an OS for extensions and applications; the OS should provide a way to interact via http to the outside world.

-- PeterThoeny - 13 Dec 2006

You are missing my point. I want to ditch the internal sockets implementation of getUrl and switch over to using LWP exclusively. The current getUrl interface is a lowest common denominator and hides almost all of the "juicy bits" of LWP. As such it rapidly runs out of steam in real applications, and certainly can't be classed as adequate for an OS.

-- CrawfordCurrie - 14 Dec 2006

No, not all environments have LWP installed (notably Solaris), and we cannot arbitrarily raise the complexity of installing TWiki. Therefore we need a fallback. The current fallback of using internal sockets implementation decreases the likelihood of failed installation attempts.

-- PeterThoeny - 14 Dec 2006

Crawford added something to Func. Please share the details here, and please follow our TWikiRelease04x01Process. For now I added this topic to TWikiFeature04x02.

-- PeterThoeny - 07 Feb 2007

I have seen that the implemented function is TWiki::Func::GET(). What is the reason to not use the suggested TWiki::Func::getUrl() name? GET() is not descriptive, it does not mean anything in the Func context.

-- PeterThoeny - 08 Feb 2007

Thanks for adding it to the process docs. smile

The reason I avoided getUrl was simply that the xxxUrl meme is already established in Func to mean "get a Url" - for example, getScriptUrl, getPubUrl etc. I didn't want to introduce a function with a similar name, but utterly different semantics. GET suggested itself because it is an implementation of http GET.

Here's the doc I wrote. Note that the point about HTTP/1.0 URLs is hearsay, as I don't really understand why it can't be trusted, but I thought it safer to include that note. Perhaps someone more knowledgeable can advise.

=pod

getExternalResource( $url ) -> $response

Get whatever is at the other end of a URL (using an HTTP GET request). Will only work for encrypted protocols such as https if the LWP CPAN module is installed.

Note that the $url may have an optional user and password, as specified by the relevant RFC. Any proxy set in configure is honoured.

The $response is an object that is known to implement the following subset of the methods of LWP::Response. It may in fact be an LWP::Response object, but it may also not be if LWP is not available, so callers may only assume the following subset of methods is available:

code()
message()
header($field)
content()
is_error()
is_redirect()

Note that if LWP is not available, this function:

  1. can only really be trusted for HTTP/1.0 urls. If HTTP/1.1 or another protocol is required, you are strongly recommended to require LWP.
  2. Will not parse multipart content

In the event of the server returning an error, then is_error() will return true, code() will return a valid HTTP status code in the range 100..507 as specified in RFC 2616 and RFC 2518, and message() will return the message that was received from the server. In the event of a client-side error (e.g. an unparseable URL) then is_error() will return true and message() will return an explanatory message. code() will return 400 (BAD REQUEST).

You can identify valid HTTP status codes using the HTTP::Status CPAN module.

Note: Callers can easily check the availability of other HTTP::Response methods as follows:

my $response = TWiki::Func::getExternalResource($url);
if (!$response->is_error() && $response->isa('HTTP::Response')) {
    ... other methods of HTTP::Response may be called
} else {
    ... only the methods listed above may be called
}

Since: TWiki::Plugins::VERSION 1.12

=cut

-- CrawfordCurrie - 08 Feb 2007

I find the name GET() confusing. Here is why. The TWiki::Func is here for plugins to interact with the TWiki engine and with TWiki content. TWiki::Func::GET() suggest to "get" some kind of TWiki content, but what content? Since you do not seem to like the previously suggested descriptive TWiki::Func::getUrl(), how about TWiki::Func::getWebContent()? I find this even more descriptive.

-- PeterThoeny - 15 Feb 2007

The try/catch error handling is powerful in general. However, I find it too complex in this context since the programmer has to worry about two layers of error handling: The internal one with the is_error() method, and the try/catch one. For simplicity I suggest to rely just on methods for error handling, e.g. just is_error(), and possibly message(). That way it is as simple as:

    my $response = TWiki::Func::getWebContent( $url );
    if( $response->is_error() ) {
        print "%RED%" . $response->message . "%ENDCOLOR%\n";
    } else {
        print $response->content;
    }

AFAIK, no other TWiki::Func requires programmers to use an eval with a catch.

In any case, all methods need to be documented.

-- PeterThoeny - 15 Feb 2007

GET is the name used by LWP because it reflects the HTTP standard, and if other HTTP methods are subsequently added (such as POST) it sets the baseline of a naming standard. Personally I would expect getWebContent to return me the content of a TWiki Web, as "Web" is an accepted TWiki concept.

I considered all three options, including using is_success exclusively, and using exceptions exclusively. The LWP API, which uses is_error (and is_success) exclusively is confusing and inadequately documented in this respect, and using is_success would have involved reverse-engineering it to find out the status code of internal errors, some of which simply don't map to the sockets implementation. Using exceptions to report client side errors allows a much cleaner separation between the remote server HTTP status codes reflected by is_success, and the purely client side errors reported via exceptions.

On a general point, I really don't like the is_error reporting approach, as it relies on the caller checking the return status, which they rarely bother to do. Before I started using exceptions, almost all "error-status" returning functions had their return status ignored in the TWiki core. Further, using exceptions means you automatically handle all errors that are thrown by called functions - for example, in CPAN and other third-party modules and system calls that die, but don't bother to document that fact.

I would have changed all error reporting in Func entirely over to exceptions if I hadn't been concerned to minimise changes to the API. As it is, almost every Func function could be wrapped in a try..catch for error handling, if the author wants to do a completely thorough job of trapping errors. The TWiki core uses exceptions to report errors, which are by default trapped by UI.pm, but can be intercepted by a plugin author seeking to perform their own handling.

-- CrawfordCurrie - 15 Feb 2007

I am trying to find a win/win for all.

On function name, how about getWwwContent or getExternalContent?

On error handling, I feel it just complicates things. Look for example at what the BlackListPlugin needs to do in order to remain compatible with all TWiki versions. Example without a try/catch:

    if( $TWiki::Plugins::VERSION < 1.1 ) {
        # TWiki 01 Sep 2004 and older
        $text = TWiki::Net::getUrl( $host, $port, $path );
    } elsif( $TWiki::Plugins::VERSION < 1.11 ) {
        # TWiki 4.0
        $text = $TWiki::Plugins::SESSION->{net}->getUrl( $host, $port, $path );
    } elsif( $TWiki::Plugins::VERSION < 1.12 ) {
        # TWiki 4.1
        $text = TWiki::Plugins::SESSION->{net}->getUrl( 'http', $host, $port, $path );
    } else {
        # TWiki 4.2
        my $response = TWiki::Func::getExternalContent( "http://$host:$port/$path" );
        $text = $response->content() unless( $response->is_error() );
    }

Now, if you add the try/catch it gets even more complex; especially because you have to handle two layers of error reporting.

I find it good that TWiki is using try/catch internally in the core. Along the same line, the try/catch can be handled internally in TWiki::Func::getExternalContent() so that the plugin programmer has an easier task interacting with TWiki. This is also for concistency with the current Func API.

-- PeterThoeny - 16 Feb 2007

Continuing trying to find a win/win for all with this proposal: By default, for KISS do only single layer of error handling. For those programmers who would like to use nested error handling (is_error() and try/catch), add a parameter specifing that. For example:
my $response = TWiki::Func::getExternalContent( $url, 1 );
where 1 means "use two layer error reporting with try/catch."

-- PeterThoeny - 17 Feb 2007

No way am I adding an option for different error handling!

However I think you are right on the error handling. Using the is_error API is easier. We can embed the try..catch into the socket function (though translating from an exception to an error status is going to be "interesting").

On the function name; well, I looked around the web to see what names are used in other APIs and there really is no consistency (unsurprisingly, I suppose). However, if you think what URL actually stands for (Universal Resource Locator) then the logic behind the most commonly used name, getResource, is clear. But at the end of the day it really doesn't matter as any plugin author will have to read the API doc anyway, so the name doesn't have to be much more that unique. Content doesn;t work for me, because the functionr eturns a lot more than content. getExternalResource is a happy compromise.

-- CrawfordCurrie - 18 Feb 2007

Agreed on both, I am glad we found a solution that satisfies our needs.

-- PeterThoeny - 18 Feb 2007

Since consensus was reached and no open concerns are left - I did not evaluate that this needed a release meeting decision.

-- KennethLavrsen - 09 Apr 2007

Edit | Attach | Watch | Print version | History: r17 < r16 < r15 < r14 < r13 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r17 - 2007-04-25 - KennethLavrsen
 
  • Learn about TWiki  
  • Download TWiki
This site is powered by the TWiki collaboration platform Powered by Perl Hosted by OICcam.com Ideas, requests, problems regarding TWiki? Send feedback. Ask community in the support forum.
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.