Feature Proposal: Topic Diffs should support by-word comparison
Motivation
The current topic diff does not make it easy to see what changed. The output format is marginal for code, but quite unhelpful for the text that TWiki is oriented toward. I spent 30 minutes this morning trying to find what turned out to be a 2 word change in a 300 word paragraph - at least, as DIFF saw it.
Description and Documentation
Just highlight the insertions and deletions
It would be nice if the rdiff script used to compare topic revisions highlighted changes within each line.
With each paragraph in a topic being a very long line, and users who add a comma, delete a space, or fix a single word, it can be difficult to see what happened.
Examples
Currrent:
| Changed: |
< < | The web is represents each person as a primary topic that has a standardized format. These topics contain a form for vital statistics, a machine-generated |
> > | The web represents each person as a primary topic that has a standardized format. These topics contain a form for vital statistics, a machine-generated |
Desired:
Caution: the
WYSIWYG editor will remove the formatting in the following line - if you don't see red, try an earlier revision
|
|
| | The web is represents each person as a primary topic that has a standardized format. These topics contain a form for vital statistics, a machine-generated |
Impact
Implementation
I suppose I should just complain - but here's a solution:
I'm not sure that I understand all the subtleties of the renderer, but I do have a prototype that seems functional.
This patch:
* Is against TWiki 4.2.3
* implements renderstyle={sequentialbyword, sequentialbyline}.
byword is the default and what you get with renderstyle=sequential.
byline is for anyone who relies on the old style. One might argue that the the default should be backwards compatibility - but then I'd have had to patch the differences GUI. (Exercise for the reader who cares.)
* Modifies UI/RDiff.pm
* Adds UI/WDiff.pm
* Requires
CPAN's Text:::WordDiff
* Is a lot simpler than it appears - there are only 13 new/changed executable statements in RDiff and 7 in WDiff - the rest is indentation & comments.
* Could probably use some more testing.
You're welcome to take it as is, or send it through your new feature process. It seems a simple enough patch that the latter would be overkill.
As usual, I'm providing the technlogy - the release mechanics need to come from someone else.
Enjoy.
<verbatim> --- lib/TWiki/UI/RDiff.pm.4.2.3 2008-09-11 23:41:58.000000000 -0400
+++ lib/TWiki/UI/RDiff.pm 2009-01-01 10:17:56.000000000 -0500
@@ -244,13 +244,13 @@
#| Parameter: =$left= | the text blob before the opteration |
#| Parameter: =$right= | the text after the operation |
#| Return: =$result= | Formatted html text |
#| TODO: | this should move to Render.pm |
sub _renderSequential
{
- my ( $session, $web, $topic, $diffType, $left, $right ) = @_;
+ my ( $session, $web, $topic, $diffType, $left, $right, $seqByLine ) = @_;
my $result = '';
ASSERT($topic) if DEBUG;
#note: I have made the colspan 9 to make sure that it spans all columns (thought there are only 2 now)
if ( $diffType eq '-') {
$result .=
@@ -274,26 +274,40 @@
'Unchanged',
'Unchanged',
_renderCellData( $session, $right, $web, $topic ),
'u', '',
$session );
} elsif ( $diffType eq 'c') {
- $result .=
- _sequentialRow( '#D0FFD0',
- 'Changed',
- 'Deleted',
- _renderCellData( $session, $left, $web, $topic ),
- '-', '<',
- $session );
- $result .=
- _sequentialRow( undef,
- 'Changed',
- 'Added',
- _renderCellData( $session, $right, $web, $topic ),
- '+', '>',
- $session );
+ if( $seqByLine ) {
+ $result .=
+ _sequentialRow( '#D0FFD0',
+ 'Changed',
+ 'Deleted',
+ _renderCellData( $session, $left, $web, $topic ),
+ '-', '<',
+ $session );
+ $result .=
+ _sequentialRow( undef,
+ 'Changed',
+ 'Added',
+ _renderCellData( $session, $right, $web, $topic ),
+ '+', '>',
+ $session );
+ } else {
+ my $diff = word_diff( \$left, \$right, { STYLE=>'TWiki::UI::WDiff',
+ } );
+ $diff = _renderCellData( $session, $diff, $web, $topic );
+
+ $result .= CGI::Tr(CGI::td({bgcolor=>'#D0FFD0',
+ class=>"twikiDiffChangedHeader",
+ colspan=>9},
+ CGI::b( $session->i18n->maketext('Changed').': ')));
+ $result .=CGI::Tr(CGI::td( " " ),
+ CGI::td({class=>"twikiDiffChangedText", colspan=>9}, $diff));
+
+ }
} elsif ( $diffType eq 'l' && $left ne '' && $right ne '' ) {
$result .= CGI::Tr({bgcolor=>$format{l}[0],
class=>'twikiDiffLineNumberHeader'},
CGI::th({align=>'left',
colspan=>9},
($session->i18n->maketext('Line: [_1] to [_2]',$left,$right))
@@ -311,13 +325,13 @@
#| Parameter: =$diffArray= | array generated by parseRevisionDiff |
#| Parameter: =$renderStyle= | style of rendering { debug, sequential, sidebyside} |
#| Return: =$text= | output html for one renderes revision diff |
#| TODO: | move into Render.pm |
sub _renderRevisionDiff
{
- my( $session, $web, $topic, $sdiffArray_ref, $renderStyle ) = @_;
+ my( $session, $web, $topic, $sdiffArray_ref, $renderStyle, $seqByLine ) = @_;
#combine sequential array elements that are the same diffType
my @diffArray = ();
foreach my $ele ( @$sdiffArray_ref ) {
if( ( @$ele[1] =~ /^\%META\:TOPICINFO/ ) || ( @$ele[2] =~ /^\%META\:TOPICINFO/ ) ) {
# do nothing, ignore redundant topic info
@@ -348,26 +362,26 @@
}
if (( @$diff_ref[0] eq '-' ) && ( @$next_ref[0] eq '+' )) {
$diff_ref = ['c', @$diff_ref[1], @$next_ref[2]];
$next_ref = undef;
}
if ( $renderStyle eq 'sequential' ) {
- $result .= _renderSequential( $session, $web, $topic, @$diff_ref );
+ $result .= _renderSequential( $session, $web, $topic, @$diff_ref, $seqByLine );
} elsif ( $renderStyle eq 'sidebyside' ) {
$result .= CGI::Tr(CGI::td({ width=>'50%'}, ''),
CGI::td({ width=>'50%'}, ''));
$result .= _renderSideBySide( $session, $web, $topic, @$diff_ref );
} elsif ( $renderStyle eq 'debug' ) {
$result .= _renderDebug( @$diff_ref );
}
$diff_ref = $next_ref;
}
#don't forget the last one ;)
if ( $diff_ref ) {
if ( $renderStyle eq 'sequential' ) {
- $result .= _renderSequential ( $session, $web, $topic, @$diff_ref );
+ $result .= _renderSequential ( $session, $web, $topic, @$diff_ref, $seqByLine );
} elsif ( $renderStyle eq 'sidebyside' ) {
$result .= CGI::Tr(CGI::td({ width=>'50%'}, ''),
CGI::td({ width=>'50%'}, ''));
$result .= _renderSideBySide( $session, $web, $topic, @$diff_ref );
} elsif ( $renderStyle eq 'debug' ) {
$result .= _renderDebug( @$diff_ref );
@@ -388,13 +402,13 @@
invoked via the =UI::run= method.
Renders the differences between version of a TwikiTopic
| topic | topic that we are showing the differences of |
| rev1 | the higher revision |
| rev2 | the lower revision |
-| render | the rendering style {sequential, sidebyside, raw, debug} | (preferences) DIFFRENDERSTYLE, =sequential= |
+| render | the rendering style {sequential, sidebyside, raw, debug} | (preferences) DIFFRENDERSTYLE, =sequential= ; sequentialbyword (default)
or sequentialbyline |
| type | {history, diff, last} history diff, version to version, last version to previous | =history= |
| context | number of lines of context |
| skin | the skin(s) to use to display the diff |
TODO:
* add a {word} render style
* move the common CGI param handling to one place
@@ -412,12 +426,17 @@
TWiki::UI::checkWebExists( $session, $webName, $topic, 'diff' );
TWiki::UI::checkTopicExists( $session, $webName, $topic, 'diff' ); my $renderStyle = $query->param('render') ||
$session->{prefs}->getPreferencesValue( 'DIFFRENDERSTYLE' ) ||
'sequential';
+ my $seqByLine = ($renderStyle eq 'sequentialbyline');
+ $renderStyle =~ s/^sequential.*$/sequential/;
+ if( $renderStyle eq 'sequential' && !$seqByLine ) {
+ use Text::WordDiff;
+ }
my $diffType = $query->param('type') || 'history';
my $contextLines = $query->param('context');
unless( defined $contextLines ) {
$session->{prefs}->getPreferencesValue( 'DIFFCONTEXTLINES' );
$contextLines = 3 unless defined $contextLines;
}
@@ -491,13 +510,13 @@
# eliminate white space to prevent wrap around in HR table:
$rInfo =~ s/\s+/ /g;
$rInfo2 =~ s/\s+/ /g;
my $diffArrayRef = $session->{store}->getRevisionDiff(
$session->{user}, $webName, $topic, $r2, $r1, $contextLines );
$text = _renderRevisionDiff( $session, $webName, $topic,
- $diffArrayRef, $renderStyle );
+ $diffArrayRef, $renderStyle, $seqByLine );
$diff =~ s/%REVINFO1%/$rInfo/go;
$diff =~ s/%REVINFO2%/$rInfo2/go;
$diff =~ s/%TEXT%/$text/go;
$page .= $diff;
$r1 = $r1 - 1;
$r2 = $r2 - 1;
--- /dev/null 2008-04-26 06:27:25.709194980 -0400
+++ lib/TWiki/UI/WDiff.pm 2009-01-01 10:22:56.000000000 -0500
@@ -0,0 +1,43 @@
+package TWiki::UI::WDiff;
+
+use strict;
+use HTML::Entities qw(encode_entities);
+use vars qw($VERSION @ISA);
+
+$VERSION = '0.01';
+@ISA = qw(Text::WordDiff::Base);
+
+sub file_header {
+ return '';
+}
+
+sub same_items {
+ shift;
+ return encode_entities( join '', @_ );
+}
+
+sub delete_items {
+ shift;
+ return '<del style="background:#ff9999;" class="twikiDiffDeletedMarker">' . encode_entities( join'', @_ ) . '</del>';
+}
+
+sub insert_items {
+ shift;
+ return '<ins style="background:#ccccff;" class="twikiDiffAddedMarker">' . encode_entities( join'', @_ ) . '</ins>';
+}
+
+1;
+__END__
+
+
+=begin comment
+
+=head1 Copyright and License
+
+Copyright (c) 2005-2008 David Wheeler. Some Rights Reserved.
+Copyright (c) 2009 Timothe Litt. Some Rights Reserved.
+
+This module is free software; you can redistribute it and/or modify it under the
+same terms as Perl itself.
+
+=cut </verbatim>
--
Contributors: TimotheLitt - 01 Jan 2009
Discussion
Thank you Timothe for providing this feature request
and a patch! Line-only diff was bugging me for a long time, I think a solid word-by-word diff should be part of the core TWiki.
For inspiration, look at
DiffWordByWordAddOn (can't be used in tWiki distribution since it depends on Python) and
CompareRevisionsAddOn (word diff of page revisons.)
--
PeterThoeny - 01 Jan 2009
I looked at
CompareRevisions - it's pretty good, but not fully integrated. For example, mailer-contrib doesn't sense & use it, though pattern.viewtopicbuttons does. It requires the history plugin for a GUI - and then still leaves the rdiff plugin as the more topic actions default. Compare Revisions doesn't return 1 on load, and still uses the ineffecient common tags handler. So there's lots of work to be done. None particularly hard, but it's work.
As for inspiration, I'm done. This patch isn't perfect, but it seems to do pretty well for the cases it's encountered so far. I'm currently using it +
CompareRevisions + History. Seems to handle the various situations & was high return on small effort.
For a production version, I suggest integrating the three components into the core (including
MailerContrib), fixing the more topic actions screen, and perhaps improving the more topic actions GUI. The latter should default to comparing the last two revisions & should have a place to explicitly compare two numeric revisions.
But what I have is good enough for my present needs.
--
TimotheLitt - 05 Jan 2009