- id: perltidy
name: perltidy
description: Run the perltidy source code formatter on Perl source files
- minimum_pre_commit_version: 2.1.0
entry: perltidy --nostandard-output --backup-and-modify-in-place
args: [--standard-error-output, --backup-file-extension=/]
language: perl
types: [perl]
+ stages: [pre-commit, pre-merge-commit, pre-push, manual]
+ minimum_pre_commit_version: 3.2.0 # for "new" names in stages
# Perltidy open BUGS and LIMITATIONS
-You can help perltidy evolve into a better program. If you think you
-have hit a bug or weird behavior, or have a suggested improvement,
-please send a note to perltidy at users.sourceforge.net.
-
-This file only lists open bugs. For bugs which have been fixed,
-see the ChangeLog.
-
-## The --extrude and --mangle options can produce code with syntax errors
-
-The --extrude tries to put as many newlines in the formatted code as possible.
-The --mangle tries to remove as many newlines as possible. These options are
-very useful for stress testing perltidy (and Perl) but not so much for normal
-formatting. Occasionally they will produce code which Perl considers to have a
-syntax error. These problems often involve code where Perl is having to guess
-the tokenization based on whitespace. The given/when and switch/case
-statements are also particularly vulnerable to unusual line breaks and
-whitespace. This type of error should not normally occur in practice, but if
-it does it should be easy to fix the problem by rerunning perltidy with more
-normal parameters or by manually changing whitespace or newlines.
+This file only lists open bugs. For bugs which have been fixed, see the
+ChangeLog.
## The Pod:Html module has some bugs
-For the most part Pod::Html works very well and is very convenient because
-it part of the standard Perl distribution. But for example the following line
+Perltidy uses the module Pod::Html, and for the most part it works very well
+and is very convenient because it part of the standard Perl distribution. But
+for example the following line
=item B<< <Deck> = Session->new_cflt_deck; >>
```
print "${ \<<END1 }${ \<<END2 }";
-Hello
+Hello
END1
-World
+World
END2
```
Perltidy will not look for the here-doc targets within the quotes, so it
will not format the script correctly.
-## Latest Bug and Wishlist at CPAN:
+## Issues and Feature Requests
-For the latest list of bugs and feature requests at CPAN see:
-
-https://rt.cpan.org/Public/Dist/Display.html?Name=Perl-Tidy
+The most recent Issues and Feature requests can be seen [at GitHub](https://github.com/perltidy/perltidy)
# Perltidy Change Log
+## 2025 01 05
+
+ - If a file consists only of comments, then the starting indentation will
+ be guessed from the indentation of the first comment. Previously it would
+ be guessed to be zero. Parameter --starting-indentation-level=n can be
+ used to specify an indentation and avoid a guess. This issue can
+ arise when formatting a block of comments from within an editor.
+
+ - Added missing 'use File::Temp' for -html option. This was causing the
+ message: "Undefined subroutine &File::Temp::tempfile called at ..."
+ See git #176.
+
+ - A new parameter --dump-unique-keys, or -duk, dumps a list of hash keys
+ which appear to be used just once, and do not appear among the quoted
+ strings in a file. For example:
+
+ perltidy -duk File.pm >output.txt
+
+ This can help locate misspelled hash keys.
+
+ - Line breaks at long chains of method calls now break at all calls
+ with args in parens, as in this example from git #171
+
+ # Old default
+ sub bla_p( $value = 42 ) {
+ return Mojo::Promise->resolve($value)->then( sub { shift() / 2 } )
+ ->then( sub { shift() + 6 } )->then( sub { shift() / 2 } )
+ ->catch( sub { warn shift } );
+ }
+
+ # New default
+ sub bla_p( $value = 42 ) {
+ return Mojo::Promise->resolve($value)
+ ->then( sub { shift() / 2 } )
+ ->then( sub { shift() + 6 } )
+ ->then( sub { shift() / 2 } )
+ ->catch( sub { warn shift } );
+ }
+
+ - Parameter --break-at-old-method-breakpoints, or -bom, has been
+ updated to insure that it only applies to lines beginning with
+ method calls, as intended. Line breaks for all lines beginning with
+ '->', even non-method calls, can be retained by using
+ --keep-old-breakpoints_before='->'.
+
+ - Added parameter --multiple-token-tightness=s, or -mutt=s.
+ The default value --paren-tightness=1 adds space within the parens
+ if, and only if, the container holds multiple tokens. Some perltidy
+ tokens may be rather long, and it can be preferable to also space some of
+ them as if they were multiple tokens. This can be done with this parameter,
+ and it applies to parens as well as square brackets and curly braces.
+ For example, the default below has no space within the square brackets:
+
+ # perltidy
+ my $rlist = [qw( alpha beta gamma )];
+
+ Spaces can be obtained with:
+
+ # perltidy -mutt='q*'
+ my $rlist = [ qw( alpha beta gamma ) ];
+
+ The parameter -mutt='q*' means treat qw and similar quote operators as
+ multiple tokens. The manual has details; git #120 has another example.
+
+ - Added parameter --indent-leading-semicolon, -ils; see git #171. When
+ this is negated, a line with a leading semicolon does not get the extra
+ leading continuation indentation spaces (defined with -ci=n).
+
+ - Space around here doc delimiters follow spacing controls better. For
+ example, a space is now added before the closing paren here:
+
+ OLD: (without the here doc):
+ push( @script, <<'EOT');
+
+ NEW:
+ push( @script, <<'EOT' );
+
+ Also, any spaces between the '<<' and here target are removed (git #174):
+
+ OLD:
+ push( @script, << 'EOT');
+
+ NEW:
+ push( @script, <<'EOT' );
+
+ - Added parameter --break-at-trailing-comma-types=s, or -btct=s, where
+ s is a string which selects trailing commas. For example, -btct='f(b'
+ places a line break after all bare trailing commas in function calls.
+ The manual has details.
+
+ - Fix git #165, strings beginning with v before => gave an incorrect error
+ message.
+
+ - The parameter --add-lone-trailing-commas, -altc, is now on by default.
+ This will simplify input for trailing comma operations. Use
+ --noadd-lone-trailing-commas, or -naltc to turn it off.
+
+ - More edge cases for adding and deleting trailing commas are now handled
+ (git #156).
+
+ - A problem has been fixed in which the addition or deletion of trailing
+ commas with the -atc or -dtc flags did not occur due to early convergence
+ when the -conv flag was set (git #143).
+
+ - Added parameter --qw-as-function, or -qwaf, discussed in git #164.
+ When this parameter is set, a qw list which begins with 'qw(' is
+ formatted as if it were a function call with call args being a list
+ of comma-separated quoted items. For example, given this input:
+
+ @fields = qw( $st_dev $st_ino $st_mode $st_nlink $st_uid
+ $st_gid $st_rdev $st_size $st_atime $st_mtime $st_ctime
+ $st_blksize $st_blocks);
+
+ # perltidy -qwaf
+ @fields = qw(
+ $st_dev $st_ino $st_mode $st_nlink
+ $st_uid $st_gid $st_rdev $st_size
+ $st_atime $st_mtime $st_ctime $st_blksize
+ $st_blocks
+ );
+
+## 2024 09 03
+
+ - Add partial support for Syntax::Operator::In and Syntax::Keyword::Match
+ (see git #162).
+
+ - Add --timeout-in-seconds=n, or -tos=n. When the standard input supplies
+ the input stream, and the input has not been received within n seconds,
+ perltidy will end with a timeout message. The intention is to catch
+ a situation where perltidy is accidentally invoked without a file to
+ process and therefore waits for input from the system standard input
+ (stdin), which never arrives. The default is n=10.
+ This check can be turned off with -tos=0.
+
+ - Add parameter --closing-side-comment-exclusion-list=string, or
+ -cscxl=string, where string is a list of block types to exclude
+ for closing side comment operations. Also, closing side comments
+ now work for anonymous subs if a --closing-side-comment-list (-cscl)
+ is not specified, and when 'asub' is requested with -cscl=asub.
+ Use -cscxl=asub to prevent this.
+
+ - Include check for unused constants in --dump-unusual-variables and
+ --warn-variable-types (new issue type 'c'). Also expand checks to
+ cover variables introduced with 'use vars'.
+
+ - Include signature variables in --dump-unusual-variables and
+ --warn-variable-types; see git #158.
+
+ - Add logical xor operator ^^ available in perl version 5.40, as
+ noted in git #157.
+
+ - Keyword 'state' now has default space before a paren, like 'my'.
+ Previously there was no space and no control. So the default
+ is now "state ($x)". This space can be removed with -nsak='state'.
+
+ - Add options --add-lone-trailing-commas, -altc and
+ --delete-lone-trailing-commas, -dltc, to provide control over adding
+ and deleting the only comma in a list. See discussion in git #143
+ and the updated manual.
+
+ - Add options --dump-mismatched-returns (or -dmr) and
+ --warn-mismatched-returns (or -wmr). These options report function
+ calls where the number of values requested may disagree with sub
+ return statements. The -dump version writes the results for a single
+ file to standard output and exits:
+
+ perltidy -dmr somefile.pl >results.txt
+
+ The -warn version formats as normal but reports any issues as warnings in
+ the error file:
+
+ perltidy -wmr somefile.pl
+
+ The -warn version may be customized with the following additional
+ parameters if necessary to avoid needless warnings:
+
+ --warn-mismatched-return-types=s (or -wmrt=s),
+ --warn-mismatched-return-exclusion-list=s (or -wmrxl=s)
+
+ where 's' is a control string. These are explained in the manual.
+
+ - Updates for issue git #151:
+ (1) --warn-variable-types=u is now okay if a named file is processed.
+ (2) --warn-variable-exclusion-list=s now allows leading and/or
+ trailing * on variable names to allow a wildcard match. For example
+ -wvxl='*_unused' is okay and would match $var1_unused and $var2_unused.
+ (3) --dump-unusual-variables now outputs the filename.
+
+ - A option was added to filter unimplemented parameters from perltidy
+ configuration files, suggested in git #146. It works like this: if
+ a line in the config file begins with three dashes followed by a
+ parameter name (rather than two dashes), then the line will be removed
+ if the parameter is unknown. Otherwise, a dash will be removed to make
+ the line valid.
+
+ - Parameters --dump-mismatched-args (or -dma) and
+ --warn-mismatched-args (or -wma) have been updated to catch more
+ arg count issues.
+
+ - Fixed issue git #143, extend -add-trailing-commas to apply to a list
+ with just a fat comma.
+
+ - The minimum perl version is 5.8.1. Previously it was 5.8.0, which was
+ not correct because of the use of utf8::is_utf8.
+
+ - Fixed issue git #142, test failure installing on perl versions before
+ version 5.10. The error caused the new parameter
+ -interbracket-arrow-style=s not to work. Except for this limitation,
+ Version 20240511 will work on older perl versions.
+
+## 2024 05 11
+
+ - The option --valign-signed-numbers, or -vsn is now the default. It
+ was introduced in the previous release has been found to significantly
+ improve the overall appearance of columns of signed and unsigned
+ numbers. See the previous Change Log entry for an example.
+ This will change the formatting in scripts with columns
+ of vertically aligned signed and unsigned numbers.
+ Use -nvsn to turn this option off and avoid this change.
+
+ - The option --delete-repeated-commas is now the default.
+
+ It makes the following checks and changes:
+ - Repeated commas like ',,' are removed with a warning
+ - Repeated fat commas like '=> =>' are removed with a warning
+ - The combination '=>,' produces a warning but is not changed
+ These warnings are only output if --warning-output, or -w, is set.
+
+ Use --nodelete-repeated-commas, or -ndrc, to retain repeated commas.
+
+ - Previously, a line break was always made before a concatenated
+ quoted string, such as "\n", if the previous line had a greater
+ starting indentation. An exception is now made for a short concatenated
+ terminal quote. This keeps code a little more compact. For example:
+
+ # basic rule: break before "\n" here because '$name' has more indentation:
+ my $html = $this->SUPER::genObject( $query, $bindNode, $field . ":$var",
+ $name, "remove", "UNCHECKED" )
+ . "\n";
+
+ # modified rule: make an exception for a short terminal quote like "\n"
+ my $html = $this->SUPER::genObject( $query, $bindNode, $field . ":$var",
+ $name, "remove", "UNCHECKED" ) . "\n";
+
+ - The operator ``**=`` now has spaces on both sides by default. Previously,
+ there was no space on the left. This change makes its spacing the same
+ as all other assignment operators. The previous behavior can be obtained
+ with the parameter setting -nwls='**='.
+
+ - The option --file-size-order, or -fso is now the default. When
+ perltidy is given a list of multiple filenames to process, they
+ are sorted by size and processed in order of increasing size.
+ This can significantly reduce memory usage by Perl. This
+ option has always been used in testing, where typically several
+ jobs each operating on thousands of filenames are running at the
+ same time and competing for system resources. If this option
+ is not wanted for some reason, it can be deactivated with -nfso.
+
+ - In the option --dump-block-summary, the number of sub arguments indicated
+ for each sub now includes any leading object variable passed with
+ an arrow-operator call. Previously the count would have been decreased
+ by one in this case. This change is needed for compatibility with future
+ updates.
+
+ - Fix issue git #138 involving -xlp (--extended-line-up-parentheses).
+ When multiple-line quotes and regexes have long secondary lines, these
+ line lengths could influencing some spacing and indentation, but they
+ should not have since perltidy has no control over their indentation.
+ This has been fixed. This will mainly influence code which uses -xlp
+ and has long multi-line quotes.
+
+ - Add option --minimize-continuation-indentation, -mci (see git #137).
+ This flag allows perltidy to remove continuation indentation in some
+ special cases where it is not really unnecessary. For a simple example,
+ the default formatting for the following snippet is:
+
+ # perltidy -nmci
+ $self->blurt( "Error: No INPUT definition for type '$type', typekind '"
+ . $type->xstype
+ . "' found" );
+
+ The second and third lines are one level deep in a container, and
+ are also statement continuations, so they get indented by the sum
+ of the -i value and the -ci value. If this flag is set, the
+ indentation is reduced by -ci spaces, giving
+
+ # perltidy -mci
+ $self->blurt( "Error: No INPUT definition for type '$type', typekind '"
+ . $type->xstype
+ . "' found" );
+
+ This situation is relatively rare except in code which has long
+ quoted strings and the -nolq flag is also set. This flag is currently
+ off by default, but it could become the default in a future version.
+
+ - Add options --dump-mismatched-args (or -dma) and
+ --warn-mismatched-args (or -wma). These options look
+ for and report instances where the number of args expected by a
+ sub appear to differ from the number passed to the sub. The -dump
+ version writes the results for a single file to standard output
+ and exits:
+
+ perltidy -dma somefile.pl >results.txt
+
+ The -warn version formats as normal but reports any issues as warnings in
+ the error file:
+
+ perltidy -wma somefile.pl
+
+ The -warn version may be customized with the following additional parameters
+ if necessary to avoid needless warnings:
+
+ --warn-mismatched-arg-types=s (or -wmat=s),
+ --warn-mismatched-arg-exclusion-list=s (or -wmaxl=s), and
+ --warn-mismatched-arg-undercount-cutoff=n (or -wmauc=n).
+ --warn-mismatched-arg-overcount-cutoff=n (or -wmaoc=n).
+
+ These are explained in the manual.
+
+ - Add option --valign-wide-equals, or -vwe, for issue git #135.
+ Setting this parameter causes the following assignment operators
+
+ = **= += *= &= <<= &&= -= /= |= >>= ||= //= .= %= ^= x=
+
+ to be aligned vertically with the ending = all aligned. For example,
+ here is the default formatting of a snippet of code:
+
+ $str .= SPACE x $total_pad_count;
+ $str_len += $total_pad_count;
+ $total_pad_count = 0;
+ $str .= $rfields->[$j];
+ $str_len += $rfield_lengths->[$j];
+
+ And here is the same code formatted with -vwe:
+
+ # perltidy -vwe
+ $str .= SPACE x $total_pad_count;
+ $str_len += $total_pad_count;
+ $total_pad_count = 0;
+ $str .= $rfields->[$j];
+ $str_len += $rfield_lengths->[$j];
+
+ This option currently is off by default to avoid changing existing
+ formatting.
+
+ - Added control --delete-interbracket-arrows, or -dia, to delete optional
+ hash ref and array ref arrows between brackets as in the following
+ expression (see git #131)
+
+ return $self->{'commandline'}->{'arg_list'}->[0]->[0]->{'hostgroups'};
+
+ # perltidy -dia gives:
+ return $self->{'commandline'}{'arg_list'}[0][0]{'hostgroups'};
+
+ Added the opposite control --aia-interbracket-arrows, or -aia, to
+ add arrows. So applied to the previous line the arrows are restored:
+
+ # perltidy -aia
+ return $self->{'commandline'}->{'arg_list'}->[0]->[0]->{'hostgroups'};
+
+ The manual describes additional controls for adding and deleting
+ just selected interbracket arrows.
+
+## 2024 02 02
+
+ - Added --valign-signed-numbers, or -vsn. This improves the appearance
+ of columns of numbers by aligning leading algebraic signs. For example:
+
+ # perltidy -vsn
+ my $xyz_shield = [
+ [ -0.060, -0.060, 0. ],
+ [ 0.060, -0.060, 0. ],
+ [ 0.060, 0.060, 0. ],
+ [ -0.060, 0.060, 0. ],
+ [ -0.0925, -0.0925, 0.092 ],
+ [ 0.0925, -0.0925, 0.092 ],
+ [ 0.0925, 0.0925, 0.092 ],
+ [ -0.0925, 0.0925, 0.092 ],
+ ];
+
+ # perltidy -nvsn (current DEFAULT)
+ my $xyz_shield = [
+ [ -0.060, -0.060, 0. ],
+ [ 0.060, -0.060, 0. ],
+ [ 0.060, 0.060, 0. ],
+ [ -0.060, 0.060, 0. ],
+ [ -0.0925, -0.0925, 0.092 ],
+ [ 0.0925, -0.0925, 0.092 ],
+ [ 0.0925, 0.0925, 0.092 ],
+ [ -0.0925, 0.0925, 0.092 ],
+ ];
+
+ This new option works well but is currently OFF to allow more testing
+ and fine-tuning. It is expected to be activated in a future release.
+
+ - Added --dump-mixed-call-parens (-dmcp ) which will dump a list of
+ operators which are sometimes followed by parens and sometimes not.
+ This can be useful for developing a uniform style for selected operators.
+ Issue git #128. For example
+
+ perltidy -dmcp somefile.pl >out.txt
+
+ produces lines like this, where the first number is the count of
+ uses with parens, and the second number is the count without parens.
+
+ k:caller:2:1
+ k:chomp:3:4
+ k:close:7:4
+
+ - Added --want-call-parens=s (-wcp=s) and --nowant-call-parens=s (-nwcp=s)
+ options which will warn of paren uses which do not match a selected
+ style. The manual has details. But for example,
+
+ perltidy -wcp='&' somefile.pl
+
+ will format as normal but warn if any user subs are called without parens.
+
+ - Added --dump-unusual-variables (-duv) option to dump a list of
+ variables with certain properties of interest. For example
+
+ perltidy -duv somefile.pl >vars.txt
+
+ produces a file with lines which look something like
+
+ 1778:u: my $input_file
+ 6089:r: my $j: reused - see line 6076
+
+ The values on the line which are separated by colons are:
+
+ line number - the number of the line of the input file
+ issue - a single letter indicating the issue, see below
+ variable name - the name of the variable, preceded by a keyword
+ note - an optional note referring to another line
+
+ The issue is indicated by a letter which may be one of:
+
+ r: reused variable name
+ s: sigil change but reused bareword
+ p: lexical variable with scope in multiple packages
+ u: unused variable
+
+ This is very useful for locating problem areas and bugs in code.
+
+ - Added a related flag --warn-variable-types=string (-wvt=string) option
+ to warn if certain types of variables are found in a script. The types
+ are a space-separated string which may include 'r', 's', and 'p' but
+ not 'u'. For example
+
+ perltidy -wvt='r s' somefile.pl
+
+ will check for and warn if any variabls of type 'r', or 's' are seen,
+ but not 'p'. All possible checks may be indicated with a '*' or '1':
+
+ perltidy -wvt='*' somefile.pl
+
+ The manual has further details.
+
+ - All parameters taking integer values are now checked for
+ out-of-range values before processing starts. When a maximum or
+ maximum range is exceeded, the new default behavior is to write a
+ warning message, reset the value to its default setting, and continue.
+ This default behavior can be changed with the new parameter
+ --integer-range-check=n, or -irc=n, as follows:
+
+ n=0 skip check completely (for stress-testing perltidy only)
+ n=1 reset bad values to defaults but do not issue a warning
+ n=2 reset bad values to defaults and issue a warning [DEFAULT]
+ n=3 stop immediately if any values are out of bounds
+
+ The settings n=0 and n=1 are mainly useful for testing purposes.
+
+ - The --dump-block-summary (-dbs) option now includes the number of sub
+ args in the 'type' column. For example, 'sub(9)' indicates a sub
+ with 9 args. Subs whose arg count cannot easily be determined are
+ indicated as 'sub(*)'. The count does not include a leading '$self'
+ or '$class' arg.
+
+ - Added flag --space-signature-paren=n, or -ssp=n (issue git #125).
+ This flag works the same as the existing flag --space-prototype-paren=n
+ except that it applies to the space before the opening paren of a sub
+ signature instead of a sub prototype. Previously, there was no control
+ over this (a space always occurred). For example, given the following
+ line:
+
+ sub circle( $xc, $yc, $rad );
+
+ The following results can now be obtained, according to the value of n:
+
+ sub circle( $xc, $yc, $rad ); # n=0 [no space]
+ sub circle( $xc, $yc, $rad ); # n=1 [default; same as input]
+ sub circle ( $xc, $yc, $rad ); # n=2 [space]
+
+ The spacing in previous versions of perltidy corresponded to n=2 (always
+ a space). The new default value, n=1, will produce a space if and only
+ if there was a space in the input text.
+
+ - The --dump-block-summary option can report an if-elsif-elsif-.. chain
+ as a single line item with the notation -dbt='elsif3', for example,
+ where the '3' is an integer which specifies the minimum number of elsif
+ blocks required for a chain to be reported. The manual has details.
+
+ - Fix problem c269, in which the new -ame parameter could incorrectly
+ emit an else block when two elsif blocks were separated by a hanging
+ side comment (a very rare situation).
+
+ - When braces are detected to be unbalanced, an attempt is made to
+ localize the error by comparing the indentation at closing braces
+ with their actual nesting levels. This can be useful for files which
+ have previously been formatted by perltidy. To illustrate, a test was
+ made in which the closing brace at line 30644 was commented out in
+ a file with a total of over 62000 lines. The new error message is
+
+ Final nesting depth of '{'s is 1
+ The most recent un-matched '{' is on line 6858
+ ...
+ Table of nesting level differences at closing braces.
+ This might help localize brace errors if the file was previously formatted.
+ line: (brace level) - (level expected from old indentation)
+ 30643: 0
+ 30645: 1
+
+ Previously, the error file only indicated that the error in this case
+ was somewhere after line 6858, so the new table is very helpful. Closing
+ brace indentation is checked because it is unambiguous and can be done
+ very efficiently.
+
+ - The -DEBUG option no longer automatically also writes a .LOG file.
+ Use --show-options if the .LOG file is needed.
+
+ - The run time of this version with all new options in use is no greater
+ than that of the previous version thanks to optimization work.
+
+## 2023 09 12
+
+ - Fix for git #124: remove a syntax error check which could cause
+ an incorrect error message when List::Gather::gather was used.
+
+## 2023 09 09
+
+ - Added new parameters -wme, or --warn-missing-else, and -ame,
+ or --add-missing else. The parameter -wme tells perltidy to issue
+ a warning if an if-elsif-... chain does not end in an else block.
+ The parameter -ame tells perltidy to insert an else block at the
+ end of such a chain if there is none.
+
+ For example, given the following snippet:
+
+ if ( $level == 3 ) { $val = $global{'section'} }
+ elsif ( $level == 2 ) { $val = $global{'chapter'} }
+
+ # perltidy -ame
+ if ( $level == 3 ) { $val = $global{'section'} }
+ elsif ( $level == 2 ) { $val = $global{'chapter'} }
+ else {
+ ##FIXME - added with perltidy -ame
+ }
+
+ The resulting code should be carefully reviewed, and the ##FIXME comment
+ should be updated as appropriate. The text of the ##FIXME comment can be
+ changed with parameter -amec=s, where 's' is the comment to mark the new
+ else block. The man pages have more details.
+
+ - The syntax of the parameter --use-feature=class, or -uf=class, which
+ new in the previous release, has been changed slightly for clarity.
+ The default behavior, which occurs if this flag is not entered, is
+ to automatically try to handle both old and new uses of the keywords
+ 'class', 'method', 'field', and 'ADJUST'.
+ To force these keywords to only follow the -use feature 'class' syntax,
+ enter --use-feature=class.
+ To force perltidy to ignore the -use feature 'class' syntax, enter
+ --use-feature=noclass.
+
+ - Issue git #122. Added parameter -lrt=n1:n2, or --line-range-tidy=n1:n2
+ to limit tidy operations to a limited line range. Line numbers start
+ with 1. This parameter is mainly of interest to editing programs which
+ drive perltidy. The man pages have details.
+
+ - Some fairly rare instances of incorrect spacing have been fixed. The
+ problem was that the tokenizer being overly conservative in marking
+ terms as possible filehandles or indirect objects. This causes the space
+ after the possible filehandle to be frozen to its input value in order not
+ to introduce an error in case Perl had to guess. The problem was fixed
+ by having the tokenizer look ahead for operators which can eliminate the
+ uncertainty. To illustrate, in the following line the term ``$d`` was
+ previously marked as a possible filehandle, so no space was added after it.
+
+ print $d== 1 ? " [ON]\n" : $d ? " [$d]\n" : "\n";
+ ^
+
+ In the current version, the next token is seen to be an equality, so
+ ``$d`` is marked as an ordinary identifier and normal spacing rules
+ can apply:
+
+ print $d == 1 ? " [ON]\n" : $d ? " [$d]\n" : "\n";
+ ^
+
+ - This version runs 7 to 10 percent faster than the previous release on
+ large files, depending on options and file type. Much of the gain comes
+ from streamlined I/O operations.
+
+ - This version was stress-tested for many cpu hours with random
+ input parameters. No failures to converge, internal fault checks,
+ undefined variable references or other irregularities were seen.
+
+
+## 2023 07 01
+
+ - Issue git #121. Added parameters -xbt, or --extended-block-tightness,
+ and -xbtl=s, or --extended-block-tightness-list=s, to allow
+ certain small code blocks to have internal spacing controlled by
+ -bbt=n rather than -bt=n. The man pages have details.
+
+ - Issue git #118. A warning will be issued if a duplicate format-skipping
+ starting marker is seen within a format-skipping section. The same
+ applies to duplicate code-skipping starting markers within code-skipping
+ sections.
+
+ - Issue git #116. A new flag --valign-if-unless, -viu, was added to
+ allow postfix 'unless' terms to align with postfix 'if' terms. The
+ default remains not to do this.
+
+ - Fixed git #115. In the two most recent CPAN releases, when the
+ Perl::Tidy module was called with the source pointing to a file,
+ but no destination specified, the output went to the standard
+ output instead of to a file with extension ``.tdy``, as it should
+ have. This has been fixed.
+
+ - Fixed git #110, add missing documentation for new options
+ -cpb and -bfvt=n. These work in version 20230309 but the pod
+ documentation was missing and has been added.
+
+ - Fixed an undefined reference message when running with
+ --dump-block-summary on a file without any subs or other
+ selected block types.
+
+ - Add parameter -ipc, or --ignore-perlcritic-comments. Perltidy, by
+ default, will look for side comments beginning with ``## no critic`` and
+ ignore their lengths when making line break decisions, even if the user
+ has not set ``-iscl``. The reason is that an unwanted line break can
+ make these special comments ineffective in controlling ``perlcritic``.
+ The parameter -ipc can be set if, for some reason, this is not wanted.
+
+ - Some minor issues with continuation indentation have been fixed.
+ Most scripts will remain unchanged. The main change is that block
+ comments which occur just before a closing brace, bracket or paren
+ now have an indentation which is independent of the existence of
+ an optional comma or semicolon. Previously, adding or deleting
+ an optional trailing comma could cause their indentation to jump.
+ Also, indentation of comments within ternary statements has been
+ improved. For additional details see:
+
+ https://github.com/perltidy/perltidy/blob/master/docs/ci_update.md
+
+ - This version was stress-tested for many cpu hours with random
+ input parameters. No failures to converge, internal fault checks,
+ undefined variable references or other irregularities were seen.
+
+ - This version runs several percent faster than the previous release
+ on large files.
+
## 2023 03 09
- No significant bugs have been found since the last release to CPAN.
- Added a new option '--code-skipping', requested in git #65, in which code
between comment lines '#<<V' and '#>>V' is passed verbatim to the output
- stream without error checking. It is simmilar to --format-skipping
+ stream without error checking. It is similar to --format-skipping
but there is no error checking of the skipped code. This can be useful for
skipping past code which employs an extended syntax.
were added to request that old breakpoints be kept before or after
selected token types. For example, -kbb='=>' means that newlines before
fat commas should be kept.
-
+
- Fix git #44, fix exit status for assert-tidy/untidy. The exit status was
always 0 for --assert-tidy if the user had turned off all error messages with
the -quiet flag. This has been fixed.
- Add flag -maxfs=n, --maximum-file-size-mb=n. This parameter is provided to
- avoid causing system problems by accidentally attempting to format an
- extremely large data file. The default is n=10. The command to increase
+ avoid causing system problems by accidentally attempting to format an
+ extremely large data file. The default is n=10. The command to increase
the limit to 20 MB for example would be -mfs=20. This only applies to
files specified by filename on the command line.
- - Skip formatting if there are too many indentation level errors. This is
- controlled with -maxle=n, --maximum-level-errors=n. This means that if
+ - Skip formatting if there are too many indentation level errors. This is
+ controlled with -maxle=n, --maximum-level-errors=n. This means that if
the ending indentation differs from the starting indentation by more than
- n levels, the file will be output verbatim. The default is n=1.
+ n levels, the file will be output verbatim. The default is n=1.
To skip this check, set n=-1 or set n to a large number.
- A related new flag, --maximum-unexpected-errors=n, or -maxue=n, is available
- Add flag -xci, --extended-continuation-indentation, regarding issue git #28
This flag causes continuation indentation to "extend" deeper into structures.
- Since this is a fairly new flag, the default is -nxci to avoid disturbing
+ Since this is a fairly new flag, the default is -nxci to avoid disturbing
existing formatting. BUT you will probably see some improved formatting
- in complex data structures by setting this flag if you currently use -ci=n
- and -i=n with the same value of 'n' (as is the case if you use -pbp,
+ in complex data structures by setting this flag if you currently use -ci=n
+ and -i=n with the same value of 'n' (as is the case if you use -pbp,
--perl-best-practices, where n=4).
- Fix issue git #42, clarify how --break-at-old-logical-breakpoints works.
- Fix issue git #41, typo in manual regarding -fsb.
- - Fix issue git #40: when using the -bli option, a closing brace followed by
- a semicolon was not being indented. This applies to braces which require
+ - Fix issue git #40: when using the -bli option, a closing brace followed by
+ a semicolon was not being indented. This applies to braces which require
semicolons, such as a 'do' block.
- Added 'state' as a keyword.
- A better test for convergence has been added. When iterations are requested,
the new test will stop after the first pass if no changes in line break
- locations are made. Previously, file checksums were used and required at least two
- passes to verify convergence unless no formatting changes were made. With the new test,
- only a single pass is needed when formatting changes are limited to adjustments of
+ locations are made. Previously, file checksums were used and required at least two
+ passes to verify convergence unless no formatting changes were made. With the new test,
+ only a single pass is needed when formatting changes are limited to adjustments of
indentation and whitespace on the lines of code. Extensive testing has been made to
verify the correctness of the new convergence test.
- - Line breaks are now automatically placed after 'use overload' to
+ - Line breaks are now automatically placed after 'use overload' to
improve formatting when there are numerous overloaded operators. For
example
-
+
use overload
'+' => sub {
...
- A number of minor problems with parsing signatures and prototypes have
- been corrected, particularly multi-line signatures. Some signatures
- had previously been parsed as if they were prototypes, which meant the
+ been corrected, particularly multi-line signatures. Some signatures
+ had previously been parsed as if they were prototypes, which meant the
normal spacing rules were not applied. For example
-
+
OLD:
sub echo ($message= 'Hello World!' ) {
...;
}
- Numerous minor issues that the average user would not encounter were found
- and fixed. They can be seen in the more complete list of updates at
+ and fixed. They can be seen in the more complete list of updates at
https://github.com/perltidy/perltidy/blob/master/local-docs/BugLog.pod
## 2020 10 01
- - Robustness of perltidy has been significantly improved. Updating is recommended. Continual
- automated testing runs began about 1 Sep 2020 and numerous issues have been found and fixed.
+ - Robustness of perltidy has been significantly improved. Updating is recommended. Continual
+ automated testing runs began about 1 Sep 2020 and numerous issues have been found and fixed.
Many involve references to uninitialized variables when perltidy is fed random text and random
- control parameters.
+ control parameters.
- Added the token '->' to the list of alignment tokens, as suggested in git
#39, so that it can be vertically aligned if a space is placed before them with -wls='->'.
comments generated by the -csc parameter, a separating newline was missing.
The resulting script will not then run, but worse, if it is reformatted with
the same parameters then closing side comments could be overwritten and data
- lost.
+ lost.
This problem was found during automated random testing. The parameter
-scbb is rarely used, which is probably why this has not been reported. Please
upgrade your version.
- Added parameter --non-indenting-braces, or -nib, which prevents
- code from indenting one level if it follows an opening brace marked
+ code from indenting one level if it follows an opening brace marked
with a special side comment, '#<<<'. For example,
{ #<<< a closure to contain lexical vars
This is on by default. If your code happens to have some
opening braces followed by '#<<<', and you
- don't want this, you can use -nnib to deactivate it.
+ don't want this, you can use -nnib to deactivate it.
- Side comment locations reset at a line ending in a level 0 open
- block, such as when a new multi-line sub begins. This is intended to
+ block, such as when a new multi-line sub begins. This is intended to
help keep side comments from drifting to far to the right.
## 2020 08 22
- Fix RT #133166, encoding not set for -st. Also reported as RT #133171
- and git #35.
+ and git #35.
This is a significant bug in version 20200616 which can corrupt data if
perltidy is run as a filter on encoded text.
- Vertical alignment has been improved. Numerous minor issues have
been fixed.
- - Formatting with the -lp option is improved.
+ - Formatting with the -lp option is improved.
- Fixed issue git #32, misparse of bare 'ref' in ternary
- Added support for Switch::Plain syntax, issue git #31.
- - Fixed minor problem where trailing 'unless' clauses were not
+ - Fixed minor problem where trailing 'unless' clauses were not
getting vertically aligned.
- Added a parameter --logical-padding or -lop to allow logical padding
'teefile' call parameters. These output streams are rarely used but
they are now treated the same as any 'logfile' stream.
- - add option --break-at-old-semicolon-breakpoints', -bos, requested
+ - add option --break-at-old-semicolon-breakpoints', -bos, requested
in RT#131644. This flag will keep lines beginning with a semicolon.
- Added --use-unicode-gcstring to control use of Unicode::GCString for
- evaluating character widths of encoded data. The default is
+ evaluating character widths of encoded data. The default is
not to use this (--nouse-unicode-gcstring). If this flag is set,
- perltidy will look for Unicode::GCString and, if found, will use it
+ perltidy will look for Unicode::GCString and, if found, will use it
to evaluate character display widths. This can improve displayed
vertical alignment for files with wide characters. It is a nice
feature but it is off by default to avoid conflicting formatting
- when there are multiple developers. Perltidy installation does not
- require Unicode::GCString, so users wanting to use this feature need
+ when there are multiple developers. Perltidy installation does not
+ require Unicode::GCString, so users wanting to use this feature need
set this flag and also to install Unicode::GCString separately.
- Added --character-encoding=guess or -guess to have perltidy guess
- if a file (or other input stream) is encoded as -utf8 or some
- other single-byte encoding. This is useful when processing a mixture
+ if a file (or other input stream) is encoded as -utf8 or some
+ other single-byte encoding. This is useful when processing a mixture
of file types, such as utf8 and latin-1.
Please Note: The default encoding has been set to be 'guess'
- instead of 'none'. This seems like the best default, since
+ instead of 'none'. This seems like the best default, since
it allows perltidy work properly with both
utf8 files and older latin-1 files. The guess mode uses Encode::Guess,
- which is included in standard perl distributions, and only tries to
- guess if a file is utf8 or not, never any other encoding. If the guess is
- utf8, and if the file successfully decodes as utf8, then it the encoding
- is assumed to be utf8. Otherwise, no encoding is assumed.
- If you do not want to use this new default guess mode, or have a
- problem with it, you can set --character-encoding=none (the previous
+ which is included in standard perl distributions, and only tries to
+ guess if a file is utf8 or not, never any other encoding. If the guess is
+ utf8, and if the file successfully decodes as utf8, then it the encoding
+ is assumed to be utf8. Otherwise, no encoding is assumed.
+ If you do not want to use this new default guess mode, or have a
+ problem with it, you can set --character-encoding=none (the previous
default) or --character-encoding=utf8 (if you deal with utf8 files).
- Specific encodings of input files other than utf8 may now be given, for
## 2020 01 10
- This release adds a flag to control the feature RT#130394 (allow short nested blocks)
- introduced in the previous release. Unfortunately that feature breaks
+ introduced in the previous release. Unfortunately that feature breaks
RPerl installations, so a control flag has been introduced and that feature is now
off by default. The flag is:
- --one-line-block-nesting=n, or -olbn=n, where n is an integer as follows:
+ --one-line-block-nesting=n, or -olbn=n, where n is an integer as follows:
-olbn=0 break nested one-line blocks into multiple lines [new DEFAULT]
-olbn=1 stable; keep existing nested-one line blocks intact [previous DEFAULT]
- Fixed issue RT#131288: parse error for un-prototyped constant function without parenthesized
call parameters followed by ternary.
- - Fixed issue RT#131360, installation documentation. Added a note that the binary
- 'perltidy' comes with the Perl::Tidy module. They can both normally be installed with
+ - Fixed issue RT#131360, installation documentation. Added a note that the binary
+ 'perltidy' comes with the Perl::Tidy module. They can both normally be installed with
'cpanm Perl::Tidy'
- Fixed issue RT#130394: Allow short nested blocks. Given the following
$factorial = sub { reduce { $a * $b } 1 .. 11 };
-
+
Previous versions would always break the sub block because it
contains another block (the reduce block). The fix keeps
short one-line blocks such as this intact.
one or more aliases for 'sub', separated by spaces or commas.
For example,
- perltidy -sal='method fun'
+ perltidy -sal='method fun'
will cause the perltidy to treat the words 'method' and 'fun' to be
treated the same as if they were 'sub'.
- - Added flag --space-prototype-paren=i, or -spp=i, to control spacing
+ - Added flag --space-prototype-paren=i, or -spp=i, to control spacing
before the opening paren of a prototype, where i=0, 1, or 2:
i=0 no space
i=1 follow input [current and default]
i=2 always space
Previously, perltidy always followed the input.
- For example, given the following input
+ For example, given the following input
sub usage();
## 2019 09 15
- - fixed issue RT#130344: false warning "operator in print statement"
- for "use lib".
+ - fixed issue RT#130344: false warning "operator in print statement"
+ for "use lib".
- fixed issue RT#130304: standard error output should include filename.
- When perltidy error messages are directed to the standard error output
- with -se or --standard-error-output, the message lines now have a prefix
- 'filename:' for clarification in case multiple files
- are processed, where 'filename' is the name of the input file. If
- input is from the standard input the displayed filename is '<stdin>',
- and if it is from a data structure then displayed filename
+ When perltidy error messages are directed to the standard error output
+ with -se or --standard-error-output, the message lines now have a prefix
+ 'filename:' for clarification in case multiple files
+ are processed, where 'filename' is the name of the input file. If
+ input is from the standard input the displayed filename is '<stdin>',
+ and if it is from a data structure then displayed filename
is '<source_stream>'.
- implement issue RT#130425: check mode. A new flag '--assert-tidy'
has also been added. The next item, RT#130297, insures that the script
will exit with a non-zero exit flag if the assertion fails.
- - fixed issue RT#130297; the perltidy script now exits with a nonzero exit
- status if it wrote to the standard error output. Prevously only fatal
+ - fixed issue RT#130297; the perltidy script now exits with a nonzero exit
+ status if it wrote to the standard error output. Previously only fatal
run errors produced a non-zero exit flag. Now, even non-fatal messages
requested with the -w flag will cause a non-zero exit flag. The exit
flag now has these values:
- fixed issue git#13, needless trailing whitespace in error message
- fixed issue git#9: if the -ce (--cuddled-else) flag is used,
- do not try to form new one line blocks for a block type
+ do not try to form new one line blocks for a block type
specified with -cbl, particularly map, sort, grep
- iteration speedup for unchanged code. Previously, when iterations were
## 2019 06 01
- - rt #128477: Prevent inconsistent owner/group and setuid/setgid bits.
+ - rt #128477: Prevent inconsistent owner/group and setuid/setgid bits.
In the -b (--backup-and-modify-in-place) mode, an attempt is made to set ownership
of the output file equal to the input file, if they differ.
In all cases, if the final output file ownership differs from input file, any setuid/setgid bits are cleared.
- RT #123749, partial fix. "Continuation indentation" is removed from lines
with leading closing parens which are part of a call chain.
For example, the call to pack() is is now outdented to the starting
- indentation in the following experession:
+ indentation in the following expression:
# OLD
$mw->Button(
-it>1.
- Fixed bug where a line occasionally ended with an extra space. This reduces
- rhe number of instances where a second iteration gives a result different
+ the number of instances where a second iteration gives a result different
from the first.
- Updated documentation to note that the Tidy.pm module <stderr> parameter may
- Allow configuration file to be 'perltidy.ini' for Windows systems.
i.e. C:\Documents and Settings\User\perltidy.ini
- and added documentation for setting configuation file under Windows in man
+ and added documentation for setting configuration file under Windows in man
page. Thanks to Stuart Clark.
- Corrected problem of unwanted semicolons in hash ref within given/when code.
Thanks to Mark Olesen for suggesting this.
- -Improved alignement of '='s in certain cases.
+ -Improved alignment of '='s in certain cases.
Thanks to Norbert Gruener for sending an example.
-Outdent-long-comments (-olc) has been re-instated as a default, since
);
-Lists which do not format well in uniform columns are now better
- identified and formated.
+ identified and formatted.
OLD:
return $c->create( 'polygon', $x, $y, $x + $ruler_info{'size'},
to control what text is appended to 'else' and 'elsif' blocks.
Default is to just add leading 'if' text to an 'else'. See manual.
- -The -csc option now labels 'else' blocks with additinal information
+ -The -csc option now labels 'else' blocks with additional information
from the opening if statement and elsif statements, if space.
Thanks to Wolfgang Weisselberg for suggesting this.
'92', '94', '96', '98', '100', '102', '104'
);
- -Lists of complex items, such as matricies, are now detected
+ -Lists of complex items, such as matrices, are now detected
and displayed with just one item per row:
OLD:
if ( ( $tmp >= 0x80_00_00 ) || ( $tmp < -0x80_00_00 ) ) { }
-'**=' was incorrectly tokenized as '**' and '='. This only
- caused a problem with the -extrude opton.
+ caused a problem with the -extrude option.
-Corrected a divide by zero when -extrude option is used
COPYING
docs/BugLog.html
docs/ChangeLog.html
+docs/ci_update.md
docs/eos_flag.md
docs/index.html
docs/index.md
examples/bbtidy.pl
examples/break_long_quotes.pl
examples/delete_ending_blank_lines.pl
+examples/dump_unique_keys.pl
examples/ex_mp.pl
examples/filter_example.in
examples/filter_example.pl
examples/perlmask.pl
examples/perltidy_hide.pl
examples/perltidy_okw.pl
-examples/perltidyrc_dump.pl
examples/perlxmltok.pl
examples/pt.bat
examples/README
lib/Perl/Tidy.pm
lib/Perl/Tidy.pod
lib/Perl/Tidy/Debugger.pm
-lib/Perl/Tidy/DevNull.pm
lib/Perl/Tidy/Diagnostics.pm
lib/Perl/Tidy/FileWriter.pm
lib/Perl/Tidy/Formatter.pm
lib/Perl/Tidy/IndentationItem.pm
lib/Perl/Tidy/IOScalar.pm
lib/Perl/Tidy/IOScalarArray.pm
-lib/Perl/Tidy/LineBuffer.pm
-lib/Perl/Tidy/LineSink.pm
-lib/Perl/Tidy/LineSource.pm
lib/Perl/Tidy/Logger.pm
lib/Perl/Tidy/Tokenizer.pm
lib/Perl/Tidy/VerticalAligner.pm
lib/Perl/Tidy/VerticalAligner/Alignment.pm
lib/Perl/Tidy/VerticalAligner/Line.pm
Makefile.PL
-MANIFEST This list of files
+MANIFEST
pm2pl
README.md
t/.gitattributes
t/snippets26.t
t/snippets27.t
t/snippets28.t
+t/snippets29.t
t/snippets3.t
+t/snippets30.t
+t/snippets31.t
t/snippets4.t
t/snippets5.t
t/snippets6.t
t/testwide-tidy.t
t/testwide.pl.src
t/testwide.t
+t/zero.t
META.yml Module YAML meta-data (added by MakeMaker)
META.json Module JSON meta-data (added by MakeMaker)
"web" : "https://github.com/perltidy/perltidy"
}
},
- "version" : "20230309",
+ "version" : "20250105",
"x_serialization_backend" : "JSON::PP version 4.04"
}
resources:
bugtracker: https://github.com/perltidy/perltidy/issues
repository: https://github.com/perltidy/perltidy.git
-version: '20230309'
+version: '20250105'
x_serialization_backend: 'CPAN::Meta::YAML version 0.012'
#!/usr/bin/perl
package main;
-
use Perl::Tidy;
my $arg_string = undef;
if ( $^O =~ /Mac/ ) {
$arg_string = MacPerl::Ask(
'Please enter @ARGV (-h for help)',
- defined $ARGV[0] ? "\"$ARGV[0]\"" : ""
+ defined( $ARGV[0] ) ? "\"$ARGV[0]\"" : ""
);
}
=head1 EXAMPLES
+Here are some example perltidy commands followed by their meanings:
+
perltidy somefile.pl
This will produce a file F<somefile.pl.tdy> containing the script reformatted
which shows the nesting of braces, parentheses, and square brackets at
the start of every line.
+ perltidy -dbs -dbl=10 somefile.pl >blocks.csv
+
+This will dump a table of comma-separated metrics for subroutines longer than
+10 lines to F<blocks.csv>.
+
+ perltidy -duv somefile.pl >vars.txt
+
+This will dump a list of unused and reused lexical variable names to
+F<vars.txt>.
+
perltidy -html somefile.pl
This will produce a file F<somefile.pl.html> containing the script with
html markup. The output file will contain an embedded style sheet in
-the <HEAD> section which may be edited to change the appearance.
+the C<<HEAD>> section which may be edited to change the appearance.
perltidy -html -css=mystyle.css somefile.pl
perltidy -html -pre somefile.pl
-Write an html snippet with only the PRE section to F<somefile.pl.html>.
+Write an html snippet with only the C<<PRE>> section to F<somefile.pl.html>.
This is useful when code snippets are being formatted for inclusion in a
larger web page. No style sheet will be written in this case.
Show summary of usage and exit.
-=item B<-o>=filename, B<--outfile>=filename
+=item B<-v>, B<--version>
+
+Show perltidy VERSION number and exit.
+
+=item B<-o>=filename, B<--outfile>=filename
Name of the output file (only if a single input file is being
processed). If no output file is specified, and output is not
F<filename.tdy>. [Note: - does not redirect to standard output. Use
B<-st> instead.]
-=item B<-st>, B<--standard-output>
+=item B<-st>, B<--standard-output>
Perltidy must be able to operate on an arbitrarily large number of files
in a single run, with each output being directed to a different output
This option may only be used if there is just a single input file.
The default is B<-nst> or B<--nostandard-output>.
-=item B<-se>, B<--standard-error-output>
+=item B<-se>, B<--standard-error-output>
If perltidy detects an error when processing file F<somefile.pl>, its
default behavior is to write error messages to file F<somefile.pl.ERR>.
Thus, you may place B<-se> in a F<.perltidyrc> and override it when
desired with B<-nse> on the command line.
-=item B<-oext>=ext, B<--output-file-extension>=ext
+=item B<-oext>=ext, B<--output-file-extension>=ext
Change the extension of the output file to be F<ext> instead of the
default F<tdy> (or F<html> in case the -B<-html> option is used).
See L<"Specifying File Extensions">.
-=item B<-opath>=path, B<--output-path>=path
+=item B<-opath>=path, B<--output-path>=path
When perltidy creates a filename for an output file, by default it merely
appends an extension to the path and basename of the input file. This
If the path contains spaces, it should be placed in quotes.
This parameter will be ignored if output is being directed to standard output,
-or if it is being specified explicitly with the B<-o=s> parameter.
+or if it is being specified explicitly with the B<--outfile=s> parameter.
-=item B<-b>, B<--backup-and-modify-in-place>
+=item B<-b>, B<--backup-and-modify-in-place>
Modify the input file or files in-place and save the original with the
extension F<.bak>. Any existing F<.bak> file will be deleted. See next
A B<-b> flag will be ignored if input is from standard input or goes to
standard output, or if the B<-html> flag is set.
-In particular, if you want to use both the B<-b> flag and the B<-pbp>
-(--perl-best-practices) flag, then you must put a B<-nst> flag after the
-B<-pbp> flag because it contains a B<-st> flag as one of its components,
-which means that output will go to the standard output stream.
+In particular, if you want to use both the B<-b> flag and the
+B<--perl-best-practices> (B<-pbp>) flag, then you must put a
+B<--nostandard-output> (B<-nst>) flag after the B<-pbp> flag because it contains a
+B<--standard-output> flag as one of its components, which means that output
+will go to the standard output stream.
-=item B<-bext>=ext, B<--backup-file-extension>=ext
+=item B<-bext>=ext, B<--backup-file-extension>=ext
This parameter serves two purposes: (1) to change the extension of the backup
file to be something other than the default F<.bak>, and (2) to indicate
Here are some examples:
- Parameter Extension Backup File Treatment
- <-bext=bak> F<.bak> Keep (same as the default behavior)
- <-bext='/'> F<.bak> Delete if no errors
- <-bext='/backup'> F<.backup> Delete if no errors
- <-bext='original/'> F<.original> Delete if no errors
+ Parameter Extension Backup File Treatment
+ -bext=bak .bak Keep (same as default behavior)
+ -bext='/' .bak Delete if no errors
+ -bext='/backup' .backup Delete if no errors
+ -bext='original/' .original Delete if no errors
=item B<-bm=s>, B<--backup-method=s>
To simplify testing and switching .perltidyrc files, this command may be
used to specify a configuration file which will override the default
-name of .perltidyrc. There must not be a space on either side of the
+name of F<.perltidyrc>. There must not be a space on either side of the
'=' sign. For example, the line
perltidy -pro=testcfg
parameters. If this is not the case, an error message noting this is produced.
This flag has no other effect on the functioning of perltidy.
+=item B<-tos=n>, B<--timeout-in-seconds=n>
+
+When the standard input supplies the input stream, and the input has not been
+received within B<n> seconds, perltidy will end with a timeout message. The
+intention is to catch a situation where perltidy is accidentally invoked
+without a file to process and therefore waits for input from the system
+standard input (stdin), which never arrives. The default is B<n=10> seconds.
+This check can be turned off with B<n=0>.
+
=back
=head1 FORMATTING OPTIONS
=over 4
+=item B<-fmt=s>, B<--format=s>
+
+If B<--format=tidy> (the default) then perltidy will reformat the input file,
+and if B<--format=html> then perltidy will produce html output.
+
+For convenience, the abbreviation B<--tidy> is equivalent to B<--format=tidy>,
+and B<-html> is equivalent to B<--format=html>.
+
=item B<--notidy>
This flag disables all formatting and causes the input to be copied unchanged
=item B<-l=n>, B<--maximum-line-length=n>
-The default maximum line length is n=80 characters. Perltidy will try
+The default maximum line length is B<n=80> characters. Perltidy will try
to find line break points to keep lines below this length. However, long
-quotes and side comments may cause lines to exceed this length.
+quotes and side comments may cause lines to exceed this length. And
+long lines may sometimes be broken at a length less than B<n> characters
+because some of the line break decisions employ small tolerances
+to prevent formatting instabilities.
The default length of 80 comes from the past when this was the standard CRT
screen width. Many programmers prefer to increase this to something like 120.
from there. If the whitespace never exceeds this limit the formatting remains
unchanged.
-The combination of B<-vmll> and B<-wc=n> provides a solution to the problem of
-displaying arbitrarily deep data structures and code in a finite window,
-although B<-wc=n> may of course be used without B<-vmll>.
+The combination of B<--variable-maximum-line-length> and
+B<--whitespace-cycle=n> provides a solution to the problem of displaying
+arbitrarily deep data structures and code in a finite window, although B<--whitespace-cycle=n> may of course be used without B<--variable-maximum-line-length>.
The default is not to use this, which can also be indicated using B<-wc=0>.
Except for possibly introducing tab indentation characters, as outlined
below, perltidy does not introduce any tab characters into your file,
and it removes any tabs from the code (unless requested not to do so
-with B<-fws>). If you have any tabs in your comments, quotes, or
+with B<--freeze-whitespace>). If you have any tabs in your comments, quotes, or
here-documents, they will remain.
=over 4
indentation level. You may want to coordinate the value of B<n> with what your
display software assumes for the spacing of a tab.
+The default is not to use this, which can also be indicated using B<-et=0>.
+
=item B<-t>, B<--tabs>
This flag causes one leading tab character to be inserted for each level
of indentation. Certain other features are incompatible with this
option, and if these options are also given, then a warning message will
-be issued and this flag will be unset. One example is the B<-lp>
+be issued and this flag will be unset. One example is the B<--line-up-parentheses>
option. This flag is retained for backwards compatibility, but
-if you use tabs, the B<-et=n> flag is recommended. If both B<-t> and
-B<-et=n> are set, the B<-et=n> is used.
+if you use tabs, the B<--entab-leading-whitespace=n> flag is recommended. If both B<--tabs> and
+B<--entab-leading-whitespace=n> are set, then B<--entab-leading-whitespace=n> is used.
=item B<-dt=n>, B<--default-tabsize=n>
With this option perltidy is still free to modify the indenting (and
outdenting) of code and comments as it normally would. If you also want to
-prevent long comment lines from being outdented, you can add either B<-noll> or
-B<-l=0>.
+prevent long comment lines from being outdented, you can add either B<--nooutdent-long-lines> (B<-noll>) or B<--maximum-line-length=0> (B<l=0>).
Setting this flag will prevent perltidy from doing any special operations on
closing side comments. You may still delete all side comments however when
this flag is in effect.
-
=item B<-enc=s>, B<--character-encoding=s>
-This flag indicates if the input data stream use a character encoding.
+This flag indicates if the input data stream uses a character encoding.
Perltidy does not look for the encoding directives in the source stream, such
as B<use utf8>, and instead relies on this flag to determine the encoding.
-(Note that perltidy often works on snippets of code rather than complete files
-so it cannot rely on B<use utf8> directives).
+(This is because perltidy often works on snippets of code rather than complete
+files, so it cannot rely on B<use utf8> directives). Consequently perltidy is
+likely to encounter problems formatting a file which is only partially encoded.
The possible values for B<s> are:
=item B<-it=n>, B<--iterations=n>
This flag causes perltidy to do B<n> complete iterations. The reason for this
-flag is that code beautification is an iterative process and in some
+flag is that code formatting is an iterative process and in some
cases the output from perltidy can be different if it is applied a second time.
For most purposes the default of B<n=1> should be satisfactory. However B<n=2>
can be useful when a major style change is being made, or when code is being
=item B<-conv>, B<--converge>
This flag is equivalent to B<-it=4> and is included to simplify iteration
-control. For all practical purposes one either does or does not want to be
-sure that the output is converged, and there is no penalty to using a large
-iteration limit since perltidy will check for convergence and stop iterating as
-soon as possible. The default is B<-nconv> (no convergence check). Using
-B<-conv> will approximately double run time since typically one extra iteration
-is required to verify convergence. No extra iterations are required if no new
+control. Perltidy will check for convergence and stop iterating as soon as
+possible. The default is B<-nconv> (no convergence check). Using B<-conv>
+will approximately double run time since typically one extra iteration is
+required to verify convergence. No extra iterations are required if no new
line breaks are made, and two extra iterations are occasionally needed when
reformatting complex code structures, such as deeply nested ternary statements.
The value given to B<-ci> is also used by some commands when a small
space is required. Examples are commands for outdenting labels,
-B<-ola>, and control keywords, B<-okw>.
+B<--outdent-labels> (B<-ola>), and control keywords, B<--outdent-keywords> (B<-okw>).
When default values are not used, it is recommended that either
(1) the value B<n> given with B<-ci=n> be no more than about one-half of the
number of spaces assigned to a full indentation level on the B<-i=n> command, or
-(2) the flag B<-extended-continuation-indentation> is used (see next section).
+(2) the flag B<--extended-continuation-indentation> is used (see next section).
=item B<-xci>, B<--extended-continuation-indentation>
Please see the section L<"B<-pbp>, B<--perl-best-practices>"> for an example of
how this flag can improve the formatting of ternary statements. It can also
-improve indentation of some multi-line qw lists as shown below.
+improve indentation of some multiline qw lists as shown below.
+
+ # perltidy
+ foreach $color (
+ qw(
+ AntiqueWhite3 Bisque1 Bisque2 Bisque3 Bisque4
+ SlateBlue3 RoyalBlue1 SteelBlue2 DeepSkyBlue3
+ ),
+ qw(
+ LightBlue1 DarkSlateGray1 Aquamarine2 DarkSeaGreen2
+ SeaGreen1 Yellow1 IndianRed1 IndianRed2 Tan1 Tan4
+ )
+ )
- # perltidy
- foreach $color (
- qw(
+ # perltidy -xci
+ foreach $color (
+ qw(
AntiqueWhite3 Bisque1 Bisque2 Bisque3 Bisque4
SlateBlue3 RoyalBlue1 SteelBlue2 DeepSkyBlue3
- ),
- qw(
+ ),
+ qw(
LightBlue1 DarkSlateGray1 Aquamarine2 DarkSeaGreen2
SeaGreen1 Yellow1 IndianRed1 IndianRed2 Tan1 Tan4
- )
- )
+ )
+ )
- # perltidy -xci
- foreach $color (
- qw(
- AntiqueWhite3 Bisque1 Bisque2 Bisque3 Bisque4
- SlateBlue3 RoyalBlue1 SteelBlue2 DeepSkyBlue3
- ),
- qw(
- LightBlue1 DarkSlateGray1 Aquamarine2 DarkSeaGreen2
- SeaGreen1 Yellow1 IndianRed1 IndianRed2 Tan1 Tan4
- )
- )
+=item B<-mci>, B<--minimize-continuation-indentation>
+
+This flag allows perltidy to remove continuation indentation in some
+special cases where it is not really unnecessary. For a simple example,
+the default formatting for the following snippet is:
+
+ # perltidy -nmci
+ $self->blurt( "Error: No INPUT for type '$type', typekind '"
+ . $type->xstype
+ . "' found" );
+
+The second and third lines are one level deep in a container, and are also
+statement continuations, so they get indented by the sum of the full indentation B<-i>
+value and the continuation indentation B<-ci> value. If this flag is set, the indentation is reduced by
+B<-ci> spaces, giving
+
+ # perltidy -mci
+ $self->blurt( "Error: No INPUT for type '$type', typekind '"
+ . $type->xstype
+ . "' found" );
+
+This flag is off by default.
=item B<-sil=n> B<--starting-indentation-level=n>
indentation scheme used to create the code snippet is the same as is being used
for the current perltidy process. This is the only sensible guess that can be
made. It should be correct if this is true, but otherwise it probably won't.
-For example, if the input script was written with -i=2 and the current perltidy
-flags have -i=4, the wrong initial indentation will be guessed for a code
+For example, if the input script was written with B<-i=2> and the current perltidy
+flags have B<-i=4>, the wrong initial indentation will be guessed for a code
snippet which has non-zero initial indentation. Likewise, if an entabbing
scheme is used in the input script and not in the current process then the
guessed indentation will be wrong.
A limitation on B<-lp>, but not B<-xlp>, occurs in situations where perltidy
does not have complete freedom to choose line breaks. Then it may temporarily revert
to its default indentation method. This can occur for example if there are
-blank lines, block comments, multi-line quotes, or side comments between the
+blank lines, block comments, multiline quotes, or side comments between the
opening and closing parens, braces, or brackets. It will also occur if a
-multi-line anonymous sub occurs within a container since that will impose
+multiline anonymous sub occurs within a container since that will impose
specific line breaks (such as line breaks after statements).
=item *
For both the B<-lp> and B<-xlp> flags, any parameter which significantly
restricts the ability of perltidy to choose newlines will conflict with these
-flags and will cause them to be deactivated. These include B<-io>, B<-fnl>,
-B<-nanl>, and B<-ndnl>.
+flags and will cause them to be deactivated. These include B<--indent-only>, B<--freeze-newlines>,
+B<--noadd-newlines>, and B<--nodelete-old-newlines>.
=item *
The set of containers to which the B<-lp> style applies can be reduced by
either one of these two flags:
-Use B<-lpil=s> to specify the containers to which B<-lp> applies, or
+Use B<--line-up-parentheses-inclusion-list=s> (B<-lpil=s>) to specify the containers to which B<-lp> applies, or
-use B<-lpxl=s> to specify the containers to which B<-lp> does NOT apply.
+use B<--line-up-parentheses-exclusion-list=s> (B<-lpxl=s>) to specify the containers to which B<-lp> does NOT apply.
Only one of these two flags may be used. Both flags can achieve the same
result, but the B<-lpil=s> flag is much easier to describe and use and is
token immediately before the paren. The possible letters are currently 'k',
'K', 'f', 'F', 'w', and 'W', with these meanings for matching whatever precedes an opening paren:
- 'k' matches if the previous nonblank token is a perl built-in keyword (such as 'if', 'while'),
- 'K' matches if 'k' does not, meaning that the previous token is not a keyword.
- 'f' matches if the previous token is a function other than a keyword.
+ 'k' matches if the previous nonblank token is a perl keyword
+ (such as 'if', 'while'),
+ 'K' matches if 'k' does not: previous token is not a keyword
+ 'f' matches if previous token is a function (not a keyword)
'F' matches if 'f' does not.
'w' matches if either 'k' or 'f' match.
'W' matches if 'w' does not.
An optional numeric code may follow any of the container types to further refine the selection based
on container contents. The numeric codes are:
- '0' or blank: no check on contents is made
- '1' exclude B<-lp> unless the contents is a simple list without sublists
- '2' exclude B<-lp> unless the contents is a simple list without sublists, without
- code blocks, and without ternary operators
+ '0' or blank: no restriction is placed on container contents
+ '1' the container contents must be a simple list without sublists
+ '2' the container contents must be a simple list without sublists,
+ without code blocks, and without ternary operators
For example,
-lpil = 'f(2'
-means only apply -lp to function call lists which do not contain any sublists,
-code blocks or ternary expressions.
+means only apply -lp to function calls with simple lists (not containing any sublists,
+code blocks or ternary expressions).
=item B<-cti=n>, B<--closing-token-indentation>
); or ]; or };
-cti = 3 one extra indentation level always
-The flags B<-cti=1> and B<-cti=2> work well with the B<-lp> flag (previous
-section).
+The flags B<-cti=1> and B<-cti=2> work well with the B<--line-up-parentheses>
+(B<-lp>) flag (previous section).
# perltidy -lp -cti=1
@month_of_year = (
The default is not to do this, indicated by B<-nicb>.
+=item B<-ils>, B<--indent-leading-semicolon>
+
+A line which begins with a leading semicolon will, by default, have the extra
+number of indentation spaces defined by B<--continuation-indentation=n>.
+This extra indentation can be removed by setting B<-nils>.
+
+ # default
+ $z = sqrt( $x**2 + $y**2 )
+
+ ; # <-- indented by ci spaces
+
+ # -nils
+ $z = sqrt( $x**2 + $y**2 )
+
+ ; # <-- not indented by ci spaces
+
+Note that leading semicolons do not normally occur unless requested with
+B<--break-at-old-semicolon-breakpoints> or forced, for example by
+a blank line as in this example.
=item B<-nib>, B<--non-indenting-braces>
corresponding closing brace will not be given the normal extra indentation
level. For example:
- { #<<< a closure to contain lexical vars
+ { #<<< a closure to contain lexical vars
- my $var; # this line does not get one level of indentation
- ...
+ my $var; # this line does not get one level of indentation
+ ...
- }
+ }
- # this line does not 'see' $var;
+ # this line does not 'see' $var;
This can be useful, for example, when combining code from different files.
Different sections of code can be placed within braces to keep their lexical
non-indenting braces. The default is equivalent to -nibp='#<<<'. The string
that you enter must begin with a # and should be in quotes as necessary to get
past the command shell of your system. This string is the leading text of a
-regex pattern that is constructed by appending pre-pending a '^' and appending
+regex pattern that is constructed by prepending a '^' and appending
a'\s', so you must also include backslashes for characters to be taken
literally rather than as patterns.
When B<-olq> is set, lines which is a quoted string longer than the
value B<maximum-line-length> will have their indentation removed to make
them more readable. This is the default. To prevent such out-denting,
-use B<-nolq> or B<--nooutdent-long-lines>.
+use B<-nolq>.
=item B<-oll>, B<--outdent-long-lines>
-This command is equivalent to B<--outdent-long-quotes> and
+This command is equivalent to the combination B<--outdent-long-quotes> and
B<--outdent-long-comments>, and it is included for compatibility with previous
versions of perltidy. The negation of this also works, B<-noll> or
B<--nooutdent-long-lines>, and is equivalent to setting B<-nolq> and B<-nolc>.
=item B<Outdenting Labels:> B<-ola>, B<--outdent-labels>
-This command will cause labels to be outdented by 2 spaces (or whatever B<-ci>
-has been set to), if possible. This is the default. For example:
+This command will cause labels to be outdented by the number of spaces defined
+by B<--continuation-indentation=n>, if possible.
+This is the default. For example:
my $i;
LOOP: while ( $i = <FOTOS> ) {
fixit($i);
}
-Use B<-nola> to not outdent labels. To control line breaks after labels see L<"-bal=n, --break-after-labels=n">.
+Use B<-nola> to prevent this. To control line breaks after labels see L<"-bal=n, --break-after-labels=n">.
=item B<Outdenting Keywords>
=item B<-okw>, B<--outdent-keywords>
The command B<-okw> will cause certain leading control keywords to
-be outdented by 2 spaces (or whatever B<-ci> has been set to), if
+be outdented by the number of spaces defined by
+B<--continuation-indentation=n>spaces, if
possible. By default, these keywords are C<redo>, C<next>, C<last>,
C<goto>, and C<return>. The intention is to make these control keywords
easier to see. To change this list of keywords being outdented, see
fixit($i);
}
-The default is not to do this.
+Notice that the keyword B<next> has been outdented. The default is not to do
+this.
=item B<Specifying Outdented Keywords:> B<-okwl=string>, B<--outdent-keyword-list=string>
command is still required.
For example, the commands C<-okwl="next last redo goto" -okw> will cause
-those four keywords to be outdented. It is probably simplest to place
-any B<-okwl> command in a F<.perltidyrc> file.
+those four keywords to be outdented.
=back
And finally, curly braces which contain blocks of code are controlled by the
parameter B<-bbt=n> or B<--block-brace-tightness=n> as illustrated in the
-example below.
+example below (B<-bbt=0> is the default).
- %bf = map { $_ => -M $_ } grep { /\.deb$/ } dirents '.'; # -bbt=0 (default)
+ %bf = map { $_ => -M $_ } grep { /\.deb$/ } dirents '.'; # -bbt=0
%bf = map { $_ => -M $_ } grep {/\.deb$/} dirents '.'; # -bbt=1
%bf = map {$_ => -M $_} grep {/\.deb$/} dirents '.'; # -bbt=2
To simplify input in the case that all of the tightness flags have the same
-value <n>, the parameter <-act=n> or B<--all-containers-tightness=n> is an
-abbreviation for the combination <-pt=n -sbt=n -bt=n -bbt=n>.
+value B<n>, the parameter B<-act=n> or B<--all-containers-tightness=n> is an
+abbreviation for the combination B<-pt=n -sbt=n -bt=n -bbt=n>.
+
+=item B<-mutt=s>, B<--multiple-token-tightness=s>
+
+To review, the tightness controls described in the previous section have three
+possible integer values: 0, 1, and 2, where B<n=0> always adds a space, and
+B<n=2> never adds a space.
+
+The default value B<n=1> adds space only if the container holds multiple
+tokens. Some perltidy tokens may be rather long, and it can be
+preferable to space some of them as if they were multple tokens. This can be
+done with this paramter.
+
+For example, in the following expression the C<qw> list is a single token and
+therefore there the default formatting does not put spaces within the square
+brackets:
+
+ my $rlist = [qw(alpha beta gamma)];
+
+This can be changed with
+
+ # perltidy -mutt='qw'
+ my $rlist = [ qw(alpha beta gamma) ];
+
+This tells perltidy to space a C<qw>list as if it were multiple tokens when the
+default tightness is used.
+
+The parameter B<s> may contain any of the following perl operators:
+
+ qw q qq qx qr s y tr m
+
+Other possible specifications are
+
+ q* - all of the above operators
+ Q - all of the above plus any quoted text
+ h - a here-doc target, such as '<<EOT'
+ <<>> - the double-diamond operator
+
+A symbol may be negated by preceding it with a carat B<^> symbol.
+The double-diamond operator is always included unless negated in this way.
+
+=item B<-xbt>, B<--extended-block-tightness>
+
+There are two controls for spacing within curly braces, namely
+B<--block-brace-tightness=n> for code block braces and B<--brace-tightness=n>
+for all other braces.
+
+There is a little fuzziness in this division of brace types though because the
+curly braces considered by perltidy to contain code blocks for formatting
+purposes, such as highlighting code structure, exclude some of the small code
+blocks used by Perl mainly for isolating terms. These include curly braces
+following a keyword where an indirect object might occur, or curly braces
+following a type symbol. For example, perltidy does not mark the following
+braces as code block braces:
+
+ print {*STDERR} $message;
+ return ${$foo};
+
+Consequently, the spacing within these small braced containers by default
+follows the flag B<--brace-tightness=n> rather than
+B<--block-brace-tightness=n>, as one might expect.
+
+If desired, small blocks such as these can be made to instead follow the
+spacing defined by the B<--block-brace-tightness=n> flag by setting
+B<--extended-block-tightness>. The specific types of small blocks to which
+this parameter applies is controlled by a companion control parameter,
+described in the next section.
+
+Note that if the two flags B<--block-brace-tightness=n> and
+B<--brace-tightness=n> have the same value B<n> then there would be no reason
+to set this flag.
+
+=item B<-xbtl=s>, B<--extended-block-tightness-list=s>
+
+The previous parameter B<--extended-block-tightness> (B<-xbt>) can be made to
+apply to curly braces preceded by any of the keywords
+
+ print printf exec system say
+
+and/or the special symbols
+
+ $ @ % & * $#
+
+The parameter string B<s> may contain a selection of these keywords and symbols
+to indicate the brace types to which B<--extended-block-tightness> applies.
+For convenience, all of
+the keywords can be selected with 'k', and all of the special symbols
+can be selected with 't'. The default is equivalent to B<-xbtl='k'>, which
+selects all of the keywords.
+
+Examples:
+
+ -xbtl='k' # selects just the keywords [DEFAULT]
+ -xbtl="t" # selects just the special type symbols
+ -xbtl="k t" # selects all keywords and symbols, or simply
+ -xbtl="kt" # selects all keywords and symbols
+ -xbtl="print say" # selects just keywords B<print> and B<say>:
+Here are some formatting examples using the default values of
+B<--brace-tightness=n> and B<--block-brace-tightness=n>. Note that in these
+examples B<$ref> is in block braces but B<$key> is not.
+
+ # default formatting
+ print {*STDERR} $message;
+ my $val = ${$ref}{$key};
+
+ # perltidy -xbt or
+ # perltidy -xbt -xbtl=k
+ print { *STDERR } $message;
+ my $val = ${$ref}{$key};
+
+ # perltidy -xbt -xbtl=t
+ print {*STDERR} $message;
+ my $val = ${ $ref }{$key};
+
+ # perltidy -xbt -xbtl=kt
+ print { *STDERR } $message;
+ my $val = ${ $ref }{$key};
+
+Finally, note that this parameter merely changes the way that the parameter
+B<--extended-block-tightness> works. It has no effect unless
+B<--extended-block-tightness> is actually set.
=item B<-tso>, B<--tight-secret-operators>
default is for no such space, and is indicated with B<-nsts> or
B<--nospace-terminal-semicolon>.
- $i = 1 ; # -sts
- $i = 1; # -nsts (default)
+ $i = 1 ; # -sts
+ $i = 1; # -nsts (default)
=item B<-sfs>, B<--space-for-semicolon>
Setting this option allows perltidy to add certain whitespace to improve
code readability. This is the default. If you do not want any
whitespace added, but are willing to have some whitespace deleted, use
-B<-naws>. (Use B<-fws> to leave whitespace completely unchanged).
+B<-naws>. (Use B<--freeze-whitespace> (B<-fws>) to leave whitespace completely unchanged).
=item B<-dws>, B<--delete-old-whitespace>
-Setting this option allows perltidy to remove some old whitespace
-between characters, if necessary. This is the default. If you
-do not want any old whitespace removed, use B<-ndws> or
-B<--nodelete-old-whitespace>.
+Setting this option allows perltidy to remove optional whitespace between
+characters in the input file. The default is to not to do this
+(B<-nodelete-old-whitespace>). This parameter has little effect by itself.
+But in combination with B<--noadd-whitespace> it will cause most of the
+whitespace in a file to be removed.
=item B<Detailed whitespace controls around tokens>
It is necessary to have a list of all token types in order to create
this type of input. Such a list can be obtained by the command
B<--dump-token-types>. Also try the B<-D> flag on a short snippet of code
-and look at the .DEBUG file to see the tokenization.
+and look at the F<.DEBUG> file to see the tokenization.
+
+To illustrate, suppose we do not want a space after a colon which introduces a
+sub attribute. We need to know its type. It is not a colon because that is the
+type of a ternary operator. The output of B<--dump-token-types> states that it
+is token type 'A'. To verify this, we can run C<perltidy -D> on a short piece
+of code containing such a colon, such as
+
+ sub foo : lvalue;
+
+This produces a F<.DEBUG> file which contains
+
+ 1: sub foo : lvalue;
+ 1: SSSSSSSbAbwwwwww;
+
+The top line is the input, and the bottom line shows the token types.
+The 'A' beneath the colon verifies that it is type B<A>.
+
+So to prevent a space on the right of this colon we can use
+
+ # perltidy -nwrs='A'
+ sub foo :lvalue;
B<WARNING> Be sure to put these tokens in quotes to avoid having them
misinterpreted by your command shell.
If formatted in this way, the program will not run (at least with recent versions of perl) because the $x is taken to be a filehandle and / is assumed to start a quote. In a complex program, there might happen to be a / which terminates the multiline quote without a syntax error, allowing the program to run, but not as intended.
-Related issues arise with other binary operator symbols, such as + and -, and in older versions of perl there could be problems with ternary operators. So to avoid changing program behavior, perltidy has the simple rule that whitespace around possible filehandles is left unchanged. Likewise, whitespace around barewords is left unchanged. The reason is that if the barewords are defined in other modules, or in code that has not even been written yet, perltidy will not have seen their prototypes and must treat them cautiously.
+Related issues arise with other binary operator symbols, such as + and -, and in older versions of perl there could be problems with ternary operators. So to avoid changing program behavior, perltidy has the simple rule that whitespace around possible filehandles is left unchanged. Likewise, whitespace around unknown barewords is left unchanged. The reason is that if the barewords are defined in other modules, or in code that has not even been written yet, perltidy will not have seen their prototypes and must treat them cautiously.
In perltidy this is implemented in the tokenizer by marking token following a
B<print> keyword as a special type B<Z>. When formatting is being done,
For another example, the following two lines will be parsed without syntax error:
# original programming, syntax ok
- for my $severity ( reverse $SEVERITY_LOWEST+1 .. $SEVERITY_HIGHEST ) { ... }
+ for my $severity ( reverse $LOWEST+1 .. $HIGHEST ) { ... }
# perltidy default, syntax ok
- for my $severity ( reverse $SEVERITY_LOWEST + 1 .. $SEVERITY_HIGHEST ) { ... }
+ for my $severity ( reverse $LOWEST + 1 .. $HIGHEST ) { ... }
But the following will give a syntax error:
# perltidy -nwrs='+', syntax error:
- for my $severity ( reverse $SEVERITY_LOWEST +1 .. $SEVERITY_HIGHEST ) { ... }
+ for my $severity ( reverse $LOWEST +1 .. $HIGHEST ) { ... }
To avoid subtle parsing problems like this, it is best to avoid spacing a
binary operator asymmetrically with a space on the left but not on the right.
When an opening paren follows a Perl keyword, no space is introduced after the
keyword, unless it is (by default) one of these:
- my local our and or xor eq ne if else elsif until unless
- while for foreach return switch case given when
+ my local our state and or xor err eq ne if else elsif until unless
+ while for foreach return switch case given when catch
These defaults can be modified with two commands:
sub usage(); # n=1 [default; follows input]
sub usage (); # n=2 [space]
+=item B<-ssp=n> or B<--space-signature-paren=n>
+
+This flag is analogous to the previous except that it applies to the space before the opening paren of a sub B<signature> rather than a sub B<prototype>.
+
+For example, consider the following line:
+
+ sub circle( $xc, $yc, $rad )
+
+This space before the opening paren can be controlled with integer B<n> which
+may have the value 0, 1, or 2 with these meanings:
+
+ -ssp=0 means no space before the paren
+ -ssp=1 means follow the example of the source code [DEFAULT]
+ -ssp=2 means always put a space before the paren
+
+The default is B<-ssp=1>, meaning that will be a space in the output if, and only if, there is one in the input. Given the above line of code, the result of
+applying the different options would be:
+
+ sub circle( $xc, $yc, $rad ) # n=0 [no space]
+ sub circle( $xc, $yc, $rad ) # n=1 [default; same as input]
+ sub circle ( $xc, $yc, $rad ) # n=2 [space]
+
=item B<-kpit=n> or B<--keyword-paren-inner-tightness=n>
The space inside of an opening paren, which itself follows a certain keyword,
These can be changed with the parameter B<-kpitl=s> described in the next section.
-=item B<-kpitl=string> or B<--keyword-paren-inner-tightness=string>
+=item B<-kpitl=string> or B<--keyword-paren-inner-tightness-list=string>
-This command can be used to change the keywords to which the the B<-kpit=n>
-command applies. The parameter B<string> is a required list either keywords or
+This command can be used to change the keywords to which the previous
+parameter, B<-kpit=n>,
+applies. The parameter B<string> is a required list either keywords or
functions, which should be placed in quotes if there are more than one. By
itself, this parameter does not cause any change in spacing, so the B<-kpit=n>
command is still required.
Note that this is considered to be a different operation from "vertical
alignment" because space at just one line is being adjusted, whereas in
-"vertical alignment" the spaces at all lines are being adjusted. So it sort of
-a local version of vertical alignment.
+"vertical alignment" the spaces at all lines are being adjusted. So it is
+sort of a local version of vertical alignment.
Here is an example involving a ternary operator:
=item B<Trimming whitespace around C<qw> quotes>
B<-tqw> or B<--trim-qw> provide the default behavior of trimming
-spaces around multi-line C<qw> quotes and indenting them appropriately.
+spaces around multiline C<qw> quotes and indenting them appropriately.
B<-ntqw> or B<--notrim-qw> cause leading and trailing whitespace around
-multi-line C<qw> quotes to be left unchanged. This option will not
+multiline C<qw> quotes to be left unchanged. This option will not
normally be necessary, but was added for testing purposes, because in
some versions of perl, trimming C<qw> quotes changes the syntax tree.
comment, whereas B<side comment> will refer to a comment which appears on a
line to the right of some code.
+Perltidy does not do any word wrapping of commented text to match a selected
+maximum line length. This is because there is no way to determine if this is
+appropriate for the given content. However, an interactive program named
+B<perlcomment.pl> is available in the B<examples> folder of the perltidy
+distribution which can assist in doing this.
+
=over 4
=item B<-ibc>, B<--indent-block-comments>
example:
# this comment is indented (-ibc, default)
- if ($task) { yyy(); }
+ if ($task) { yyy(); }
The alternative is B<-nibc>:
# this comment is not indented (-nibc)
- if ($task) { yyy(); }
+ if ($task) { yyy(); }
See also the next item, B<-isbc>, as well as B<-sbc>, for other ways to
have some indented and some outdented block comments.
long single line would remain intact with -l=80 and -iscl:
perltidy -l=80 -iscl
- $vmsfile =~ s/;[\d\-]*$//; # Clip off version number; we can use a newer version as well
+ $vmsfile =~ s/;[\d\-]*$//; # Clip off version number; we can use a newer version
whereas without the -iscl flag the line will be broken:
perltidy -l=80
$vmsfile =~ s/;[\d\-]*$//
- ; # Clip off version number; we can use a newer version as well
+ ; # Clip off version number; we can use a newer version
+
+=item B<-ipc>, B<--ignore-perlcritic-comments>
+Perltidy, by default, will look for side comments beginning with
+B<## no critic> and ignore their lengths when making line break decisions,
+even if the user has not set B<-iscl>. The reason is that an unwanted line
+break can make these special comments ineffective in controlling B<perlcritic>.
+
+Setting B<--ignore-perlcritic-comments> tells perltidy not to look for these
+B<## no critic> comments.
=item B<-hsc>, B<--hanging-side-comments>
command changes the default list to be any selected block types; see
L<"Specifying Block Types">.
For example, the following command
-requests that only C<sub>'s, labels, C<BEGIN>, and C<END> blocks be
-affected by any B<-csc> or B<-dcsc> operation:
+requests that only C<sub>'s, labels, C<BEGIN>, and C<END> blocks be affected by
+operations which add (B<-csc>) or delete (B<-dcsc>) closing side comments:
- -cscl="sub : BEGIN END"
+ --closing-side-comment-list='sub : BEGIN END'
+
+=item B<-cscxl=string>, or B<--closing-side-comment-exclusion-list>
+
+where C<string> is a list of block types which should NOT be tagged with
+closing side comments. If a block type appears in both B<-cscl> and
+B<-cscxl>, then B<-cscxl> has priority and the block will not be tagged.
+
+For example, the following command requests that anonymous subs
+should not be affected by any B<-csc> or B<-dcsc> operation:
+
+ --closing-side-comment-exclusion-list='asub'
+
+By default, no block types are excluded.
=item B<-csct=n>, or B<--closing-side-comment-maximum-text=n>
=item B<-osbc>, B<--outdent-static-block-comments>
-The command B<-osbc> will cause static block comments to be outdented by 2
-spaces (or whatever B<-ci=n> has been set to), if possible.
+The command B<-osbc> will cause static block comments to be outdented by
+the number of B<--continuation-spaces=n>, if possible.
=back
B<--code-skipping> uses starting and ending markers '#<<V' and '#>>V', like
this:
- #<<V code skipping: perltidy will pass this verbatim without error checking
+ #<<V code skipping: perltidy passes this verbatim, no error checking
token ident_digit {
[ [ <?word> | _ | <?digit> ] <?ident_digit>
#>>V
-Additional text may appear on the special comment lines provided that it
-is separated from the marker by at least one space, as in the above examples.
+(The last character V is like a < or > rotated 90 degrees).
+Additional text may appear on the special comment lines provided that it is
+separated from the marker by at least one space to highlight the sign, as in
+the above examples.
Any number of code-skipping or format-skipping sections may appear in a file.
If an opening code-skipping or format-skipping comment is not followed by a
Some examples show how example strings become patterns:
- -fsb='#\{\{\{' becomes /^#\{\{\{\s/ which matches #{{{ but not #{{{{
- -fsb='#\*\*' becomes /^#\*\*\s/ which matches #** but not #***
- -fsb='#\*{2,}' becomes /^#\*{2,}\s/ which matches #** and #*****
+ -fsb='#\{\{\{' becomes /^#\{\{\{\s/ which matches #{{{ but not #{{{{
+ -fsb='#\*\*' becomes /^#\*\*\s/ which matches #** but not #***
+ -fsb='#\*{2,}' becomes /^#\*{2,}\s/ which matches #** and #*****
=item B<-fse=string>, B<--format-skipping-end=string>
=back
+=head2 Formatting a Limited Range of Lines
+
+A command B<--line-range-tidy=n1:n2> is available to process just a selected
+range of lines of an input stream with perltidy. This command is mainly of
+interest for programming interactive code editors. When it is used, the entire
+input stream is read but just the selected range of lines of the input file are
+processed by the perltidy tokenizer and formatter, and then the stream is
+reassembled for output. The selected lines need to contain a complete statement
+or balanced container. Otherwise, a syntax error will occur and the code will
+not be tidied. There are a couple of limitations on the use of this command:
+(1) it may not be applied to multiple files, and (2) it only applies to code
+tidying and not, for example, html formatting.
+
+=over 4
+
+=item B<-lrt=n1:n2>, B<--line-range-tidy=n1:n2>
+
+The range of lines is specified by integers B<n1> and B<n2>, where B<n1> is the first line number to be formatted (start counting with 1) and B<n2> is the last line number to be formatted. If B<n2> is not given, or exceeds the actual number of lines, then formatting continues to the end of the file.
+
+Examples:
+
+ --line-range-tidy=43:109 # tidy lines 43 through 109
+ --line-range-tidy=' 43 : 109' # tidy lines 43 through 109
+ --line-range-tidy=1: # tidy all lines
+ --line-range-tidy=0:90 # ERROR (n1 must be >= 1)
+
+The second example shows that spaces are okay if placed in quotes.
+
+=back
+
=head2 Line Break Control
The parameters in this and the next sections control breaks after
this section and sections
L<"Controlling List Formatting">,
L<"Retaining or Ignoring Existing Line Breaks">.
-You may want to use B<-noll> with this.
+You may want to use B<--nooutdent-long-lines> with this.
Note: If you also want to keep your blank lines exactly
-as they are, you can use the B<-fbl> flag which is described
+as they are, you can use the B<--freeze-blank-lines> flag which is described
in the section L<"Blank Line Control">.
=back
# -nce (default)
if ($task) {
- yyy();
+ yyy();
}
else {
- zzz();
+ zzz();
}
In this example the keyword B<else> is placed on the same line which begins with
this "cuddled" style are B<elsif>, B<continue>, B<catch>, B<finally>.
Other block types can be formatted by specifying their names on a
-separate parameter B<-cbl>, described in a later section.
+separate parameter B<--cuddled-block-list>, described in a later section.
Cuddling between a pair of code blocks requires that the closing brace of the
first block start a new line. If this block is entirely on one line in the
input file, it is necessary to decide if it should be broken to allow cuddling.
-This decision is controlled by the flag B<-cbo=n> discussed below. The default
+This decision is controlled by the flag B<--cuddled-break-options=n>
+(B<-cbo=n>) discussed below. The default
and recommended value of B<-cbo=1> bases this decision on the first block in
the chain. If it spans multiple lines then cuddling is made and continues
along the chain, regardless of the sizes of subsequent blocks. Otherwise, short
lines remain intact.
-So for example, the B<-ce> flag would not have any effect if the above snippet
-is rewritten as
+So for example, the B<--cuddled-else> flag would not have any effect if the
+above snippet is rewritten as
if ($task) { yyy() }
else { zzz() }
=item B<-cb>, B<--cuddled-blocks>
-This flag is equivalent to B<-ce>.
+This flag is equivalent to B<--cuddled-else> (B<-ce>).
=item B<-cbl>, B<--cuddled-block-list>
The built-in default cuddled block types are B<else, elsif, continue, catch, finally>.
-Additional block types to which the B<-cuddled-blocks> style applies can be defined by
+Additional block types to which the B<--cuddled-blocks> style applies can be defined by
this parameter. This parameter is a character string, giving a list of
block types separated by commas or spaces. For example, to cuddle code blocks
of type sort, map and grep, in addition to the default types, the string could
As a diagnostic check, the flag B<--dump-cuddled-block-list> or B<-dcbl> can be
used to view the hash of values that are generated by this flag.
-Finally, note that the B<-cbl> flag by itself merely specifies which blocks are formatted
-with the cuddled format. It has no effect unless this formatting style is activated with
-B<-ce>.
+Finally, note that the B<--cuddled-block-list> parameter by itself merely
+specifies which blocks are formatted with the cuddled format. It has no effect
+unless this formatting style is activated with B<--cuddled-else>.
=item B<-cblx>, B<--cuddled-block-list-exclusive>
-When cuddled else formatting is selected with B<-ce>, setting this flag causes
-perltidy to ignore its built-in defaults and rely exclusively on the block types
-specified on the B<-cbl> flag described in the previous section. For example,
-to avoid using cuddled B<catch> and B<finally>, which are among the defaults, the
-following set of parameters could be used:
+When cuddled else formatting is selected with B<--cuddled-else>, setting this
+flag causes perltidy to ignore its built-in defaults and rely exclusively on
+the block types specified on the B<--cuddled-block-list> flag described in the
+previous section. For example, to avoid using cuddled B<catch> and B<finally>,
+which are among the defaults, the following set of parameters could be used:
perltidy -ce -cbl='else elsif continue' -cblx
-
=item B<-cbo=n>, B<--cuddled-break-option=n>
Cuddled formatting is only possible between a pair of code blocks if the
options are:
cbo=0 Never force a short block to break.
- cbo=1 If the first of a pair of blocks is broken in the input file,
- then break the second [DEFAULT].
+ cbo=1 If the first of a pair of blocks is broken in the input
+ file, then break the second [DEFAULT].
cbo=2 Break open all blocks for maximal cuddled formatting.
The default and recommended value is B<cbo=1>. With this value, if the starting
The option B<cbo=2> produces maximal cuddling but will not allow any short blocks.
-
=item B<-bl>, B<--opening-brace-on-new-line>, or B<--brace-left>
Use the flag B<-bl> to place an opening block brace on a new line:
}
When B<-bl> is set, the blocks to which this applies can be controlled with the
-parameters B<--brace-left-list> and B<-brace-left-exclusion-list> described in the next sections.
+parameters B<--brace-left-list> and B<--brace-left-exclusion-list> described in the next sections.
=item B<-bll=s>, B<--brace-left-list=s>
mean all blocks except B<sort map grep eval> and anonymous sub blocks.
Note that the lists B<-bll=s> and B<-blxl=s> control the behavior of the
-B<-bl> flag but have no effect unless the B<-bl> flag is set.
+B<-bl> flag but have no effect unless the B<-bl> flag is set. These
+two lists provide complete control for this flag, but two shortcut
+flags are available and described in the next sections.
=item B<-sbl>, B<--opening-sub-brace-on-new-line>
-The flag B<-sbl> provides a shortcut way to turn on B<-bl> just for named
-subs. The same effect can be achieved by turning on B<-bl>
-with the block list set as B<-bll='sub'>.
+The flag B<-sbl> provides a shortcut way to turn on B<-bl> just for named subs.
+The same effect can be achieved by turning on B<-bl> with the block list set as
+B<-bll='sub'>. To avoid conflicts, it is recommended to either use the more
+general list method described above to control B<-bl>, or this shortcut method,
+but not both.
For example,
}
}
-This flag is negated with B<-nsbl>, which is the default.
+This negative version of this flag, B<-nsbl>, turns off B<-bl> for
+named subs. The same effect can be achieved with the exclusion
+list method, B<-blxl=sub>.
=item B<-asbl>, B<--opening-anonymous-sub-brace-on-new-line>
-The flag B<-asbl> is like the B<-sbl> flag except that it applies
-to anonymous sub's instead of named subs. For example
+The flag B<-asbl> is like the B<-sbl> flag except that it applies to anonymous
+sub's instead of named subs. The same effect can be achieved by turning on
+B<-bl> with the block list set to include B<-bll='asub'>.
+
+For example
perltidy -asbl
}
};
-This flag is negated with B<-nasbl>, and the default is B<-nasbl>.
+This negative version of this flag, B<-nasbl>, turns off B<-bl> for
+anonymous subs.
=item B<-bli>, B<--brace-left-and-indent>
The flag B<-bli> is similar to the B<-bl> flag but in addition it causes one
-unit of continuation indentation ( see B<-ci> ) to be placed before
-an opening and closing block braces.
+unit of continuation indentation ( see B<--continuation-indentation> ) to be
+placed before an opening and closing block braces.
-For example, perltidy -bli gives
+For example
+ # perltidy -bli
if ( $input_file eq '-' )
{
important_function();
By default, this extra indentation occurs for block types:
B<if>, B<elsif>, B<else>, B<unless>, B<while>, B<for>, B<foreach>, B<do>, and
-also B<named subs> and blocks preceded by a B<label>. The next item shows how to
-change this.
+also B<named subs> and blocks preceded by a B<label>. The next item shows how
+to change this.
-B<Note>: The B<-bli> flag is similar to the B<-bl> flag, with the difference being
+B<Note>: The B<-bli> flag is similar to the B<-bl> flag, with the difference
+being
that braces get indented. But these two flags are implemented independently,
and have different default settings for historical reasons. If desired, a
mixture of effects can be achieved if desired by turning them both on with
big_waste_of_time();
}
-A conflict occurs if both B<-bl> and B<-bar> are specified.
+A conflict occurs if both B<--opening-brace-on_new-line> (B<-bl>) and B<-bar>
+are specified.
+
+=item B<-cpb>, B<--cuddled-paren-brace>
+
+A related parameter, B<--cuddled-paren-brace>, causes perltidy to join
+two lines which otherwise would be
+
+ )
+ {
+
+to be
+
+ ) {
+
+For example:
+
+ # default
+ foreach my $dir (
+ '05_lexer', '07_token', '08_regression', '11_util',
+ '13_data', '15_transform'
+ )
+ {
+ ...
+ }
+
+ # perltidy -cpb
+ foreach my $dir (
+ '05_lexer', '07_token', '08_regression', '11_util',
+ '13_data', '15_transform'
+ ) {
+ ...;
+ }
=item B<-otr>, B<--opening-token-right> and related flags
=item B<-bbpi=n>, B<--break-before-paren-and-indent=n>
-This flag is a companion to B<-bbp=n> for controlling the indentation of an opening paren
-which is placed on a new line by that parameter. The indentation is as follows:
+This flag is a companion to B<-bbp=n> for controlling the indentation of an
+opening paren which is placed on a new line by that parameter. The indentation
+is as follows:
-bbpi=0 one continuation level [default]
-bbpi=1 outdent by one continuation level
-bbpi=2 indent one full indentation level
+=item B<-bfvt=n>, B<--brace-follower-vertical-tightness=n>
+
+Some types of closing block braces, such as B<eval>, may be followed by
+additional code. A line break may be inserted between such a closing
+brace and the following code depending on the parameter B<n> and
+the length of the trailing code, as follows:
+
+If the trailing code fits on a single line, then
+
+ -bfvt=0 Follow the input style regarding break/no-break
+ -bfvt=1 Follow the input style regarding break/no-break [Default]
+ -bfvt=2 Do not insert a line break
+
+If the trailing code requires multiple lines, then
+
+ -bfvt=0 Insert a line break
+ -bfvt=1 Insert a line break except for a cuddled block chain
+ -bfvt=2 Do not insert a line break
+
+The default is B<-bfvt=1>. The most compact code is achieved with B<-bfvt=2>.
+
+Example (non-cuddled, multiple lines ):
+
+ # -bfvt=0 or -bvft=1 [DEFAULT]
+ eval {
+ ( $line, $cond ) = $self->_normalize_if_elif($line);
+ 1;
+ }
+ or die sprintf "Error at line %d\nLine %d: %s\n%s",
+ ( $line_info->start_line_num() ) x 2, $line, $@;
+
+ # -bfvt=2
+ eval {
+ ( $line, $cond ) = $self->_normalize_if_elif($line);
+ 1;
+ } or die sprintf "Error at line %d\nLine %d: %s\n%s",
+ ( $line_info->start_line_num() ) x 2, $line, $@;
+
+Example (cuddled, multiple lines):
+
+ # -bfvt=0
+ eval {
+ #STUFF;
+ 1; # return true
+ }
+ or do {
+ ##handle error
+ };
+
+ # -bfvt=1 [DEFAULT] or -bfvt=2
+ eval {
+ #STUFF;
+ 1; # return true
+ } or do {
+ ##handle error
+ };
+
=back
=head2 Welding
For example:
- # default formatting
+ # default formatting
do {
{
next if $x == $y;
}
} until $x++ > $z;
- # perltidy -wn
+ # perltidy -wn
do { {
next if $x == $y;
} } until $x++ > $z;
opening or closing symbols may join together in weld. For example, here are
three levels of wrapped function calls:
- # default formatting
+ # default formatting
my (@date_time) = Localtime(
Date_to_Time(
Add_Delta_DHMS(
The inner sandwich layer is required to be at least one line thick. If this
cannot be achieved, welding does not occur. This constraint can cause
formatting to take a couple of iterations to stabilize when it is first applied
-to a script. The B<-conv> flag can be used to insure that the final format is
-achieved in a single run.
+to a script. The B<--converge> flag can be used to insure that the final format
+is achieved in a single run.
Here is an example illustrating a welded container within a welded containers:
- # default formatting
- $x->badd(
- bmul(
- $class->new(
- abs(
- $sx * int( $xr->numify() ) & $sy * int( $yr->numify() )
- )
- ),
- $m
- )
- );
-
- # perltidy -wn
- $x->badd( bmul(
- $class->new( abs(
- $sx * int( $xr->numify() ) & $sy * int( $yr->numify() )
- ) ),
+ # default formatting
+ $x->badd(
+ bmul(
+ $class->new(
+ abs(
+ $sx * int( $xr->num() ) & $sy * int( $yr->num() )
+ )
+ ),
$m
- ) );
+ )
+ );
+
+ # perltidy -wn
+ $x->badd( bmul(
+ $class->new( abs(
+ $sx * int( $xr->num() ) & $sy * int( $yr->num() )
+ ) ),
+ $m
+ ) );
The welded closing tokens are by default on a separate line but this can be
-modified with the B<-vtc=n> flag (described in the next section). For example,
+modified with the B<--vertical-tightness-closing=n> (B<-vtc=n>) flag (described
+in the next section). For example,
the same example adding B<-vtc=2> is
- # perltidy -wn -vtc=2
- $x->badd( bmul(
- $class->new( abs(
- $sx * int( $xr->numify() ) & $sy * int( $yr->numify() ) ) ),
- $m ) );
+ # perltidy -wn -vtc=2
+ $x->badd( bmul(
+ $class->new( abs(
+ $sx * int( $xr->num() ) & $sy * int( $yr->num() ) ) ),
+ $m ) );
This format option is quite general but there are some limitations.
One limitation is that any line length limit still applies and can cause long
welded sections to be broken into multiple lines.
-Another limitation is that an opening symbol which delimits quoted text cannot
-be included in a welded pair. This is because quote delimiters are treated
-specially in perltidy.
-
-Finally, the stacking of containers defined by this flag have priority over
+Also, the stacking of containers defined by this flag have priority over
any other container stacking flags. This is because any welding is done first.
=item B<-wfc>, B<--weld-fat-comma >
-When the B<-wfc> flag is set, along with B<-wn>, perltidy is allowed to weld
-an opening paren to an inner opening container when they are separated by a hash key and fat comma (=>). for example
+When the B<-wfc> flag is set, along with B<-wn> (B<--weld-nested-containers>),
+perltidy is
+allowed to weld an opening paren to an inner opening container when they are
+separated by a hash key and fat comma (=>). for example
# perltidy -wn -wfc
elf->call_method( method_name_foo => {
container symbol. The possible letters are currently 'k', 'K', 'f', 'F',
'w', and 'W', with these meanings:
- 'k' matches if the previous nonblank token is a perl built-in keyword (such as 'if', 'while'),
- 'K' matches if 'k' does not, meaning that the previous token is not a keyword.
- 'f' matches if the previous token is a function other than a keyword.
- 'F' matches if 'f' does not.
- 'w' matches if either 'k' or 'f' match.
- 'W' matches if 'w' does not.
+ 'k' matches if the previous nonblank token is a perl keyword
+ (such as 'if', 'while'),
+ 'K' matches if 'k' does not: previous token is not a keyword
+ 'f' matches if previous token is a function (not a keyword)
+ 'F' matches if 'f' does not
+ 'w' matches if either 'k' or 'f' match
+ 'W' matches if 'w' does not
For example, compare
# perltidy -wn
if ( defined( $_Cgi_Query{
- $Config{'methods'}{'authentication'}{'remote'}{'cgi'}{'username'}
+ $Config{'methods'}{'auth'}{'remote'}{'cgi'}{'username'}
} ) )
with
# perltidy -wn -wnxl='^K( {'
if ( defined(
- $_Cgi_Query{ $Config{'methods'}{'authentication'}{'remote'}{'cgi'}
+ $_Cgi_Query{ $Config{'methods'}{'auth'}{'remote'}{'cgi'}
{'username'} }
) )
Here are some additional example strings and their meanings:
- '^(' - the weld must not start with a paren
- '.(' - the second and later tokens may not be parens
- '.w(' - the second and later tokens may not keyword or function call parens
- '(' - no parens in a weld
- '^K(' - exclude a leading paren preceded by a non-keyword
- '.k(' - exclude a secondary paren preceded by a keyword
- '[ {' - exclude all brackets and braces
- '[ ( ^K{' - exclude everything except nested structures like do {{ ... }}
+ '^(' - the weld must not start with a paren
+ '.(' - second and later tokens may not be parens
+ '.w(' - second and later tokens may not be a keyword or call parens
+ '(' - no parens in a weld
+ '^K(' - exclude a leading paren preceded by a non-keyword
+ '.k(' - exclude a secondary paren preceded by a keyword
+ '[ {' - exclude all brackets and braces
+ '[ ( ^K{' - exclude all except nested structures like do {{ ... }}
=item B<Vertical tightness> of non-block curly braces, parentheses, and square brackets.
=item *
-You must also use the B<-lp> flag when you use the B<-vt> flag; the
-reason is explained below.
+You must also use the B<-lp> (B<--line-up-parentheses>) flag when you use the
+B<-vt> flag; the reason is explained below.
=item *
Finally, please note that these vertical tightness flags are merely
hints to the formatter, and it cannot always follow them. Things which
make it difficult or impossible include comments, blank lines, blocks of
-code within a list, and possibly the lack of the B<-lp> parameter.
+code within a list, and possibly the lack of the B<--line-up-parentheses>
+parameter.
Also, these flags may be ignored for very small lists (2 or 3 lines in
length).
The difference between B<-vt=1> and B<-vt=2> is shown here:
- # perltidy -lp -vt=1
- $init->add(
- mysprintf( "(void)find_threadsv(%s);",
- cstring( $threadsv_names[ $op->targ ] )
- )
- );
+ # perltidy -lp -vt=1
+ $init->add(
+ mysprintf( "(void)find_threadsv(%s);",
+ cstring( $threadsv_names[ $op->targ ] )
+ )
+ );
- # perltidy -lp -vt=2
- $init->add( mysprintf( "(void)find_threadsv(%s);",
- cstring( $threadsv_names[ $op->targ ] )
- )
- );
+ # perltidy -lp -vt=2
+ $init->add( mysprintf( "(void)find_threadsv(%s);",
+ cstring( $threadsv_names[ $op->targ ] )
+ )
+ );
With B<-vt=1>, the line ending in C<add(> does not combine with the next
line because the next line is not balanced. This can help with
The tightest, and least readable, code is produced with both C<-vt=2> and
C<-vtc=2>:
- # perltidy -lp -vt=2 -vtc=2
- $init->add( mysprintf( "(void)find_threadsv(%s);",
- cstring( $threadsv_names[ $op->targ ] ) ) );
+ # perltidy -lp -vt=2 -vtc=2
+ $init->add( mysprintf( "(void)find_threadsv(%s);",
+ cstring( $threadsv_names[ $op->targ ] ) ) );
Notice how the code in all of these examples collapses vertically as
B<-vt> increases, but the indentation remains unchanged. This is
because perltidy implements the B<-vt> parameter by first formatting as
if B<-vt=0>, and then simply overwriting one output line on top of the
next, if possible, to achieve the desired vertical tightness. The
-B<-lp> indentation style has been designed to allow this vertical
+B<-lp> (B<--line-up-parentheses>) indentation style has been designed to allow
+this vertical
collapse to occur, which is why it is required for the B<-vt> parameter.
The B<-vt=n> and B<-vtc=n> parameters apply to each type of container
There is no vertical tightness control for closing block braces; with
one exception they will be placed on separate lines.
The exception is that a cascade of closing block braces may
-be stacked on a single line. See B<-scbb>.
+be stacked on a single line. See B<--stack-closing-block-brace>.
=item B<-sot>, B<--stack-opening-tokens> and related flags
always_quote => 1,
} );
-The B<-sct> flag is somewhat similar to the B<-vtc> flags, and in some
+The B<-sct> flag is somewhat similar to the B<-vtc>
+(B<--vertical-tightness-closing> flags, and in some
cases it can give a similar result. The difference is that the B<-vtc>
flags try to avoid lines with leading opening tokens by "hiding" them at
the end of a previous line, whereas the B<-sct> flag merely tries to
To simplify input even further for the case in which both opening and closing
non-block containers are stacked, the flag B<-sac> or B<--stack-all-containers>
-is an abbreviation for B<-sot -sct>.
+is an abbreviation for B<--stack-opening-tokens --stack-closing-tokens>.
Please note that if both opening and closing tokens are to be stacked, then the
-newer flag B<-weld-nested-containers> may be preferable because it insures that
+newer flag B<--weld-nested-containers> may be preferable because it insures that
stacking is always done symmetrically. It also removes an extra level of
unnecessary indentation within welded containers. It is able to do this
-because it works on formatting globally rather than locally, as the B<-sot> and
-B<-sct> flags do.
+because it works on formatting globally rather than locally, as the B<--stack-opening-tokens> and B<--stack-closing-tokens> flags do.
=back
snippet of code and look at the .DEBUG file to see the tokenization. However,
for a few token types there may be conflicts with hardwired logic which cause
unexpected results. One example is curly braces, which should be controlled
-with the parameter B<bl> provided for that purpose.
+with the parameter B<--opening-brace-on-new-line> provided for that purpose.
B<WARNING> Be sure to put these tokens in quotes to avoid having them
misinterpreted by your command shell.
and the B<-bbao> flag sets the default to break before all of these operators.
These can be used to define an initial break preference which can be fine-tuned
-with the B<-wba> and B<-wbb> flags. For example, to break before all operators
-except an B<=> one could use --bbao -wba='=' rather than listing every
-single perl operator except B<=> on a -wbb flag.
+with the B<--want-break-after> and B<--want-break-before> flags. For example,
+to break before all operators except an B<=> one could use C<-bbao -wba='='>
+rather than listing every single perl operator except B<=> on a B<-wbb> flag.
=over 4
# perltidy (default)
my @list = ( 1, 1, 1, 1, 2, 1, 1, 3, 3, 1, 1, 4, 6, 4, 1, );
-This formatting loses the nice structure. If we place a side comment anywhere
-between the opening and closing parens, the original line break points are
-retained. For example,
+This formatting loses the nice structure. The original line breaks can be
+retained by adding comment or a blank line somewhere between the
+two parens. For example,
+
+ my @list = (
+ 1, # a side comment forces the original breakpoints to be kept
+ 1, 1,
+ 1, 2, 1,
+ 1, 3, 3, 1,
+ 1, 4, 6, 4, 1,
+ );
+
+We could achieve the same result with a blank line or full comment
+anywhere between the opening and closing parens. Vertical alignment
+of the list items will still occur if possible. The blank line method
+is shown here:
my @list = (
- 1, # a side comment forces the original line breakpoints to be kept
+
+ 1,
1, 1,
1, 2, 1,
1, 3, 3, 1,
1, 4, 6, 4, 1,
);
-The side comment can be a single hash symbol without any text.
-We could achieve the same result with a blank line or full comment
-anywhere between the opening and closing parens. Vertical alignment
-of the list items will still occur if possible.
-
For another possibility see
the -fs flag in L<"Skipping Selected Sections of Code">.
A disadvantage of this flag compared to the methods discussed above is that all
tables in the file must already be nicely formatted.
+=item B<-btct=s>, B<--break-at-trailing-comma-types=s>
+
+A B<trailing comma> is an optional comma following the last item of a list.
+The B<-btct=s> tells perltidy to end a line at selected trailing commas.
+The string B<s> selects the trailing commas, as follows:
+
+ s=1 or '*' : every trailing comma
+ s=m a trailing comma in a multiline list
+ s=b a bare trailing comma
+ s=0 none
+
+For example, given the following input
+
+ $w->bind(
+ '<Page_Down>' => xx,
+ );
+
+The default formatting would flatten this into a single line. But the
+container can be kept open with
+
+ # perltidy -btct='b'
+ $w->bind(
+ '<Page_Down>' => xx,
+ );
+
+This can be particularly useful for short function calls like this,
+where the default perltidy formatting does not keep the container open.
+
+The options B<s=m> and B<s=1> can be used to open up containers with
+non-bare trailing commas. For example, given this input
+
+ $w->bind( '<Page_Down>' => xx, );
+
+we can break it open with
+
+ # perltidy -btct=1
+ $w->bind(
+ '<Page_Down>' => xx,
+ );
+
+Afterwards, we could switch to B<-btct='b'> since the trailing comma is now
+bare.
+But the B<-btct> parameter must be retained in this case because otherwise this
+small list will be flattened the next time it is formatted.
+
+This logic can be restricted to specific container types by including an
+opening token ahead of the letter in the above table. For example
+
+ -btct='(b'
+
+means that this operation should only apply to bare trailing commas within
+parentheses.
+
+For parentheses, an additional item of information which can be given is an
+alphanumeric letter which is used to limit the selection further depending on
+the type of token immediately before the opening paren. The possible letters
+are currently 'k', 'K', 'f', 'F', 'w', and 'W', with these meanings for
+matching whatever precedes an opening paren:
+
+ 'k' matches if the previous nonblank token is a perl keyword
+ (such as 'if', 'while'),
+ 'K' matches if 'k' does not: previous token is not a keyword
+ 'f' matches if previous token is a function (not a keyword)
+ 'F' matches if 'f' does not.
+ 'w' matches if either 'k' or 'f' match.
+ 'W' matches if 'w' does not.
+
+These are the same codes used for B<--line-up-parentheses-inclusion-list>.
+For example,
+
+ -btct='f(b'
+
+means that bare trailing commas in function call lists in the input stream
+should be followed by line breaks in the formatted output stream.
+
+The section L<"Adding and Deleting Commas"> describes additional controls for
+working with trailing commas. These can be combined with the
+B<-break-trailing-comma-types> parameter for additional control of list
+formatting.
+
=item B<-mft=n>, B<--maximum-fields-per-table=n>
If B<n> is a positive number, and the computed number of fields for any table
exceeds B<n>, then it will be reduced to B<n>. This parameter might be used on
a small section of code to force a list to have a particular number of fields
per line, and then either the B<-boc> flag could be used to retain this
-formatting, or a single comment could be introduced somewhere to freeze the
-formatting in future applications of perltidy. For example
+formatting, or a blank line or comment could be introduced somewhere to freeze
+the formatting in future applications of perltidy. For example
# perltidy -mft=2
@month_of_year = (
"09" => 30, "10" => 31, "11" => 30, "12" => 31
);
+=item B<-qwaf>, B<--qw-as-function>
+
+This option tells perltidy to format a B<qw> list which is delimited with
+parentheses as if it were a function call whose call args are a list of quoted
+items. Normally, a B<qw> list is output verbatim except for an adjustment of
+leading whitespace to indicate the indentation level. For example, here is an
+example of the default formatting of a poorly formatted B<qw> list:
+
+ # perltidy
+ @fields = qw( $st_dev $st_ino $st_mode $st_nlink $st_uid
+ $st_gid $st_rdev $st_size $st_atime $st_mtime $st_ctime
+ $st_blksize $st_blocks);
+
+If we format with B<-qwaf> then the result will be:
+
+ # perltidy -qwaf
+ @fields = qw(
+ $st_dev $st_ino $st_mode $st_nlink
+ $st_uid $st_gid $st_rdev $st_size
+ $st_atime $st_mtime $st_ctime $st_blksize
+ $st_blocks
+ );
+
+The way this works is that just before formatting begins, the tokens of the
+B<qw> text are replaced with the tokens of an equivalent function call with a
+comma-separated list of quoted items as call args. Then it is formatted like
+any other list. Special comma tokens are employed which have no display text, so
+when the code is eventually displayed it remains a valid B<qw> quote.
+
+Some things to note are:
+
+=over 4
+
+=item *
+
+This only works for B<qw> quotes which begin with B<qw(>, with no space
+before the paren.
+
+=item *
+
+If the option B<--space-function-paren> is employed, it is ignored for
+these special function calls because it would deactivate them.
+
+=item *
+
+Otherwise the various formatting control flags operate on these lists the same
+as for other comma-separated lists. In particular, note that if
+B<--break-at-old-comma-breakpoints>, or B<-boc>, is set, then the old line
+break locations will be retained. And old line breaks will be retained if
+there are any blank lines between the opening and closing parens.
+
+=item *
+
+Before using this option for the first time, it is a good idea to scan the code
+and decide if any lists have a special order which should be retained. This
+can be accomplished for example by changing the quote delimiters to something
+other than parens, or by inserting a blank line as discussed at the start of
+this section.
+
+=back
+
=back
=head2 Adding and Deleting Commas
=item B<-drc>, B<--delete-repeated-commas>
-Repeated commas in a list are undesirable and can be removed with this flag.
+This option causes repeated commas to be removed.
For example, given this list with a repeated comma
ignoreSpec( $file, "file",, \%spec, \%Rspec );
# perltidy -drc:
ignoreSpec( $file, "file", \%spec, \%Rspec );
-Since the default is not to add or delete commas, this feature is off by default and must be requested.
+This parameter also deletes repeated fat commas, '=>'. The complete list of
+actions taken when this flag is set are as follows:
+=over 4
-=item B<--want-trailing-commas=s> or B<-wtc=s>, B<--add-trailing-commas> or B<-atc>, and B<--delete-trailing-commas> or B<-dtc>
+=item *
-A trailing comma is a comma following the last item of a list. Perl allows
-trailing commas but they are not required. By default, perltidy does not add
-or delete trailing commas, but it is possible to manipulate them with the
-following set of three related parameters:
+Repeated commas like ',,' are removed with a warning.
- --want-trailing-commas=s, -wtc=s - defines where trailing commas are wanted
- --add-trailing-commas, -atc - gives permission to add trailing commas to match the style wanted
- --delete-trailing-commas, -dtc - gives permission to delete trailing commas which do not match the style wanted
+=item *
-The parameter B<--want-trailing-commas=s>, or B<-wtc=s>, defines a preferred style. The string B<s> indicates which lists should get trailing commas, as follows:
+Repeated fat commas like '=> =>' are removed with a warning.
- s=0 : no list should have a trailing comma
- s=1 or * : every list should have a trailing comma
- s=m a multi-line list should have a trailing commas
- s=b trailing commas should be 'bare' (comma followed by newline)
- s=h lists of key=>value pairs, with about one one '=>' and one ',' per line,
- with a bare trailing comma
- s=i lists with about one comma per line, with a bare trailing comma
- s=' ' or -wtc not defined : leave trailing commas unchanged [DEFAULT].
+=item *
-This parameter by itself only indicates the where trailing commas are
-wanted. Perltidy only adds these trailing commas if the flag B<--add-trailing-commas>, or B<-atc> is set. And perltidy only removes unwanted trailing commas
-if the flag B<--delete-trailing-commas>, or B<-dtc> is set.
+The combination '=>,' produces a warning but is not changed (it is
+likely an error but only its author would know how to fix it).
-Here are some example parameter combinations and their meanings
+=item *
- -wtc=0 -dtc : delete all trailing commas
- -wtc=1 -atc : all lists get trailing commas
- -wtc=m -atc : all multi-line lists get trailing commas, but
- single line lists remain unchanged.
- -wtc=m -dtc : multi-line lists remain unchanged, but
- any trailing commas on single line lists are removed.
- -wtc=m -atc -dtc : all multi-line lists get trailing commas, and
- any trailing commas on single line lists are removed.
+The remaining combination ',=>' (sometimes called a 'winking fat comma') is
+ignored by this parameter.
-For example, given the following input without a trailing comma
+=item *
- bless {
- B => $B,
- Root => $Root
- } => $package;
+These warnings are only output if the B<--warning-output>, or B<-w>, flag is
+set.
-we can add a trailing comma after the variable C<$Root> using
+=back
- # perltidy -wtc=m -atc
- bless {
- B => $B,
- Root => $Root,
- } => $package;
+This option is on by default. Use B<-ndrc> to turn it off.
-This could also be achieved in this case with B<-wtc=b> instead of B<-wtc=m>
-because the trailing comma here is bare (separated from its closing brace by a
-newline). And it could also be achieved with B<-wtc=h> because this particular
-list is a list of key=>value pairs.
+=item B<Adding and Deleting Trailing Commas>
-The above styles should cover the main of situations of interest, but it is
-possible to apply a different style to each type of container token by
-including an opening token ahead of the style character in the above table.
-For example
+A B<trailing comma> is a comma following the last item of a list. Perl allows
+trailing commas but they are not required. Including them can sometimes
+simplify the maintenance of large or complex lists, and help display structure.
+But they may not be appropriate in all lists, for example in a list which
+always has just one term. By default, perltidy does not add or delete trailing
+commas, but it is possible to manipulate them with the following set of related
+parameters:
- -wtc='(m [b'
+=over 4
-means that lists within parens should have multi-line trailing commas, and that
-lists within square brackets have bare trailing commas. Since there is no
-specification for curly braces in this example, their trailing commas would
-remain unchanged.
+=item *
+
+B<--want-trailing-commas=s, -wtc=s> - defines where trailing commas are wanted (the style)
+
+=item *
+
+B<--add-trailing-commas, -atc> - gives permission to add trailing commas to match the style wanted
+
+=item *
+
+B<--delete-trailing-commas, -dtc> - gives permission to delete trailing commas which do not match the style wanted
+
+=back
+
+The parameter B<--want-trailing-commas=s>, or B<-wtc=s>, defines a preferred style. The string B<s> indicates which lists should get trailing commas, as follows:
+
+ s=1 or '*' : every list
+ s=m a multiline list
+ s=b a multiline list, bare trailing comma
+ s=i a multiline list, bare trailing comma, about one comma per line
+ s=h a multiline list, bare trailing comma, about one key=>value
+ pair per line
+ s=0 : no list
+
+ s=' ' or not defined : leave trailing commas unchanged [DEFAULT]
+
+where:
+
+=over 4
+
+=item *
+
+A B<list> here is basically taken to be a container of items (parens, square
+brackets, or braces), which is not a code block, which contains one or more
+commas or fat commas. These parameters only apply to something that fits this
+definition of a list.
+
+A paren-less list of parameters is not a list by this definition, so
+these parameters do not apply to a paren-less list.
+
+=item *
+
+A B<multiline list> is a list for which the opening and closing brackets
+on different lines.
+
+=item *
+
+A B<bare trailing comma> is a comma which is at the end of a line. That is,
+the closing container token follows on a different line. So a list with a
+bare trailing comma is a special case of a multiline list.
+
+=item *
+
+In fact the above options for trailing commas can be seen as a hierarchy
+of nesting sets which can be expressed as
+
+ 1 > m > b > i > h > 0
+
+This indicates that multiline trailing commas B<m> are a
+subset of all trailing commas, and bare trailing commas B<b> are a subset
+of all multiline trailing commas, and so on.
+
+=back
+
+This parameter by itself only indicates where trailing commas are wanted.
+Perltidy only adds these trailing commas if permission is granted by setting
+the flag B<--add-trailing-commas>, or B<-atc>. And perltidy only removes
+unwanted trailing commas if the flag B<--delete-trailing-commas>, or B<-dtc> is
+set.
+
+Here are some example parameter combinations and their meanings
+
+ -wtc=0 -dtc : delete all trailing commas
+ -wtc=1 -atc : add trailing commas to all lists
+ -wtc=m -atc : add trailing commas to all multiline lists
+ (single line lists remain unchanged)
+ -wtc=b -atc : add commas so that all lists whose closing
+ bracket starts a new line have trailing commas
+ -wtc=b -dtc : all trailing commas which are not bare
+ (not followed by a newline) get deleted.
+ -wtc=b -atc -dtc : do both of the above operations so that
+ all trailing commas are bare
+
+For example, given the following input
+
+ $wine_list = $top->Box(
+ "-length" => 5,
+ "-width" => 3
+ )->select( "red", "white", "gold", );
+
+we have
+
+ # perltidy -wtc=b -atc -dtc
+ $wine_list = $top->Box(
+ "-length" => 5,
+ "-width" => 3,
+ )->select( "red", "white", "gold" );
+
+A comma was added after the C<3>, since it is bare, and a comma was removed
+after C<"gold">, since it not bare.
+
+It is possible to apply a different style to each type of container token by
+including an opening token ahead of the style character in the above table.
+For example
+
+ -wtc='(m [b'
+
+means that lists within parens should have multiline trailing commas, and that
+lists within square brackets have bare trailing commas. Since there is no
+specification for curly braces in this example, their trailing commas would
+remain unchanged.
For parentheses, an additional item of information which can be given is an
alphanumeric letter which is used to limit the selection further depending on
are currently 'k', 'K', 'f', 'F', 'w', and 'W', with these meanings for
matching whatever precedes an opening paren:
- 'k' matches if the previous nonblank token is a perl built-in keyword (such as 'if', 'while'),
- 'K' matches if 'k' does not, meaning that the previous token is not a keyword.
- 'f' matches if the previous token is a function other than a keyword.
+ 'k' matches if the previous nonblank token is a perl keyword
+ (such as 'if', 'while'),
+ 'K' matches if 'k' does not: previous token is not a keyword
+ 'f' matches if previous token is a function (not a keyword)
'F' matches if 'f' does not.
'w' matches if either 'k' or 'f' match.
'W' matches if 'w' does not.
These are the same codes used for B<--line-up-parentheses-inclusion-list>.
For example,
- -wtc = 'w(m'
+ -wtc='w(m'
+
+means that trailing commas are wanted for multiline parenthesized lists following a function call or keyword.
+
+Finally, a leading B<+> can be placed on any term to indicate that it only
+applies when adding commas. A leading B<-> indicates that it only applies when
+deleting commas. For example,
+
+ -wtc='+h -b' -atc -dtc
-means that trailing commas are wanted for multi-line parenthesized lists following a function call or keyword.
+means that missing trailing commas should be added to lists of key => value
+pairs, and trailing commas which are not bare should be removed. No other
+changes are made. When both plus and minus terms are used like this, they must
+not be in conflict. There is no conflict in this example because the trailing
+comma locations of the key=>value pairs selected by the B<+h> term are a subset
+of all bare trailing commas, and thus will not be deleted by the B<-b> term.
+The general rule is that the letter of the plus term should occur after the
+letter of the minus term in the hierarchical nesting order,
+B<< 1 > m > b > i > h > 0 >>.
-Here are some points to note regarding adding and deleting trailing commas:
+B<Some points to note> regarding adding and deleting trailing commas:
=over 4
=item *
-For the implementation of these parameters, a B<list> is basically taken to be
-a container of items (parens, square brackets, or braces), which is not a code
-block, with one or more commas. These parameters only apply to something that
-fits this definition of a list.
+It is recommended to also use the B<--converge> parameter when adding and/or
+deleting trailing commas, especially if the formatter may be making other line
+break changes at the same time. The reason is that the decision regarding
+whether or not a list is multiline or bare is made based on the B<input> stream
+if only one iteration is made, which is the default.
-Note that a paren-less list of parameters is not a list by this definition, so
-these parameters have no effect on a peren-less list.
+When iterations are requested with the B<--converge> parameter, any comma
+deletion operations are postponed until the start of the B<second iteration>,
+after most changes in line breaks have been made.
-Another consequence is that if the only comma in a list is deleted, then it
-cannot later be added back with these parameters because the container no
-longer fits this definition of a list. For example, given
+To illustrate, if we start with
- my ( $self, ) = @_;
+ f(
+ a => 1,
+ b => 2, );
-and if we remove the comma with
+and attempt to delete non-bare commas,
- # perltidy -wtc=m -dtc
- my ( $self ) = @_;
+ # perltidy -wtc=b -dtc
+ f(
+ a => 1,
+ b => 2
+ );
-then we cannot use these trailing comma controls to add this comma back.
+we delete a comma which has become bare, which is not what is wanted. This
+happened because the change was based on the starting rather than the final
+line breaks. Running with B<--converge> gives the desired result:
-=item *
+ # perltidy -wtc=b -dtc --converge
+ f(
+ a => 1,
+ b => 2,
+ );
+
+because comma changes are based on the line breaks after the first iteration.
-By B<multiline> list is meant a list for which the first comma and trailing comma
-are on different lines.
+A parameter B<--delay-trailing-comma-operations>, or B<-dtco>, is available to
+control this behavior if desired. Negating this parameter, with B<-ndtco>, tells
+perltidy to always use the starting state to make decisions regarding comma
+addition and deletion, even when iterations are requested. This should not
+normally be necessary.
=item *
-A B<bare> trailing comma is a comma which is at the end of a line. That is,
-the closing container token follows on a different line. So a list with a
-bare trailing comma is a special case of a multi-line list.
+Perltidy does not add a trailing comma in some B<edge cases> which appear to
+be near a stability limit. So if a comma is unexpectedly not added, this is
+probably the reason.
=item *
-The decision regarding whether or not a list is multi-line or bare is
-made based on the B<input> stream. In some cases it may take an iteration
-or two to reach a final state.
+If the parameter B<--break-at-trailing-comma-types>, or B<-btct>. is also
+employed, it operates on the state after any adding or deleting of commas. And
+it will allow trailing commas to be added in most edge cases. For example,
+given the following input text
+
+ plot(
+ 'g', Canvas => $overview_canvas
+ );
+
+formatting with C<-wtc=f(b) -atc> will not add a trailing comma because the
+list will be flattened and the comma will not remain bare. But we can add a
+trailing comma, and keep the container open, with
+
+ # perltidy -wtc='f(b' -atc -btct='f(b'
+ plot(
+ 'g', Canvas => $overview_canvas,
+ );
+
+As another example, given the same text on a single line without a trailing comma
+
+ plot( 'g', Canvas => $overview_canvas );
+
+we can add a trailing comma and break the container open with
+
+ # perltidy -wtc=1 -atc -btct=1
+ plot(
+ 'g', Canvas => $overview_canvas,
+ );
+
+After that, we could use C<-btct='f(b'> to keep the container open.
=item *
When using these parameters for the first time it is a good idea to practice
on some test scripts and verify that the results are as expected.
+=back
+
+B<Special Considerations for Lone Trailing Commas>
+
+Adding or deleting the only comma in a list can have some implications which
+need to be explained and possibly controlled. Two additional controls are
+available for these lone commas:
+
+=over 4
+
+=item *
+
+B<--add-lone-trailing-commas, -altc> - gives permission to add a comma if it will be the only comma. This is on by default and explained below.
+
=item *
-Since the default behavior is not to add or delete commas, these parameters
-can be useful on a temporary basis for reformatting a script.
+B<--delete-lone-trailing-commas, -dltc> - gives permission to delete the only comma in a list. This is on by default and explained below.
=back
+One issue with deleting a lone comma is that if it is deleted, then it
+might not be possible add it back automatically since perltidy uses the
+existence of commas to help locate containers where commas are appropriate. For
+example, given
+
+ my ( $self, ) = @_;
+
+and if we remove the comma with
+
+ # perltidy -wtc=m -dtc
+ my ( $self ) = @_;
+
+then we cannot use the trailing comma controls to add this comma back. The
+parameter B<--delete-lone-trailing-commas> allows such a comma to be deleted,
+and is on by default, but can be turned off to prevent this. This might
+be useful if one is experimenting with formatting options and wants
+to restrict testing to operations which are reversible. Note that this
+parameter is a fine-tuning control for B<--delete-trailing-commas> which
+must also be set for it to have any effect.
+
+However, if a single item in a list is itself is a list with multiple lines,
+such as the item in braces here
+
+ $self->make_grammar(
+ {
+ iterator => $self->_iterator,
+ parser => $self,
+ }
+ );
+
+then perltidy can add and/or delete a lone comma:
+
+ # perltidy -atc -wtc=b
+ $self->make_grammar(
+ {
+ iterator => $self->_iterator,
+ parser => $self,
+ },
+ );
+
+But it turns out that these cases usually coincide with situations where the
+B<--weld-nested-containers>, or B<-wn>, parameter would apply, and adding such
+commas can block welding. For example, the B<-wn> parameter would succeed on
+the first of the above snippets, but it would fail on the second because of the
+added trailing comma.
+
+The parameter B<--add-lone-trailing-commas>, or B<-altc> allows these commas to
+be added, provide that B<--add-trailing-commas> is also set. It is on by
+default. Users of B<-wn> may want to turn it off with
+B<--noadd-lone-trailing-commas>, B<-naltc> to prevent such commas from being
+added.
+
+If such commas do get added, then can be removed to allow welding with the
+control described in the next section.
+
=item B<-dwic>, B<--delete-weld-interfering-commas>
If the closing tokens of two nested containers are separated by a comma, then
For example, a comma in this script prevents welding:
# perltidy -wn
- skip_symbols(
- [ qw(
- Perl_dump_fds
- Perl_ErrorNo
- Perl_GetVars
- PL_sys_intern
- ) ],
+ $self->make_grammar(
+ {
+ iterator => $self->_iterator,
+ parser => $self,
+ },
);
-Using B<-dwic> removes the comma and allows welding:
+Adding B<-dwic> removes the comma and allows welding:
# perltidy -wn -dwic
- skip_symbols( [ qw(
- Perl_dump_fds
- Perl_ErrorNo
- Perl_GetVars
- PL_sys_intern
- ) ] );
+ $self->make_grammar( {
+ iterator => $self->_iterator,
+ parser => $self,
+ } );
-Since the default is not to add or delete commas, this feature is off by default.
+This feature is off by default.
Here are some points to note about the B<-dwic> parameter
=over 4
=item *
-This operation is not reversible, so please check results of using this parameter carefully.
+This operation is not always reversible, so please check results of using this
+parameter carefully.
=item *
possible, but not sufficient. So welding will not always occur where these
commas are removed.
+=item *
+
+This operation is independent of B<--add-trailing-commas> and B<--delete-trailing-commas>. If it conflicts with any of those settings, it has priority.
+
+=back
+
+=back
+
+=head2 Adding and Deleting Interbracket Arrows
+
+In the following expression, the arrow operator '->' between the closing and
+opening brackets of hash keys and array indexes are optional:
+
+ return $self->{'commandline'}->{'args'}->[0]->[0]->{'hgroups'};
+
+These will be called B<interbracket arrows> here, for lack of a better term.
+Perltidy will not change them by default, but they can be added or removed with
+the following parameters.
+
+=over 4
+
+=item B<-dia>, B<--delete-interbracket-arrows>
+
+This parameter deletes interbracket arrows. Applied to the above example we have
+
+ # perltidy -dia
+ return $self->{'commandline'}{'args'}[0][0]{'hgroups'};
+
+By default this applies to all interbracket arrows, but selective deletion is possible
+with controls described below.
+
+=item B<-aia>, B<--add-interbracket-arrows>
+
+This parameter adds interbracket arrows. Applied to the line of code above, we
+get back the original line.
+
+ # perltidy -aia
+ return $self->{'commandline'}->{'args'}->[0]->[0]->{'hgroups'};
+
+Selective changes can be made with controls described below.
+
+=item B<-ias=s>, B<--interbracket-arrow-style=s>
+
+By default the B<-add-> and B<-delete-> parameters apply to all interbracket
+arrows.
+
+An optional style can be specified with this parameter string B<s>. In that
+case the parameters B<--add-interbracket-arrows> and
+B<--delete-interbracket-arrows> only apply where they would bring the
+formatting into agreement with the specified style. They may both be used in a
+single run if a mixed style is specified since there is no conflict.
+
+The style string B<s> gives a graphical description of the desired style. It
+lists up to four possible pairs of bracket types with an optional
+arrow between them. For example:
+
+ -ias='][ }->{'
+
+This means no arrows are wanted between '][' but arrows should be between '}{'.
+And it means that the unlisted pairs ']{' and '}[' should remain unchanged,
+either with or without arrows.
+
+In this particular example, if the parameter B<--delete-interbracket-arrows> is
+used, then only arrows like ']->[' will be deleted, since they
+are the only ones which disagree with the style.
+
+And likewise, if B<--add-interbracket-arrows> is used, then arrows will
+only be inserted between brackets like '}{' to bring the formatting into
+conformity with the style in this example.
+
+Spaces in the string B<s> are optional. They are ignored when the
+expression is parsed.
+
+The style corresponding to all possible arrows is
+
+ -ias=']->[ ]->{ }->[ }->{'
+
+For convenience, this may also be requested with B<-ias=1> or B<-ias='*'>.
+
+The style corresponding to no interbracket arrows is
+
+ -ias='] [ ] { } [ } {'
+
+which may also be requested with B<-ias=0>.
+
+=item B<-wia>, B<--warn-interbracket-arrows>
+
+If this parameter is set, then a message is written to the error file in the
+following cases:
+
+=over 4
+
+=item *
+
+If an arrow is added or deleted by an add or delete command.
+
+=item *
+
+If a style is defined and an arrow would have been added or deleted if requested.
+So for example, the command
+
+ perltidy -wia -ias=']['
+
+will show where a file has arrows like ]->[' since they do not match the style,
+but no changes will be made because the delete command B<-dia> has not been
+given. And
+
+ perltidy -wia -ias=0
+
+will warn if any arrows exist, since the flag -ias=0 means that no arrows
+are wanted.
+
+=back
+
+=item B<-iac=n>, B<--interbracket-arrow-complexity=n>
+
+This parameter makes it possible to skip adding or deleting arrows following a
+container which is complex in some sense. Three levels of complexity can be
+specified with the integer B<n>, as follows:
+
+ n=0 the left container must contain be a single thing (token)
+ n=1 the left container must not contain other containers [DEFAULT]
+ n=2 the left container may contain anything
+
+Some examples:
+
+ # Container complexity
+ {'commandline'} 0 single token OK by default
+ { $type . $name } 1 multiple tokens OK by default
+ [ $plot{'x-axis'} - 1 ] 2 contains a container SKIPPED by default
+
+So, with the default complexity level of 1, an arrow could be added or deleted
+following the first two of these containers but not the third.
+
+=back
+
+B<Some points to consider> when working with these parameters are:
+
+=over 4
+
+=item *
+
+There are no known bugs, but this is a relatively new feature. So please
+carefully check file differences and run tests when interbracket arrows are
+added or removed.
+
+=item *
+
+For some unusual spacing parameters, it could take an extra iteration for
+the spaces between brackets to reach their final state after arrows are
+added or deleted.
+
+=item *
+
+Any comments between brackets will prevent the adding and deleting of arrows.
+
=back
+=head2 Missing Else Blocks
+
+A defensive programming technique is to require that every B<if-elsif-> chain
+be terminated with an B<else> block, even though it is not strictly required.
+This helps insure that there are no holes in the logic.
+
+For example, consider the following snippet:
+
+ my $type = get_cards();
+ if ( $type = 1 ) { action("hold 'em") }
+ elsif ( $type = 2 ) { action("fold 'em") }
+ elsif ( $type = 3 ) { action("walk away") }
+
+What if the variable B<$type> is some other value? It might have been obvious
+that this was okay when the code was first written, but it might not be so
+clear when the code is reviewed a few years later. A terminal B<else> block
+with a comment would help clarify things.
+
+The parameters in this section can help by either issuing a warning if an
+B<else> is missing, or even inserting an empty B<else> block where one is
+missing, or both.
+
+=over 4
+
+=item B<-wme>, B<--warn-missing-else>
+
+This flag tells perltidy to issue a warning if a program is missing a terminal B<else> block. The default is not to issue such a warning.
+
+=item B<-ame>, B<--add-missing-else>
+
+This flag tells perltidy to output an empty else block wherever a program is
+missing a terminal B<else> block. To get a warning when this is done you
+should also set B<-wme>. The default is not to add missing else blocks.
+
+=item B<-amec=s>, B<--add-missing-else-comment=s>
+
+This string is a side comment which will be written to highlight a
+new empty else block. The default is:
+
+ -amec='##FIXME - added with perltidy -ame'
+
=back
+For example, on the above example we can add a missing else and also get
+a warning notice with:
+
+ # perltidy -ame -wme
+ my $type = get_cards();
+ if ( $type == 1 ) { action("hold 'em") }
+ elsif ( $type == 2 ) { action("fold 'em") }
+ elsif ( $type == 3 ) { action("walk away") }
+ else {
+ ##FIXME - added with perltidy -ame
+ }
+
+Any B<##FIXME> comments created in this way should be reviewed and changed
+appropriately. For example, one might decide that the code fine as is, and just
+change the comment to indicate that nothing has been overlooked:
+
+ my $type = get_cards();
+ if ( $type == 1 ) { action("hold 'em") }
+ elsif ( $type == 2 ) { action("fold 'em") }
+ elsif ( $type == 3 ) { action("walk away") }
+ else {
+ # ok - no worries
+ }
+
+Or maybe a deeper analysis reveals that something was missed:
+
+ my $type = get_cards();
+ if ( $type == 1 ) { action("hold 'em") }
+ elsif ( $type == 2 ) { action("fold 'em") }
+ elsif ( $type == 3 ) { action("walk away") }
+ else { action("run") }
+
+Sometimes it turns out that the else block should not reachable, in which case
+an error exit might be appropriate. In any case, having the B<else> block can
+improve code maintainability.
+
=head2 Retaining or Ignoring Existing Line Breaks
Several additional parameters are available for controlling the extent
For example, given this snippet:
return unless $cmd = $cmd || ($dot
- && $Last_Shell) || &prompt('|');
+ && $Last) || &prompt('|');
# perltidy -bol [default]
return
unless $cmd = $cmd
|| ( $dot
- && $Last_Shell )
+ && $Last )
|| &prompt('|');
# perltidy -nbol
- return unless $cmd = $cmd || ( $dot && $Last_Shell ) || &prompt('|');
+ return unless $cmd = $cmd || ( $dot && $Last ) || &prompt('|');
=item B<-bom>, B<--break-at-old-method-breakpoints>
considered. With B<-bom>, breaks before the arrow are preserved, so if you
have pre-formatted a method chain:
- my $q = $rs
- ->related_resultset('CDs')
- ->related_resultset('Tracks')
- ->search({
- 'track.id' => {-ident => 'none_search.id'},
- })->as_query;
+ # perltidy -bom
+ $Document
+ ->schild(0)
+ ->schildren();
-It will B<keep> these breaks, rather than become this:
+the flag B<-bom> will B<keep> these line breaks, rather than become this:
- my $q = $rs->related_resultset('CDs')->related_resultset('Tracks')->search({
- 'track.id' => {-ident => 'none_search.id'},
- })->as_query;
+ # perltidy [DEFAULT]
+ $Document->schild(0)->schildren();
This flag will also look for and keep a 'cuddled' style of calls,
in which lines begin with a closing paren followed by a call arrow,
as in this example:
- # perltidy -bom -wn
- my $q = $rs->related_resultset(
- 'CDs'
- )->related_resultset(
- 'Tracks'
- )->search( {
- 'track.id' => { -ident => 'none_search.id' },
- } )->as_query;
+ # perltidy -bom
+ my $q = $rs->related_resultset(
+ 'CDs'
+ )->related_resultset(
+ 'Tracks'
+ )->search(
+ {
+ 'track.id' => { -ident => 'none_search.id' },
+ }
+ )->as_query;
-You may want to include the B<-weld-nested-containers> flag in this case to keep
-nested braces and parens together, as in the last line.
=item B<-bos>, B<--break-at-old-semicolon-breakpoints>
$z = sqrt( $x**2 + $y**2 );
-The result using B<perltidy -bos> keeps the isolated semicolon:
+Using the B<-bos> flag keeps the isolated semicolon:
+ # perltidy -bos
$z = sqrt( $x**2 + $y**2 )
;
+The extra continuation indentation spaces on the semicolon can be
+removed by also setting B<--noindent-leading-semicolon>.
+
+ # perltidy -bos -nils
+ $z = sqrt( $x**2 + $y**2 )
+ ;
+
The default is not to do this, B<-nbos>.
=item B<-bok>, B<--break-at-old-keyword-breakpoints>
By default, perltidy will retain a breakpoint before keywords which may
-return lists, such as C<sort> and <map>. This allows chains of these
+return lists, such as C<sort> and C<map>. This allows chains of these
operators to be displayed one per line. Use B<-nbok> to prevent
retaining these breakpoints.
B<--weld-nested-exclusion-list> and B<--line-up-parentheses-exclusion-list>
parameters):
- 'k' matches if the previous nonblank token is a perl built-in keyword (such as 'if', 'while'),
- 'K' matches if 'k' does not, meaning that the previous token is not a keyword.
- 'f' matches if the previous token is a function other than a keyword.
+ 'k' matches if the previous nonblank token is a perl keyword
+ (such as 'if', 'while'),
+ 'K' matches if 'k' does not: previous token is not a keyword
+ 'f' matches if previous token is a function (not a keyword)
'F' matches if 'f' does not.
'w' matches if either 'k' or 'f' match.
'W' matches if 'w' does not.
The parameter B<-blbs=n> requests that least B<n> blank lines precede a sub
definition which does not follow a comment and which is more than one-line
-long. The default is <-blbs=1>. B<BEGIN> and B<END> blocks are included.
+long. The default is B<-blbs=1>. B<BEGIN> and B<END> blocks are included.
The requested number of blanks statement will be inserted regardless of the
value of B<--maximum-consecutive-blank-lines=n> (B<-mbl=n>) with the exception
=item B<-bbs>, B<--blanks-before-subs>
For compatibility with previous versions, B<-bbs> or B<--blanks-before-subs>
-is equivalent to F<-blbp=1> and F<-blbs=1>.
+is equivalent to B<-blbp=1> and B<-blbs=1>.
Likewise, B<-nbbs> or B<--noblanks-before-subs>
-is equivalent to F<-blbp=0> and F<-blbs=0>.
+is equivalent to B<-blbp=0> and B<-blbs=0>.
=item B<-bbb>, B<--blanks-before-blocks>
perltidy retains single blank lines, so the blank lines remain.
We can easily fix this by telling perltidy to ignore old blank lines by
-including the added parameter B<-kbl=0> and rerunning. Then the unwanted blank
-lines will be gone. However, this will cause all old blank lines to be
-ignored, perhaps even some that were added by hand to improve formatting. So
-please be cautious when using these parameters.
+including the added parameter B<--keep-old-blank-lines=0> and rerunning. Then
+the unwanted blank lines will be gone. However, this will cause all old blank
+lines to be ignored, perhaps even some that were added by hand to improve
+formatting. So please be cautious when using these parameters.
=item B<-mbl=n> B<--maximum-consecutive-blank-lines=n>
This parameter specifies the maximum number of consecutive blank lines which
will be output within code sections of a script. The default is n=1. If the
input file has more than n consecutive blank lines, the number will be reduced
-to n except as noted above for the B<-blbp> and B<-blbs> parameters. If B<n=0>
+to B<n> except as noted above for the B<--blank-lines-before-subs> and B<--blank-lines-before-subs> parameters. If B<n=0>
then no blank lines will be output (unless all old blank lines are retained
-with the B<-kbl=2> flag of the next section).
+with the B<--keep-old-blank-lines=2> flag of the next section).
This flag obviously does not apply to pod sections,
here-documents, and quotes.
The possible values of B<n> are:
n=0 ignore all old blank lines
- n=1 stable: keep old blanks, but limited by the value of the B<-mbl=n> flag
- n=2 keep all old blank lines, regardless of the value of the B<-mbl=n> flag
+ n=1 stable: keep old blanks, but limited by the B<-mbl=n> flag
+ n=2 keep all old blank lines, regardless of the B<-mbl=n> flag
The default is B<n=1>.
=item B<-sob>, B<--swallow-optional-blank-lines>
-This is equivalent to B<kbl=0> and is included for compatibility with
-previous versions.
+This is equivalent to B<--keep-old-blank-lines=0> and is included for compatibility with previous versions.
=item B<-nsob>, B<--noswallow-optional-blank-lines>
-This is equivalent to B<kbl=1> and is included for compatibility with
-previous versions.
+This is equivalent to B<--keep-old-blank-lines=1> and is included for
+compatibility with previous versions.
=back
Before describing the meaning of the parameters in detail let us look at an
example which is formatted with default parameter settings.
- print "Entering test 2\n";
- use Test;
- use Encode qw(from_to encode decode
- encode_utf8 decode_utf8
- find_encoding is_utf8);
- use charnames qw(greek);
- my @encodings = grep( /iso-?8859/, Encode::encodings() );
- my @character_set = ( '0' .. '9', 'A' .. 'Z', 'a' .. 'z' );
- my @source = qw(ascii iso8859-1 cp1250);
- my @destiny = qw(cp1047 cp37 posix-bc);
- my @ebcdic_sets = qw(cp1047 cp37 posix-bc);
- my $str = join( '', map( chr($_), 0x20 .. 0x7E ) );
- return unless ($str);
+ print "Entering test 2\n";
+ use Test;
+ use Encode qw(from_to encode decode
+ encode_utf8 decode_utf8
+ find_encoding is_utf8);
+ use charnames qw(greek);
+ my @encodings = grep( /iso-?8859/, Encode::encodings() );
+ my @character_set = ( '0' .. '9', 'A' .. 'Z', 'a' .. 'z' );
+ my @source = qw(ascii iso8859-1 cp1250);
+ my @destiny = qw(cp1047 cp37 posix-bc);
+ my @ebcdic_sets = qw(cp1047 cp37 posix-bc);
+ my $str = join( '', map( chr($_), 0x20 .. 0x7E ) );
+ return unless ($str);
using B<perltidy -kgb> gives:
- print "Entering test 2\n";
- <----------this blank controlled by -kgbb
- use Test;
- use Encode qw(from_to encode decode
- encode_utf8 decode_utf8
- find_encoding is_utf8);
- use charnames qw(greek);
- <---------this blank controlled by -kgbi
- my @encodings = grep( /iso-?8859/, Encode::encodings() );
- my @character_set = ( '0' .. '9', 'A' .. 'Z', 'a' .. 'z' );
- my @source = qw(ascii iso8859-1 cp1250);
- my @destiny = qw(cp1047 cp37 posix-bc);
- my @ebcdic_sets = qw(cp1047 cp37 posix-bc);
- my $str = join( '', map( chr($_), 0x20 .. 0x7E ) );
- <----------this blank controlled by -kgba
- return unless ($str);
+ print "Entering test 2\n";
+ <----------this blank controlled by -kgbb
+ use Test;
+ use Encode qw(from_to encode decode
+ encode_utf8 decode_utf8
+ find_encoding is_utf8);
+ use charnames qw(greek);
+ <---------this blank controlled by -kgbi
+ my @encodings = grep( /iso-?8859/, Encode::encodings() );
+ my @character_set = ( '0' .. '9', 'A' .. 'Z', 'a' .. 'z' );
+ my @source = qw(ascii iso8859-1 cp1250);
+ my @destiny = qw(cp1047 cp37 posix-bc);
+ my @ebcdic_sets = qw(cp1047 cp37 posix-bc);
+ my $str = join( '', map( chr($_), 0x20 .. 0x7E ) );
+ <----------this blank controlled by -kgba
+ return unless ($str);
Blank lines have been introduced around the B<my> and B<use> sequences. What
happened is that the default keyword list includes B<my> and B<use> but not
-lp -bl -noll -pt=2 -bt=2 -sbt=2 -icp
-To use this style with B<-xlp> instead of B<-lp> use B<-gnu -xlp>.
+To use this style with B<-xlp> (B<--extended-line-up-parentheses>) instead of B<-lp> (B<--line-up-parentheses>) use B<-gnu -xlp>.
=item B<-pbp>, B<--perl-best-practices>
B<-pbp> is an abbreviation for the parameters in the book B<Perl Best Practices>
by Damian Conway:
- -l=78 -i=4 -ci=4 -st -se -vt=2 -cti=0 -pt=1 -bt=1 -sbt=1 -bbt=1 -nsfs -nolq
+ -l=78 -i=4 -ci=4 -st -se -vt=2 -cti=0 -pt=1 -bt=1 -sbt=1 -bbt=1
+ -nsfs -nolq
-wbb="% + - * / x != == >= <= =~ !~ < > | & =
**= += *= &= <<= &&= -= /= |= >>= ||= //= .= %= ^= x="
-Please note that this parameter set includes -st and -se flags, which make
-perltidy act as a filter on one file only. These can be overridden by placing
-B<-nst> and/or B<-nse> after the -pbp parameter.
+Please note that this parameter set includes B<-st> (B<--standard-output>) and
+B<-se> (B<--standard-error-output>) flags, which make perltidy act as a filter
+on one file only. These can be overridden by placing B<-nst> and/or B<-nse>
+after the -pbp parameter.
-Also note that the value of continuation indentation, -ci=4, is equal to the
-value of the full indentation, -i=4. It is recommended that the either (1) the
-parameter B<-ci=2> be used instead, or (2) the flag B<-xci> be set. This will
-help show structure, particularly when there are ternary statements. The
+Also note that the value of continuation indentation, B<-ci=4>, is equal to the
+value of the full indentation, B<-i=4>. It is recommended that the either (1)
+the parameter B<-ci=2> be used instead, or (2) the flag B<-xci> be set. This
+will help show structure, particularly when there are ternary statements. The
following snippet illustrates these options.
# perltidy -pbp
)
. (
$page
- ? ( $section ? ' in ' : '' ) . "the $page$page_ext manpage"
+ ? ( $section ? ' in ' : '' ) . "the $page_ext manpage"
: ' elsewhere in this document'
);
)
. (
$page
- ? ( $section ? ' in ' : '' ) . "the $page$page_ext manpage"
+ ? ( $section ? ' in ' : '' ) . "the $page_ext manpage"
: ' elsewhere in this document'
);
: "the section on $section"
)
. ( $page
- ? ( $section ? ' in ' : '' ) . "the $page$page_ext manpage"
+ ? ( $section ? ' in ' : '' ) . "the $page_ext manpage"
: ' elsewhere in this document'
);
The B<-xci> flag was developed after the B<-pbp> parameters were published so you need
to include it separately.
+=item B<Making a file unreadable>
+
+The goal of perltidy is to improve the readability of files, but there
+are two commands which have the opposite effect, B<--mangle> and
+B<--extrude>. They are actually
+merely aliases for combinations of other parameters. Both of these
+strip all possible whitespace, but leave comments and pod documents,
+so that they are essentially reversible. The
+difference between these is that B<--mangle> puts the fewest possible
+line breaks in a script while B<--extrude> puts the maximum possible.
+Note that these options do not provided any meaningful obfuscation, because
+perltidy can be used to reformat the files. They were originally
+developed to help test the tokenization logic of perltidy, but they
+have other uses.
+One use for B<--mangle> is the following:
+
+ perltidy --mangle myfile.pl -st | perltidy -o myfile.pl.new
+
+This will form the maximum possible number of one-line blocks (see next
+section), and can sometimes help clean up a badly formatted script.
+
+A similar technique can be used with B<--extrude> instead of B<--mangle>
+to make the minimum number of one-line blocks.
+
+Another use for B<--mangle> is to combine it with B<--delete-all-comments (-dac)> to reduce
+the file size of a perl script.
+
=back
=head2 One-Line Blocks
B<--one-line-block-exclusion-list> described below.
When the B<cuddled-else> style is used, the default treatment of one-line blocks
-may interfere with the cuddled style. In this case, the default behavior may
-be changed with the flag B<--cuddled-break-option=n> described elsehwere.
+may interfere with the cuddled style. In this case, the default behavior may
+be changed with the flag B<--cuddled-break-option=n> described elsewhere.
When an existing one-line block is longer than the maximum line length, and
must therefore be broken into multiple lines, perltidy checks for and adds any
include any of the words C<sort>, C<map>, C<grep>, C<eval>, or it may be C<*>
to indicate all of these.
-So for example to prevent multi-line B<eval> blocks from becoming one-line
-blocks, the command would be B<-olbxl='eval'>. In this case, existing one-line B<eval> blocks will remain on one-line if possible, and existing multi-line
-B<eval> blocks will remain multi-line blocks.
+So for example to prevent multiline B<eval> blocks from becoming one-line
+blocks, the command would be B<-olbxl='eval'>. In this case, existing one-line B<eval> blocks will remain on one-line if possible, and existing multiline
+B<eval> blocks will remain multiline blocks.
=item B<-olbn=n>, B<--one-line-block-nesting=n>
By default, perltidy follows the input file regarding these semicolons,
but this behavior can be controlled by this flag. The values of n are:
- n=0 remove terminal semicolons in one-line blocks having a single statement
- n=1 stable; keep input file placement of terminal semicolons [DEFAULT ]
- n=2 add terminal semicolons in all one-line blocks
+ n=0 remove terminal semicolons in single-statement one-line blocks
+ n=1 stable; keep input terminal semicolons [DEFAULT ]
+ n=2 add terminal semicolons in all one-line blocks
Note that the B<n=2> option has no effect if adding semicolons is prohibited
with the B<-nasc> flag. Also not that while B<n=2> adds missing semicolons to
Sometimes it might be desirable to convert a script to have one-line blocks
whenever possible. Although there is currently no flag for this, a simple
-workaround is to execute perltidy twice, once with the flag B<-noadd-newlines>
+workaround is to execute perltidy twice, once with the flag B<--noadd-newlines>
and then once again with normal parameters, like this:
cat infile | perltidy -nanl | perltidy >outfile
which were broken will remain broken (with the exceptions noted above).
Another trick for doing this for certain block types is to format one time with
-the B<-cuddled-else> flag and B<--cuddled-break-option=2>. Then format again
+the B<--cuddled-else> flag and B<--cuddled-break-option=2>. Then format again
with the normal parameters. This will break any one-line blocks which are
involved in a cuddled-else style.
These alignment types correspond to perl symbols, operators and keywords except
for 'q', which refers to the special case of alignment in a 'use' statement of
-qw quotes and empty parens.
+qw quotes and empty parens.
They are all enabled by default, but they can be selectively disabled by including one or more of these tokens in the space-separated list B<valign-exclusion-list=s>.
For example, the following would prevent alignment at B<=> and B<if>:
In that case you may leave the B<valign-exclusion-list> undefined, or use the special symbol B<*> for the exclusion list.
For example, the following parameters enable alignment only at commas and 'fat commas':
- --valign-inclusion-list=', =>'
- --valign-exclusion-list='*' ( this is optional and may be omitted )
+ --valign-inclusion-list=', =>'
+ --valign-exclusion-list='*' ( this is optional and may be omitted )
These parameter lists should consist of space-separated tokens from the above
list of possible alignment tokens, or a '*'. If an unrecognized token
To illustrate, consider the following snippet with default formatting
- # perltidy
- $co_description = ($color) ? 'bold cyan' : ''; # description
- $co_prompt = ($color) ? 'bold green' : ''; # prompt
- $co_unused = ($color) ? 'on_green' : 'reverse'; # unused
+ # perltidy
+ $co_description = ($color) ? 'bold cyan' : ''; # descr
+ $co_prompt = ($color) ? 'bold green' : ''; # prompt
+ $co_unused = ($color) ? 'on_green' : 'reverse'; # unused
To exclude all alignments except the equals (i.e., include only equals) we could use:
- # perltidy -vil='='
- $co_description = ($color) ? 'bold cyan' : ''; # description
- $co_prompt = ($color) ? 'bold green' : ''; # prompt
- $co_unused = ($color) ? 'on_green' : 'reverse'; # unused
+ # perltidy -vil='='
+ $co_description = ($color) ? 'bold cyan' : ''; # descr
+ $co_prompt = ($color) ? 'bold green' : ''; # prompt
+ $co_unused = ($color) ? 'on_green' : 'reverse'; # unused
To exclude only the equals we could use:
- # perltidy -vxl='='
- $co_description = ($color) ? 'bold cyan' : ''; # description
- $co_prompt = ($color) ? 'bold green' : ''; # prompt
- $co_unused = ($color) ? 'on_green' : 'reverse'; # unused
+ # perltidy -vxl='='
+ $co_description = ($color) ? 'bold cyan' : ''; # descr
+ $co_prompt = ($color) ? 'bold green' : ''; # prompt
+ $co_unused = ($color) ? 'on_green' : 'reverse'; # unused
Notice in this last example that although only the equals alignment was
excluded, the ternary alignments were also lost. This happens because the
alignment cannot be made for some reason.
But also notice that side comments remain aligned because their alignment is
-controlled separately with the parameter B<--valign-side_comments> described above.
+controlled separately with the parameter B<--valign-side-comments> described above.
-=back
+=item B<Aligning postfix unless and if with --valign-if-unless or -viu>
-=head2 Extended Syntax
+By default, postfix B<if> terms align and postfix B<unless> terms align,
+but separately. For example,
-This section describes some parameters for dealing with extended syntax.
+ # perltidy [DEFAULT]
+ print "Tried to add: @Resolve\n" if ( @Resolve and !$Quiet );
+ print "Would need: @DepList\n" if ( @DepList and !$Quiet );
+ print "Output:\n" unless $Quiet;
+ print join( "\n", @Output ) . "\n" unless $Quiet;
-For another method of handling extended syntax see the section L<"Skipping Selected Sections of Code">.
+The B<-viu> flag causes a postfix B<unless> to be treated as if it were a
+postfix B<if> for purposes of alignment, and thus they align:
-Also note that the module F<Perl::Tidy> supplies a pre-filter and post-filter capability. This requires calling the module from a separate program rather than through the binary F<perltidy>.
+ # perltidy -viu
+ print "Tried to add: @Resolve\n" if ( @Resolve and !$Quiet );
+ print "Would need: @DepList\n" if ( @DepList and !$Quiet );
+ print "Output:\n" unless $Quiet;
+ print join( "\n", @Output ) . "\n" unless $Quiet;
-=over 4
+=item B<Aligning signed numbers with --valign-signed-numbers or -vsn>
-=item B<-xs>, B<--extended-syntax>
+Setting B<-vsn> causes columns of numbers containing both signed and unsigned
+values to have leading signs placed in their own column. For example:
-This flag allows perltidy to handle certain common extensions
-to the standard syntax without complaint.
+ # perltidy -vsn
+ my @correct = (
+ [ 123456.79, 86753090000.868, 11 ],
+ [ -123456.79, -86753090000.868, -11 ],
+ [ 123456.001, 80.080, 10 ],
+ [ -123456.001, -80.080, 0 ],
+ [ 10.9, 10.9, 11 ],
+ );
-For example, without this flag a structure such as the following would generate
-a syntax error:
+The default is B<-vsn>. This can be turned off to get is strict left
+justification:
+
+ # perltidy -nvsn
+ my @correct = (
+ [ 123456.79, 86753090000.868, 11 ],
+ [ -123456.79, -86753090000.868, -11 ],
+ [ 123456.001, 80.080, 10 ],
+ [ -123456.001, -80.080, 0 ],
+ [ 10.9, 10.9, 11 ],
+ );
+
+Some points regarding B<-vsn> are:
+
+=over 4
+
+=item *
+
+This option works by inserting a single space ahead of unsigned numbers
+when possible. This is not done if it would require increasing the
+maximum width of a column.
+
+=item *
+
+This option is mainly limited to lists of comma-separated numbers. For
+multiline lists of numbers, having trailing commas can sometimes improve the
+results. If missing, perltidy can add them for example
+with parameters B<-wtc=b -atc>. See L<"Adding and Deleting Commas">.
+
+=item *
+
+This option has a control parameter B<--valign-signed-number-limit=N>, or
+B<-vsnl=N>. This value controls formatting of very long columns of numbers and
+should not normally need to be changed. To see its purpose, consider a very
+long column of just unsigned numbers, say 1000 lines. If we add a single
+negative number, it is undesirable to move all of the other numbers over by one
+space. This could create many lines of file differences but not really improve
+the appearance when a local section of the table was viewed. The number B<N>
+avoids this problem by not adding extra indentation to a run of more than B<N>
+lines of unsigned numbers. The default value, B<N=20>, is set to be a number
+of lines for which the ends of a long column of unsigned numbers are not
+normally both in view.
+
+=back
+
+=item B<Aligning assignment operators with --valign-wide-equals or -vwe>
+
+The following assignment operators are aligned independently by default:
+
+ = **= += *= &= <<= &&= -= /= |= >>= ||= //= .= %= ^= x=
+
+Setting B<--valign-wide-equals>, or B<-vwe>, causes them to be vertically
+aligned together with the trailing B<=> all aligned. For example, here
+is the default formatting for some code with several of these operators:
+
+ $str .= SPACE x $total_pad_count;
+ $str_len += $total_pad_count;
+ $total_pad_count = 0;
+ $str .= $rfields->[$j];
+ $str_len += $rfield_lengths->[$j];
+
+And here is the same code formatted with B<-vme>:
+
+ # perltidy -vme
+ $str .= SPACE x $total_pad_count;
+ $str_len += $total_pad_count;
+ $total_pad_count = 0;
+ $str .= $rfields->[$j];
+ $str_len += $rfield_lengths->[$j];
+
+This option was added for issue git #135 and can improve readability, but it is
+off by default to avoid changing existing formatting.
+
+=back
+
+=head2 Extended Syntax
+
+This section describes some parameters for dealing with extended syntax.
+
+For another method of handling extended syntax see the section L<"Skipping Selected Sections of Code">.
+
+Also note that the module F<Perl::Tidy> supplies a pre-filter and post-filter capability. This requires calling the module from a separate program rather than through the binary F<perltidy>.
+
+=over 4
+
+=item B<-xs>, B<--extended-syntax>
+
+This flag allows perltidy to handle certain common extensions
+to the standard syntax without complaint.
+
+For example, without this flag a structure such as the following would generate
+a syntax error:
Method deposit( Num $amount) {
$self->balance( $self->balance + $amount );
=item B<-uf=s>, B<--use-feature=s>
-This flag tells perltidy to allow the syntax associated a pragma in string
-B<s>. Currently only the recognized values for the string are B<s='class'> or
-string B<s=' '>. The default is B<--use-feature='class'>. This enables
-perltidy to recognized the special words B<class>, B<method>, B<field>, and
-B<ADJUST>. If this causes a conflict with other uses of these words, the
-default can be turned off with B<--use-feature=' '>.
+This flag tells perltidy to allow or disallow the syntax associated a pragma in
+string B<s>. The current possible settings are:
+
+=over 4
+
+=item *
+
+B<--use-feature='class'>. This tells perltidy to recognized the special words
+B<class>, B<method>, B<field>, and B<ADJUST> as defined for this feature.
+
+=item *
+
+B<--use-feature='noclass'>. This tells perltidy B<not> to treat words B<class>, B<method>, B<field>, B<ADJUST> specially.
+
+=item *
+
+B<Neither of these> (B<--use-feature> not defined). This is the DEFAULT and
+recommended setting. In this case perltidy will try to automatically handle
+both the newer --use-feature 'class' syntax as well as some conflicting
+uses of some of these special words by existing modules.
+
+=back
+
+Note that this parameter is independent of any B<use feature> control lines
+within a script. Perltidy does not look for or read such control lines. This
+is because perltidy must be able to work on small chunks of code sent from an
+editor, so it cannot assume that such lines will be within the lines being
+formatted.
+
+=item B<Working around problems with older version of Perl>
+
+Perltidy contains a number of rules which help avoid known subtleties
+and problems with older versions of perl, and these rules always
+take priority over whatever formatting flags have been set. For example,
+perltidy will usually avoid starting a new line with a bareword, because
+this might cause problems if C<use strict> is active.
+
+There is no way to override these rules.
=back
-=head2 Other Controls
+=head2 Deleting and Extracting Pod or Comments
=over 4
The negatives of these commands also work, and are the defaults.
+=back
+
+=head2 The perltidyrc file
+
+=over 4
+
=item B<Using a F<.perltidyrc> command file>
If you use perltidy frequently, you probably won't be happy until you
Another option is the use of the PERLTIDY environment variable.
The method for setting environment variables depends upon the version of
-Windows that you are using. Instructions for Windows 95 and later versions can
-be found here:
-
-http://www.netmanage.com/000/20021101_005_tcm21-6336.pdf
+Windows that you are using.
Under Windows NT / 2000 / XP the PERLTIDY environment variable can be placed in
either the user section or the system section. The later makes the
# This implements a highly spaced style
-se # errors to standard error output
-w # show all warnings
- -bl # braces on new lines
+ -bl # braces on new lines
-pt=0 # parens not tight at all
-bt=0 # braces not tight
-sbt=0 # square brackets not tight
parameters given on the command line will have priority over them.
To avoid confusion, perltidy ignores any command in the .perltidyrc
-file which would cause some kind of dump and an exit. These are:
+file which would cause some kind of dump and an exit. These include:
-h -v -ddf -dln -dop -dsn -dtt -dwls -dwrs -ss
=item *
The parameters in the F<.perltidyrc> file can be switched off with
-the B<-npro> option.
+the B<-npro> option on the command line.
+
+=item *
+
+Any parameter in the F<.perltidyrc> file can be overridden with a replacement
+value on the command line. This is because the command line is processed
+after the F<.perltidyrc> file.
=item *
=back
+=item B<Skipping a line with an unknown parameter>
+
+Versions of perltidy greater than 20240511 have an option to filter
+unrecognized parameters from a perltidy command file. If a line in the file
+begins with B<three dashes> followed by a parameter name (rather than one or
+two), then the line will be removed if the parameter is unknown. Otherwise, a
+dash will be removed to make the line valid. This option was added to allow a
+single command file to be used during the transition to a new version of
+perltidy.
+
=item B<Creating a new abbreviation>
A special notation is available for use in a F<.perltidyrc> file
used. The notation is to group the options within curly braces which
are preceded by the name of the alias (without leading dashes), like this:
- newword {
- -opt1
- -opt2
- }
+ newword {
+ -opt1
+ -opt2
+ }
where B<newword> is the abbreviation, and B<opt1>, etc, are existing parameters
I<or other abbreviations>. The main syntax requirement is that the new
For a specific example, the following line
- oneliner { --maximum-line-length=0 --noadd-newlines --noadd-terminal-newline}
+ oneliner { --maximum-line-length=0 --noadd-newlines --noadd-terminal-newline}
or equivalently with abbreviations
- oneliner { -l=0 -nanl -natnl }
+ oneliner { -l=0 -nanl -natnl }
could be placed in a F<.perltidyrc> file to temporarily override the maximum
line length with a large value, to temporarily prevent new line breaks from
file. All other settings in the F<.perltidyrc> file still apply. Thus it
provides a way to format a long 'one liner' when perltidy is invoked with
- perltidy --oneliner ...
+ perltidy --oneliner ...
(Either C<-oneliner> or C<--oneliner> may be used).
the actual first hash-bang is in a here-doc. In that case a parsing error will
occur because the tokenization will begin in the middle of the here-doc.
-=item B<Making a file unreadable>
-
-The goal of perltidy is to improve the readability of files, but there
-are two commands which have the opposite effect, B<--mangle> and
-B<--extrude>. They are actually
-merely aliases for combinations of other parameters. Both of these
-strip all possible whitespace, but leave comments and pod documents,
-so that they are essentially reversible. The
-difference between these is that B<--mangle> puts the fewest possible
-line breaks in a script while B<--extrude> puts the maximum possible.
-Note that these options do not provided any meaningful obfuscation, because
-perltidy can be used to reformat the files. They were originally
-developed to help test the tokenization logic of perltidy, but they
-have other uses.
-One use for B<--mangle> is the following:
-
- perltidy --mangle myfile.pl -st | perltidy -o myfile.pl.new
-
-This will form the maximum possible number of one-line blocks (see next
-section), and can sometimes help clean up a badly formatted script.
+=back
-A similar technique can be used with B<--extrude> instead of B<--mangle>
-to make the minimum number of one-line blocks.
+=head2 Debugging perltidy input
-Another use for B<--mangle> is to combine it with B<-dac> to reduce
-the file size of a perl script.
+=over 4
-=item B<Debugging>
+=item The B<--dump-...> parameters
-The following flags are available for debugging:
+The following flags are available for debugging. Note that all commands
+named B<--dump-...> will simply write some requested information to standard
+output and then immediately exit.
B<--dump-cuddled-block-list> or B<-dcbl> will dump to standard output the
-internal hash of cuddled block types created by a B<-cuddled-block-list> input
+internal hash of cuddled block types created by a B<--cuddled-block-list> input
string.
B<--dump-defaults> or B<-ddf> will write the default option set to standard output and quit
+B<--dump-integer-option-range> or B<-dior> will write a list of comma-separated values. Each line contains the name of an integer option along with its minimum, maximum, and default values.
+
B<--dump-profile> or B<-dpro> will write the name of the current
configuration file and its contents to standard output and quit.
B<--dump-token-types> or B<-dtt> will write a list of all token types
to standard output and quit.
-B<--dump-want-left-space> or B<-dwls> will write the hash %want_left_space
+B<--dump-want-left-space> or B<-dwls> will write the hash C<%want_left_space>
to standard output and quit. See the section on controlling whitespace
around tokens.
-B<--dump-want-right-space> or B<-dwrs> will write the hash %want_right_space
+B<--dump-want-right-space> or B<-dwrs> will write the hash C<%want_right_space>
to standard output and quit. See the section on controlling whitespace
around tokens.
+See L<"Analyzing Code"> for additional B<--dump-> parameters.
+
+=item B<Other parameters related to processing>
+
B<--no-memoize> or B<-nmem> will turn of memoizing.
Memoization can reduce run time when running perltidy repeatedly in a
single process. It is on by default but can be deactivated for
to allow timestamps (B<--timestamp> or B<-ts>).
B<--file-size-order> or B<-fso> will cause files to be processed in order of
-increasing size, when multiple files are being processed. This is useful
-during program development, when large numbers of files with varying sizes are
-processed, because it can reduce virtual memory usage.
+increasing size, when multiple files are being processed. This is particularly
+useful during program development, when large numbers of files with varying
+sizes are processed, because it can reduce virtual memory usage. This is
+the default and can be deactivated with B<-nfso>.
B<--maximum-file-size-mb=n> or B<-maxfs=n> specifies the maximum file size in
megabytes that perltidy will attempt to format. This parameter is provided to
perltidy -maxfs=20
-This only applies to files specified by filename on the command line.
+This length test is applied to named files before they are read into memory.
+It is applied to files arriving from standard input after they are read into
+memory. It is not applied to character strings arriving by a call to the
+Perl::Tidy module.
+
+=item B<Controls for when to stop processing>
B<--maximum-level-errors=n> or B<-maxle=n> specifies the maximum number of
indentation level errors are allowed before perltidy skips formatting and just
A recommended value is B<n=3>. However, the default is B<n=0> (skip this check)
to avoid causing problems with scripts which have extended syntaxes.
-B<-DEBUG> will write a file with extension F<.DEBUG> for each input file
-showing the tokenization of all lines of code.
+=item B<Handling errors in options which take integer values>
+
+Many of the input parameters take integer values. Before processing
+begins, a check is made to see if any of these integer parameters exceed
+their valid ranges. The default behavior when a range is exceeded is to
+write a warning message and reset the value to its default setting. This
+default behavior can be changed with the parameter
+B<--integer-range-check=n>, or B<-irc=n>, as follows:
+
+ n=0 skip check completely (for stress-testing perltidy only)
+ n=1 reset bad values to defaults but do not issue a warning
+ n=2 reset bad values to defaults and issue warning [DEFAULT]
+ n=3 stop if any values are out of bounds
+
+The values B<n=0> and B<n=1> are mainly useful for testing purposes.
-=item B<Making a table of information on code blocks>
+=item B<Debugging perltidy tokenization>
+
+B<-DEBUG, -D> will write a file with extension F<.DEBUG> for each input file
+showing the tokenization of all lines of code. This can produce a lot of
+output and is mainly useful for debugging tokenization issues during
+perltidy development.
+
+=back
+
+=head2 Analyzing Code
+
+Perltidy reports any obvious issues that are found during formatting, such as
+unbalanced braces. But several parameters are available for making certain
+additional checks for issues which might be of interest to a programmer. These
+parameters fall into two categories as indicated by their prefix, B<--dump-> or
+B<--warn->:
+
+=over 4
+
+=item *
+
+The B<--dump-> parameters read a file, write information to the standard output,
+and then exit without doing any formatting.
+
+=item *
+
+The B<--warn-> parameters, on the other hand, cause perltidy to function
+normally but issue warnings to the error output when certain conditions are
+encountered.
+
+=back
+
+Some of these have associated control parameters.
+
+=over 4
+
+=item B<Use --dump-block-summary to make a table of information on code blocks>
A table listing information about the blocks of code in a file can be made with
B<--dump-block-summary>, or B<-dbs>. This causes perltidy to read and parse
produces an output file F<blocks.csv> whose lines hold these
parameters:
- filename - the name of the file
- line - the line number of the opening brace of this block
- line_count - the number of lines between opening and closing braces
- code_lines - the number of lines excluding blanks, comments, and pod
- type - the block type (sub, for, foreach, ...)
- name - the block name if applicable (sub name, label, asub name)
- depth - the nesting depth of the opening block brace
- max_change - the change in depth to the most deeply nested code block
- block_count - the total number of code blocks nested in this block
- mccabe_count - the McCabe complexity measure of this code block
+ filename - name of the file
+ line - line number of the opening brace of this block
+ line_count - number of lines between opening and closing braces
+ code_lines - number of lines excluding blanks, comments, and pod
+ type - block type (sub, for, foreach, ...)
+ name - block name if applicable (sub or asub name, label..)
+ depth - nesting depth of the opening block brace
+ max_change - change in depth to the most deeply nested code block
+ block_count - total number of code blocks nested in this block
+ mccabe_count - McCabe complexity measure of this code block
This feature was developed to help identify complex sections of code as an aid
in refactoring. The McCabe complexity measure follows the definition used by
otherwise indicate which blocks are nested in other blocks, this can be
determined by computing and comparing the block ending line numbers.
+For subroutines, the number of call arguments (args) is listed in parentheses
+in the C<type> column. For example, C<sub(9)> indicates a sub with 9 args.
+Subroutines whose arg count cannot easily be determined are indicated
+as C<sub(*)>. The count includes any leading object passed in a method
+call (such as B<$self>).
+
By default the table lists subroutines with more than 20 C<code_lines>, but
this can be changed with the following two parameters:
B<--dump-block-minimum-lines=n>, or B<-dbl=n>, where B<n> is the minimum
number of C<code_lines> to be included. The default is B<-n=20>. Note that
-C<code_lines> is the number of lines excluding and comments, blanks and pod.
+C<code_lines> is the number of lines excluding comments, blanks and pod.
B<--dump-block-types=s>, or B<-dbt=s>, where string B<s> is a list of block
types to be included. The type of a block is either the name of the perl
a few symbols for special block types, as follows:
if elsif else for foreach ... any keyword introducing a block
- sub - any sub or anynomous sub
+ sub - any sub or anonymous sub
asub - any anonymous sub
* - any block except nameless blocks
+ - any nested inner block loop
package - any package or class
closure - any nameless block
+ elsif3 - an if-elsif- chain with 3 or more elsif's (see below)
+
+A chain of B<if-elsif-...> blocks may be reported as a single line item by entering the word B<elsif> with an appended integer, as indicated by the last item in
+this list. The integer indicates the number of B<elsif> blocks required for
+a chain to be reported. If you use this, you may want to also use B<-dbl=n>,
+with a smaller number of lines B<n> than the default.
In addition, specific block loop types which are nested in other loops can be
selected by adding a B<+> after the block name. (Nested loops are sometimes
=over 4
=item *
+
This selects both C<subs> and C<packages> which have 20 or more lines of code.
This can be useful in code which contains multiple packages.
perltidy -dbs -dbt='sub package' somefile.pl >blocks.csv
=item *
+
This selects block types C<sub for foreach while> with 10 or more code lines.
perltidy -dbs -dbl=10 -dbt='sub for foreach while' somefile.pl >blocks.csv
=item *
+
This selects blocks with 2 or more code lines which are type C<sub> or which
are inner loops.
perltidy -dbs -dbl=2 -dbt='sub +' somefile.pl >blocks.csv
=item *
+
This selects every block and package.
perltidy -dbs -dbl=1 -dbt='* closure' somefile.pl >blocks.csv
+=item *
+
+This selects every if-chain which contains 2 or more C<elsif> blocks:
+
+ perltidy -dbs -dbl=1 -dbt='elsif2' somefile.pl >blocks.csv
+
+=item *
+
+This selects every C<while> block with 6 or more code lines
+
+ perltidy -dbs -dbt=while -dbl=6 somfile.pl >while.csv
+
+=back
+
+=item B<Use --dump-unusual-variables to find unused, reused, and certain other variables of interest>
+
+Variables with certain properties of interest to a programmer can be
+listed with B<--dump-unusual-variables> or B<-duv>. This parameter must be on
+the command line, along with a single file name. It causes perltidy to scan
+the file for certain variable types, write any found to the standard output,
+and then immediately exit without doing any formatting. For example
+
+ perltidy -duv somefile.pl >vars.txt
+
+produces a file with lines which look something like
+
+ 1778:u: my $input_file
+ 6089:r: my $j: reused - see line 6076
+
+The values on the line are separated by colons and have the following
+meaning:
+
+ line number - the number of the line of the input file
+ issue - a single letter indicating the issue, see below
+ variable name - the name of the variable, preceded by a keyword
+ note - an optional note referring to another line
+
+If there are a large number of issues it can be convenient to read the file
+into a spreadsheet.
+
+The checks are made for variables introduced by the keywords B<my>, B<state>,
+and B<our>, along with variables defined with B<use vars> and B<use constant>.
+It is a good idea to also set B<use strict> in a script so that Perl itself
+can find issues with variables which appear in a script without one of these
+methods.
+
+The types of checks which are made are identified in the output with one of the
+letters B<r>, B<s>, B<p>, B<u>, and B<c> as follows:
+
+=over 4
+
+=item B<r: reused variable name>
+
+These are variables which are re-declared in the scope of a variable with the
+identical name. This can be confusing, perhaps not when the code is first
+written, but possibly later during maintenance work. For example, this
+can make it difficult to locate the correct variable with an editor when
+changes are being made. This issue can be avoided by renaming one of the
+conflicting variables. Note that this is
+similar to the B<Perl::Critic> policy B<Variables::ProhibitReusedNames>.
+
+=item B<s: sigil change but reused bareword>
+
+These are variables which have the same bareword name but a different sigil
+(B<$>, B<@>, or B<%>) as another variable in the same scope. For example, this
+occurs if variables B<$data> and B<%data> share the same scope. This can also
+be confusing for the reasons mentioned above and can be avoided by renaming
+one of the variables.
+
+=item B<p: package-crossing variables>
+
+These are lexical variables which are declared in one package and still visible
+in subroutines of a different package in the same file. This can be confusing,
+and it might cause the program to run differently, or fail, if the the packages
+were ever split into separate files. This issue can usually be avoided by
+placing code in block braces of some type. For example, this issue is often
+found in test code and can sometimes be fixed by using the structure
+
+ main();
+
+ sub main { #<<<
+ ## old main code goes here
+ }
+
+The B<non-indenting-braces> side comment C<#<<<> is not required but will keep
+the indentation of the old code unchanged.
+
+This check is only applied to package statements which are not enclosed in
+block braces in order avoid warnings at temporary package changes.
+
+=item B<u: unused variables>
+
+These are lexical variables declared with C<my> or C<state> (but not C<our>)
+and not referenced again within their scope. Calling them B<unused> is
+convenient but not really accurate; this is a "gray area" for a program. There
+are some good reasons for having such variables. For example, they might occur
+in a list of values provided by another routine or data structure, and
+therefore must be listed, even though they might not be referenced again.
+Having such variables can make them immediately available for future
+development and debugging, and can be beneficial for program clarity.
+
+B<But> sometimes they can occur due to being orphaned by a coding
+change, due to a misspelling, or by having an unintentional preceding
+C<my>. So it is worth reviewing them, especially for new code. Here is
+an example of an unused variable in a script located with this method:
+
+ BEGIN { my $string = "" }
+ ...
+ $string .= "ok";
+
+This looks nice at first glance, but the scope of the C<my> declaration is
+limited to the surrounding braces, so it is not the same variable as the other
+C<$string> and must therefore be reported as unused. This particular problem
+would have also been caught by perl if the author had used C<strict>.
+
+=item B<c: unused constants>
+
+These are names which are declared with a C<use constant> and a reference was
+not seen again within their package. They might be needed by an external
+package, or a set of standard definitions, or available for future development.
+And in some unusual cases a reference may have been missed by perltidy. But
+they might also be unused remnants from code development, or due to a
+misspelling, so it can be worthwhile reviewing them.
+
+=back
+
+B<Exception>: The following B<our> variables are exempt from warnings:
+B<$VERSION>, B<@EXPORT>, B<@EXPORT_OK>, B<%EXPORT_TAGS>, B<@ISA, $AUTOLOAD>.
+
+=item B<Use --warn-variable-types to warn about certain variable types>
+
+The flag B<--warn-variable-types=string>, or B<-wvt=string>, is the B<--warn>
+counterpart to B<--dump-unusual-variables>, and can be used to
+produce a warning message if certain of the above variable types are
+encountered during formatting. All possible variable warnings may be
+requested with B<-wvt='*'> or B<-wvt=1>.
+
+For example,
+
+ perltidy -wvt='*' somefile.pl
+
+The default is not to do any of these checks, and it can also be indicated with
+B<-wvt=0>.
+
+To restrict the check to a specific set warnings, set the input B<string> to be
+a space-separated or comma-separated list of the letters associated with the
+types of variables to be checked. For example:
+
+ perltidy -wvt='s r' somefile.pl
+
+will process F<somefile.pl> normally but issue a warning if either of
+the issues B<s> or B<r>, described above, are encountered.
+
+A companion flag, B<--warn-variable-exclusion-list=string>, or B<-wvxl=string>,
+can be used to skip warning checks for a list of variable names. A leading
+and/or trailing '*' may be placed on any of these variable names to allow a
+partial match.
+
+For example,
+
+ perltidy -wvt=1 -wvxl='$self $class *_unused' somefile.pl
+
+will do all possible checks but not report any warnings for variables C<$self>,
+C<$class>, and for example C<$value_unused>.
+
+This partial match option provides a way to trigger a warning message when a
+new unused variable is detected in a script. This can be accomplished by
+adding a unique suffix to the names of existing unused variables, such as
+C<_unused>. This suffix is then added to the exclusion list.
+
+As a specific example, consider the following line which is part of some debug
+code which only references the latter three variables (but might someday need
+to reference the package variable too).
+
+ my ( $package_uu, $filename, $line, $subroutine ) = caller();
+
+The unused variable, C<$package_uu>, has been specially marked with suffix
+C<_uu>. No type B<u> (unused variable) warning will be produced provided that
+this wildcard suffix is in the exclusion list:
+
+ -wvxl='*_uu'
+
+=item B<Use --dump-unique-keys> to help locate misspelled hash keys
+
+The parameter B<--dump-unique-keys>, or B<-duk>, dumps a list of hash keys
+which appear to be used just once, and do not appear among the quoted
+strings in a file. For example:
+
+ perltidy -duk File.pm >output.txt
+
+The lines in the output file list each unique key and its line number.
+Typically, most of the listed keys listed will be perfectly valid keys needed,
+for example, for communication with other modules or for future development.
+But the list might also include something unexpected, such as a misspelled key.
+
+A program C<dump_unique_keys.pl> at
+L<https://github.com/perltidy/perltidy/tree/master/examples> can run perltidy
+with B<-duk> on multiple files, and then remove any common keys from the list.
+
+=item B<Use --dump-mixed-call-parens to find functions called both with and without parens>
+
+The parameter B<--dump-mixed-call-parens>, or B<-dmcp>, provides information on
+the use of call parens within a program. It produces a list of keywords and sub
+names which occur both both with and without parens. In other words, with
+a mixed style. This might be useful if one is working to standardize the
+call style for some particular keyword or function. For example,
+
+ perltidy -dmcp somefile.pl >output.txt
+
+will analyze the text of F<somefile.pl>, write the results to F<output.txt>,
+and then immediately exit (like all B<dump-> parameters).
+
+The output shows a list of operators and the number of times they were
+used with parens and the number of times without parens. For example, here
+is a small section of the output from one file in a past Perl distribution:
+
+ k:length:17:9
+ k:open:30:9
+ k:pop:3:4
+
+The first line shows that the C<length> function occurs 17 times with parens
+and 9 times without parens. The 'k' indicates that C<length> is a Perl builtin
+keyword ('U' would mean user-defined sub, and 'w' would mean unknown bareword).
+So from this partial output we see that the author had a preference for parens
+around the args of C<length> and C<open>, whereas C<pop> was about equally
+likely to have parens as not.
+
+More detailed information can be obtained with the parameters described in the
+next section.
+
+=item B<Use --want-call-parens=s and --nowant-call-parens=s to warn about specific missing or extra call parens>
+
+The parameter B<--want-call-parens=s>, or B<-wcp=s>, can be used to to produce
+a warning message if call parens are missing from selected functions.
+Likewise, B<--nowant-call-parens=s>, or B<-nwcp=s>, can warn if call parens
+exist for selected functions. When either of these parameters are set,
+perltidy will report any discrepancies from the requested style in its error
+output.
+
+Before using either of these parameters, it may be helpful to first use
+B<--dump-mixed-call-parens=s>, described in the previous section, to get an
+overview of the existing paren usage in a file.
+
+The string arguments B<s> are space-separated lists of the names of the
+functions to be checked. The function names may be builtin keywords or
+user-defined subs. They may not include a package prefix or sigil.
+To illustrate,
+
+ perltidy -wcp='length open' -nwcp='pop' somefile.pl
+
+means that the builtin functions C<length> and C<open> should have parens
+around their call args but C<pop> should not. The error output might contain
+lines such as:
+
+ 2314:open FD_TO_CLOSE: no call parens
+ 3652:pop (: has call parens
+ 3783:length $DB: no call parens
+ ...
+
+For builtin keywords which have both a block form and a trailing modifier form,
+such as C<if>, only the trailing modifier form will be checked since
+parens are mandatory for the block form.
+
+The symbol B<&> may entered instead of a function name to mean all user-defined
+subs not explicitly listed. So the compact expression
+
+ perltidy -wcp='&' somefile.pl
+
+means that calls to all user-defined subs in the file being processed
+should have their call arguments enclosed in parens.
+
+Perltidy does not have the ability to add or delete call parens because it is
+difficult to automate, so changes must be made manually. When adding or
+removing parentheses, it is essential to pay attention to operator precedence
+issues. For example, if the parens in the following statement are
+removed, then C<||> must be changed to C<or>:
+
+ open( IN, "<", $infile ) || die("cannot open $infile:$!\n");
+
+Otherwise, the C<||> will operate on C<$infile> rather than the return value of
+C<open>.
+
+=item B<Use --dump-mismatched-args to find function calls with args not matching sub declarations>
+
+The parameter B<--dump-mismatched-args>, or B<-dma>, causes perltidy to
+examine the definitions of subroutines in a file, and calls to those subs,
+and report certain differences. Like all B<--dump> commands, it
+writes its report to standard output and exits immediately. For example
+
+ perltidy -dma somefile.pl >results.txt
+
+Four types of issues are reported, types B<a>, B<o>, B<u>, and B<i>:
+
+=over 4
+
+=item B<a:> calls made to a sub both with and without the B<arrow> operator
+
+For example the following two lines would be reported as a mismatch:
+
+ Fault();
+
+and
+
+ $self->Fault();
+
+This may or may not be an error, but it is worth checking. It might become an
+error in the future if sub C<Fault> starts to access C<$self>.
+
+=item B<o:> (B<overcount>): the number of call args exceeds the expected number.
+
+=item B<u:> (B<undercount>): the number of call args is less than the expected number.
+
+For example
+
+ sub gnab_gib {
+ my $self=shift;
+ my ($v1,$v2)=@_;
+ ...
+ }
+
+ $self->gnab_gib(42);
+
+In this case, the sub is expecting a total of three args (C<$self>, C<$v1>, and
+C<$v2>) but only receives two (C<$self> and C<42>), so an undercount is
+reported. This is not necessarily an error because the sub may allow for this
+possibility, but it is worth checking.
+
+Although it is not possible to automatically determine which sub args are
+optional, if optional sub args are enclosed in an extra set of parentheses,
+perltidy will take this a signal that they are optional and not issue a
+warning. So if the above example is written as
+
+ sub gnab_gib {
+ my $self = shift;
+ my ( $v1, ($v2) ) = @_; # <-- $v2 is optional
+ ...;
+ }
+
+then perltidy will consider that the second arg is optional and not issue
+a warning for:
+
+ $self->gnab_gib(42);
+
+For multiple default call args, place one set of parens around them all. Some
+examples:
+
+ my ( ($v1) ) = @_; # <-- $v1 is optional
+ my ( $v1, ( $v2, $v3 ) ) = @_; # <-- $v2, $v3 are optional
+
+=item B<i:> B<indeterminate:> a specific number of expected args for a sub could not be determined, but it is called with a specific number. This issue is reported for the B<--dump-> option but not the B<--warn-> option.
+
=back
+B<Notes and Limitations:>
-=item B<Working with MakeMaker, AutoLoader and SelfLoader>
+=over 4
+
+=item *
+
+This option works best for subs which unpack call args in an orderly manner
+near the beginning of the sub from C<@_> and/or with C<shift> operations. It
+will also work for direct access to specific elements of the @_ array. However
+if the coding for arg extraction is complex then the number of sub args will be
+considered indeterminate and a count cannot be checked.
+
+=item *
+
+Sub calls made without parentheses around the args are not checked.
+
+=item *
+
+Anonymous subs and lexical subs (introduced with C<my>) are not checked.
+
+=item *
+
+Only calls which appear to be to subs defined within the file being
+processed are checked. But note that a file may contain multiple packages.
+
+=back
+
+=item B<Use --warn-mismatched-args to produce a warning for function calls with
+args not matching sub declarations>.
+
+This is similar to the B<-dump> parameter described above except that any
+mismatches are reported in the error file and otherwise formatting continues
+normally. The basic usage is
+
+ perltidy -wma somefile.pl
+
+Several companion controls are available to avoid unwanted error messages:
+
+=over 4
+
+=item *
+
+B<--warn-mismatched-arg-types=s>, or B<-wmat=s>, can be used to
+select specific tests, type B<a> (arrow test) or B<o> (overcounts) or B<u> (undercounts). All checks may be requested with B<-wmat='*'> or B<-wmat=1>. This is the default.
+
+To restrict the check to a specific warning type, set the string equal to the letter of that warning, any B<a>, B<o>, or B<u>. For example
+
+ perltidy -wmat='a o' somefile.pl
+
+will format F<somefile.pl> and report any arrow-type mismatches and overcount mismatches, but will skip undercount mismatches.
+
+=item *
+
+B<--warn-mismatched-arg-exclusion-list>, or B<-wmaxl=string>, can be given to
+skip the warning checks for a list of subroutine names, entered as a quoted
+string of space- or comma-separated names, without a package prefix. All subs
+with those names will be skipped, regardless of package. A leading and/or
+trailing B<*> on a name may be used to indicate a partial string match.
+
+=item *
+
+B<--warn-mismatched-arg-undercount-cutoff=n>, or B<-wmauc=n>, can be used to
+avoid B<undercount> warnings when the expected number of args is less than B<n>.
+Please note that this number B<n> is the number of args from the point of
+view of the sub definition, so an object like C<$self> passed with an arrow
+operator counts as one arg.
+
+The default value is B<n=4>. This has been found to allow most programs to
+pass without warnings, but it should be reduced if possible for better error
+checking. The minimum possible value of B<n> needed to avoid triggering an
+error for a program can be determined by running with B<-wma -wmauc=0>. If
+there are undercount errors, a note at the bottom of the error output
+indicates the value of B<n> required to avoid reporting them.
+
+As noted above for the parameter B<--dump-mismatched-args>, if optional call
+args are enclosed in separate parentheses, then perltidy will recognize them as
+optional args and avoid needless warnings. If this method is used,
+then B<-wmauc=0> should be used for maximal checking.
+
+=item *
+
+B<--warn-mismatched-arg-overcount-cutoff=n>, or B<-wmaoc=n>, can be used to
+avoid B<overcount> warnings when the expected number of args is less than B<n>.
+The default value is B<n=1>. This avoids warning messages for subroutines
+which are dummy placeholders for possible expansion.
+
+=back
+
+To illustrate these controls,
+
+ perltidy -wma -wmat='o u' -wmaxl='new old' -wmauc=2 somefile.pl
+
+means format F<somefile.pl> as usual and check for mismatched overcounts and
+undercounts but not arrows. Skip checking for any sub named C<new> or C<old>,
+and only warn of undercounts for subs expecting more than 2 args.
+
+=item B<Use --dump-mismatched-returns> to find function calls where the number of requested values may disagree with sub return statements
+
+The parameter B<--dump-mismatched-returns>, or B<-dmr>, examines the return
+side of sub call statements. Like all B<--dump> commands, it writes its report
+to standard output and exits immediately. For example
+
+ perltidy -dmr somefile.pl >results.txt
+
+The following types of issues are reported:
+
+=over 4
+
+=item B<x:> calls requesting an array from a sub with no return statements.
+
+=item B<y:> calls requesting a scalar from a sub with no return statements.
+
+=item B<o:> (B<overwant>): calls requesting an array with a count which exceeds the maximum number returned by the sub.
+
+=item B<u:> (B<underwant>): calls requesting an array with a count which is below the maximum and which does not match a number returned by the sub.
+
+=item B<s:> calls requesting a scalar from a sub which only returns two or more items.
+
+=back
+
+These issue types are illustrated with the following code
+
+ sub macho {
+ ...
+ ( $name, $flags ); # 2 values but no 'return' statement
+ }
+
+ ( $name, $flags ) = macho(); # 'x' (want array, but no return)
+ $name = macho(); # 'y' (want scalar but no return)
+
+ sub wimp {
+ ...;
+ return ( $name, $flags ); # 2 values with 'return' statement
+ }
+
+ ( $name, $flags, $access) = wimp(); # 'o' (want array 3 > 2)
+ ($name) = wimp(); # 'u' (want array 1 < 2)
+ $name = wimp(); # 's' (want scalar but 2 values returned)
+
+This analysis works by scanning all call statements and all sub return
+statements, and comparing the the number of items wanted with the possible
+number of items returned. If a specific value for either of these numbers
+cannot be determined for a call then it cannot be checked.
+
+Since only return
+statements are scanned for return values, this analysis will not be useful for
+programming which relies on the default return mechanism, as in
+the first sub above.
+Note that the B<Perl::Critic> policy B<RequireFinalReturn> can be used to check for code in this situation.
+
+Reported issues are
+not necessarily errors, but they might be, or they might indicate potentially
+confusing code.
+
+=item B<Use --warn-mismatched-returns> to issue warnings when the number of requested values may disagree with sub return statements
+
+This is similar to the B<-dump> parameter described above except that any
+mismatches are reported in the error file and otherwise formatting continues
+normally. The basic usage is
+
+ perltidy -wmr somefile.pl
+
+The following companion controls are available to avoid unwanted error messages:
+
+=over 4
+
+=item *
+
+B<--warn-mismatched-return-types=string>, or B<-wmrt=string>, can be used to limit checks.
+
+To restrict the checking, set the string equal to the letter(s) of that warning,
+any B<x>, B<y>, B<o>, B<u>, or B<s>. For example
+
+ perltidy -wmrt='x o s' somefile.pl
+
+will format F<somefile.pl> and report issue types B<x>, B<o>, and B<s> but not
+types B<u> and B<y>. All checks may be requested with B<-wmrt='*'> or
+B<-wmrt=1>. This is the default.
+
+=item *
+
+B<--warn-mismatched-return-exclusion-list>, or B<-wmrxl=string>, can be given to
+skip the warning checks for a list of subroutine names, entered as a quoted
+string of space- or comma-separated names, without a package prefix. All subs
+with those names will be skipped, regardless of package. A leading and/or
+trailing B<*> on a name may be used to indicate a partial string match.
+
+=back
+
+=back
+
+=head2 B<Working with MakeMaker, AutoLoader and SelfLoader>
The first $VERSION line of a file which might be eval'd by MakeMaker
is passed through unchanged except for indentation.
+The default B<--pass-version-line>, or B<-pvl>, is to do this.
Use B<--nopass-version-line>, or B<-npvl>, to deactivate this feature.
If the AutoLoader module is used, perltidy will continue formatting
code after seeing an __END__ line.
+The default B<--look-for-autoloader>, or B<-lal>, is to do this.
Use B<--nolook-for-autoloader>, or B<-nlal>, to deactivate this feature.
Likewise, if the SelfLoader module is used, perltidy will continue formatting
code after seeing a __DATA__ line.
+The default B<--look-for-selfloader>, or B<-lsl>, is to do this.
Use B<--nolook-for-selfloader>, or B<-nlsl>, to deactivate this feature.
-=item B<Working around problems with older version of Perl>
-
-Perltidy contains a number of rules which help avoid known subtleties
-and problems with older versions of perl, and these rules always
-take priority over whatever formatting flags have been set. For example,
-perltidy will usually avoid starting a new line with a bareword, because
-this might cause problems if C<use strict> is active.
-
-There is no way to override these rules.
-
-=back
-
=head1 HTML OPTIONS
=over 4
The flag B<-html> causes perltidy to write an html file with extension
F<.html>. So, for example, the following command
- perltidy -html somefile.pl
+ perltidy -html somefile.pl
will produce a syntax-colored html file named F<somefile.pl.html>
which may be viewed with a browser.
=item The B<-pre> flag for code snippets
When the B<-pre> flag is given, only the pre-formatted section, within
-the <PRE> and </PRE> tags, will be output. This simplifies inclusion
+the C<<PRE>> and C<</PRE>> tags, will be output. This simplifies inclusion
of the output in other files. The default is to output a complete
web page.
identifier identifier i
bareword, function bareword w
keyword keyword k
- quite, pattern quote q
+ quote, pattern quote q
here doc text here-doc-text h
here doc target here-doc-target hh
punctuation punctuation pu
To illustrate, the following command will produce an html
file F<somefile.pl.html> with "aqua" keywords:
- perltidy -html -hck=00ffff somefile.pl
+ perltidy -html -hck=00ffff somefile.pl
and this should be equivalent for most browsers:
- perltidy -html -hck=aqua somefile.pl
+ perltidy -html -hck=aqua somefile.pl
Perltidy merely writes any non-hex names that it sees in the html file.
The following 16 color names are defined in the HTML 3.2 standard:
- black => 000000,
- silver => c0c0c0,
- gray => 808080,
- white => ffffff,
- maroon => 800000,
- red => ff0000,
- purple => 800080,
- fuchsia => ff00ff,
- green => 008000,
- lime => 00ff00,
- olive => 808000,
- yellow => ffff00
- navy => 000080,
- blue => 0000ff,
- teal => 008080,
- aqua => 00ffff,
+ black => 000000,
+ silver => c0c0c0,
+ gray => 808080,
+ white => ffffff,
+ maroon => 800000,
+ red => ff0000,
+ purple => 800080,
+ fuchsia => ff00ff,
+ green => 008000,
+ lime => 00ff00,
+ olive => 808000,
+ yellow => ffff00
+ navy => 000080,
+ blue => 0000ff,
+ teal => 008080,
+ aqua => 00ffff,
Many more names are supported in specific browsers, but it is safest
to use the hex codes for other colors. Helpful color tables can be
For example, to use bold braces and lime color, non-bold, italics keywords the
following command would be used:
- perltidy -html -hbs -hck=00FF00 -nhbk -hik somefile.pl
+ perltidy -html -hbs -hck=00FF00 -nhbk -hik somefile.pl
The background color can be specified with B<--html-color-background=n>,
or B<-hcbg=n> for short, where n is a 6 character hex RGB value. The
The following list shows all short parameter names which allow a prefix
'n' to produce the negated form:
- D anl asbl asc ast asu atc atnl aws b
- baa baao bar bbao bbb bbc bbs bl bli boa
- boc bok bol bom bos bot cblx ce conv cpb
- cs csc cscb cscw dac dbc dbs dcbl dcsc ddf
- dln dnl dop dp dpro drc dsc dsm dsn dtc
- dtt dwic dwls dwrs dws eos f fll fpva frm
- fs fso gcs hbc hbcm hbco hbh hbhh hbi hbj
- hbk hbm hbn hbp hbpd hbpu hbq hbs hbsc hbv
- hbw hent hic hicm hico hih hihh hii hij hik
- him hin hip hipd hipu hiq his hisc hiv hiw
- hsc html ibc icb icp iob isbc iscl kgb kgbd
- kgbi kis lal log lop lp lsl mem nib ohbr
- okw ola olc oll olq opr opt osbc osbr otr
- ple pod pvl q sac sbc sbl scbb schb scp
- scsb sct se sfp sfs skp sob sobb sohb sop
- sosb sot ssc st sts t tac tbc toc tp
- tqw trp ts tsc tso vbc vc vmll vsc w
- wfc wn x xci xlp xs
+ D aia altc ame anl asbl asc ast asu atc
+ atnl aws b baa baao bar bbao bbb bbc bbs
+ bl bli boa boc bok bol bom bos bot cblx
+ ce conv cpb cs csc cscb cscw dac dbc dbs
+ dcbl dcsc ddf dia dior dln dltc dma dmcp dmr
+ dnl dop dp dpro drc dsc dsm dsn dtc dtco
+ dtt duk duv dwic dwls dwrs dws eos f fpva
+ frm fs fso gcs hbc hbcm hbco hbh hbhh hbi
+ hbj hbk hbm hbn hbp hbpd hbpu hbq hbs hbsc
+ hbv hbw hent hic hicm hico hih hihh hii hij
+ hik him hin hip hipd hipu hiq his hisc hiv
+ hiw hsc html ibc icb icp ils iob ipc isbc
+ iscl kgb kgbd kgbi kis lal log lop lp lsl
+ mci mem nib ohbr okw ola olc oll olq opr
+ opt osbc osbr otr ple pod pvl q qwaf sac
+ sbc sbl scbb schb scp scsb sct se sfp sfs
+ skp sob sobb sohb sop sosb sot ssc st sts
+ t tac tbc toc tp tqw trp ts tsc tso
+ vbc vc viu vmll vsc vsn vwe w wfc wia
+ wma wme wmr wn x xbt xci xlp xs
Equivalently, the prefix 'no' or 'no-' on the corresponding long names may be
used.
=head1 VERSION
-This man page documents perltidy version 20230309
+This man page documents perltidy version 20250105
=head1 BUG REPORTS
=head1 COPYRIGHT
-Copyright (c) 2000-2022 by Steve Hancock
+Copyright (c) 2000-2025 by Steve Hancock
=head1 LICENSE
<h1>Perltidy Change Log</h1>
+<h2>2025 01 05</h2>
+
+<pre><code>- If a file consists only of comments, then the starting indentation will
+ be guessed from the indentation of the first comment. Previously it would
+ be guessed to be zero. Parameter --starting-indentation-level=n can be
+ used to specify an indentation and avoid a guess. This issue can
+ arise when formatting a block of comments from within an editor.
+
+- Added missing 'use File::Temp' for -html option. This was causing the
+ message: "Undefined subroutine &File::Temp::tempfile called at ..."
+ See git #176.
+
+- A new parameter --dump-unique-keys, or -duk, dumps a list of hash keys
+ which appear to be used just once, and do not appear among the quoted
+ strings in a file. For example:
+
+ perltidy -duk File.pm >output.txt
+
+ This can help locate misspelled hash keys.
+
+- Line breaks at long chains of method calls now break at all calls
+ with args in parens, as in this example from git #171
+
+ # Old default
+ sub bla_p( $value = 42 ) {
+ return Mojo::Promise->resolve($value)->then( sub { shift() / 2 } )
+ ->then( sub { shift() + 6 } )->then( sub { shift() / 2 } )
+ ->catch( sub { warn shift } );
+ }
+
+ # New default
+ sub bla_p( $value = 42 ) {
+ return Mojo::Promise->resolve($value)
+ ->then( sub { shift() / 2 } )
+ ->then( sub { shift() + 6 } )
+ ->then( sub { shift() / 2 } )
+ ->catch( sub { warn shift } );
+ }
+
+- Parameter --break-at-old-method-breakpoints, or -bom, has been
+updated to insure that it only applies to lines beginning with
+method calls, as intended. Line breaks for all lines beginning with
+'->', even non-method calls, can be retained by using
+--keep-old-breakpoints_before='->'.
+
+- Added parameter --multiple-token-tightness=s, or -mutt=s.
+The default value --paren-tightness=1 adds space within the parens
+if, and only if, the container holds multiple tokens. Some perltidy
+tokens may be rather long, and it can be preferable to also space some of
+them as if they were multiple tokens. This can be done with this parameter,
+and it applies to parens as well as square brackets and curly braces.
+For example, the default below has no space within the square brackets:
+
+ # perltidy
+ my $rlist = [qw( alpha beta gamma )];
+
+Spaces can be obtained with:
+
+ # perltidy -mutt='q*'
+ my $rlist = [ qw( alpha beta gamma ) ];
+
+The parameter -mutt='q*' means treat qw and similar quote operators as
+multiple tokens. The manual has details; git #120 has another example.
+
+- Added parameter --indent-leading-semicolon, -ils; see git #171. When
+this is negated, a line with a leading semicolon does not get the extra
+leading continuation indentation spaces (defined with -ci=n).
+
+- Space around here doc delimiters follow spacing controls better. For
+example, a space is now added before the closing paren here:
+
+ OLD: (without the here doc):
+ push( @script, <<'EOT');
+
+ NEW:
+ push( @script, <<'EOT' );
+
+Also, any spaces between the '<<' and here target are removed (git #174):
+
+ OLD:
+ push( @script, << 'EOT');
+
+ NEW:
+ push( @script, <<'EOT' );
+
+- Added parameter --break-at-trailing-comma-types=s, or -btct=s, where
+s is a string which selects trailing commas. For example, -btct='f(b'
+places a line break after all bare trailing commas in function calls.
+The manual has details.
+
+- Fix git #165, strings beginning with v before => gave an incorrect error
+message.
+
+- The parameter --add-lone-trailing-commas, -altc, is now on by default.
+This will simplify input for trailing comma operations. Use
+--noadd-lone-trailing-commas, or -naltc to turn it off.
+
+- More edge cases for adding and deleting trailing commas are now handled
+(git #156).
+
+- A problem has been fixed in which the addition or deletion of trailing
+commas with the -atc or -dtc flags did not occur due to early convergence
+when the -conv flag was set (git #143).
+
+- Added parameter --qw-as-function, or -qwaf, discussed in git #164.
+When this parameter is set, a qw list which begins with 'qw(' is
+formatted as if it were a function call with call args being a list
+of comma-separated quoted items. For example, given this input:
+
+@fields = qw( $st_dev $st_ino $st_mode $st_nlink $st_uid
+ $st_gid $st_rdev $st_size $st_atime $st_mtime $st_ctime
+ $st_blksize $st_blocks);
+
+# perltidy -qwaf
+@fields = qw(
+ $st_dev $st_ino $st_mode $st_nlink
+ $st_uid $st_gid $st_rdev $st_size
+ $st_atime $st_mtime $st_ctime $st_blksize
+ $st_blocks
+);
+</code></pre>
+
+<h2>2024 09 03</h2>
+
+<pre><code>- Add partial support for Syntax::Operator::In and Syntax::Keyword::Match
+ (see git #162).
+
+- Add --timeout-in-seconds=n, or -tos=n. When the standard input supplies
+ the input stream, and the input has not been received within n seconds,
+ perltidy will end with a timeout message. The intention is to catch
+ a situation where perltidy is accidentally invoked without a file to
+ process and therefore waits for input from the system standard input
+ (stdin), which never arrives. The default is n=10.
+ This check can be turned off with -tos=0.
+
+- Add parameter --closing-side-comment-exclusion-list=string, or
+ -cscxl=string, where string is a list of block types to exclude
+ for closing side comment operations. Also, closing side comments
+ now work for anonymous subs if a --closing-side-comment-list (-cscl)
+ is not specified, and when 'asub' is requested with -cscl=asub.
+ Use -cscxl=asub to prevent this.
+
+- Include check for unused constants in --dump-unusual-variables and
+ --warn-variable-types (new issue type 'c'). Also expand checks to
+ cover variables introduced with 'use vars'.
+
+- Include signature variables in --dump-unusual-variables and
+ --warn-variable-types; see git #158.
+
+- Add logical xor operator ^^ available in perl version 5.40, as
+ noted in git #157.
+
+- Keyword 'state' now has default space before a paren, like 'my'.
+ Previously there was no space and no control. So the default
+ is now "state ($x)". This space can be removed with -nsak='state'.
+
+- Add options --add-lone-trailing-commas, -altc and
+ --delete-lone-trailing-commas, -dltc, to provide control over adding
+ and deleting the only comma in a list. See discussion in git #143
+ and the updated manual.
+
+- Add options --dump-mismatched-returns (or -dmr) and
+ --warn-mismatched-returns (or -wmr). These options report function
+ calls where the number of values requested may disagree with sub
+ return statements. The -dump version writes the results for a single
+ file to standard output and exits:
+
+ perltidy -dmr somefile.pl >results.txt
+
+ The -warn version formats as normal but reports any issues as warnings in
+ the error file:
+
+ perltidy -wmr somefile.pl
+
+ The -warn version may be customized with the following additional
+ parameters if necessary to avoid needless warnings:
+
+ --warn-mismatched-return-types=s (or -wmrt=s),
+ --warn-mismatched-return-exclusion-list=s (or -wmrxl=s)
+
+ where 's' is a control string. These are explained in the manual.
+
+- Updates for issue git #151:
+ (1) --warn-variable-types=u is now okay if a named file is processed.
+ (2) --warn-variable-exclusion-list=s now allows leading and/or
+ trailing * on variable names to allow a wildcard match. For example
+ -wvxl='*_unused' is okay and would match $var1_unused and $var2_unused.
+ (3) --dump-unusual-variables now outputs the filename.
+
+- A option was added to filter unimplemented parameters from perltidy
+ configuration files, suggested in git #146. It works like this: if
+ a line in the config file begins with three dashes followed by a
+ parameter name (rather than two dashes), then the line will be removed
+ if the parameter is unknown. Otherwise, a dash will be removed to make
+ the line valid.
+
+- Parameters --dump-mismatched-args (or -dma) and
+ --warn-mismatched-args (or -wma) have been updated to catch more
+ arg count issues.
+
+- Fixed issue git #143, extend -add-trailing-commas to apply to a list
+ with just a fat comma.
+
+- The minimum perl version is 5.8.1. Previously it was 5.8.0, which was
+ not correct because of the use of utf8::is_utf8.
+
+- Fixed issue git #142, test failure installing on perl versions before
+ version 5.10. The error caused the new parameter
+ -interbracket-arrow-style=s not to work. Except for this limitation,
+ Version 20240511 will work on older perl versions.
+</code></pre>
+
+<h2>2024 05 11</h2>
+
+<pre><code>- The option --valign-signed-numbers, or -vsn is now the default. It
+ was introduced in the previous release has been found to significantly
+ improve the overall appearance of columns of signed and unsigned
+ numbers. See the previous Change Log entry for an example.
+ This will change the formatting in scripts with columns
+ of vertically aligned signed and unsigned numbers.
+ Use -nvsn to turn this option off and avoid this change.
+
+- The option --delete-repeated-commas is now the default.
+
+ It makes the following checks and changes:
+ - Repeated commas like ',,' are removed with a warning
+ - Repeated fat commas like '=> =>' are removed with a warning
+ - The combination '=>,' produces a warning but is not changed
+ These warnings are only output if --warning-output, or -w, is set.
+
+ Use --nodelete-repeated-commas, or -ndrc, to retain repeated commas.
+
+- Previously, a line break was always made before a concatenated
+ quoted string, such as "\n", if the previous line had a greater
+ starting indentation. An exception is now made for a short concatenated
+ terminal quote. This keeps code a little more compact. For example:
+
+# basic rule: break before "\n" here because '$name' has more indentation:
+my $html = $this->SUPER::genObject( $query, $bindNode, $field . ":$var",
+ $name, "remove", "UNCHECKED" )
+ . "\n";
+
+# modified rule: make an exception for a short terminal quote like "\n"
+my $html = $this->SUPER::genObject( $query, $bindNode, $field . ":$var",
+ $name, "remove", "UNCHECKED" ) . "\n";
+
+- The operator ``**=`` now has spaces on both sides by default. Previously,
+ there was no space on the left. This change makes its spacing the same
+ as all other assignment operators. The previous behavior can be obtained
+ with the parameter setting -nwls='**='.
+
+- The option --file-size-order, or -fso is now the default. When
+ perltidy is given a list of multiple filenames to process, they
+ are sorted by size and processed in order of increasing size.
+ This can significantly reduce memory usage by Perl. This
+ option has always been used in testing, where typically several
+ jobs each operating on thousands of filenames are running at the
+ same time and competing for system resources. If this option
+ is not wanted for some reason, it can be deactivated with -nfso.
+
+- In the option --dump-block-summary, the number of sub arguments indicated
+ for each sub now includes any leading object variable passed with
+ an arrow-operator call. Previously the count would have been decreased
+ by one in this case. This change is needed for compatibility with future
+ updates.
+
+- Fix issue git #138 involving -xlp (--extended-line-up-parentheses).
+ When multiple-line quotes and regexes have long secondary lines, these
+ line lengths could influencing some spacing and indentation, but they
+ should not have since perltidy has no control over their indentation.
+ This has been fixed. This will mainly influence code which uses -xlp
+ and has long multi-line quotes.
+
+- Add option --minimize-continuation-indentation, -mci (see git #137).
+ This flag allows perltidy to remove continuation indentation in some
+ special cases where it is not really unnecessary. For a simple example,
+ the default formatting for the following snippet is:
+
+ # perltidy -nmci
+ $self->blurt( "Error: No INPUT definition for type '$type', typekind '"
+ . $type->xstype
+ . "' found" );
+
+ The second and third lines are one level deep in a container, and
+ are also statement continuations, so they get indented by the sum
+ of the -i value and the -ci value. If this flag is set, the
+ indentation is reduced by -ci spaces, giving
+
+ # perltidy -mci
+ $self->blurt( "Error: No INPUT definition for type '$type', typekind '"
+ . $type->xstype
+ . "' found" );
+
+ This situation is relatively rare except in code which has long
+ quoted strings and the -nolq flag is also set. This flag is currently
+ off by default, but it could become the default in a future version.
+
+- Add options --dump-mismatched-args (or -dma) and
+ --warn-mismatched-args (or -wma). These options look
+ for and report instances where the number of args expected by a
+ sub appear to differ from the number passed to the sub. The -dump
+ version writes the results for a single file to standard output
+ and exits:
+
+ perltidy -dma somefile.pl >results.txt
+
+ The -warn version formats as normal but reports any issues as warnings in
+ the error file:
+
+ perltidy -wma somefile.pl
+
+ The -warn version may be customized with the following additional parameters
+ if necessary to avoid needless warnings:
+
+ --warn-mismatched-arg-types=s (or -wmat=s),
+ --warn-mismatched-arg-exclusion-list=s (or -wmaxl=s), and
+ --warn-mismatched-arg-undercount-cutoff=n (or -wmauc=n).
+ --warn-mismatched-arg-overcount-cutoff=n (or -wmaoc=n).
+
+ These are explained in the manual.
+
+- Add option --valign-wide-equals, or -vwe, for issue git #135.
+ Setting this parameter causes the following assignment operators
+
+ = **= += *= &= <<= &&= -= /= |= >>= ||= //= .= %= ^= x=
+
+ to be aligned vertically with the ending = all aligned. For example,
+ here is the default formatting of a snippet of code:
+
+ $str .= SPACE x $total_pad_count;
+ $str_len += $total_pad_count;
+ $total_pad_count = 0;
+ $str .= $rfields->[$j];
+ $str_len += $rfield_lengths->[$j];
+
+ And here is the same code formatted with -vwe:
+
+ # perltidy -vwe
+ $str .= SPACE x $total_pad_count;
+ $str_len += $total_pad_count;
+ $total_pad_count = 0;
+ $str .= $rfields->[$j];
+ $str_len += $rfield_lengths->[$j];
+
+ This option currently is off by default to avoid changing existing
+ formatting.
+
+- Added control --delete-interbracket-arrows, or -dia, to delete optional
+ hash ref and array ref arrows between brackets as in the following
+ expression (see git #131)
+
+ return $self->{'commandline'}->{'arg_list'}->[0]->[0]->{'hostgroups'};
+
+ # perltidy -dia gives:
+ return $self->{'commandline'}{'arg_list'}[0][0]{'hostgroups'};
+
+ Added the opposite control --aia-interbracket-arrows, or -aia, to
+ add arrows. So applied to the previous line the arrows are restored:
+
+ # perltidy -aia
+ return $self->{'commandline'}->{'arg_list'}->[0]->[0]->{'hostgroups'};
+
+ The manual describes additional controls for adding and deleting
+ just selected interbracket arrows.
+</code></pre>
+
+<h2>2024 02 02</h2>
+
+<pre><code>- Added --valign-signed-numbers, or -vsn. This improves the appearance
+ of columns of numbers by aligning leading algebraic signs. For example:
+
+ # perltidy -vsn
+ my $xyz_shield = [
+ [ -0.060, -0.060, 0. ],
+ [ 0.060, -0.060, 0. ],
+ [ 0.060, 0.060, 0. ],
+ [ -0.060, 0.060, 0. ],
+ [ -0.0925, -0.0925, 0.092 ],
+ [ 0.0925, -0.0925, 0.092 ],
+ [ 0.0925, 0.0925, 0.092 ],
+ [ -0.0925, 0.0925, 0.092 ],
+ ];
+
+ # perltidy -nvsn (current DEFAULT)
+ my $xyz_shield = [
+ [ -0.060, -0.060, 0. ],
+ [ 0.060, -0.060, 0. ],
+ [ 0.060, 0.060, 0. ],
+ [ -0.060, 0.060, 0. ],
+ [ -0.0925, -0.0925, 0.092 ],
+ [ 0.0925, -0.0925, 0.092 ],
+ [ 0.0925, 0.0925, 0.092 ],
+ [ -0.0925, 0.0925, 0.092 ],
+ ];
+
+ This new option works well but is currently OFF to allow more testing
+ and fine-tuning. It is expected to be activated in a future release.
+
+- Added --dump-mixed-call-parens (-dmcp ) which will dump a list of
+ operators which are sometimes followed by parens and sometimes not.
+ This can be useful for developing a uniform style for selected operators.
+ Issue git #128. For example
+
+ perltidy -dmcp somefile.pl >out.txt
+
+ produces lines like this, where the first number is the count of
+ uses with parens, and the second number is the count without parens.
+
+ k:caller:2:1
+ k:chomp:3:4
+ k:close:7:4
+
+- Added --want-call-parens=s (-wcp=s) and --nowant-call-parens=s (-nwcp=s)
+ options which will warn of paren uses which do not match a selected
+ style. The manual has details. But for example,
+
+ perltidy -wcp='&' somefile.pl
+
+ will format as normal but warn if any user subs are called without parens.
+
+- Added --dump-unusual-variables (-duv) option to dump a list of
+ variables with certain properties of interest. For example
+
+ perltidy -duv somefile.pl >vars.txt
+
+ produces a file with lines which look something like
+
+ 1778:u: my $input_file
+ 6089:r: my $j: reused - see line 6076
+
+ The values on the line which are separated by colons are:
+
+ line number - the number of the line of the input file
+ issue - a single letter indicating the issue, see below
+ variable name - the name of the variable, preceded by a keyword
+ note - an optional note referring to another line
+
+ The issue is indicated by a letter which may be one of:
+
+ r: reused variable name
+ s: sigil change but reused bareword
+ p: lexical variable with scope in multiple packages
+ u: unused variable
+
+ This is very useful for locating problem areas and bugs in code.
+
+- Added a related flag --warn-variable-types=string (-wvt=string) option
+ to warn if certain types of variables are found in a script. The types
+ are a space-separated string which may include 'r', 's', and 'p' but
+ not 'u'. For example
+
+ perltidy -wvt='r s' somefile.pl
+
+ will check for and warn if any variabls of type 'r', or 's' are seen,
+ but not 'p'. All possible checks may be indicated with a '*' or '1':
+
+ perltidy -wvt='*' somefile.pl
+
+ The manual has further details.
+
+- All parameters taking integer values are now checked for
+ out-of-range values before processing starts. When a maximum or
+ maximum range is exceeded, the new default behavior is to write a
+ warning message, reset the value to its default setting, and continue.
+ This default behavior can be changed with the new parameter
+ --integer-range-check=n, or -irc=n, as follows:
+
+ n=0 skip check completely (for stress-testing perltidy only)
+ n=1 reset bad values to defaults but do not issue a warning
+ n=2 reset bad values to defaults and issue a warning [DEFAULT]
+ n=3 stop immediately if any values are out of bounds
+
+ The settings n=0 and n=1 are mainly useful for testing purposes.
+
+- The --dump-block-summary (-dbs) option now includes the number of sub
+ args in the 'type' column. For example, 'sub(9)' indicates a sub
+ with 9 args. Subs whose arg count cannot easily be determined are
+ indicated as 'sub(*)'. The count does not include a leading '$self'
+ or '$class' arg.
+
+- Added flag --space-signature-paren=n, or -ssp=n (issue git #125).
+ This flag works the same as the existing flag --space-prototype-paren=n
+ except that it applies to the space before the opening paren of a sub
+ signature instead of a sub prototype. Previously, there was no control
+ over this (a space always occurred). For example, given the following
+ line:
+
+ sub circle( $xc, $yc, $rad );
+
+ The following results can now be obtained, according to the value of n:
+
+ sub circle( $xc, $yc, $rad ); # n=0 [no space]
+ sub circle( $xc, $yc, $rad ); # n=1 [default; same as input]
+ sub circle ( $xc, $yc, $rad ); # n=2 [space]
+
+ The spacing in previous versions of perltidy corresponded to n=2 (always
+ a space). The new default value, n=1, will produce a space if and only
+ if there was a space in the input text.
+
+- The --dump-block-summary option can report an if-elsif-elsif-.. chain
+ as a single line item with the notation -dbt='elsif3', for example,
+ where the '3' is an integer which specifies the minimum number of elsif
+ blocks required for a chain to be reported. The manual has details.
+
+- Fix problem c269, in which the new -ame parameter could incorrectly
+ emit an else block when two elsif blocks were separated by a hanging
+ side comment (a very rare situation).
+
+- When braces are detected to be unbalanced, an attempt is made to
+ localize the error by comparing the indentation at closing braces
+ with their actual nesting levels. This can be useful for files which
+ have previously been formatted by perltidy. To illustrate, a test was
+ made in which the closing brace at line 30644 was commented out in
+ a file with a total of over 62000 lines. The new error message is
+
+ Final nesting depth of '{'s is 1
+ The most recent un-matched '{' is on line 6858
+ ...
+ Table of nesting level differences at closing braces.
+ This might help localize brace errors if the file was previously formatted.
+ line: (brace level) - (level expected from old indentation)
+ 30643: 0
+ 30645: 1
+
+ Previously, the error file only indicated that the error in this case
+ was somewhere after line 6858, so the new table is very helpful. Closing
+ brace indentation is checked because it is unambiguous and can be done
+ very efficiently.
+
+- The -DEBUG option no longer automatically also writes a .LOG file.
+ Use --show-options if the .LOG file is needed.
+
+- The run time of this version with all new options in use is no greater
+ than that of the previous version thanks to optimization work.
+</code></pre>
+
+<h2>2023 09 12</h2>
+
+<pre><code>- Fix for git #124: remove a syntax error check which could cause
+ an incorrect error message when List::Gather::gather was used.
+</code></pre>
+
+<h2>2023 09 09</h2>
+
+<pre><code>- Added new parameters -wme, or --warn-missing-else, and -ame,
+ or --add-missing else. The parameter -wme tells perltidy to issue
+ a warning if an if-elsif-... chain does not end in an else block.
+ The parameter -ame tells perltidy to insert an else block at the
+ end of such a chain if there is none.
+
+ For example, given the following snippet:
+
+ if ( $level == 3 ) { $val = $global{'section'} }
+ elsif ( $level == 2 ) { $val = $global{'chapter'} }
+
+ # perltidy -ame
+ if ( $level == 3 ) { $val = $global{'section'} }
+ elsif ( $level == 2 ) { $val = $global{'chapter'} }
+ else {
+ ##FIXME - added with perltidy -ame
+ }
+
+ The resulting code should be carefully reviewed, and the ##FIXME comment
+ should be updated as appropriate. The text of the ##FIXME comment can be
+ changed with parameter -amec=s, where 's' is the comment to mark the new
+ else block. The man pages have more details.
+
+- The syntax of the parameter --use-feature=class, or -uf=class, which
+ new in the previous release, has been changed slightly for clarity.
+ The default behavior, which occurs if this flag is not entered, is
+ to automatically try to handle both old and new uses of the keywords
+ 'class', 'method', 'field', and 'ADJUST'.
+ To force these keywords to only follow the -use feature 'class' syntax,
+ enter --use-feature=class.
+ To force perltidy to ignore the -use feature 'class' syntax, enter
+ --use-feature=noclass.
+
+- Issue git #122. Added parameter -lrt=n1:n2, or --line-range-tidy=n1:n2
+ to limit tidy operations to a limited line range. Line numbers start
+ with 1. This parameter is mainly of interest to editing programs which
+ drive perltidy. The man pages have details.
+
+- Some fairly rare instances of incorrect spacing have been fixed. The
+ problem was that the tokenizer being overly conservative in marking
+ terms as possible filehandles or indirect objects. This causes the space
+ after the possible filehandle to be frozen to its input value in order not
+ to introduce an error in case Perl had to guess. The problem was fixed
+ by having the tokenizer look ahead for operators which can eliminate the
+ uncertainty. To illustrate, in the following line the term ``$d`` was
+ previously marked as a possible filehandle, so no space was added after it.
+
+ print $d== 1 ? " [ON]\n" : $d ? " [$d]\n" : "\n";
+ ^
+
+ In the current version, the next token is seen to be an equality, so
+ ``$d`` is marked as an ordinary identifier and normal spacing rules
+ can apply:
+
+ print $d == 1 ? " [ON]\n" : $d ? " [$d]\n" : "\n";
+ ^
+
+- This version runs 7 to 10 percent faster than the previous release on
+ large files, depending on options and file type. Much of the gain comes
+ from streamlined I/O operations.
+
+- This version was stress-tested for many cpu hours with random
+ input parameters. No failures to converge, internal fault checks,
+ undefined variable references or other irregularities were seen.
+</code></pre>
+
+<h2>2023 07 01</h2>
+
+<pre><code>- Issue git #121. Added parameters -xbt, or --extended-block-tightness,
+ and -xbtl=s, or --extended-block-tightness-list=s, to allow
+ certain small code blocks to have internal spacing controlled by
+ -bbt=n rather than -bt=n. The man pages have details.
+
+- Issue git #118. A warning will be issued if a duplicate format-skipping
+ starting marker is seen within a format-skipping section. The same
+ applies to duplicate code-skipping starting markers within code-skipping
+ sections.
+
+- Issue git #116. A new flag --valign-if-unless, -viu, was added to
+ allow postfix 'unless' terms to align with postfix 'if' terms. The
+ default remains not to do this.
+
+- Fixed git #115. In the two most recent CPAN releases, when the
+ Perl::Tidy module was called with the source pointing to a file,
+ but no destination specified, the output went to the standard
+ output instead of to a file with extension ``.tdy``, as it should
+ have. This has been fixed.
+
+- Fixed git #110, add missing documentation for new options
+ -cpb and -bfvt=n. These work in version 20230309 but the pod
+ documentation was missing and has been added.
+
+- Fixed an undefined reference message when running with
+ --dump-block-summary on a file without any subs or other
+ selected block types.
+
+- Add parameter -ipc, or --ignore-perlcritic-comments. Perltidy, by
+ default, will look for side comments beginning with ``## no critic`` and
+ ignore their lengths when making line break decisions, even if the user
+ has not set ``-iscl``. The reason is that an unwanted line break can
+ make these special comments ineffective in controlling ``perlcritic``.
+ The parameter -ipc can be set if, for some reason, this is not wanted.
+
+- Some minor issues with continuation indentation have been fixed.
+ Most scripts will remain unchanged. The main change is that block
+ comments which occur just before a closing brace, bracket or paren
+ now have an indentation which is independent of the existence of
+ an optional comma or semicolon. Previously, adding or deleting
+ an optional trailing comma could cause their indentation to jump.
+ Also, indentation of comments within ternary statements has been
+ improved. For additional details see:
+
+ https://github.com/perltidy/perltidy/blob/master/docs/ci_update.md
+
+- This version was stress-tested for many cpu hours with random
+ input parameters. No failures to converge, internal fault checks,
+ undefined variable references or other irregularities were seen.
+
+- This version runs several percent faster than the previous release
+ on large files.
+</code></pre>
+
<h2>2023 03 09</h2>
<pre><code>- No significant bugs have been found since the last release to CPAN.
- Added a new option '--code-skipping', requested in git #65, in which code
between comment lines '#<<V' and '#>>V' is passed verbatim to the output
- stream without error checking. It is simmilar to --format-skipping
+ stream without error checking. It is similar to --format-skipping
but there is no error checking of the skipped code. This can be useful for
skipping past code which employs an extended syntax.
the -quiet flag. This has been fixed.
- Add flag -maxfs=n, --maximum-file-size-mb=n. This parameter is provided to
- avoid causing system problems by accidentally attempting to format an
- extremely large data file. The default is n=10. The command to increase
+ avoid causing system problems by accidentally attempting to format an
+ extremely large data file. The default is n=10. The command to increase
the limit to 20 MB for example would be -mfs=20. This only applies to
files specified by filename on the command line.
-- Skip formatting if there are too many indentation level errors. This is
- controlled with -maxle=n, --maximum-level-errors=n. This means that if
+- Skip formatting if there are too many indentation level errors. This is
+ controlled with -maxle=n, --maximum-level-errors=n. This means that if
the ending indentation differs from the starting indentation by more than
- n levels, the file will be output verbatim. The default is n=1.
+ n levels, the file will be output verbatim. The default is n=1.
To skip this check, set n=-1 or set n to a large number.
- A related new flag, --maximum-unexpected-errors=n, or -maxue=n, is available
- Add flag -xci, --extended-continuation-indentation, regarding issue git #28
This flag causes continuation indentation to "extend" deeper into structures.
- Since this is a fairly new flag, the default is -nxci to avoid disturbing
+ Since this is a fairly new flag, the default is -nxci to avoid disturbing
existing formatting. BUT you will probably see some improved formatting
- in complex data structures by setting this flag if you currently use -ci=n
- and -i=n with the same value of 'n' (as is the case if you use -pbp,
+ in complex data structures by setting this flag if you currently use -ci=n
+ and -i=n with the same value of 'n' (as is the case if you use -pbp,
--perl-best-practices, where n=4).
- Fix issue git #42, clarify how --break-at-old-logical-breakpoints works.
- Fix issue git #41, typo in manual regarding -fsb.
-- Fix issue git #40: when using the -bli option, a closing brace followed by
- a semicolon was not being indented. This applies to braces which require
+- Fix issue git #40: when using the -bli option, a closing brace followed by
+ a semicolon was not being indented. This applies to braces which require
semicolons, such as a 'do' block.
- Added 'state' as a keyword.
- A better test for convergence has been added. When iterations are requested,
the new test will stop after the first pass if no changes in line break
- locations are made. Previously, file checksums were used and required at least two
- passes to verify convergence unless no formatting changes were made. With the new test,
- only a single pass is needed when formatting changes are limited to adjustments of
+ locations are made. Previously, file checksums were used and required at least two
+ passes to verify convergence unless no formatting changes were made. With the new test,
+ only a single pass is needed when formatting changes are limited to adjustments of
indentation and whitespace on the lines of code. Extensive testing has been made to
verify the correctness of the new convergence test.
-- Line breaks are now automatically placed after 'use overload' to
+- Line breaks are now automatically placed after 'use overload' to
improve formatting when there are numerous overloaded operators. For
example
...
- A number of minor problems with parsing signatures and prototypes have
- been corrected, particularly multi-line signatures. Some signatures
- had previously been parsed as if they were prototypes, which meant the
+ been corrected, particularly multi-line signatures. Some signatures
+ had previously been parsed as if they were prototypes, which meant the
normal spacing rules were not applied. For example
OLD:
}
- Numerous minor issues that the average user would not encounter were found
- and fixed. They can be seen in the more complete list of updates at
+ and fixed. They can be seen in the more complete list of updates at
https://github.com/perltidy/perltidy/blob/master/local-docs/BugLog.pod
</code></pre>
<h2>2020 10 01</h2>
-<pre><code>- Robustness of perltidy has been significantly improved. Updating is recommended. Continual
- automated testing runs began about 1 Sep 2020 and numerous issues have been found and fixed.
+<pre><code>- Robustness of perltidy has been significantly improved. Updating is recommended. Continual
+ automated testing runs began about 1 Sep 2020 and numerous issues have been found and fixed.
Many involve references to uninitialized variables when perltidy is fed random text and random
- control parameters.
+ control parameters.
- Added the token '->' to the list of alignment tokens, as suggested in git
#39, so that it can be vertically aligned if a space is placed before them with -wls='->'.
comments generated by the -csc parameter, a separating newline was missing.
The resulting script will not then run, but worse, if it is reformatted with
the same parameters then closing side comments could be overwritten and data
- lost.
+ lost.
This problem was found during automated random testing. The parameter
-scbb is rarely used, which is probably why this has not been reported. Please
upgrade your version.
- Added parameter --non-indenting-braces, or -nib, which prevents
- code from indenting one level if it follows an opening brace marked
+ code from indenting one level if it follows an opening brace marked
with a special side comment, '#<<<'. For example,
{ #<<< a closure to contain lexical vars
This is on by default. If your code happens to have some
opening braces followed by '#<<<', and you
- don't want this, you can use -nnib to deactivate it.
+ don't want this, you can use -nnib to deactivate it.
- Side comment locations reset at a line ending in a level 0 open
- block, such as when a new multi-line sub begins. This is intended to
+ block, such as when a new multi-line sub begins. This is intended to
help keep side comments from drifting to far to the right.
</code></pre>
<h2>2020 08 22</h2>
<pre><code>- Fix RT #133166, encoding not set for -st. Also reported as RT #133171
- and git #35.
+ and git #35.
This is a significant bug in version 20200616 which can corrupt data if
perltidy is run as a filter on encoded text.
- Vertical alignment has been improved. Numerous minor issues have
been fixed.
-- Formatting with the -lp option is improved.
+- Formatting with the -lp option is improved.
- Fixed issue git #32, misparse of bare 'ref' in ternary
<pre><code>- Added support for Switch::Plain syntax, issue git #31.
-- Fixed minor problem where trailing 'unless' clauses were not
+- Fixed minor problem where trailing 'unless' clauses were not
getting vertically aligned.
- Added a parameter --logical-padding or -lop to allow logical padding
'teefile' call parameters. These output streams are rarely used but
they are now treated the same as any 'logfile' stream.
-- add option --break-at-old-semicolon-breakpoints', -bos, requested
+- add option --break-at-old-semicolon-breakpoints', -bos, requested
in RT#131644. This flag will keep lines beginning with a semicolon.
- Added --use-unicode-gcstring to control use of Unicode::GCString for
- evaluating character widths of encoded data. The default is
+ evaluating character widths of encoded data. The default is
not to use this (--nouse-unicode-gcstring). If this flag is set,
- perltidy will look for Unicode::GCString and, if found, will use it
+ perltidy will look for Unicode::GCString and, if found, will use it
to evaluate character display widths. This can improve displayed
vertical alignment for files with wide characters. It is a nice
feature but it is off by default to avoid conflicting formatting
- when there are multiple developers. Perltidy installation does not
- require Unicode::GCString, so users wanting to use this feature need
+ when there are multiple developers. Perltidy installation does not
+ require Unicode::GCString, so users wanting to use this feature need
set this flag and also to install Unicode::GCString separately.
- Added --character-encoding=guess or -guess to have perltidy guess
- if a file (or other input stream) is encoded as -utf8 or some
- other single-byte encoding. This is useful when processing a mixture
+ if a file (or other input stream) is encoded as -utf8 or some
+ other single-byte encoding. This is useful when processing a mixture
of file types, such as utf8 and latin-1.
Please Note: The default encoding has been set to be 'guess'
- instead of 'none'. This seems like the best default, since
+ instead of 'none'. This seems like the best default, since
it allows perltidy work properly with both
utf8 files and older latin-1 files. The guess mode uses Encode::Guess,
- which is included in standard perl distributions, and only tries to
- guess if a file is utf8 or not, never any other encoding. If the guess is
- utf8, and if the file successfully decodes as utf8, then it the encoding
- is assumed to be utf8. Otherwise, no encoding is assumed.
- If you do not want to use this new default guess mode, or have a
- problem with it, you can set --character-encoding=none (the previous
+ which is included in standard perl distributions, and only tries to
+ guess if a file is utf8 or not, never any other encoding. If the guess is
+ utf8, and if the file successfully decodes as utf8, then it the encoding
+ is assumed to be utf8. Otherwise, no encoding is assumed.
+ If you do not want to use this new default guess mode, or have a
+ problem with it, you can set --character-encoding=none (the previous
default) or --character-encoding=utf8 (if you deal with utf8 files).
- Specific encodings of input files other than utf8 may now be given, for
<h2>2020 01 10</h2>
<pre><code>- This release adds a flag to control the feature RT#130394 (allow short nested blocks)
- introduced in the previous release. Unfortunately that feature breaks
+ introduced in the previous release. Unfortunately that feature breaks
RPerl installations, so a control flag has been introduced and that feature is now
off by default. The flag is:
- --one-line-block-nesting=n, or -olbn=n, where n is an integer as follows:
+ --one-line-block-nesting=n, or -olbn=n, where n is an integer as follows:
-olbn=0 break nested one-line blocks into multiple lines [new DEFAULT]
-olbn=1 stable; keep existing nested-one line blocks intact [previous DEFAULT]
- Fixed issue RT#131288: parse error for un-prototyped constant function without parenthesized
call parameters followed by ternary.
-- Fixed issue RT#131360, installation documentation. Added a note that the binary
- 'perltidy' comes with the Perl::Tidy module. They can both normally be installed with
+- Fixed issue RT#131360, installation documentation. Added a note that the binary
+ 'perltidy' comes with the Perl::Tidy module. They can both normally be installed with
'cpanm Perl::Tidy'
</code></pre>
one or more aliases for 'sub', separated by spaces or commas.
For example,
- perltidy -sal='method fun'
+ perltidy -sal='method fun'
will cause the perltidy to treat the words 'method' and 'fun' to be
treated the same as if they were 'sub'.
-- Added flag --space-prototype-paren=i, or -spp=i, to control spacing
+- Added flag --space-prototype-paren=i, or -spp=i, to control spacing
before the opening paren of a prototype, where i=0, 1, or 2:
i=0 no space
i=1 follow input [current and default]
i=2 always space
Previously, perltidy always followed the input.
- For example, given the following input
+ For example, given the following input
sub usage();
<h2>2019 09 15</h2>
-<pre><code>- fixed issue RT#130344: false warning "operator in print statement"
- for "use lib".
+<pre><code>- fixed issue RT#130344: false warning "operator in print statement"
+ for "use lib".
- fixed issue RT#130304: standard error output should include filename.
- When perltidy error messages are directed to the standard error output
- with -se or --standard-error-output, the message lines now have a prefix
- 'filename:' for clarification in case multiple files
- are processed, where 'filename' is the name of the input file. If
- input is from the standard input the displayed filename is '<stdin>',
- and if it is from a data structure then displayed filename
+ When perltidy error messages are directed to the standard error output
+ with -se or --standard-error-output, the message lines now have a prefix
+ 'filename:' for clarification in case multiple files
+ are processed, where 'filename' is the name of the input file. If
+ input is from the standard input the displayed filename is '<stdin>',
+ and if it is from a data structure then displayed filename
is '<source_stream>'.
- implement issue RT#130425: check mode. A new flag '--assert-tidy'
has also been added. The next item, RT#130297, insures that the script
will exit with a non-zero exit flag if the assertion fails.
-- fixed issue RT#130297; the perltidy script now exits with a nonzero exit
- status if it wrote to the standard error output. Prevously only fatal
+- fixed issue RT#130297; the perltidy script now exits with a nonzero exit
+ status if it wrote to the standard error output. Previously only fatal
run errors produced a non-zero exit flag. Now, even non-fatal messages
requested with the -w flag will cause a non-zero exit flag. The exit
flag now has these values:
- fixed issue git#13, needless trailing whitespace in error message
- fixed issue git#9: if the -ce (--cuddled-else) flag is used,
- do not try to form new one line blocks for a block type
+ do not try to form new one line blocks for a block type
specified with -cbl, particularly map, sort, grep
- iteration speedup for unchanged code. Previously, when iterations were
<h2>2019 06 01</h2>
-<pre><code>- rt #128477: Prevent inconsistent owner/group and setuid/setgid bits.
+<pre><code>- rt #128477: Prevent inconsistent owner/group and setuid/setgid bits.
In the -b (--backup-and-modify-in-place) mode, an attempt is made to set ownership
of the output file equal to the input file, if they differ.
In all cases, if the final output file ownership differs from input file, any setuid/setgid bits are cleared.
<pre><code>- RT #123749, partial fix. "Continuation indentation" is removed from lines
with leading closing parens which are part of a call chain.
For example, the call to pack() is is now outdented to the starting
- indentation in the following experession:
+ indentation in the following expression:
# OLD
$mw->Button(
-it>1.
- Fixed bug where a line occasionally ended with an extra space. This reduces
- rhe number of instances where a second iteration gives a result different
+ the number of instances where a second iteration gives a result different
from the first.
- Updated documentation to note that the Tidy.pm module <stderr> parameter may
<pre><code>- Allow configuration file to be 'perltidy.ini' for Windows systems.
i.e. C:\Documents and Settings\User\perltidy.ini
- and added documentation for setting configuation file under Windows in man
+ and added documentation for setting configuration file under Windows in man
page. Thanks to Stuart Clark.
- Corrected problem of unwanted semicolons in hash ref within given/when code.
Thanks to Mark Olesen for suggesting this.
--Improved alignement of '='s in certain cases.
+-Improved alignment of '='s in certain cases.
Thanks to Norbert Gruener for sending an example.
-Outdent-long-comments (-olc) has been re-instated as a default, since
);
-Lists which do not format well in uniform columns are now better
- identified and formated.
+ identified and formatted.
OLD:
return $c->create( 'polygon', $x, $y, $x + $ruler_info{'size'},
to control what text is appended to 'else' and 'elsif' blocks.
Default is to just add leading 'if' text to an 'else'. See manual.
--The -csc option now labels 'else' blocks with additinal information
+-The -csc option now labels 'else' blocks with additional information
from the opening if statement and elsif statements, if space.
Thanks to Wolfgang Weisselberg for suggesting this.
'92', '94', '96', '98', '100', '102', '104'
);
--Lists of complex items, such as matricies, are now detected
+-Lists of complex items, such as matrices, are now detected
and displayed with just one item per row:
OLD:
if ( ( $tmp >= 0x80_00_00 ) || ( $tmp < -0x80_00_00 ) ) { }
-'**=' was incorrectly tokenized as '**' and '='. This only
- caused a problem with the -extrude opton.
+ caused a problem with the -extrude option.
-Corrected a divide by zero when -extrude option is used
errorfile => $errorfile,
teefile => $teefile,
debugfile => $debugfile,
- formatter => $formatter, # callback object (see below)
+ formatter => $formatter, # callback object (see below)
dump_options => $dump_options,
dump_options_type => $dump_options_type,
prefilter => $prefilter_coderef,
<p>The module accepts input and output streams by a variety of methods. The following list of parameters may be any of the following: a filename, an ARRAY reference, a SCALAR reference, or an object with either a <b>getline</b> or <b>print</b> method, as appropriate.</p>
-<pre><code> source - the source of the script to be formatted
- destination - the destination of the formatted output
- stderr - standard error output
- perltidyrc - the .perltidyrc file
- logfile - the .LOG file stream, if any
- errorfile - the .ERR file stream, if any
- dump_options - ref to a hash to receive parameters (see below),
- dump_options_type - controls contents of dump_options
- dump_getopt_flags - ref to a hash to receive Getopt flags
- dump_options_category - ref to a hash giving category of options
- dump_abbreviations - ref to a hash giving all abbreviations</code></pre>
+<pre><code> source - the source of the script to be formatted
+ destination - the destination of the formatted output
+ stderr - standard error output
+ perltidyrc - the .perltidyrc file
+ logfile - the .LOG file stream, if any
+ errorfile - the .ERR file stream, if any
+ dump_options - ref to a hash to receive parameters (see below),
+ dump_options_type - controls contents of dump_options
+ dump_getopt_flags - ref to a hash to receive Getopt flags
+ dump_options_category - ref to a hash giving category of options
+ dump_abbreviations - ref to a hash giving all abbreviations</code></pre>
<p>The following chart illustrates the logic used to decide how to treat a parameter.</p>
<p>An exit value of 1 indicates that the process had to be terminated early due to errors in the input parameters. This can happen for example if a parameter is misspelled or given an invalid value. The calling program should check for this flag because if it is set the destination stream will be empty or incomplete and should be ignored. Error messages in the <b>stderr</b> stream will indicate the cause of any problem.</p>
-<p>An exit value of 2 indicates that perltidy ran to completion but there there are warning messages in the <b>stderr</b> stream related to parameter errors or conflicts and/or warning messages in the <b>errorfile</b> stream relating to possible syntax errors in the source code being tidied.</p>
+<p>An exit value of 2 indicates that perltidy ran to completion but there are warning messages in the <b>stderr</b> stream related to parameter errors or conflicts and/or warning messages in the <b>errorfile</b> stream relating to possible syntax errors in the source code being tidied.</p>
<p>In the event of a catastrophic error for which recovery is not possible <b>perltidy</b> terminates by making calls to <b>croak</b> or <b>confess</b> to help the programmer localize the problem. These should normally only occur during program development.</p>
<p>Parameters which control formatting may be passed in several ways: in a <i>.perltidyrc</i> configuration file, in the <b>perltidyrc</b> parameter, and in the <b>argv</b> parameter.</p>
-<p>The <b>-syn</b> (<b>--check-syntax</b>) flag may be used with all source and destination streams except for standard input and output. However data streams which are not associated with a filename will be copied to a temporary file before being passed to Perl. This use of temporary files can cause somewhat confusing output from Perl.</p>
-
<p>If the <b>-pbp</b> style is used it will typically be necessary to also specify a <b>-nst</b> flag. This is necessary to turn off the <b>-st</b> flag contained in the <b>-pbp</b> parameter set which otherwise would direct the output stream to the standard output.</p>
<h1 id="EXAMPLES">EXAMPLES</h1>
source => \$source_string,
destination => \$dest_string,
stderr => \$stderr_string,
- errorfile => \$errorfile_string, # ignored when -se flag is set
- ##phasers => 'stun', # uncomment to trigger an error
+ errorfile => \$errorfile_string, # ignored when -se flag is set
+ ##phasers => 'stun', # uncomment to trigger an error
);
if ($error) {
<h1 id="VERSION">VERSION</h1>
-<p>This man page documents Perl::Tidy version 20230309</p>
+<p>This man page documents Perl::Tidy version 20250105</p>
<h1 id="LICENSE">LICENSE</h1>
--- /dev/null
+# An update to the basic Perl::Tidy continuation indentation model
+
+Perl::Tidy version 20230701 has several changes in the basic method for
+computing "continuation indentation". This has been on the TODO list
+for a long time. The changes mainly apply to some unusual situations,
+and most programs will remain unchanged. This note explains what the changes
+are and why they are needed.
+
+To briefly review, the indentation of a line is the sum of two parts:
+(1) **structural indentation**, and (2) **continuation indentation**.
+
+These are occasionally called primary and secondary indentation.
+
+**Structural indentation** is introduced by opening container tokens
+**{**, **(**, or **[**. Default structural indentation is 4 characters by
+default but can be changed with the **-i=n** parameter. The total structural
+indentation is easily determined by keeping a stack of the opening tokens which
+contain a given line.
+
+**Continuation indentation** is introduced to help show structure in multi-line
+statements, list items, and logical expressions. The first line of such long
+lines usually starts with the basic structural indentation. Subsequent
+lines are given the additional continuation indentation to emphasize that
+they are a continuation of the statement.
+
+The default continuation indentation is 2 characters but this can be changed
+with the **-ci=n** parameter.
+
+Previously, computation of continuation indentation was done in the
+initial pass through a file, and this placed some limits on what it could do.
+This computation has been moved downstream in the processing pipeline, where the
+entire file is accessible with full data structures, and this allows several
+improvements to be made. These mainly involve (1) the continuation
+indentation assigned to comments in unusual circumstances, or (2) the
+indentation of complex ternary expressions, or (3) the indentation of
+chains of ``sort/map/grep`` blocks. Some examples are as follows.
+
+## Block comment indentation changes before closing braces, brackets and parens
+
+The indentation of one-line comments, also called block comments, which
+appear near the end of a containing structure are now independent of the
+existence of any optional trailing comma or semicolon.
+
+To illustrate the issue, consider the following example, in which the last
+statement is not terminated with a semicolon. Previously, the subsequent
+comments would have continuation indentation, since the statement is not
+terminated:
+
+```
+BEGIN {
+
+ $my_hash{'word1'} = 1;
+ $my_hash{'word2'} = 1
+
+ # comment
+ # ...
+}
+```
+
+In the updated version, since the final semicolon is optional, the
+comments do not have the continuation indentation:
+
+```
+BEGIN {
+
+ $my_hash{'word1'} = 1;
+ $my_hash{'word2'} = 1
+
+ # comment
+ # ...
+}
+```
+
+This makes the comments have the same indentation as if there were a
+terminal semicolon. This update keeps large blocks of comments from shifting
+when an optional trailing semicolon or comma is added or removed.
+
+## Closing brace indentation changes
+
+A related issue which has been fixed is illustrated with the following
+example which shows the previous formatting:
+
+```
+ if ( $term->ReadLine eq "Term::ReadLine::Gnu" ) {
+ my $attribs = $term->Attribs;
+ $attribs->{attempted_completion_function} = sub {
+ &CPAN::Complete::gnu_cpl;
+ }
+
+ # comment
+ # comment
+ }
+```
+
+Here again, an optional terminal semicolon is missing after the closing sub
+brace, and there are some comments before the closing ``if`` block brace. The
+previous logic had a limited look-ahead ability, and in this case the
+continuation indentation of the closing sub brace was not removed.
+
+The updated logic fixes this problem:
+
+```
+ if ( $term->ReadLine eq "Term::ReadLine::Gnu" ) {
+ my $attribs = $term->Attribs;
+ $attribs->{attempted_completion_function} = sub {
+ &CPAN::Complete::gnu_cpl;
+ }
+
+ # comment
+ # comment
+ }
+```
+
+## Block comment indentation changes in ternary statements
+
+Another change is that the indentation of block comments within ternary
+statements is improved. These can be difficult to format. For example,
+here is the old default formatting of a complex ternary with lots of comments:
+
+```
+ # a) under an interactive shell?
+ my $rl_avail = ( !$term->isa('CPANPLUS::Shell::_Faked') )
+
+ # b) do we have a tty terminal?
+ ? ( -t STDIN )
+
+ # c) should we enable the term?
+ ? ( !$self->__is_bad_terminal($term) )
+
+ # d) external modules available?
+ ? ( $term->ReadLine ne "Term::ReadLine::Stub" )
+
+ # a+b+c+d => "Smart" terminal
+ ? loc("enabled")
+
+ # a+b+c => "Stub" terminal
+ : loc("available (try 'i Term::ReadLine::Perl')")
+
+ # a+b => "Bad" terminal
+ : loc("disabled")
+
+ # a => "Dumb" terminal
+ : loc("suppressed")
+
+ # none => "Faked" terminal
+ : loc("suppressed in batch mode");
+```
+
+The comment indentation is very poor here. Here is the new formatting:
+
+```
+ # a) under an interactive shell?
+ my $rl_avail = ( !$term->isa('CPANPLUS::Shell::_Faked') )
+
+ # b) do we have a tty terminal?
+ ? ( -t STDIN )
+
+ # c) should we enable the term?
+ ? ( !$self->__is_bad_terminal($term) )
+
+ # d) external modules available?
+ ? ( $term->ReadLine ne "Term::ReadLine::Stub" )
+
+ # a+b+c+d => "Smart" terminal
+ ? loc("enabled")
+
+ # a+b+c => "Stub" terminal
+ : loc("available (try 'i Term::ReadLine::Perl')")
+
+ # a+b => "Bad" terminal
+ : loc("disabled")
+
+ # a => "Dumb" terminal
+ : loc("suppressed")
+
+ # none => "Faked" terminal
+ : loc("suppressed in batch mode");
+```
+
+## Improved indentation for some nested welds.
+
+An issue has been fixed involving cases where the **--weld-nested**, or **-wn**
+parameter was used on comma-separated lists of items at block level (paren-less
+lists). For example, here is the old default formatting with the **-wn**
+parameter.
+
+```
+is_deeply $fixer->fix( {
+ demo => { nl => 'Tuin der lusten', en => 'The Garden of Earthly Delights' }
+} ),
+ {
+ demo => { NL => 'TUIN DER LUSTEN', en => 'The Garden of Earthly Delights' },
+ titles => ['The Garden of Earthly Delights']
+ },
+ 'specific testing';
+```
+
+The closing '} )' is missing some continuation indentation. The new default
+formatting is
+
+```
+is_deeply $fixer->fix( {
+ demo => { nl => 'Tuin der lusten', en => 'The Garden of Earthly Delights' }
+ } ),
+ {
+ demo => { NL => 'TUIN DER LUSTEN', en => 'The Garden of Earthly Delights' },
+ titles => ['The Garden of Earthly Delights']
+ },
+ 'specific testing';
+```
+
+## Problems with excess continuation indentation
+
+A very rare problem has been fixed in which excess indentation could occur.
+This is illustrated in the following example which is run with **-ci=4** to
+emphasize the problem:
+
+```
+ ( $foo, $dayC[$cnt], $foo ) = split /;/,
+ $slist[
+ &UnixDate(
+ &ParseDate(
+ $week_name[ $cnt - 1 ]
+ . " week "
+ . $uweek . " "
+ . $this_year
+ ),
+ "%j"
+ ) - 1
+ ];
+```
+
+The problem is that the lines with leading dots have twice the
+amount of indentation that they should. The new version fixes this:
+
+```
+ ( $foo, $dayC[$cnt], $foo ) = split /;/,
+ $slist[
+ &UnixDate(
+ &ParseDate(
+ $week_name[ $cnt - 1 ]
+ . " week "
+ . $uweek . " "
+ . $this_year
+ ),
+ "%j"
+ ) - 1
+ ];
+```
+
+Here is another example, also run with **-ci=4** for emphasis:
+
+```
+ $a
+ ? $b
+ ? $c
+ ? $d
+ ? $e
+ : $f
+ : $g
+ : $h
+ : print "hello\n";
+```
+
+Note how $e and $f have excess indentation. The updated version is:
+
+```
+ $a
+ ? $b
+ ? $c
+ ? $d
+ ? $e
+ : $f
+ : $g
+ : $h
+ : print "hello\n";
+```
+
+## Some problems with indentation in ternary expressions
+
+The continuation indentation in some complex ternary statements has been
+improved. For example, in the following old formatting the lines beginning
+with ``&&`` lack continuation indentation:
+
+```
+ if (
+ $file eq '-' ? open(PHONES, '<&STDIN')
+ : $file =~ /\.Z$/ ? open(PHONES, "zcat '$file' 2>/dev/null |")
+ : $file =~ /\.pgp$/ ? $usepgp
+ && length($ENV{PGPPASS})
+ && open(PHONES, "pgp -fd <'$file' |")
+ : open(PHONES, "< $file\0")
+ )
+ {
+ }
+```
+
+The updated version adds indentation to these lines to help indicate that
+they are a continuation of the previous line.
+
+```
+ if (
+ $file eq '-' ? open(PHONES, '<&STDIN')
+ : $file =~ /\.Z$/ ? open(PHONES, "zcat '$file' 2>/dev/null |")
+ : $file =~ /\.pgp$/ ? $usepgp
+ && length($ENV{PGPPASS})
+ && open(PHONES, "pgp -fd <'$file' |")
+ : open(PHONES, "< $file\0")
+ )
+ {
+ }
+```
+
+## Some improved indentation of filter block chains
+
+The lines of an isolated chain of ``sort/map/grep`` blocks are normally all
+given the same indentation. For example
+
+```
+ @new_in_dir = (
+ grep { not $seen{$_} }
+ map { $dir . "/" . $_ }
+ grep { not ignore_file($_) }
+ grep { not $skip{$_} } readdir(D)
+ );
+```
+
+Previously, there were a a number of situations where this could not be
+achieved. As an example, if the above example had side comments then the
+formatting would be
+
+```
+ @new_in_dir = (
+ grep { not $seen{$_} } # files not yet processed
+ map { $dir . "/" . $_ } # map from file to dir/file
+ grep { not ignore_file($_) } # ignore files in cvsignore
+ grep { not $skip{$_} } # skip files to be ignored
+ readdir(D)
+ );
+```
+
+The first line now has a different indentation from the rest, and this is
+undesirable because ideally indentation should be independent of the existence
+of side comments. The new version handles this correctly:
+
+```
+ @new_in_dir = (
+ grep { not $seen{$_} } # files not yet processed
+ map { $dir . "/" . $_ } # map from file to dir/file
+ grep { not ignore_file($_) } # ignore files in cvsignore
+ grep { not $skip{$_} } # skip files to be ignored
+ readdir(D)
+ );
+```
+
+A related change is that some undesirable alignments across changes in
+continuation indentation have been removed. For example, here is an
+example of this issue as previously formatted:
+
+```
+ print $fh map { $_->[0] }
+ sort { $a->[1] cmp $b->[1] || $a->[0] cmp $b->[0] }
+ map { my $f = lc $_; $f =~ s/[^a-z0-9\s]//g; [ $_, $f ] } @manifest;
+```
+
+The alignment of the ``map`` and ``sort`` braces produces an undesirable
+gap. The revised formatting avoids this:
+
+```
+ print $fh map { $_->[0] }
+ sort { $a->[1] cmp $b->[1] || $a->[0] cmp $b->[0] }
+ map { my $f = lc $_; $f =~ s/[^a-z0-9\s]//g; [ $_, $f ] } @manifest;
+```
alert for the double encoding problem, and how to reset the default if
necessary. This is also the reason for waiting some time before the second step was made.
-If double encoding does appear to be occuring with the change in the default for some program which calls Perl::Tidy, then a quick emergency fix can be made by the program user by setting **-neos** to revert to the old default. A better fix can eventually be made by the program author by removing the second encoding using a technique such as illustrated above.
+If double encoding does appear to be occurring with the change in the default for some program which calls Perl::Tidy, then a quick emergency fix can be made by the program user by setting **-neos** to revert to the old default. A better fix can eventually be made by the program author by removing the second encoding using a technique such as illustrated above.
## Summary
The source can only be in 'C' mode if it arrives by a call from another Perl
program, and the destination can only be in 'C' mode if the destination is a
Perl program. Otherwise, if the destination is a file, or object with a print
-method, then it will be assumed to be ending its existance as a Perl string and
+method, then it will be assumed to be ending its existence as a Perl string and
will be placed in an end state which is 'B' mode.
Transition from a starting 'B' mode to 'C' mode is done by a decoding operation
Let us make a list of all possible sets of string storage modes to be sure that
all cases are covered. If each of the three stages list above (entry,
-intermedite, and exit) could be in 'B' or 'C' mode then we would have a total
+intermediate, and exit) could be in 'B' or 'C' mode then we would have a total
of 2 x 2 x 2 = 8 combinations of states. Each end point may either be a file
or a string reference. Here is a list of them, with a note indicating which
ones are possible, and when:
<p>Perltidy should run on any system with perl 5.008 or later.
The total disk space needed after removing the installation directory will be
-about 2 Mb.</p>
+about 4 Mb.</p>
<h2>Download</h2>
</code></pre>
<p>The <a href="./INSTALL.html">INSTALL file</a> has additional installation notes. They
-are mainly for older sytems but also tell how to use perltidy without doing an installation.</p>
+are mainly for older systems but also tell how to use perltidy without doing an installation.</p>
<h2>Links</h2>
Perltidy should run on any system with perl 5.008 or later.
The total disk space needed after removing the installation directory will be
-about 2 Mb.
+about 4 Mb.
## Download
make install
The [INSTALL file](./INSTALL.html) has additional installation notes. They
-are mainly for older sytems but also tell how to use perltidy without doing an installation.
+are mainly for older systems but also tell how to use perltidy without doing an installation.
## Links
<li><a href="#Whitespace-Control">Whitespace Control</a></li>
<li><a href="#Comment-Controls">Comment Controls</a></li>
<li><a href="#Skipping-Selected-Sections-of-Code">Skipping Selected Sections of Code</a></li>
+ <li><a href="#Formatting-a-Limited-Range-of-Lines">Formatting a Limited Range of Lines</a></li>
<li><a href="#Line-Break-Control">Line Break Control</a></li>
<li><a href="#Controlling-Breaks-at-Braces-Parens-and-Square-Brackets">Controlling Breaks at Braces, Parens, and Square Brackets</a></li>
<li><a href="#Welding">Welding</a></li>
<li><a href="#Breaking-Before-or-After-Operators">Breaking Before or After Operators</a></li>
<li><a href="#Controlling-List-Formatting">Controlling List Formatting</a></li>
<li><a href="#Adding-and-Deleting-Commas">Adding and Deleting Commas</a></li>
+ <li><a href="#Adding-and-Deleting-Interbracket-Arrows">Adding and Deleting Interbracket Arrows</a></li>
+ <li><a href="#Missing-Else-Blocks">Missing Else Blocks</a></li>
<li><a href="#Retaining-or-Ignoring-Existing-Line-Breaks">Retaining or Ignoring Existing Line Breaks</a></li>
<li><a href="#Blank-Line-Control">Blank Line Control</a></li>
<li><a href="#Styles">Styles</a></li>
<li><a href="#One-Line-Blocks">One-Line Blocks</a></li>
<li><a href="#Controlling-Vertical-Alignment">Controlling Vertical Alignment</a></li>
<li><a href="#Extended-Syntax">Extended Syntax</a></li>
- <li><a href="#Other-Controls">Other Controls</a></li>
+ <li><a href="#Deleting-and-Extracting-Pod-or-Comments">Deleting and Extracting Pod or Comments</a></li>
+ <li><a href="#The-perltidyrc-file">The perltidyrc file</a></li>
+ <li><a href="#Debugging-perltidy-input">Debugging perltidy input</a></li>
+ <li><a href="#Analyzing-Code">Analyzing Code</a></li>
+ <li><a href="#Working-with-MakeMaker-AutoLoader-and-SelfLoader">Working with MakeMaker, AutoLoader and SelfLoader</a></li>
</ul>
</li>
<li><a href="#HTML-OPTIONS">HTML OPTIONS</a></li>
<h1 id="EXAMPLES">EXAMPLES</h1>
+<p>Here are some example perltidy commands followed by their meanings:</p>
+
<pre><code> perltidy somefile.pl</code></pre>
<p>This will produce a file <i>somefile.pl.tdy</i> containing the script reformatted using the default options, which approximate the style suggested in perlstyle(1). The source file <i>somefile.pl</i> is unchanged.</p>
<p>Execute perltidy on file <i>somefile.pl</i> and save a log file <i>somefile.pl.LOG</i> which shows the nesting of braces, parentheses, and square brackets at the start of every line.</p>
+<pre><code> perltidy -dbs -dbl=10 somefile.pl >blocks.csv</code></pre>
+
+<p>This will dump a table of comma-separated metrics for subroutines longer than 10 lines to <i>blocks.csv</i>.</p>
+
+<pre><code> perltidy -duv somefile.pl >vars.txt</code></pre>
+
+<p>This will dump a list of unused and reused lexical variable names to <i>vars.txt</i>.</p>
+
<pre><code> perltidy -html somefile.pl</code></pre>
-<p>This will produce a file <i>somefile.pl.html</i> containing the script with html markup. The output file will contain an embedded style sheet in the <HEAD> section which may be edited to change the appearance.</p>
+<p>This will produce a file <i>somefile.pl.html</i> containing the script with html markup. The output file will contain an embedded style sheet in the <code><HEAD</code>> section which may be edited to change the appearance.</p>
<pre><code> perltidy -html -css=mystyle.css somefile.pl</code></pre>
<pre><code> perltidy -html -pre somefile.pl</code></pre>
-<p>Write an html snippet with only the PRE section to <i>somefile.pl.html</i>. This is useful when code snippets are being formatted for inclusion in a larger web page. No style sheet will be written in this case.</p>
+<p>Write an html snippet with only the <code><PRE</code>> section to <i>somefile.pl.html</i>. This is useful when code snippets are being formatted for inclusion in a larger web page. No style sheet will be written in this case.</p>
<pre><code> perltidy -html -ss >mystyle.css</code></pre>
<p>Show summary of usage and exit.</p>
+</dd>
+<dt id="v---version"><b>-v</b>, <b>--version</b></dt>
+<dd>
+
+<p>Show perltidy VERSION number and exit.</p>
+
</dd>
<dt id="o-filename---outfile-filename"><b>-o</b>=filename, <b>--outfile</b>=filename</dt>
<dd>
<p>If the path contains spaces, it should be placed in quotes.</p>
-<p>This parameter will be ignored if output is being directed to standard output, or if it is being specified explicitly with the <b>-o=s</b> parameter.</p>
+<p>This parameter will be ignored if output is being directed to standard output, or if it is being specified explicitly with the <b>--outfile=s</b> parameter.</p>
</dd>
<dt id="b---backup-and-modify-in-place"><b>-b</b>, <b>--backup-and-modify-in-place</b></dt>
<p>A <b>-b</b> flag will be ignored if input is from standard input or goes to standard output, or if the <b>-html</b> flag is set.</p>
-<p>In particular, if you want to use both the <b>-b</b> flag and the <b>-pbp</b> (--perl-best-practices) flag, then you must put a <b>-nst</b> flag after the <b>-pbp</b> flag because it contains a <b>-st</b> flag as one of its components, which means that output will go to the standard output stream.</p>
+<p>In particular, if you want to use both the <b>-b</b> flag and the <b>--perl-best-practices</b> (<b>-pbp</b>) flag, then you must put a <b>--nostandard-output</b> (<b>-nst</b>) flag after the <b>-pbp</b> flag because it contains a <b>--standard-output</b> flag as one of its components, which means that output will go to the standard output stream.</p>
</dd>
<dt id="bext-ext---backup-file-extension-ext"><b>-bext</b>=ext, <b>--backup-file-extension</b>=ext</dt>
<p>Here are some examples:</p>
-<pre><code> Parameter Extension Backup File Treatment
- <-bext=bak> F<.bak> Keep (same as the default behavior)
- <-bext='/'> F<.bak> Delete if no errors
- <-bext='/backup'> F<.backup> Delete if no errors
- <-bext='original/'> F<.original> Delete if no errors</code></pre>
+<pre><code> Parameter Extension Backup File Treatment
+ -bext=bak .bak Keep (same as default behavior)
+ -bext='/' .bak Delete if no errors
+ -bext='/backup' .backup Delete if no errors
+ -bext='original/' .original Delete if no errors</code></pre>
</dd>
<dt id="bm-s---backup-method-s"><b>-bm=s</b>, <b>--backup-method=s</b></dt>
<dt id="pro-filename-or---profile-filename"><b>-pro=filename</b> or <b>--profile=filename</b></dt>
<dd>
-<p>To simplify testing and switching .perltidyrc files, this command may be used to specify a configuration file which will override the default name of .perltidyrc. There must not be a space on either side of the '=' sign. For example, the line</p>
+<p>To simplify testing and switching .perltidyrc files, this command may be used to specify a configuration file which will override the default name of <i>.perltidyrc</i>. There must not be a space on either side of the '=' sign. For example, the line</p>
<pre><code> perltidy -pro=testcfg</code></pre>
<p>This flag asserts that the input and output code streams are different, or in other words that the input code is 'untidy' according to the formatting parameters. If this is not the case, an error message noting this is produced. This flag has no other effect on the functioning of perltidy.</p>
+</dd>
+<dt id="tos-n---timeout-in-seconds-n"><b>-tos=n</b>, <b>--timeout-in-seconds=n</b></dt>
+<dd>
+
+<p>When the standard input supplies the input stream, and the input has not been received within <b>n</b> seconds, perltidy will end with a timeout message. The intention is to catch a situation where perltidy is accidentally invoked without a file to process and therefore waits for input from the system standard input (stdin), which never arrives. The default is <b>n=10</b> seconds. This check can be turned off with <b>n=0</b>.</p>
+
</dd>
</dl>
<dl>
+<dt id="fmt-s---format-s"><b>-fmt=s</b>, <b>--format=s</b></dt>
+<dd>
+
+<p>If <b>--format=tidy</b> (the default) then perltidy will reformat the input file, and if <b>--format=html</b> then perltidy will produce html output.</p>
+
+<p>For convenience, the abbreviation <b>--tidy</b> is equivalent to <b>--format=tidy</b>, and <b>-html</b> is equivalent to <b>--format=html</b>.</p>
+
+</dd>
<dt id="notidy"><b>--notidy</b></dt>
<dd>
<dt id="l-n---maximum-line-length-n"><b>-l=n</b>, <b>--maximum-line-length=n</b></dt>
<dd>
-<p>The default maximum line length is n=80 characters. Perltidy will try to find line break points to keep lines below this length. However, long quotes and side comments may cause lines to exceed this length.</p>
+<p>The default maximum line length is <b>n=80</b> characters. Perltidy will try to find line break points to keep lines below this length. However, long quotes and side comments may cause lines to exceed this length. And long lines may sometimes be broken at a length less than <b>n</b> characters because some of the line break decisions employ small tolerances to prevent formatting instabilities.</p>
<p>The default length of 80 comes from the past when this was the standard CRT screen width. Many programmers prefer to increase this to something like 120.</p>
<p>For example if one level of indentation equals 4 spaces (<b>-i=4</b>, the default), and one uses <b>-wc=15</b>, then if the leading whitespace on a line exceeds about 4*15=60 spaces it will be reduced back to 4*1=4 spaces and continue increasing from there. If the whitespace never exceeds this limit the formatting remains unchanged.</p>
-<p>The combination of <b>-vmll</b> and <b>-wc=n</b> provides a solution to the problem of displaying arbitrarily deep data structures and code in a finite window, although <b>-wc=n</b> may of course be used without <b>-vmll</b>.</p>
+<p>The combination of <b>--variable-maximum-line-length</b> and <b>--whitespace-cycle=n</b> provides a solution to the problem of displaying arbitrarily deep data structures and code in a finite window, although <b>--whitespace-cycle=n</b> may of course be used without <b>--variable-maximum-line-length</b>.</p>
<p>The default is not to use this, which can also be indicated using <b>-wc=0</b>.</p>
<p>Using tab characters will almost certainly lead to future portability and maintenance problems, so the default and recommendation is not to use them. For those who prefer tabs, however, there are two different options.</p>
-<p>Except for possibly introducing tab indentation characters, as outlined below, perltidy does not introduce any tab characters into your file, and it removes any tabs from the code (unless requested not to do so with <b>-fws</b>). If you have any tabs in your comments, quotes, or here-documents, they will remain.</p>
+<p>Except for possibly introducing tab indentation characters, as outlined below, perltidy does not introduce any tab characters into your file, and it removes any tabs from the code (unless requested not to do so with <b>--freeze-whitespace</b>). If you have any tabs in your comments, quotes, or here-documents, they will remain.</p>
<dl>
<p>The value of the integer <b>n</b> can be any value but can be coordinated with the number of spaces used for indentation. For example, <b>-et=4 -ci=4 -i=4</b> will produce one tab for each indentation level and and one for each continuation indentation level. You may want to coordinate the value of <b>n</b> with what your display software assumes for the spacing of a tab.</p>
+<p>The default is not to use this, which can also be indicated using <b>-et=0</b>.</p>
+
</dd>
<dt id="t---tabs"><b>-t</b>, <b>--tabs</b></dt>
<dd>
-<p>This flag causes one leading tab character to be inserted for each level of indentation. Certain other features are incompatible with this option, and if these options are also given, then a warning message will be issued and this flag will be unset. One example is the <b>-lp</b> option. This flag is retained for backwards compatibility, but if you use tabs, the <b>-et=n</b> flag is recommended. If both <b>-t</b> and <b>-et=n</b> are set, the <b>-et=n</b> is used.</p>
+<p>This flag causes one leading tab character to be inserted for each level of indentation. Certain other features are incompatible with this option, and if these options are also given, then a warning message will be issued and this flag will be unset. One example is the <b>--line-up-parentheses</b> option. This flag is retained for backwards compatibility, but if you use tabs, the <b>--entab-leading-whitespace=n</b> flag is recommended. If both <b>--tabs</b> and <b>--entab-leading-whitespace=n</b> are set, then <b>--entab-leading-whitespace=n</b> is used.</p>
</dd>
<dt id="dt-n---default-tabsize-n"><b>-dt=n</b>, <b>--default-tabsize=n</b></dt>
<p>If you also want to keep your existing blank lines exactly as they are, you can add <b>--freeze-blank-lines</b>.</p>
-<p>With this option perltidy is still free to modify the indenting (and outdenting) of code and comments as it normally would. If you also want to prevent long comment lines from being outdented, you can add either <b>-noll</b> or <b>-l=0</b>.</p>
+<p>With this option perltidy is still free to modify the indenting (and outdenting) of code and comments as it normally would. If you also want to prevent long comment lines from being outdented, you can add either <b>--nooutdent-long-lines</b> (<b>-noll</b>) or <b>--maximum-line-length=0</b> (<b>l=0</b>).</p>
<p>Setting this flag will prevent perltidy from doing any special operations on closing side comments. You may still delete all side comments however when this flag is in effect.</p>
<dt id="enc-s---character-encoding-s"><b>-enc=s</b>, <b>--character-encoding=s</b></dt>
<dd>
-<p>This flag indicates if the input data stream use a character encoding. Perltidy does not look for the encoding directives in the source stream, such as <b>use utf8</b>, and instead relies on this flag to determine the encoding. (Note that perltidy often works on snippets of code rather than complete files so it cannot rely on <b>use utf8</b> directives).</p>
+<p>This flag indicates if the input data stream uses a character encoding. Perltidy does not look for the encoding directives in the source stream, such as <b>use utf8</b>, and instead relies on this flag to determine the encoding. (This is because perltidy often works on snippets of code rather than complete files, so it cannot rely on <b>use utf8</b> directives). Consequently perltidy is likely to encounter problems formatting a file which is only partially encoded.</p>
<p>The possible values for <b>s</b> are:</p>
<dt id="it-n---iterations-n"><b>-it=n</b>, <b>--iterations=n</b></dt>
<dd>
-<p>This flag causes perltidy to do <b>n</b> complete iterations. The reason for this flag is that code beautification is an iterative process and in some cases the output from perltidy can be different if it is applied a second time. For most purposes the default of <b>n=1</b> should be satisfactory. However <b>n=2</b> can be useful when a major style change is being made, or when code is being beautified on check-in to a source code control system. It has been found to be extremely rare for the output to change after 2 iterations. If a value <b>n</b> is greater than 2 is input then a convergence test will be used to stop the iterations as soon as possible, almost always after 2 iterations. See the next item for a simplified iteration control.</p>
+<p>This flag causes perltidy to do <b>n</b> complete iterations. The reason for this flag is that code formatting is an iterative process and in some cases the output from perltidy can be different if it is applied a second time. For most purposes the default of <b>n=1</b> should be satisfactory. However <b>n=2</b> can be useful when a major style change is being made, or when code is being beautified on check-in to a source code control system. It has been found to be extremely rare for the output to change after 2 iterations. If a value <b>n</b> is greater than 2 is input then a convergence test will be used to stop the iterations as soon as possible, almost always after 2 iterations. See the next item for a simplified iteration control.</p>
<p>This flag has no effect when perltidy is used to generate html.</p>
<dt id="conv---converge"><b>-conv</b>, <b>--converge</b></dt>
<dd>
-<p>This flag is equivalent to <b>-it=4</b> and is included to simplify iteration control. For all practical purposes one either does or does not want to be sure that the output is converged, and there is no penalty to using a large iteration limit since perltidy will check for convergence and stop iterating as soon as possible. The default is <b>-nconv</b> (no convergence check). Using <b>-conv</b> will approximately double run time since typically one extra iteration is required to verify convergence. No extra iterations are required if no new line breaks are made, and two extra iterations are occasionally needed when reformatting complex code structures, such as deeply nested ternary statements.</p>
+<p>This flag is equivalent to <b>-it=4</b> and is included to simplify iteration control. Perltidy will check for convergence and stop iterating as soon as possible. The default is <b>-nconv</b> (no convergence check). Using <b>-conv</b> will approximately double run time since typically one extra iteration is required to verify convergence. No extra iterations are required if no new line breaks are made, and two extra iterations are occasionally needed when reformatting complex code structures, such as deeply nested ternary statements.</p>
</dd>
</dl>
<pre><code> my $level = # -ci=0
( $max_index_to_go >= 0 ) ? $levels_to_go[0] : $last_output_level;</code></pre>
-<p>The value given to <b>-ci</b> is also used by some commands when a small space is required. Examples are commands for outdenting labels, <b>-ola</b>, and control keywords, <b>-okw</b>.</p>
+<p>The value given to <b>-ci</b> is also used by some commands when a small space is required. Examples are commands for outdenting labels, <b>--outdent-labels</b> (<b>-ola</b>), and control keywords, <b>--outdent-keywords</b> (<b>-okw</b>).</p>
<p>When default values are not used, it is recommended that either</p>
<p>(1) the value <b>n</b> given with <b>-ci=n</b> be no more than about one-half of the number of spaces assigned to a full indentation level on the <b>-i=n</b> command, or</p>
-<p>(2) the flag <b>-extended-continuation-indentation</b> is used (see next section).</p>
+<p>(2) the flag <b>--extended-continuation-indentation</b> is used (see next section).</p>
</dd>
<dt id="xci---extended-continuation-indentation"><b>-xci</b>, <b>--extended-continuation-indentation</b></dt>
<p>This flag allows perltidy to use some improvements which have been made to its indentation model. One of the things it does is "extend" continuation indentation deeper into structures, hence the name. The improved indentation is particularly noticeable when the flags <b>-ci=n</b> and <b>-i=n</b> use the same value of <b>n</b>. There are no significant disadvantages to using this flag, but to avoid disturbing existing formatting the default is not to use it, <b>-nxci</b>.</p>
-<p>Please see the section <a href="#pbp---perl-best-practices">"<b>-pbp</b>, <b>--perl-best-practices</b>"</a> for an example of how this flag can improve the formatting of ternary statements. It can also improve indentation of some multi-line qw lists as shown below.</p>
+<p>Please see the section <a href="#pbp---perl-best-practices">"<b>-pbp</b>, <b>--perl-best-practices</b>"</a> for an example of how this flag can improve the formatting of ternary statements. It can also improve indentation of some multiline qw lists as shown below.</p>
+
+<pre><code> # perltidy
+ foreach $color (
+ qw(
+ AntiqueWhite3 Bisque1 Bisque2 Bisque3 Bisque4
+ SlateBlue3 RoyalBlue1 SteelBlue2 DeepSkyBlue3
+ ),
+ qw(
+ LightBlue1 DarkSlateGray1 Aquamarine2 DarkSeaGreen2
+ SeaGreen1 Yellow1 IndianRed1 IndianRed2 Tan1 Tan4
+ )
+ )
-<pre><code> # perltidy
- foreach $color (
- qw(
+ # perltidy -xci
+ foreach $color (
+ qw(
AntiqueWhite3 Bisque1 Bisque2 Bisque3 Bisque4
SlateBlue3 RoyalBlue1 SteelBlue2 DeepSkyBlue3
- ),
- qw(
+ ),
+ qw(
LightBlue1 DarkSlateGray1 Aquamarine2 DarkSeaGreen2
SeaGreen1 Yellow1 IndianRed1 IndianRed2 Tan1 Tan4
- )
- )
+ )
+ )</code></pre>
- # perltidy -xci
- foreach $color (
- qw(
- AntiqueWhite3 Bisque1 Bisque2 Bisque3 Bisque4
- SlateBlue3 RoyalBlue1 SteelBlue2 DeepSkyBlue3
- ),
- qw(
- LightBlue1 DarkSlateGray1 Aquamarine2 DarkSeaGreen2
- SeaGreen1 Yellow1 IndianRed1 IndianRed2 Tan1 Tan4
- )
- )</code></pre>
+</dd>
+<dt id="mci---minimize-continuation-indentation"><b>-mci</b>, <b>--minimize-continuation-indentation</b></dt>
+<dd>
+
+<p>This flag allows perltidy to remove continuation indentation in some special cases where it is not really unnecessary. For a simple example, the default formatting for the following snippet is:</p>
+
+<pre><code> # perltidy -nmci
+ $self->blurt( "Error: No INPUT for type '$type', typekind '"
+ . $type->xstype
+ . "' found" );</code></pre>
+
+<p>The second and third lines are one level deep in a container, and are also statement continuations, so they get indented by the sum of the full indentation <b>-i</b> value and the continuation indentation <b>-ci</b> value. If this flag is set, the indentation is reduced by <b>-ci</b> spaces, giving</p>
+
+<pre><code> # perltidy -mci
+ $self->blurt( "Error: No INPUT for type '$type', typekind '"
+ . $type->xstype
+ . "' found" );</code></pre>
+
+<p>This flag is off by default.</p>
</dd>
<dt id="sil-n---starting-indentation-level-n"><b>-sil=n</b> <b>--starting-indentation-level=n</b></dt>
<p>By default, perltidy examines the input file and tries to determine the starting indentation level. While it is often zero, it may not be zero for a code snippet being sent from an editing session.</p>
-<p>To guess the starting indentation level perltidy simply assumes that indentation scheme used to create the code snippet is the same as is being used for the current perltidy process. This is the only sensible guess that can be made. It should be correct if this is true, but otherwise it probably won't. For example, if the input script was written with -i=2 and the current perltidy flags have -i=4, the wrong initial indentation will be guessed for a code snippet which has non-zero initial indentation. Likewise, if an entabbing scheme is used in the input script and not in the current process then the guessed indentation will be wrong.</p>
+<p>To guess the starting indentation level perltidy simply assumes that indentation scheme used to create the code snippet is the same as is being used for the current perltidy process. This is the only sensible guess that can be made. It should be correct if this is true, but otherwise it probably won't. For example, if the input script was written with <b>-i=2</b> and the current perltidy flags have <b>-i=4</b>, the wrong initial indentation will be guessed for a code snippet which has non-zero initial indentation. Likewise, if an entabbing scheme is used in the input script and not in the current process then the guessed indentation will be wrong.</p>
<p>If the default method does not work correctly, or you want to change the starting level, use <b>-sil=n</b>, to force the starting level to be n.</p>
<ul>
-<li><p>A limitation on <b>-lp</b>, but not <b>-xlp</b>, occurs in situations where perltidy does not have complete freedom to choose line breaks. Then it may temporarily revert to its default indentation method. This can occur for example if there are blank lines, block comments, multi-line quotes, or side comments between the opening and closing parens, braces, or brackets. It will also occur if a multi-line anonymous sub occurs within a container since that will impose specific line breaks (such as line breaks after statements).</p>
+<li><p>A limitation on <b>-lp</b>, but not <b>-xlp</b>, occurs in situations where perltidy does not have complete freedom to choose line breaks. Then it may temporarily revert to its default indentation method. This can occur for example if there are blank lines, block comments, multiline quotes, or side comments between the opening and closing parens, braces, or brackets. It will also occur if a multiline anonymous sub occurs within a container since that will impose specific line breaks (such as line breaks after statements).</p>
</li>
-<li><p>For both the <b>-lp</b> and <b>-xlp</b> flags, any parameter which significantly restricts the ability of perltidy to choose newlines will conflict with these flags and will cause them to be deactivated. These include <b>-io</b>, <b>-fnl</b>, <b>-nanl</b>, and <b>-ndnl</b>.</p>
+<li><p>For both the <b>-lp</b> and <b>-xlp</b> flags, any parameter which significantly restricts the ability of perltidy to choose newlines will conflict with these flags and will cause them to be deactivated. These include <b>--indent-only</b>, <b>--freeze-newlines</b>, <b>--noadd-newlines</b>, and <b>--nodelete-old-newlines</b>.</p>
</li>
<li><p>The <b>-lp</b> and <b>-xlp</b> options may not be used together with the <b>-t</b> tabs option. They may, however, be used with the <b>-et=n</b> tab method</p>
<p>The following discussion is written for <b>-lp</b> but applies equally to the newer <b>-xlp</b> version. By default, the <b>-lp</b> flag applies to as many containers as possible. The set of containers to which the <b>-lp</b> style applies can be reduced by either one of these two flags:</p>
-<p>Use <b>-lpil=s</b> to specify the containers to which <b>-lp</b> applies, or</p>
+<p>Use <b>--line-up-parentheses-inclusion-list=s</b> (<b>-lpil=s</b>) to specify the containers to which <b>-lp</b> applies, or</p>
-<p>use <b>-lpxl=s</b> to specify the containers to which <b>-lp</b> does NOT apply.</p>
+<p>use <b>--line-up-parentheses-exclusion-list=s</b> (<b>-lpxl=s</b>) to specify the containers to which <b>-lp</b> does NOT apply.</p>
<p>Only one of these two flags may be used. Both flags can achieve the same result, but the <b>-lpil=s</b> flag is much easier to describe and use and is recommended. The <b>-lpxl=s</b> flag was the original implementation and is only retained for backwards compatibility.</p>
<p>A second optional item of information which can be given for parentheses is an alphanumeric letter which is used to limit the selection further depending on the type of token immediately before the paren. The possible letters are currently 'k', 'K', 'f', 'F', 'w', and 'W', with these meanings for matching whatever precedes an opening paren:</p>
-<pre><code> 'k' matches if the previous nonblank token is a perl built-in keyword (such as 'if', 'while'),
- 'K' matches if 'k' does not, meaning that the previous token is not a keyword.
- 'f' matches if the previous token is a function other than a keyword.
+<pre><code> 'k' matches if the previous nonblank token is a perl keyword
+ (such as 'if', 'while'),
+ 'K' matches if 'k' does not: previous token is not a keyword
+ 'f' matches if previous token is a function (not a keyword)
'F' matches if 'f' does not.
'w' matches if either 'k' or 'f' match.
'W' matches if 'w' does not.</code></pre>
<p>An optional numeric code may follow any of the container types to further refine the selection based on container contents. The numeric codes are:</p>
-<pre><code> '0' or blank: no check on contents is made
- '1' exclude B<-lp> unless the contents is a simple list without sublists
- '2' exclude B<-lp> unless the contents is a simple list without sublists, without
- code blocks, and without ternary operators</code></pre>
+<pre><code> '0' or blank: no restriction is placed on container contents
+ '1' the container contents must be a simple list without sublists
+ '2' the container contents must be a simple list without sublists,
+ without code blocks, and without ternary operators</code></pre>
<p>For example,</p>
<pre><code> -lpil = 'f(2'</code></pre>
-<p>means only apply -lp to function call lists which do not contain any sublists, code blocks or ternary expressions.</p>
+<p>means only apply -lp to function calls with simple lists (not containing any sublists, code blocks or ternary expressions).</p>
</dd>
<dt id="cti-n---closing-token-indentation"><b>-cti=n</b>, <b>--closing-token-indentation</b></dt>
); or ]; or };
-cti = 3 one extra indentation level always</code></pre>
-<p>The flags <b>-cti=1</b> and <b>-cti=2</b> work well with the <b>-lp</b> flag (previous section).</p>
+<p>The flags <b>-cti=1</b> and <b>-cti=2</b> work well with the <b>--line-up-parentheses</b> (<b>-lp</b>) flag (previous section).</p>
<pre><code> # perltidy -lp -cti=1
@month_of_year = (
<p>The default is not to do this, indicated by <b>-nicb</b>.</p>
+</dd>
+<dt id="ils---indent-leading-semicolon"><b>-ils</b>, <b>--indent-leading-semicolon</b></dt>
+<dd>
+
+<p>A line which begins with a leading semicolon will, by default, have the extra number of indentation spaces defined by <b>--continuation-indentation=n</b>. This extra indentation can be removed by setting <b>-nils</b>.</p>
+
+<pre><code> # default
+ $z = sqrt( $x**2 + $y**2 )
+
+ ; # <-- indented by ci spaces
+
+ # -nils
+ $z = sqrt( $x**2 + $y**2 )
+
+ ; # <-- not indented by ci spaces</code></pre>
+
+<p>Note that leading semicolons do not normally occur unless requested with <b>--break-at-old-semicolon-breakpoints</b> or forced, for example by a blank line as in this example.</p>
+
</dd>
<dt id="nib---non-indenting-braces"><b>-nib</b>, <b>--non-indenting-braces</b></dt>
<dd>
<p>Normally, lines of code contained within a pair of block braces receive one additional level of indentation. This flag, which is enabled by default, causes perltidy to look for opening block braces which are followed by a special side comment. This special side comment is <b>#<<<</b> by default. If found, the code between this opening brace and its corresponding closing brace will not be given the normal extra indentation level. For example:</p>
-<pre><code> { #<<< a closure to contain lexical vars
+<pre><code> { #<<< a closure to contain lexical vars
- my $var; # this line does not get one level of indentation
- ...
+ my $var; # this line does not get one level of indentation
+ ...
- }
+ }
- # this line does not 'see' $var;</code></pre>
+ # this line does not 'see' $var;</code></pre>
<p>This can be useful, for example, when combining code from different files. Different sections of code can be placed within braces to keep their lexical variables from being visible to the end of the file. To keep the new braces from causing all of their contained code to be indented if you run perltidy, and possibly introducing new line breaks in long lines, you can mark the opening braces with this special side comment.</p>
<dt id="nibp-s---non-indenting-brace-prefix-s"><b>-nibp=s</b>, <b>--non-indenting-brace-prefix=s</b></dt>
<dd>
-<p>The <b>-nibp=string</b> parameter may be used to change the marker for non-indenting braces. The default is equivalent to -nibp='#<<<'. The string that you enter must begin with a # and should be in quotes as necessary to get past the command shell of your system. This string is the leading text of a regex pattern that is constructed by appending pre-pending a '^' and appending a'\s', so you must also include backslashes for characters to be taken literally rather than as patterns.</p>
+<p>The <b>-nibp=string</b> parameter may be used to change the marker for non-indenting braces. The default is equivalent to -nibp='#<<<'. The string that you enter must begin with a # and should be in quotes as necessary to get past the command shell of your system. This string is the leading text of a regex pattern that is constructed by prepending a '^' and appending a'\s', so you must also include backslashes for characters to be taken literally rather than as patterns.</p>
<p>For example, to match the side comment '#++', the parameter would be</p>
<dt id="olq---outdent-long-quotes"><b>-olq</b>, <b>--outdent-long-quotes</b></dt>
<dd>
-<p>When <b>-olq</b> is set, lines which is a quoted string longer than the value <b>maximum-line-length</b> will have their indentation removed to make them more readable. This is the default. To prevent such out-denting, use <b>-nolq</b> or <b>--nooutdent-long-lines</b>.</p>
+<p>When <b>-olq</b> is set, lines which is a quoted string longer than the value <b>maximum-line-length</b> will have their indentation removed to make them more readable. This is the default. To prevent such out-denting, use <b>-nolq</b>.</p>
</dd>
<dt id="oll---outdent-long-lines"><b>-oll</b>, <b>--outdent-long-lines</b></dt>
<dd>
-<p>This command is equivalent to <b>--outdent-long-quotes</b> and <b>--outdent-long-comments</b>, and it is included for compatibility with previous versions of perltidy. The negation of this also works, <b>-noll</b> or <b>--nooutdent-long-lines</b>, and is equivalent to setting <b>-nolq</b> and <b>-nolc</b>.</p>
+<p>This command is equivalent to the combination <b>--outdent-long-quotes</b> and <b>--outdent-long-comments</b>, and it is included for compatibility with previous versions of perltidy. The negation of this also works, <b>-noll</b> or <b>--nooutdent-long-lines</b>, and is equivalent to setting <b>-nolq</b> and <b>-nolc</b>.</p>
</dd>
<dt id="Outdenting-Labels:--ola---outdent-labels"><b>Outdenting Labels:</b> <b>-ola</b>, <b>--outdent-labels</b></dt>
<dd>
-<p>This command will cause labels to be outdented by 2 spaces (or whatever <b>-ci</b> has been set to), if possible. This is the default. For example:</p>
+<p>This command will cause labels to be outdented by the number of spaces defined by <b>--continuation-indentation=n</b>, if possible. This is the default. For example:</p>
<pre><code> my $i;
LOOP: while ( $i = <FOTOS> ) {
fixit($i);
}</code></pre>
-<p>Use <b>-nola</b> to not outdent labels. To control line breaks after labels see <a href="#bal-n---break-after-labels-n">"-bal=n, --break-after-labels=n"</a>.</p>
+<p>Use <b>-nola</b> to prevent this. To control line breaks after labels see <a href="#bal-n---break-after-labels-n">"-bal=n, --break-after-labels=n"</a>.</p>
</dd>
<dt id="Outdenting-Keywords"><b>Outdenting Keywords</b></dt>
<dt id="okw---outdent-keywords"><b>-okw</b>, <b>--outdent-keywords</b></dt>
<dd>
-<p>The command <b>-okw</b> will cause certain leading control keywords to be outdented by 2 spaces (or whatever <b>-ci</b> has been set to), if possible. By default, these keywords are <code>redo</code>, <code>next</code>, <code>last</code>, <code>goto</code>, and <code>return</code>. The intention is to make these control keywords easier to see. To change this list of keywords being outdented, see the next section.</p>
+<p>The command <b>-okw</b> will cause certain leading control keywords to be outdented by the number of spaces defined by <b>--continuation-indentation=n</b>spaces, if possible. By default, these keywords are <code>redo</code>, <code>next</code>, <code>last</code>, <code>goto</code>, and <code>return</code>. The intention is to make these control keywords easier to see. To change this list of keywords being outdented, see the next section.</p>
<p>For example, using <code>perltidy -okw</code> on the previous example gives:</p>
fixit($i);
}</code></pre>
-<p>The default is not to do this.</p>
+<p>Notice that the keyword <b>next</b> has been outdented. The default is not to do this.</p>
</dd>
<dt id="Specifying-Outdented-Keywords:--okwl-string---outdent-keyword-list-string"><b>Specifying Outdented Keywords:</b> <b>-okwl=string</b>, <b>--outdent-keyword-list=string</b></dt>
<p>This command can be used to change the keywords which are outdented with the <b>-okw</b> command. The parameter <b>string</b> is a required list of perl keywords, which should be placed in quotes if there are more than one. By itself, it does not cause any outdenting to occur, so the <b>-okw</b> command is still required.</p>
-<p>For example, the commands <code>-okwl="next last redo goto" -okw</code> will cause those four keywords to be outdented. It is probably simplest to place any <b>-okwl</b> command in a <i>.perltidyrc</i> file.</p>
+<p>For example, the commands <code>-okwl="next last redo goto" -okw</code> will cause those four keywords to be outdented.</p>
</dd>
</dl>
$obj->{ $parsed_sql->{'table'}[0] }; # -bt=1 (default)
$obj->{$parsed_sql->{'table'}[0]}; # -bt=2</code></pre>
-<p>And finally, curly braces which contain blocks of code are controlled by the parameter <b>-bbt=n</b> or <b>--block-brace-tightness=n</b> as illustrated in the example below.</p>
+<p>And finally, curly braces which contain blocks of code are controlled by the parameter <b>-bbt=n</b> or <b>--block-brace-tightness=n</b> as illustrated in the example below (<b>-bbt=0</b> is the default).</p>
-<pre><code> %bf = map { $_ => -M $_ } grep { /\.deb$/ } dirents '.'; # -bbt=0 (default)
+<pre><code> %bf = map { $_ => -M $_ } grep { /\.deb$/ } dirents '.'; # -bbt=0
%bf = map { $_ => -M $_ } grep {/\.deb$/} dirents '.'; # -bbt=1
%bf = map {$_ => -M $_} grep {/\.deb$/} dirents '.'; # -bbt=2</code></pre>
-<p>To simplify input in the case that all of the tightness flags have the same value <n>, the parameter <-act=n> or <b>--all-containers-tightness=n</b> is an abbreviation for the combination <-pt=n -sbt=n -bt=n -bbt=n>.</p>
+<p>To simplify input in the case that all of the tightness flags have the same value <b>n</b>, the parameter <b>-act=n</b> or <b>--all-containers-tightness=n</b> is an abbreviation for the combination <b>-pt=n -sbt=n -bt=n -bbt=n</b>.</p>
+
+</dd>
+<dt id="mutt-s---multiple-token-tightness-s"><b>-mutt=s</b>, <b>--multiple-token-tightness=s</b></dt>
+<dd>
+
+<p>To review, the tightness controls described in the previous section have three possible integer values: 0, 1, and 2, where <b>n=0</b> always adds a space, and <b>n=2</b> never adds a space.</p>
+
+<p>The default value <b>n=1</b> adds space only if the container holds multiple tokens. Some perltidy tokens may be rather long, and it can be preferable to space some of them as if they were multple tokens. This can be done with this paramter.</p>
+
+<p>For example, in the following expression the <code>qw</code> list is a single token and therefore there the default formatting does not put spaces within the square brackets:</p>
+
+<pre><code> my $rlist = [qw(alpha beta gamma)];</code></pre>
+
+<p>This can be changed with</p>
+
+<pre><code> # perltidy -mutt='qw'
+ my $rlist = [ qw(alpha beta gamma) ];</code></pre>
+
+<p>This tells perltidy to space a <code>qw</code>list as if it were multiple tokens when the default tightness is used.</p>
+
+<p>The parameter <b>s</b> may contain any of the following perl operators:</p>
+
+<pre><code> qw q qq qx qr s y tr m</code></pre>
+
+<p>Other possible specifications are</p>
+
+<pre><code> q* - all of the above operators
+ Q - all of the above plus any quoted text
+ h - a here-doc target, such as '<<EOT'
+ <<>> - the double-diamond operator</code></pre>
+
+<p>A symbol may be negated by preceding it with a carat <b>^</b> symbol. The double-diamond operator is always included unless negated in this way.</p>
+
+</dd>
+<dt id="xbt---extended-block-tightness"><b>-xbt</b>, <b>--extended-block-tightness</b></dt>
+<dd>
+
+<p>There are two controls for spacing within curly braces, namely <b>--block-brace-tightness=n</b> for code block braces and <b>--brace-tightness=n</b> for all other braces.</p>
+
+<p>There is a little fuzziness in this division of brace types though because the curly braces considered by perltidy to contain code blocks for formatting purposes, such as highlighting code structure, exclude some of the small code blocks used by Perl mainly for isolating terms. These include curly braces following a keyword where an indirect object might occur, or curly braces following a type symbol. For example, perltidy does not mark the following braces as code block braces:</p>
+
+<pre><code> print {*STDERR} $message;
+ return ${$foo};</code></pre>
+
+<p>Consequently, the spacing within these small braced containers by default follows the flag <b>--brace-tightness=n</b> rather than <b>--block-brace-tightness=n</b>, as one might expect.</p>
+
+<p>If desired, small blocks such as these can be made to instead follow the spacing defined by the <b>--block-brace-tightness=n</b> flag by setting <b>--extended-block-tightness</b>. The specific types of small blocks to which this parameter applies is controlled by a companion control parameter, described in the next section.</p>
+
+<p>Note that if the two flags <b>--block-brace-tightness=n</b> and <b>--brace-tightness=n</b> have the same value <b>n</b> then there would be no reason to set this flag.</p>
+
+</dd>
+<dt id="xbtl-s---extended-block-tightness-list-s"><b>-xbtl=s</b>, <b>--extended-block-tightness-list=s</b></dt>
+<dd>
+
+<p>The previous parameter <b>--extended-block-tightness</b> (<b>-xbt</b>) can be made to apply to curly braces preceded by any of the keywords</p>
+
+<pre><code> print printf exec system say</code></pre>
+
+<p>and/or the special symbols</p>
+
+<pre><code> $ @ % & * $#</code></pre>
+
+<p>The parameter string <b>s</b> may contain a selection of these keywords and symbols to indicate the brace types to which <b>--extended-block-tightness</b> applies. For convenience, all of the keywords can be selected with 'k', and all of the special symbols can be selected with 't'. The default is equivalent to <b>-xbtl='k'</b>, which selects all of the keywords.</p>
+
+<p>Examples:</p>
+
+<pre><code> -xbtl='k' # selects just the keywords [DEFAULT]
+ -xbtl="t" # selects just the special type symbols
+ -xbtl="k t" # selects all keywords and symbols, or simply
+ -xbtl="kt" # selects all keywords and symbols
+ -xbtl="print say" # selects just keywords B<print> and B<say>:</code></pre>
+
+<p>Here are some formatting examples using the default values of <b>--brace-tightness=n</b> and <b>--block-brace-tightness=n</b>. Note that in these examples <b>$ref</b> is in block braces but <b>$key</b> is not.</p>
+
+<pre><code> # default formatting
+ print {*STDERR} $message;
+ my $val = ${$ref}{$key};
+
+ # perltidy -xbt or
+ # perltidy -xbt -xbtl=k
+ print { *STDERR } $message;
+ my $val = ${$ref}{$key};
+
+ # perltidy -xbt -xbtl=t
+ print {*STDERR} $message;
+ my $val = ${ $ref }{$key};
+
+ # perltidy -xbt -xbtl=kt
+ print { *STDERR } $message;
+ my $val = ${ $ref }{$key};</code></pre>
+
+<p>Finally, note that this parameter merely changes the way that the parameter <b>--extended-block-tightness</b> works. It has no effect unless <b>--extended-block-tightness</b> is actually set.</p>
</dd>
<dt id="tso---tight-secret-operators"><b>-tso</b>, <b>--tight-secret-operators</b></dt>
<dt id="aws---add-whitespace"><b>-aws</b>, <b>--add-whitespace</b></dt>
<dd>
-<p>Setting this option allows perltidy to add certain whitespace to improve code readability. This is the default. If you do not want any whitespace added, but are willing to have some whitespace deleted, use <b>-naws</b>. (Use <b>-fws</b> to leave whitespace completely unchanged).</p>
+<p>Setting this option allows perltidy to add certain whitespace to improve code readability. This is the default. If you do not want any whitespace added, but are willing to have some whitespace deleted, use <b>-naws</b>. (Use <b>--freeze-whitespace</b> (<b>-fws</b>) to leave whitespace completely unchanged).</p>
</dd>
<dt id="dws---delete-old-whitespace"><b>-dws</b>, <b>--delete-old-whitespace</b></dt>
<dd>
-<p>Setting this option allows perltidy to remove some old whitespace between characters, if necessary. This is the default. If you do not want any old whitespace removed, use <b>-ndws</b> or <b>--nodelete-old-whitespace</b>.</p>
+<p>Setting this option allows perltidy to remove optional whitespace between characters in the input file. The default is to not to do this (<b>-nodelete-old-whitespace</b>). This parameter has little effect by itself. But in combination with <b>--noadd-whitespace</b> it will cause most of the whitespace in a file to be removed.</p>
</dd>
<dt id="Detailed-whitespace-controls-around-tokens"><b>Detailed whitespace controls around tokens</b></dt>
<p>These parameters should be considered to be hints to perltidy rather than fixed rules, because perltidy must try to resolve conflicts that arise between them and all of the other rules that it uses. One conflict that can arise is if, between two tokens, the left token wants a space and the right one doesn't. In this case, the token not wanting a space takes priority.</p>
-<p>It is necessary to have a list of all token types in order to create this type of input. Such a list can be obtained by the command <b>--dump-token-types</b>. Also try the <b>-D</b> flag on a short snippet of code and look at the .DEBUG file to see the tokenization.</p>
+<p>It is necessary to have a list of all token types in order to create this type of input. Such a list can be obtained by the command <b>--dump-token-types</b>. Also try the <b>-D</b> flag on a short snippet of code and look at the <i>.DEBUG</i> file to see the tokenization.</p>
+
+<p>To illustrate, suppose we do not want a space after a colon which introduces a sub attribute. We need to know its type. It is not a colon because that is the type of a ternary operator. The output of <b>--dump-token-types</b> states that it is token type 'A'. To verify this, we can run <code>perltidy -D</code> on a short piece of code containing such a colon, such as</p>
+
+<pre><code> sub foo : lvalue;</code></pre>
+
+<p>This produces a <i>.DEBUG</i> file which contains</p>
+
+<pre><code> 1: sub foo : lvalue;
+ 1: SSSSSSSbAbwwwwww;</code></pre>
+
+<p>The top line is the input, and the bottom line shows the token types. The 'A' beneath the colon verifies that it is type <b>A</b>.</p>
+
+<p>So to prevent a space on the right of this colon we can use</p>
+
+<pre><code> # perltidy -nwrs='A'
+ sub foo :lvalue;</code></pre>
<p><b>WARNING</b> Be sure to put these tokens in quotes to avoid having them misinterpreted by your command shell.</p>
<p>If formatted in this way, the program will not run (at least with recent versions of perl) because the $x is taken to be a filehandle and / is assumed to start a quote. In a complex program, there might happen to be a / which terminates the multiline quote without a syntax error, allowing the program to run, but not as intended.</p>
-<p>Related issues arise with other binary operator symbols, such as + and -, and in older versions of perl there could be problems with ternary operators. So to avoid changing program behavior, perltidy has the simple rule that whitespace around possible filehandles is left unchanged. Likewise, whitespace around barewords is left unchanged. The reason is that if the barewords are defined in other modules, or in code that has not even been written yet, perltidy will not have seen their prototypes and must treat them cautiously.</p>
+<p>Related issues arise with other binary operator symbols, such as + and -, and in older versions of perl there could be problems with ternary operators. So to avoid changing program behavior, perltidy has the simple rule that whitespace around possible filehandles is left unchanged. Likewise, whitespace around unknown barewords is left unchanged. The reason is that if the barewords are defined in other modules, or in code that has not even been written yet, perltidy will not have seen their prototypes and must treat them cautiously.</p>
<p>In perltidy this is implemented in the tokenizer by marking token following a <b>print</b> keyword as a special type <b>Z</b>. When formatting is being done, whitespace following this token type is generally left unchanged as a precaution against changing program behavior. This is excessively conservative but simple and easy to implement. Keywords which are treated similarly to <b>print</b> include <b>printf</b>, <b>sort</b>, <b>exec</b>, <b>system</b>. Changes in spacing around parameters following these keywords may have to be made manually. For example, the space, or lack of space, after the parameter $foo in the following line will be unchanged in formatting.</p>
<p>For another example, the following two lines will be parsed without syntax error:</p>
<pre><code> # original programming, syntax ok
- for my $severity ( reverse $SEVERITY_LOWEST+1 .. $SEVERITY_HIGHEST ) { ... }
+ for my $severity ( reverse $LOWEST+1 .. $HIGHEST ) { ... }
# perltidy default, syntax ok
- for my $severity ( reverse $SEVERITY_LOWEST + 1 .. $SEVERITY_HIGHEST ) { ... }</code></pre>
+ for my $severity ( reverse $LOWEST + 1 .. $HIGHEST ) { ... }</code></pre>
<p>But the following will give a syntax error:</p>
<pre><code> # perltidy -nwrs='+', syntax error:
- for my $severity ( reverse $SEVERITY_LOWEST +1 .. $SEVERITY_HIGHEST ) { ... }</code></pre>
+ for my $severity ( reverse $LOWEST +1 .. $HIGHEST ) { ... }</code></pre>
<p>To avoid subtle parsing problems like this, it is best to avoid spacing a binary operator asymmetrically with a space on the left but not on the right.</p>
<p>When an opening paren follows a Perl keyword, no space is introduced after the keyword, unless it is (by default) one of these:</p>
-<pre><code> my local our and or xor eq ne if else elsif until unless
- while for foreach return switch case given when</code></pre>
+<pre><code> my local our state and or xor err eq ne if else elsif until unless
+ while for foreach return switch case given when catch</code></pre>
<p>These defaults can be modified with two commands:</p>
sub usage(); # n=1 [default; follows input]
sub usage (); # n=2 [space]</code></pre>
+</dd>
+<dt id="ssp-n-or---space-signature-paren-n"><b>-ssp=n</b> or <b>--space-signature-paren=n</b></dt>
+<dd>
+
+<p>This flag is analogous to the previous except that it applies to the space before the opening paren of a sub <b>signature</b> rather than a sub <b>prototype</b>.</p>
+
+<p>For example, consider the following line:</p>
+
+<pre><code> sub circle( $xc, $yc, $rad )</code></pre>
+
+<p>This space before the opening paren can be controlled with integer <b>n</b> which may have the value 0, 1, or 2 with these meanings:</p>
+
+<pre><code> -ssp=0 means no space before the paren
+ -ssp=1 means follow the example of the source code [DEFAULT]
+ -ssp=2 means always put a space before the paren</code></pre>
+
+<p>The default is <b>-ssp=1</b>, meaning that will be a space in the output if, and only if, there is one in the input. Given the above line of code, the result of applying the different options would be:</p>
+
+<pre><code> sub circle( $xc, $yc, $rad ) # n=0 [no space]
+ sub circle( $xc, $yc, $rad ) # n=1 [default; same as input]
+ sub circle ( $xc, $yc, $rad ) # n=2 [space]</code></pre>
+
</dd>
<dt id="kpit-n-or---keyword-paren-inner-tightness-n"><b>-kpit=n</b> or <b>--keyword-paren-inner-tightness=n</b></dt>
<dd>
<p>These can be changed with the parameter <b>-kpitl=s</b> described in the next section.</p>
</dd>
-<dt id="kpitl-string-or---keyword-paren-inner-tightness-string"><b>-kpitl=string</b> or <b>--keyword-paren-inner-tightness=string</b></dt>
+<dt id="kpitl-string-or---keyword-paren-inner-tightness-list-string"><b>-kpitl=string</b> or <b>--keyword-paren-inner-tightness-list=string</b></dt>
<dd>
-<p>This command can be used to change the keywords to which the the <b>-kpit=n</b> command applies. The parameter <b>string</b> is a required list either keywords or functions, which should be placed in quotes if there are more than one. By itself, this parameter does not cause any change in spacing, so the <b>-kpit=n</b> command is still required.</p>
+<p>This command can be used to change the keywords to which the previous parameter, <b>-kpit=n</b>, applies. The parameter <b>string</b> is a required list either keywords or functions, which should be placed in quotes if there are more than one. By itself, this parameter does not cause any change in spacing, so the <b>-kpit=n</b> command is still required.</p>
<p>For example, the commands <code>-kpitl="if else while" -kpit=2</code> will cause the just the spaces inside parens following 'if', 'else', and 'while' keywords to follow the tightness value indicated by the <b>-kpit=2</b> flag.</p>
&& ( $a->{'title'} eq $b->{'title'} )
&& ( $a->{'href'} eq $b->{'href'} ) );</code></pre>
-<p>Note that this is considered to be a different operation from "vertical alignment" because space at just one line is being adjusted, whereas in "vertical alignment" the spaces at all lines are being adjusted. So it sort of a local version of vertical alignment.</p>
+<p>Note that this is considered to be a different operation from "vertical alignment" because space at just one line is being adjusted, whereas in "vertical alignment" the spaces at all lines are being adjusted. So it is sort of a local version of vertical alignment.</p>
<p>Here is an example involving a ternary operator:</p>
<dt id="Trimming-whitespace-around-qw-quotes"><b>Trimming whitespace around <code>qw</code> quotes</b></dt>
<dd>
-<p><b>-tqw</b> or <b>--trim-qw</b> provide the default behavior of trimming spaces around multi-line <code>qw</code> quotes and indenting them appropriately.</p>
+<p><b>-tqw</b> or <b>--trim-qw</b> provide the default behavior of trimming spaces around multiline <code>qw</code> quotes and indenting them appropriately.</p>
-<p><b>-ntqw</b> or <b>--notrim-qw</b> cause leading and trailing whitespace around multi-line <code>qw</code> quotes to be left unchanged. This option will not normally be necessary, but was added for testing purposes, because in some versions of perl, trimming <code>qw</code> quotes changes the syntax tree.</p>
+<p><b>-ntqw</b> or <b>--notrim-qw</b> cause leading and trailing whitespace around multiline <code>qw</code> quotes to be left unchanged. This option will not normally be necessary, but was added for testing purposes, because in some versions of perl, trimming <code>qw</code> quotes changes the syntax tree.</p>
</dd>
<dt id="sbq-n-or---space-backslash-quote-n"><b>-sbq=n</b> or <b>--space-backslash-quote=n</b></dt>
<p>Perltidy has a number of ways to control the appearance of both block comments and side comments. The term <b>block comment</b> here refers to a full-line comment, whereas <b>side comment</b> will refer to a comment which appears on a line to the right of some code.</p>
+<p>Perltidy does not do any word wrapping of commented text to match a selected maximum line length. This is because there is no way to determine if this is appropriate for the given content. However, an interactive program named <b>perlcomment.pl</b> is available in the <b>examples</b> folder of the perltidy distribution which can assist in doing this.</p>
+
<dl>
<dt id="ibc---indent-block-comments"><b>-ibc</b>, <b>--indent-block-comments</b></dt>
<p>This parameter causes perltidy to ignore the length of side comments when setting line breaks. The default, <b>-niscl</b>, is to include the length of side comments when breaking lines to stay within the length prescribed by the <b>-l=n</b> maximum line length parameter. For example, the following long single line would remain intact with -l=80 and -iscl:</p>
<pre><code> perltidy -l=80 -iscl
- $vmsfile =~ s/;[\d\-]*$//; # Clip off version number; we can use a newer version as well</code></pre>
+ $vmsfile =~ s/;[\d\-]*$//; # Clip off version number; we can use a newer version</code></pre>
<p>whereas without the -iscl flag the line will be broken:</p>
<pre><code> perltidy -l=80
$vmsfile =~ s/;[\d\-]*$//
- ; # Clip off version number; we can use a newer version as well</code></pre>
+ ; # Clip off version number; we can use a newer version</code></pre>
+
+</dd>
+<dt id="ipc---ignore-perlcritic-comments"><b>-ipc</b>, <b>--ignore-perlcritic-comments</b></dt>
+<dd>
+
+<p>Perltidy, by default, will look for side comments beginning with <b>## no critic</b> and ignore their lengths when making line break decisions, even if the user has not set <b>-iscl</b>. The reason is that an unwanted line break can make these special comments ineffective in controlling <b>perlcritic</b>.</p>
+
+<p>Setting <b>--ignore-perlcritic-comments</b> tells perltidy not to look for these <b>## no critic</b> comments.</p>
</dd>
<dt id="hsc---hanging-side-comments"><b>-hsc</b>, <b>--hanging-side-comments</b></dt>
<dt id="cscl-string-or---closing-side-comment-list"><b>-cscl=string</b>, or <b>--closing-side-comment-list</b></dt>
<dd>
-<p>where <code>string</code> is a list of block types to be tagged with closing side comments. By default, all code block types preceded by a keyword or label (such as <code>if</code>, <code>sub</code>, and so on) will be tagged. The <b>-cscl</b> command changes the default list to be any selected block types; see <a href="#Specifying-Block-Types">"Specifying Block Types"</a>. For example, the following command requests that only <code>sub</code>'s, labels, <code>BEGIN</code>, and <code>END</code> blocks be affected by any <b>-csc</b> or <b>-dcsc</b> operation:</p>
+<p>where <code>string</code> is a list of block types to be tagged with closing side comments. By default, all code block types preceded by a keyword or label (such as <code>if</code>, <code>sub</code>, and so on) will be tagged. The <b>-cscl</b> command changes the default list to be any selected block types; see <a href="#Specifying-Block-Types">"Specifying Block Types"</a>. For example, the following command requests that only <code>sub</code>'s, labels, <code>BEGIN</code>, and <code>END</code> blocks be affected by operations which add (<b>-csc</b>) or delete (<b>-dcsc</b>) closing side comments:</p>
-<pre><code> -cscl="sub : BEGIN END"</code></pre>
+<pre><code> --closing-side-comment-list='sub : BEGIN END'</code></pre>
+
+</dd>
+<dt id="cscxl-string-or---closing-side-comment-exclusion-list"><b>-cscxl=string</b>, or <b>--closing-side-comment-exclusion-list</b></dt>
+<dd>
+
+<p>where <code>string</code> is a list of block types which should NOT be tagged with closing side comments. If a block type appears in both <b>-cscl</b> and <b>-cscxl</b>, then <b>-cscxl</b> has priority and the block will not be tagged.</p>
+
+<p>For example, the following command requests that anonymous subs should not be affected by any <b>-csc</b> or <b>-dcsc</b> operation:</p>
+
+<pre><code> --closing-side-comment-exclusion-list='asub'</code></pre>
+
+<p>By default, no block types are excluded.</p>
</dd>
<dt id="csct-n-or---closing-side-comment-maximum-text-n"><b>-csct=n</b>, or <b>--closing-side-comment-maximum-text=n</b></dt>
<dt id="osbc---outdent-static-block-comments"><b>-osbc</b>, <b>--outdent-static-block-comments</b></dt>
<dd>
-<p>The command <b>-osbc</b> will cause static block comments to be outdented by 2 spaces (or whatever <b>-ci=n</b> has been set to), if possible.</p>
+<p>The command <b>-osbc</b> will cause static block comments to be outdented by the number of <b>--continuation-spaces=n</b>, if possible.</p>
</dd>
</dl>
<p><b>--code-skipping</b> uses starting and ending markers '#<<V' and '#>>V', like this:</p>
-<pre><code> #<<V code skipping: perltidy will pass this verbatim without error checking
+<pre><code> #<<V code skipping: perltidy passes this verbatim, no error checking
token ident_digit {
[ [ <?word> | _ | <?digit> ] <?ident_digit>
#>>V</code></pre>
-<p>Additional text may appear on the special comment lines provided that it is separated from the marker by at least one space, as in the above examples.</p>
+<p>(The last character V is like a < or > rotated 90 degrees). Additional text may appear on the special comment lines provided that it is separated from the marker by at least one space to highlight the sign, as in the above examples.</p>
<p>Any number of code-skipping or format-skipping sections may appear in a file. If an opening code-skipping or format-skipping comment is not followed by a corresponding closing comment, then skipping continues to the end of the file. If a closing code-skipping or format-skipping comment appears in a file but does not follow a corresponding opening comment, then it is treated as an ordinary comment without any special meaning.</p>
<p>Some examples show how example strings become patterns:</p>
-<pre><code> -fsb='#\{\{\{' becomes /^#\{\{\{\s/ which matches #{{{ but not #{{{{
- -fsb='#\*\*' becomes /^#\*\*\s/ which matches #** but not #***
- -fsb='#\*{2,}' becomes /^#\*{2,}\s/ which matches #** and #*****</code></pre>
+<pre><code> -fsb='#\{\{\{' becomes /^#\{\{\{\s/ which matches #{{{ but not #{{{{
+ -fsb='#\*\*' becomes /^#\*\*\s/ which matches #** but not #***
+ -fsb='#\*{2,}' becomes /^#\*{2,}\s/ which matches #** and #*****</code></pre>
</dd>
<dt id="fse-string---format-skipping-end-string"><b>-fse=string</b>, <b>--format-skipping-end=string</b></dt>
</dd>
</dl>
+<h2 id="Formatting-a-Limited-Range-of-Lines">Formatting a Limited Range of Lines</h2>
+
+<p>A command <b>--line-range-tidy=n1:n2</b> is available to process just a selected range of lines of an input stream with perltidy. This command is mainly of interest for programming interactive code editors. When it is used, the entire input stream is read but just the selected range of lines of the input file are processed by the perltidy tokenizer and formatter, and then the stream is reassembled for output. The selected lines need to contain a complete statement or balanced container. Otherwise, a syntax error will occur and the code will not be tidied. There are a couple of limitations on the use of this command: (1) it may not be applied to multiple files, and (2) it only applies to code tidying and not, for example, html formatting.</p>
+
+<dl>
+
+<dt id="lrt-n1:n2---line-range-tidy-n1:n2"><b>-lrt=n1:n2</b>, <b>--line-range-tidy=n1:n2</b></dt>
+<dd>
+
+<p>The range of lines is specified by integers <b>n1</b> and <b>n2</b>, where <b>n1</b> is the first line number to be formatted (start counting with 1) and <b>n2</b> is the last line number to be formatted. If <b>n2</b> is not given, or exceeds the actual number of lines, then formatting continues to the end of the file.</p>
+
+<p>Examples:</p>
+
+<pre><code> --line-range-tidy=43:109 # tidy lines 43 through 109
+ --line-range-tidy=' 43 : 109' # tidy lines 43 through 109
+ --line-range-tidy=1: # tidy all lines
+ --line-range-tidy=0:90 # ERROR (n1 must be >= 1)</code></pre>
+
+<p>The second example shows that spaces are okay if placed in quotes.</p>
+
+</dd>
+</dl>
+
<h2 id="Line-Break-Control">Line Break Control</h2>
<p>The parameters in this and the next sections control breaks after non-blank lines of code. Blank lines are controlled separately by parameters in the section <a href="#Blank-Line-Control">"Blank Line Control"</a>.</p>
<dt id="fnl---freeze-newlines"><b>-fnl</b>, <b>--freeze-newlines</b></dt>
<dd>
-<p>If you do not want any changes to the line breaks within lines of code in your script, set <b>-fnl</b>, and they will remain fixed, and the rest of the commands in this section and sections <a href="#Controlling-List-Formatting">"Controlling List Formatting"</a>, <a href="#Retaining-or-Ignoring-Existing-Line-Breaks">"Retaining or Ignoring Existing Line Breaks"</a>. You may want to use <b>-noll</b> with this.</p>
+<p>If you do not want any changes to the line breaks within lines of code in your script, set <b>-fnl</b>, and they will remain fixed, and the rest of the commands in this section and sections <a href="#Controlling-List-Formatting">"Controlling List Formatting"</a>, <a href="#Retaining-or-Ignoring-Existing-Line-Breaks">"Retaining or Ignoring Existing Line Breaks"</a>. You may want to use <b>--nooutdent-long-lines</b> with this.</p>
-<p>Note: If you also want to keep your blank lines exactly as they are, you can use the <b>-fbl</b> flag which is described in the section <a href="#Blank-Line-Control">"Blank Line Control"</a>.</p>
+<p>Note: If you also want to keep your blank lines exactly as they are, you can use the <b>--freeze-blank-lines</b> flag which is described in the section <a href="#Blank-Line-Control">"Blank Line Control"</a>.</p>
</dd>
</dl>
# -nce (default)
if ($task) {
- yyy();
+ yyy();
}
else {
- zzz();
+ zzz();
}</code></pre>
<p>In this example the keyword <b>else</b> is placed on the same line which begins with the preceding closing block brace and is followed by its own opening block brace on the same line. Other keywords and function names which are formatted with this "cuddled" style are <b>elsif</b>, <b>continue</b>, <b>catch</b>, <b>finally</b>.</p>
-<p>Other block types can be formatted by specifying their names on a separate parameter <b>-cbl</b>, described in a later section.</p>
+<p>Other block types can be formatted by specifying their names on a separate parameter <b>--cuddled-block-list</b>, described in a later section.</p>
-<p>Cuddling between a pair of code blocks requires that the closing brace of the first block start a new line. If this block is entirely on one line in the input file, it is necessary to decide if it should be broken to allow cuddling. This decision is controlled by the flag <b>-cbo=n</b> discussed below. The default and recommended value of <b>-cbo=1</b> bases this decision on the first block in the chain. If it spans multiple lines then cuddling is made and continues along the chain, regardless of the sizes of subsequent blocks. Otherwise, short lines remain intact.</p>
+<p>Cuddling between a pair of code blocks requires that the closing brace of the first block start a new line. If this block is entirely on one line in the input file, it is necessary to decide if it should be broken to allow cuddling. This decision is controlled by the flag <b>--cuddled-break-options=n</b> (<b>-cbo=n</b>) discussed below. The default and recommended value of <b>-cbo=1</b> bases this decision on the first block in the chain. If it spans multiple lines then cuddling is made and continues along the chain, regardless of the sizes of subsequent blocks. Otherwise, short lines remain intact.</p>
-<p>So for example, the <b>-ce</b> flag would not have any effect if the above snippet is rewritten as</p>
+<p>So for example, the <b>--cuddled-else</b> flag would not have any effect if the above snippet is rewritten as</p>
<pre><code> if ($task) { yyy() }
else { zzz() }</code></pre>
<dt id="cb---cuddled-blocks"><b>-cb</b>, <b>--cuddled-blocks</b></dt>
<dd>
-<p>This flag is equivalent to <b>-ce</b>.</p>
+<p>This flag is equivalent to <b>--cuddled-else</b> (<b>-ce</b>).</p>
</dd>
<dt id="cbl---cuddled-block-list"><b>-cbl</b>, <b>--cuddled-block-list</b></dt>
<p>The built-in default cuddled block types are <b>else, elsif, continue, catch, finally</b>.</p>
-<p>Additional block types to which the <b>-cuddled-blocks</b> style applies can be defined by this parameter. This parameter is a character string, giving a list of block types separated by commas or spaces. For example, to cuddle code blocks of type sort, map and grep, in addition to the default types, the string could be set to</p>
+<p>Additional block types to which the <b>--cuddled-blocks</b> style applies can be defined by this parameter. This parameter is a character string, giving a list of block types separated by commas or spaces. For example, to cuddle code blocks of type sort, map and grep, in addition to the default types, the string could be set to</p>
<pre><code> -cbl="sort map grep"</code></pre>
<p>As a diagnostic check, the flag <b>--dump-cuddled-block-list</b> or <b>-dcbl</b> can be used to view the hash of values that are generated by this flag.</p>
-<p>Finally, note that the <b>-cbl</b> flag by itself merely specifies which blocks are formatted with the cuddled format. It has no effect unless this formatting style is activated with <b>-ce</b>.</p>
+<p>Finally, note that the <b>--cuddled-block-list</b> parameter by itself merely specifies which blocks are formatted with the cuddled format. It has no effect unless this formatting style is activated with <b>--cuddled-else</b>.</p>
</dd>
<dt id="cblx---cuddled-block-list-exclusive"><b>-cblx</b>, <b>--cuddled-block-list-exclusive</b></dt>
<dd>
-<p>When cuddled else formatting is selected with <b>-ce</b>, setting this flag causes perltidy to ignore its built-in defaults and rely exclusively on the block types specified on the <b>-cbl</b> flag described in the previous section. For example, to avoid using cuddled <b>catch</b> and <b>finally</b>, which are among the defaults, the following set of parameters could be used:</p>
+<p>When cuddled else formatting is selected with <b>--cuddled-else</b>, setting this flag causes perltidy to ignore its built-in defaults and rely exclusively on the block types specified on the <b>--cuddled-block-list</b> flag described in the previous section. For example, to avoid using cuddled <b>catch</b> and <b>finally</b>, which are among the defaults, the following set of parameters could be used:</p>
<pre><code> perltidy -ce -cbl='else elsif continue' -cblx</code></pre>
<p>Cuddled formatting is only possible between a pair of code blocks if the closing brace of the first block starts a new line. If a block is encountered which is entirely on a single line, and cuddled formatting is selected, it is necessary to make a decision as to whether or not to "break" the block, meaning to cause it to span multiple lines. This parameter controls that decision. The options are:</p>
<pre><code> cbo=0 Never force a short block to break.
- cbo=1 If the first of a pair of blocks is broken in the input file,
- then break the second [DEFAULT].
+ cbo=1 If the first of a pair of blocks is broken in the input
+ file, then break the second [DEFAULT].
cbo=2 Break open all blocks for maximal cuddled formatting.</code></pre>
<p>The default and recommended value is <b>cbo=1</b>. With this value, if the starting block of a chain spans multiple lines, then a cascade of breaks will occur for remaining blocks causing the entire chain to be cuddled.</p>
...
}</code></pre>
-<p>When <b>-bl</b> is set, the blocks to which this applies can be controlled with the parameters <b>--brace-left-list</b> and <b>-brace-left-exclusion-list</b> described in the next sections.</p>
+<p>When <b>-bl</b> is set, the blocks to which this applies can be controlled with the parameters <b>--brace-left-list</b> and <b>--brace-left-exclusion-list</b> described in the next sections.</p>
</dd>
<dt id="bll-s---brace-left-list-s"><b>-bll=s</b>, <b>--brace-left-list=s</b></dt>
<p>Use this parameter to exclude types of block braces for which the <b>-bl</b> flag applies; see <a href="#Specifying-Block-Types">"Specifying Block Types"</a>. For example, the default settings <b>-bll='*'</b> and <b>-blxl='sort map grep eval asub'</b> mean all blocks except <b>sort map grep eval</b> and anonymous sub blocks.</p>
-<p>Note that the lists <b>-bll=s</b> and <b>-blxl=s</b> control the behavior of the <b>-bl</b> flag but have no effect unless the <b>-bl</b> flag is set.</p>
+<p>Note that the lists <b>-bll=s</b> and <b>-blxl=s</b> control the behavior of the <b>-bl</b> flag but have no effect unless the <b>-bl</b> flag is set. These two lists provide complete control for this flag, but two shortcut flags are available and described in the next sections.</p>
</dd>
<dt id="sbl---opening-sub-brace-on-new-line"><b>-sbl</b>, <b>--opening-sub-brace-on-new-line</b></dt>
<dd>
-<p>The flag <b>-sbl</b> provides a shortcut way to turn on <b>-bl</b> just for named subs. The same effect can be achieved by turning on <b>-bl</b> with the block list set as <b>-bll='sub'</b>.</p>
+<p>The flag <b>-sbl</b> provides a shortcut way to turn on <b>-bl</b> just for named subs. The same effect can be achieved by turning on <b>-bl</b> with the block list set as <b>-bll='sub'</b>. To avoid conflicts, it is recommended to either use the more general list method described above to control <b>-bl</b>, or this shortcut method, but not both.</p>
<p>For example,</p>
}
}</code></pre>
-<p>This flag is negated with <b>-nsbl</b>, which is the default.</p>
+<p>This negative version of this flag, <b>-nsbl</b>, turns off <b>-bl</b> for named subs. The same effect can be achieved with the exclusion list method, <b>-blxl=sub</b>.</p>
</dd>
<dt id="asbl---opening-anonymous-sub-brace-on-new-line"><b>-asbl</b>, <b>--opening-anonymous-sub-brace-on-new-line</b></dt>
<dd>
-<p>The flag <b>-asbl</b> is like the <b>-sbl</b> flag except that it applies to anonymous sub's instead of named subs. For example</p>
+<p>The flag <b>-asbl</b> is like the <b>-sbl</b> flag except that it applies to anonymous sub's instead of named subs. The same effect can be achieved by turning on <b>-bl</b> with the block list set to include <b>-bll='asub'</b>.</p>
+
+<p>For example</p>
<pre><code> perltidy -asbl</code></pre>
}
};</code></pre>
-<p>This flag is negated with <b>-nasbl</b>, and the default is <b>-nasbl</b>.</p>
+<p>This negative version of this flag, <b>-nasbl</b>, turns off <b>-bl</b> for anonymous subs.</p>
</dd>
<dt id="bli---brace-left-and-indent"><b>-bli</b>, <b>--brace-left-and-indent</b></dt>
<dd>
-<p>The flag <b>-bli</b> is similar to the <b>-bl</b> flag but in addition it causes one unit of continuation indentation ( see <b>-ci</b> ) to be placed before an opening and closing block braces.</p>
+<p>The flag <b>-bli</b> is similar to the <b>-bl</b> flag but in addition it causes one unit of continuation indentation ( see <b>--continuation-indentation</b> ) to be placed before an opening and closing block braces.</p>
-<p>For example, perltidy -bli gives</p>
+<p>For example</p>
-<pre><code> if ( $input_file eq '-' )
+<pre><code> # perltidy -bli
+ if ( $input_file eq '-' )
{
important_function();
}</code></pre>
big_waste_of_time();
}</code></pre>
-<p>A conflict occurs if both <b>-bl</b> and <b>-bar</b> are specified.</p>
+<p>A conflict occurs if both <b>--opening-brace-on_new-line</b> (<b>-bl</b>) and <b>-bar</b> are specified.</p>
+
+</dd>
+<dt id="cpb---cuddled-paren-brace"><b>-cpb</b>, <b>--cuddled-paren-brace</b></dt>
+<dd>
+
+<p>A related parameter, <b>--cuddled-paren-brace</b>, causes perltidy to join two lines which otherwise would be</p>
+
+<pre><code> )
+ {</code></pre>
+
+<p>to be</p>
+
+<pre><code> ) {</code></pre>
+
+<p>For example:</p>
+
+<pre><code> # default
+ foreach my $dir (
+ '05_lexer', '07_token', '08_regression', '11_util',
+ '13_data', '15_transform'
+ )
+ {
+ ...
+ }
+
+ # perltidy -cpb
+ foreach my $dir (
+ '05_lexer', '07_token', '08_regression', '11_util',
+ '13_data', '15_transform'
+ ) {
+ ...;
+ }</code></pre>
</dd>
<dt id="otr---opening-token-right-and-related-flags"><b>-otr</b>, <b>--opening-token-right</b> and related flags</dt>
-bbpi=1 outdent by one continuation level
-bbpi=2 indent one full indentation level</code></pre>
+</dd>
+<dt id="bfvt-n---brace-follower-vertical-tightness-n"><b>-bfvt=n</b>, <b>--brace-follower-vertical-tightness=n</b></dt>
+<dd>
+
+<p>Some types of closing block braces, such as <b>eval</b>, may be followed by additional code. A line break may be inserted between such a closing brace and the following code depending on the parameter <b>n</b> and the length of the trailing code, as follows:</p>
+
+<p>If the trailing code fits on a single line, then</p>
+
+<pre><code> -bfvt=0 Follow the input style regarding break/no-break
+ -bfvt=1 Follow the input style regarding break/no-break [Default]
+ -bfvt=2 Do not insert a line break</code></pre>
+
+<p>If the trailing code requires multiple lines, then</p>
+
+<pre><code> -bfvt=0 Insert a line break
+ -bfvt=1 Insert a line break except for a cuddled block chain
+ -bfvt=2 Do not insert a line break</code></pre>
+
+<p>The default is <b>-bfvt=1</b>. The most compact code is achieved with <b>-bfvt=2</b>.</p>
+
+<p>Example (non-cuddled, multiple lines ):</p>
+
+<pre><code> # -bfvt=0 or -bvft=1 [DEFAULT]
+ eval {
+ ( $line, $cond ) = $self->_normalize_if_elif($line);
+ 1;
+ }
+ or die sprintf "Error at line %d\nLine %d: %s\n%s",
+ ( $line_info->start_line_num() ) x 2, $line, $@;
+
+ # -bfvt=2
+ eval {
+ ( $line, $cond ) = $self->_normalize_if_elif($line);
+ 1;
+ } or die sprintf "Error at line %d\nLine %d: %s\n%s",
+ ( $line_info->start_line_num() ) x 2, $line, $@;</code></pre>
+
+<p>Example (cuddled, multiple lines):</p>
+
+<pre><code> # -bfvt=0
+ eval {
+ #STUFF;
+ 1; # return true
+ }
+ or do {
+ ##handle error
+ };
+
+ # -bfvt=1 [DEFAULT] or -bfvt=2
+ eval {
+ #STUFF;
+ 1; # return true
+ } or do {
+ ##handle error
+ };</code></pre>
+
</dd>
</dl>
<p>Notice how the indentation of the inner lines are reduced by two levels in this case. This example also shows the typical result of this formatting, namely it is a sandwich consisting of an initial opening layer, a central section of any complexity forming the "meat" of the sandwich, and a final closing layer. This predictable structure helps keep the compacted structure readable.</p>
-<p>The inner sandwich layer is required to be at least one line thick. If this cannot be achieved, welding does not occur. This constraint can cause formatting to take a couple of iterations to stabilize when it is first applied to a script. The <b>-conv</b> flag can be used to insure that the final format is achieved in a single run.</p>
+<p>The inner sandwich layer is required to be at least one line thick. If this cannot be achieved, welding does not occur. This constraint can cause formatting to take a couple of iterations to stabilize when it is first applied to a script. The <b>--converge</b> flag can be used to insure that the final format is achieved in a single run.</p>
<p>Here is an example illustrating a welded container within a welded containers:</p>
-<pre><code> # default formatting
- $x->badd(
- bmul(
- $class->new(
- abs(
- $sx * int( $xr->numify() ) & $sy * int( $yr->numify() )
- )
- ),
- $m
- )
- );
-
- # perltidy -wn
- $x->badd( bmul(
- $class->new( abs(
- $sx * int( $xr->numify() ) & $sy * int( $yr->numify() )
- ) ),
+<pre><code> # default formatting
+ $x->badd(
+ bmul(
+ $class->new(
+ abs(
+ $sx * int( $xr->num() ) & $sy * int( $yr->num() )
+ )
+ ),
$m
- ) );</code></pre>
+ )
+ );
+
+ # perltidy -wn
+ $x->badd( bmul(
+ $class->new( abs(
+ $sx * int( $xr->num() ) & $sy * int( $yr->num() )
+ ) ),
+ $m
+ ) );</code></pre>
-<p>The welded closing tokens are by default on a separate line but this can be modified with the <b>-vtc=n</b> flag (described in the next section). For example, the same example adding <b>-vtc=2</b> is</p>
+<p>The welded closing tokens are by default on a separate line but this can be modified with the <b>--vertical-tightness-closing=n</b> (<b>-vtc=n</b>) flag (described in the next section). For example, the same example adding <b>-vtc=2</b> is</p>
-<pre><code> # perltidy -wn -vtc=2
- $x->badd( bmul(
- $class->new( abs(
- $sx * int( $xr->numify() ) & $sy * int( $yr->numify() ) ) ),
- $m ) );</code></pre>
+<pre><code> # perltidy -wn -vtc=2
+ $x->badd( bmul(
+ $class->new( abs(
+ $sx * int( $xr->num() ) & $sy * int( $yr->num() ) ) ),
+ $m ) );</code></pre>
<p>This format option is quite general but there are some limitations.</p>
<p>One limitation is that any line length limit still applies and can cause long welded sections to be broken into multiple lines.</p>
-<p>Another limitation is that an opening symbol which delimits quoted text cannot be included in a welded pair. This is because quote delimiters are treated specially in perltidy.</p>
-
-<p>Finally, the stacking of containers defined by this flag have priority over any other container stacking flags. This is because any welding is done first.</p>
+<p>Also, the stacking of containers defined by this flag have priority over any other container stacking flags. This is because any welding is done first.</p>
</dd>
<dt id="wfc---weld-fat-comma"><b>-wfc</b>, <b>--weld-fat-comma </b></dt>
<dd>
-<p>When the <b>-wfc</b> flag is set, along with <b>-wn</b>, perltidy is allowed to weld an opening paren to an inner opening container when they are separated by a hash key and fat comma (=>). for example</p>
+<p>When the <b>-wfc</b> flag is set, along with <b>-wn</b> (<b>--weld-nested-containers</b>), perltidy is allowed to weld an opening paren to an inner opening container when they are separated by a hash key and fat comma (=>). for example</p>
<pre><code> # perltidy -wn -wfc
elf->call_method( method_name_foo => {
<p>A third optional item of information which can be given is an alphanumeric letter which is used to limit the selection further depending on the type of token immediately before the container. If given, it goes just before the container symbol. The possible letters are currently 'k', 'K', 'f', 'F', 'w', and 'W', with these meanings:</p>
-<pre><code> 'k' matches if the previous nonblank token is a perl built-in keyword (such as 'if', 'while'),
- 'K' matches if 'k' does not, meaning that the previous token is not a keyword.
- 'f' matches if the previous token is a function other than a keyword.
- 'F' matches if 'f' does not.
- 'w' matches if either 'k' or 'f' match.
- 'W' matches if 'w' does not.</code></pre>
+<pre><code> 'k' matches if the previous nonblank token is a perl keyword
+ (such as 'if', 'while'),
+ 'K' matches if 'k' does not: previous token is not a keyword
+ 'f' matches if previous token is a function (not a keyword)
+ 'F' matches if 'f' does not
+ 'w' matches if either 'k' or 'f' match
+ 'W' matches if 'w' does not</code></pre>
<p>For example, compare</p>
<pre><code> # perltidy -wn
if ( defined( $_Cgi_Query{
- $Config{'methods'}{'authentication'}{'remote'}{'cgi'}{'username'}
+ $Config{'methods'}{'auth'}{'remote'}{'cgi'}{'username'}
} ) )</code></pre>
<p>with</p>
<pre><code> # perltidy -wn -wnxl='^K( {'
if ( defined(
- $_Cgi_Query{ $Config{'methods'}{'authentication'}{'remote'}{'cgi'}
+ $_Cgi_Query{ $Config{'methods'}{'auth'}{'remote'}{'cgi'}
{'username'} }
) )</code></pre>
<p>Here are some additional example strings and their meanings:</p>
-<pre><code> '^(' - the weld must not start with a paren
- '.(' - the second and later tokens may not be parens
- '.w(' - the second and later tokens may not keyword or function call parens
- '(' - no parens in a weld
- '^K(' - exclude a leading paren preceded by a non-keyword
- '.k(' - exclude a secondary paren preceded by a keyword
- '[ {' - exclude all brackets and braces
- '[ ( ^K{' - exclude everything except nested structures like do {{ ... }}</code></pre>
+<pre><code> '^(' - the weld must not start with a paren
+ '.(' - second and later tokens may not be parens
+ '.w(' - second and later tokens may not be a keyword or call parens
+ '(' - no parens in a weld
+ '^K(' - exclude a leading paren preceded by a non-keyword
+ '.k(' - exclude a secondary paren preceded by a keyword
+ '[ {' - exclude all brackets and braces
+ '[ ( ^K{' - exclude all except nested structures like do {{ ... }}</code></pre>
</dd>
<dt id="Vertical-tightness-of-non-block-curly-braces-parentheses-and-square-brackets"><b>Vertical tightness</b> of non-block curly braces, parentheses, and square brackets.</dt>
-vt=2 never break a line after opening token</code></pre>
</li>
-<li><p>You must also use the <b>-lp</b> flag when you use the <b>-vt</b> flag; the reason is explained below.</p>
+<li><p>You must also use the <b>-lp</b> (<b>--line-up-parentheses</b>) flag when you use the <b>-vt</b> flag; the reason is explained below.</p>
</li>
<li><p>Closing tokens (except for block braces) are controlled by <b>-vtc=n</b>, or <b>--vertical-tightness-closing=n</b>, where</p>
<li><p>Different controls may be applied to different token types, and it is also possible to control block braces; see below.</p>
</li>
-<li><p>Finally, please note that these vertical tightness flags are merely hints to the formatter, and it cannot always follow them. Things which make it difficult or impossible include comments, blank lines, blocks of code within a list, and possibly the lack of the <b>-lp</b> parameter. Also, these flags may be ignored for very small lists (2 or 3 lines in length).</p>
+<li><p>Finally, please note that these vertical tightness flags are merely hints to the formatter, and it cannot always follow them. Things which make it difficult or impossible include comments, blank lines, blocks of code within a list, and possibly the lack of the <b>--line-up-parentheses</b> parameter. Also, these flags may be ignored for very small lists (2 or 3 lines in length).</p>
</li>
</ul>
<p>The difference between <b>-vt=1</b> and <b>-vt=2</b> is shown here:</p>
-<pre><code> # perltidy -lp -vt=1
- $init->add(
- mysprintf( "(void)find_threadsv(%s);",
- cstring( $threadsv_names[ $op->targ ] )
- )
- );
+<pre><code> # perltidy -lp -vt=1
+ $init->add(
+ mysprintf( "(void)find_threadsv(%s);",
+ cstring( $threadsv_names[ $op->targ ] )
+ )
+ );
- # perltidy -lp -vt=2
- $init->add( mysprintf( "(void)find_threadsv(%s);",
- cstring( $threadsv_names[ $op->targ ] )
- )
- );</code></pre>
+ # perltidy -lp -vt=2
+ $init->add( mysprintf( "(void)find_threadsv(%s);",
+ cstring( $threadsv_names[ $op->targ ] )
+ )
+ );</code></pre>
<p>With <b>-vt=1</b>, the line ending in <code>add(</code> does not combine with the next line because the next line is not balanced. This can help with readability, but <b>-vt=2</b> can be used to ignore this rule.</p>
<p>The tightest, and least readable, code is produced with both <code>-vt=2</code> and <code>-vtc=2</code>:</p>
-<pre><code> # perltidy -lp -vt=2 -vtc=2
- $init->add( mysprintf( "(void)find_threadsv(%s);",
- cstring( $threadsv_names[ $op->targ ] ) ) );</code></pre>
+<pre><code> # perltidy -lp -vt=2 -vtc=2
+ $init->add( mysprintf( "(void)find_threadsv(%s);",
+ cstring( $threadsv_names[ $op->targ ] ) ) );</code></pre>
-<p>Notice how the code in all of these examples collapses vertically as <b>-vt</b> increases, but the indentation remains unchanged. This is because perltidy implements the <b>-vt</b> parameter by first formatting as if <b>-vt=0</b>, and then simply overwriting one output line on top of the next, if possible, to achieve the desired vertical tightness. The <b>-lp</b> indentation style has been designed to allow this vertical collapse to occur, which is why it is required for the <b>-vt</b> parameter.</p>
+<p>Notice how the code in all of these examples collapses vertically as <b>-vt</b> increases, but the indentation remains unchanged. This is because perltidy implements the <b>-vt</b> parameter by first formatting as if <b>-vt=0</b>, and then simply overwriting one output line on top of the next, if possible, to achieve the desired vertical tightness. The <b>-lp</b> (<b>--line-up-parentheses</b>) indentation style has been designed to allow this vertical collapse to occur, which is why it is required for the <b>-vt</b> parameter.</p>
<p>The <b>-vt=n</b> and <b>-vtc=n</b> parameters apply to each type of container token. If desired, vertical tightness controls can be applied independently to each of the closing container token types.</p>
<p>For example, if we want to just apply this style to <code>if</code>, <code>elsif</code>, and <code>else</code> blocks, we could use <code>perltidy -bli -bbvt=1 -bbvtl='if elsif else'</code>.</p>
-<p>There is no vertical tightness control for closing block braces; with one exception they will be placed on separate lines. The exception is that a cascade of closing block braces may be stacked on a single line. See <b>-scbb</b>.</p>
+<p>There is no vertical tightness control for closing block braces; with one exception they will be placed on separate lines. The exception is that a cascade of closing block braces may be stacked on a single line. See <b>--stack-closing-block-brace</b>.</p>
</dd>
<dt id="sot---stack-opening-tokens-and-related-flags"><b>-sot</b>, <b>--stack-opening-tokens</b> and related flags</dt>
always_quote => 1,
} );</code></pre>
-<p>The <b>-sct</b> flag is somewhat similar to the <b>-vtc</b> flags, and in some cases it can give a similar result. The difference is that the <b>-vtc</b> flags try to avoid lines with leading opening tokens by "hiding" them at the end of a previous line, whereas the <b>-sct</b> flag merely tries to reduce the number of lines with isolated closing tokens by stacking them but does not try to hide them. For example:</p>
+<p>The <b>-sct</b> flag is somewhat similar to the <b>-vtc</b> (<b>--vertical-tightness-closing</b> flags, and in some cases it can give a similar result. The difference is that the <b>-vtc</b> flags try to avoid lines with leading opening tokens by "hiding" them at the end of a previous line, whereas the <b>-sct</b> flag merely tries to reduce the number of lines with isolated closing tokens by stacking them but does not try to hide them. For example:</p>
<pre><code> # -vtc=2
$opt_c = Text::CSV_XS->new(
push( @lines, "$w1 $w2 $w3 $w4\n" );
} } } }</code></pre>
-<p>To simplify input even further for the case in which both opening and closing non-block containers are stacked, the flag <b>-sac</b> or <b>--stack-all-containers</b> is an abbreviation for <b>-sot -sct</b>.</p>
+<p>To simplify input even further for the case in which both opening and closing non-block containers are stacked, the flag <b>-sac</b> or <b>--stack-all-containers</b> is an abbreviation for <b>--stack-opening-tokens --stack-closing-tokens</b>.</p>
-<p>Please note that if both opening and closing tokens are to be stacked, then the newer flag <b>-weld-nested-containers</b> may be preferable because it insures that stacking is always done symmetrically. It also removes an extra level of unnecessary indentation within welded containers. It is able to do this because it works on formatting globally rather than locally, as the <b>-sot</b> and <b>-sct</b> flags do.</p>
+<p>Please note that if both opening and closing tokens are to be stacked, then the newer flag <b>--weld-nested-containers</b> may be preferable because it insures that stacking is always done symmetrically. It also removes an extra level of unnecessary indentation within welded containers. It is able to do this because it works on formatting globally rather than locally, as the <b>--stack-opening-tokens</b> and <b>--stack-closing-tokens</b> flags do.</p>
</dd>
</dl>
<pre><code> -wbb="+ - / *"</code></pre>
-<p>These commands should work well for most of the token types that perltidy uses (use <b>--dump-token-types</b> for a list). Also try the <b>-D</b> flag on a short snippet of code and look at the .DEBUG file to see the tokenization. However, for a few token types there may be conflicts with hardwired logic which cause unexpected results. One example is curly braces, which should be controlled with the parameter <b>bl</b> provided for that purpose.</p>
+<p>These commands should work well for most of the token types that perltidy uses (use <b>--dump-token-types</b> for a list). Also try the <b>-D</b> flag on a short snippet of code and look at the .DEBUG file to see the tokenization. However, for a few token types there may be conflicts with hardwired logic which cause unexpected results. One example is curly braces, which should be controlled with the parameter <b>--opening-brace-on-new-line</b> provided for that purpose.</p>
<p><b>WARNING</b> Be sure to put these tokens in quotes to avoid having them misinterpreted by your command shell.</p>
= **= += *= &= <<= &&= -= /= |= >>= ||= //= .= %= ^= x=
. : ? && || and or err xor</code></pre>
-<p>and the <b>-bbao</b> flag sets the default to break before all of these operators. These can be used to define an initial break preference which can be fine-tuned with the <b>-wba</b> and <b>-wbb</b> flags. For example, to break before all operators except an <b>=</b> one could use --bbao -wba='=' rather than listing every single perl operator except <b>=</b> on a -wbb flag.</p>
+<p>and the <b>-bbao</b> flag sets the default to break before all of these operators. These can be used to define an initial break preference which can be fine-tuned with the <b>--want-break-after</b> and <b>--want-break-before</b> flags. For example, to break before all operators except an <b>=</b> one could use <code>-bbao -wba='='</code> rather than listing every single perl operator except <b>=</b> on a <b>-wbb</b> flag.</p>
<dl>
<pre><code> # perltidy (default)
my @list = ( 1, 1, 1, 1, 2, 1, 1, 3, 3, 1, 1, 4, 6, 4, 1, );</code></pre>
-<p>This formatting loses the nice structure. If we place a side comment anywhere between the opening and closing parens, the original line break points are retained. For example,</p>
+<p>This formatting loses the nice structure. The original line breaks can be retained by adding comment or a blank line somewhere between the two parens. For example,</p>
+
+<pre><code> my @list = (
+ 1, # a side comment forces the original breakpoints to be kept
+ 1, 1,
+ 1, 2, 1,
+ 1, 3, 3, 1,
+ 1, 4, 6, 4, 1,
+ );</code></pre>
+
+<p>We could achieve the same result with a blank line or full comment anywhere between the opening and closing parens. Vertical alignment of the list items will still occur if possible. The blank line method is shown here:</p>
<pre><code> my @list = (
- 1, # a side comment forces the original line breakpoints to be kept
+
+ 1,
1, 1,
1, 2, 1,
1, 3, 3, 1,
1, 4, 6, 4, 1,
);</code></pre>
-<p>The side comment can be a single hash symbol without any text. We could achieve the same result with a blank line or full comment anywhere between the opening and closing parens. Vertical alignment of the list items will still occur if possible.</p>
-
<p>For another possibility see the -fs flag in <a href="#Skipping-Selected-Sections-of-Code">"Skipping Selected Sections of Code"</a>.</p>
<dl>
<p>A disadvantage of this flag compared to the methods discussed above is that all tables in the file must already be nicely formatted.</p>
+</dd>
+<dt id="btct-s---break-at-trailing-comma-types-s"><b>-btct=s</b>, <b>--break-at-trailing-comma-types=s</b></dt>
+<dd>
+
+<p>A <b>trailing comma</b> is an optional comma following the last item of a list. The <b>-btct=s</b> tells perltidy to end a line at selected trailing commas. The string <b>s</b> selects the trailing commas, as follows:</p>
+
+<pre><code> s=1 or '*' : every trailing comma
+ s=m a trailing comma in a multiline list
+ s=b a bare trailing comma
+ s=0 none</code></pre>
+
+<p>For example, given the following input</p>
+
+<pre><code> $w->bind(
+ '<Page_Down>' => xx,
+ );</code></pre>
+
+<p>The default formatting would flatten this into a single line. But the container can be kept open with</p>
+
+<pre><code> # perltidy -btct='b'
+ $w->bind(
+ '<Page_Down>' => xx,
+ );</code></pre>
+
+<p>This can be particularly useful for short function calls like this, where the default perltidy formatting does not keep the container open.</p>
+
+<p>The options <b>s=m</b> and <b>s=1</b> can be used to open up containers with non-bare trailing commas. For example, given this input</p>
+
+<pre><code> $w->bind( '<Page_Down>' => xx, );</code></pre>
+
+<p>we can break it open with</p>
+
+<pre><code> # perltidy -btct=1
+ $w->bind(
+ '<Page_Down>' => xx,
+ );</code></pre>
+
+<p>Afterwards, we could switch to <b>-btct='b'</b> since the trailing comma is now bare. But the <b>-btct</b> parameter must be retained in this case because otherwise this small list will be flattened the next time it is formatted.</p>
+
+<p>This logic can be restricted to specific container types by including an opening token ahead of the letter in the above table. For example</p>
+
+<pre><code> -btct='(b'</code></pre>
+
+<p>means that this operation should only apply to bare trailing commas within parentheses.</p>
+
+<p>For parentheses, an additional item of information which can be given is an alphanumeric letter which is used to limit the selection further depending on the type of token immediately before the opening paren. The possible letters are currently 'k', 'K', 'f', 'F', 'w', and 'W', with these meanings for matching whatever precedes an opening paren:</p>
+
+<pre><code> 'k' matches if the previous nonblank token is a perl keyword
+ (such as 'if', 'while'),
+ 'K' matches if 'k' does not: previous token is not a keyword
+ 'f' matches if previous token is a function (not a keyword)
+ 'F' matches if 'f' does not.
+ 'w' matches if either 'k' or 'f' match.
+ 'W' matches if 'w' does not.</code></pre>
+
+<p>These are the same codes used for <b>--line-up-parentheses-inclusion-list</b>. For example,</p>
+
+<pre><code> -btct='f(b'</code></pre>
+
+<p>means that bare trailing commas in function call lists in the input stream should be followed by line breaks in the formatted output stream.</p>
+
+<p>The section <a href="#Adding-and-Deleting-Commas">"Adding and Deleting Commas"</a> describes additional controls for working with trailing commas. These can be combined with the <b>-break-trailing-comma-types</b> parameter for additional control of list formatting.</p>
+
</dd>
<dt id="mft-n---maximum-fields-per-table-n"><b>-mft=n</b>, <b>--maximum-fields-per-table=n</b></dt>
<dd>
-<p>If <b>n</b> is a positive number, and the computed number of fields for any table exceeds <b>n</b>, then it will be reduced to <b>n</b>. This parameter might be used on a small section of code to force a list to have a particular number of fields per line, and then either the <b>-boc</b> flag could be used to retain this formatting, or a single comment could be introduced somewhere to freeze the formatting in future applications of perltidy. For example</p>
+<p>If <b>n</b> is a positive number, and the computed number of fields for any table exceeds <b>n</b>, then it will be reduced to <b>n</b>. This parameter might be used on a small section of code to force a list to have a particular number of fields per line, and then either the <b>-boc</b> flag could be used to retain this formatting, or a blank line or comment could be introduced somewhere to freeze the formatting in future applications of perltidy. For example</p>
<pre><code> # perltidy -mft=2
@month_of_year = (
"09" => 30, "10" => 31, "11" => 30, "12" => 31
);</code></pre>
+</dd>
+<dt id="qwaf---qw-as-function"><b>-qwaf</b>, <b>--qw-as-function</b></dt>
+<dd>
+
+<p>This option tells perltidy to format a <b>qw</b> list which is delimited with parentheses as if it were a function call whose call args are a list of quoted items. Normally, a <b>qw</b> list is output verbatim except for an adjustment of leading whitespace to indicate the indentation level. For example, here is an example of the default formatting of a poorly formatted <b>qw</b> list:</p>
+
+<pre><code> # perltidy
+ @fields = qw( $st_dev $st_ino $st_mode $st_nlink $st_uid
+ $st_gid $st_rdev $st_size $st_atime $st_mtime $st_ctime
+ $st_blksize $st_blocks);</code></pre>
+
+<p>If we format with <b>-qwaf</b> then the result will be:</p>
+
+<pre><code> # perltidy -qwaf
+ @fields = qw(
+ $st_dev $st_ino $st_mode $st_nlink
+ $st_uid $st_gid $st_rdev $st_size
+ $st_atime $st_mtime $st_ctime $st_blksize
+ $st_blocks
+ );</code></pre>
+
+<p>The way this works is that just before formatting begins, the tokens of the <b>qw</b> text are replaced with the tokens of an equivalent function call with a comma-separated list of quoted items as call args. Then it is formatted like any other list. Special comma tokens are employed which have no display text, so when the code is eventually displayed it remains a valid <b>qw</b> quote.</p>
+
+<p>Some things to note are:</p>
+
+<ul>
+
+<li><p>This only works for <b>qw</b> quotes which begin with <b>qw(</b>, with no space before the paren.</p>
+
+</li>
+<li><p>If the option <b>--space-function-paren</b> is employed, it is ignored for these special function calls because it would deactivate them.</p>
+
+</li>
+<li><p>Otherwise the various formatting control flags operate on these lists the same as for other comma-separated lists. In particular, note that if <b>--break-at-old-comma-breakpoints</b>, or <b>-boc</b>, is set, then the old line break locations will be retained. And old line breaks will be retained if there are any blank lines between the opening and closing parens.</p>
+
+</li>
+<li><p>Before using this option for the first time, it is a good idea to scan the code and decide if any lists have a special order which should be retained. This can be accomplished for example by changing the quote delimiters to something other than parens, or by inserting a blank line as discussed at the start of this section.</p>
+
+</li>
+</ul>
+
</dd>
</dl>
<dt id="drc---delete-repeated-commas"><b>-drc</b>, <b>--delete-repeated-commas</b></dt>
<dd>
-<p>Repeated commas in a list are undesirable and can be removed with this flag. For example, given this list with a repeated comma</p>
+<p>This option causes repeated commas to be removed. For example, given this list with a repeated comma</p>
<pre><code> ignoreSpec( $file, "file",, \%spec, \%Rspec );</code></pre>
<pre><code> # perltidy -drc:
ignoreSpec( $file, "file", \%spec, \%Rspec );</code></pre>
-<p>Since the default is not to add or delete commas, this feature is off by default and must be requested.</p>
+<p>This parameter also deletes repeated fat commas, '=>'. The complete list of actions taken when this flag is set are as follows:</p>
-</dd>
-<dt id="want-trailing-commas-s-or--wtc-s---add-trailing-commas-or--atc-and---delete-trailing-commas-or--dtc"><b>--want-trailing-commas=s</b> or <b>-wtc=s</b>, <b>--add-trailing-commas</b> or <b>-atc</b>, and <b>--delete-trailing-commas</b> or <b>-dtc</b></dt>
-<dd>
+<ul>
-<p>A trailing comma is a comma following the last item of a list. Perl allows trailing commas but they are not required. By default, perltidy does not add or delete trailing commas, but it is possible to manipulate them with the following set of three related parameters:</p>
+<li><p>Repeated commas like ',,' are removed with a warning.</p>
-<pre><code> --want-trailing-commas=s, -wtc=s - defines where trailing commas are wanted
- --add-trailing-commas, -atc - gives permission to add trailing commas to match the style wanted
- --delete-trailing-commas, -dtc - gives permission to delete trailing commas which do not match the style wanted</code></pre>
+</li>
+<li><p>Repeated fat commas like '=> =>' are removed with a warning.</p>
-<p>The parameter <b>--want-trailing-commas=s</b>, or <b>-wtc=s</b>, defines a preferred style. The string <b>s</b> indicates which lists should get trailing commas, as follows:</p>
+</li>
+<li><p>The combination '=>,' produces a warning but is not changed (it is likely an error but only its author would know how to fix it).</p>
-<pre><code> s=0 : no list should have a trailing comma
- s=1 or * : every list should have a trailing comma
- s=m a multi-line list should have a trailing commas
- s=b trailing commas should be 'bare' (comma followed by newline)
- s=h lists of key=>value pairs, with about one one '=>' and one ',' per line,
- with a bare trailing comma
- s=i lists with about one comma per line, with a bare trailing comma
- s=' ' or -wtc not defined : leave trailing commas unchanged [DEFAULT].</code></pre>
+</li>
+<li><p>The remaining combination ',=>' (sometimes called a 'winking fat comma') is ignored by this parameter.</p>
-<p>This parameter by itself only indicates the where trailing commas are wanted. Perltidy only adds these trailing commas if the flag <b>--add-trailing-commas</b>, or <b>-atc</b> is set. And perltidy only removes unwanted trailing commas if the flag <b>--delete-trailing-commas</b>, or <b>-dtc</b> is set.</p>
+</li>
+<li><p>These warnings are only output if the <b>--warning-output</b>, or <b>-w</b>, flag is set.</p>
-<p>Here are some example parameter combinations and their meanings</p>
+</li>
+</ul>
-<pre><code> -wtc=0 -dtc : delete all trailing commas
- -wtc=1 -atc : all lists get trailing commas
- -wtc=m -atc : all multi-line lists get trailing commas, but
- single line lists remain unchanged.
- -wtc=m -dtc : multi-line lists remain unchanged, but
- any trailing commas on single line lists are removed.
- -wtc=m -atc -dtc : all multi-line lists get trailing commas, and
- any trailing commas on single line lists are removed.</code></pre>
+<p>This option is on by default. Use <b>-ndrc</b> to turn it off.</p>
-<p>For example, given the following input without a trailing comma</p>
+</dd>
+<dt id="Adding-and-Deleting-Trailing-Commas"><b>Adding and Deleting Trailing Commas</b></dt>
+<dd>
-<pre><code> bless {
- B => $B,
- Root => $Root
- } => $package;</code></pre>
+<p>A <b>trailing comma</b> is a comma following the last item of a list. Perl allows trailing commas but they are not required. Including them can sometimes simplify the maintenance of large or complex lists, and help display structure. But they may not be appropriate in all lists, for example in a list which always has just one term. By default, perltidy does not add or delete trailing commas, but it is possible to manipulate them with the following set of related parameters:</p>
+
+<ul>
-<p>we can add a trailing comma after the variable <code>$Root</code> using</p>
+<li><p><b>--want-trailing-commas=s, -wtc=s</b> - defines where trailing commas are wanted (the style)</p>
-<pre><code> # perltidy -wtc=m -atc
- bless {
- B => $B,
- Root => $Root,
- } => $package;</code></pre>
+</li>
+<li><p><b>--add-trailing-commas, -atc</b> - gives permission to add trailing commas to match the style wanted</p>
+
+</li>
+<li><p><b>--delete-trailing-commas, -dtc</b> - gives permission to delete trailing commas which do not match the style wanted</p>
+
+</li>
+</ul>
+
+<p>The parameter <b>--want-trailing-commas=s</b>, or <b>-wtc=s</b>, defines a preferred style. The string <b>s</b> indicates which lists should get trailing commas, as follows:</p>
+
+<pre><code> s=1 or '*' : every list
+ s=m a multiline list
+ s=b a multiline list, bare trailing comma
+ s=i a multiline list, bare trailing comma, about one comma per line
+ s=h a multiline list, bare trailing comma, about one key=>value
+ pair per line
+ s=0 : no list
+
+ s=' ' or not defined : leave trailing commas unchanged [DEFAULT]</code></pre>
+
+<p>where:</p>
+
+<ul>
+
+<li><p>A <b>list</b> here is basically taken to be a container of items (parens, square brackets, or braces), which is not a code block, which contains one or more commas or fat commas. These parameters only apply to something that fits this definition of a list.</p>
+
+<p>A paren-less list of parameters is not a list by this definition, so these parameters do not apply to a paren-less list.</p>
+
+</li>
+<li><p>A <b>multiline list</b> is a list for which the opening and closing brackets on different lines.</p>
+
+</li>
+<li><p>A <b>bare trailing comma</b> is a comma which is at the end of a line. That is, the closing container token follows on a different line. So a list with a bare trailing comma is a special case of a multiline list.</p>
+
+</li>
+<li><p>In fact the above options for trailing commas can be seen as a hierarchy of nesting sets which can be expressed as</p>
+
+<pre><code> 1 > m > b > i > h > 0</code></pre>
+
+<p>This indicates that multiline trailing commas <b>m</b> are a subset of all trailing commas, and bare trailing commas <b>b</b> are a subset of all multiline trailing commas, and so on.</p>
+
+</li>
+</ul>
+
+<p>This parameter by itself only indicates where trailing commas are wanted. Perltidy only adds these trailing commas if permission is granted by setting the flag <b>--add-trailing-commas</b>, or <b>-atc</b>. And perltidy only removes unwanted trailing commas if the flag <b>--delete-trailing-commas</b>, or <b>-dtc</b> is set.</p>
+
+<p>Here are some example parameter combinations and their meanings</p>
+
+<pre><code> -wtc=0 -dtc : delete all trailing commas
+ -wtc=1 -atc : add trailing commas to all lists
+ -wtc=m -atc : add trailing commas to all multiline lists
+ (single line lists remain unchanged)
+ -wtc=b -atc : add commas so that all lists whose closing
+ bracket starts a new line have trailing commas
+ -wtc=b -dtc : all trailing commas which are not bare
+ (not followed by a newline) get deleted.
+ -wtc=b -atc -dtc : do both of the above operations so that
+ all trailing commas are bare</code></pre>
+
+<p>For example, given the following input</p>
+
+<pre><code> $wine_list = $top->Box(
+ "-length" => 5,
+ "-width" => 3
+ )->select( "red", "white", "gold", );</code></pre>
-<p>This could also be achieved in this case with <b>-wtc=b</b> instead of <b>-wtc=m</b> because the trailing comma here is bare (separated from its closing brace by a newline). And it could also be achieved with <b>-wtc=h</b> because this particular list is a list of key=>value pairs.</p>
+<p>we have</p>
-<p>The above styles should cover the main of situations of interest, but it is possible to apply a different style to each type of container token by including an opening token ahead of the style character in the above table. For example</p>
+<pre><code> # perltidy -wtc=b -atc -dtc
+ $wine_list = $top->Box(
+ "-length" => 5,
+ "-width" => 3,
+ )->select( "red", "white", "gold" );</code></pre>
+
+<p>A comma was added after the <code>3</code>, since it is bare, and a comma was removed after <code>"gold"</code>, since it not bare.</p>
+
+<p>It is possible to apply a different style to each type of container token by including an opening token ahead of the style character in the above table. For example</p>
<pre><code> -wtc='(m [b'</code></pre>
-<p>means that lists within parens should have multi-line trailing commas, and that lists within square brackets have bare trailing commas. Since there is no specification for curly braces in this example, their trailing commas would remain unchanged.</p>
+<p>means that lists within parens should have multiline trailing commas, and that lists within square brackets have bare trailing commas. Since there is no specification for curly braces in this example, their trailing commas would remain unchanged.</p>
<p>For parentheses, an additional item of information which can be given is an alphanumeric letter which is used to limit the selection further depending on the type of token immediately before the opening paren. The possible letters are currently 'k', 'K', 'f', 'F', 'w', and 'W', with these meanings for matching whatever precedes an opening paren:</p>
-<pre><code> 'k' matches if the previous nonblank token is a perl built-in keyword (such as 'if', 'while'),
- 'K' matches if 'k' does not, meaning that the previous token is not a keyword.
- 'f' matches if the previous token is a function other than a keyword.
+<pre><code> 'k' matches if the previous nonblank token is a perl keyword
+ (such as 'if', 'while'),
+ 'K' matches if 'k' does not: previous token is not a keyword
+ 'f' matches if previous token is a function (not a keyword)
'F' matches if 'f' does not.
'w' matches if either 'k' or 'f' match.
'W' matches if 'w' does not.</code></pre>
<p>These are the same codes used for <b>--line-up-parentheses-inclusion-list</b>. For example,</p>
-<pre><code> -wtc = 'w(m'</code></pre>
+<pre><code> -wtc='w(m'</code></pre>
+
+<p>means that trailing commas are wanted for multiline parenthesized lists following a function call or keyword.</p>
+
+<p>Finally, a leading <b>+</b> can be placed on any term to indicate that it only applies when adding commas. A leading <b>-</b> indicates that it only applies when deleting commas. For example,</p>
-<p>means that trailing commas are wanted for multi-line parenthesized lists following a function call or keyword.</p>
+<pre><code> -wtc='+h -b' -atc -dtc</code></pre>
-<p>Here are some points to note regarding adding and deleting trailing commas:</p>
+<p>means that missing trailing commas should be added to lists of key => value pairs, and trailing commas which are not bare should be removed. No other changes are made. When both plus and minus terms are used like this, they must not be in conflict. There is no conflict in this example because the trailing comma locations of the key=>value pairs selected by the <b>+h</b> term are a subset of all bare trailing commas, and thus will not be deleted by the <b>-b</b> term. The general rule is that the letter of the plus term should occur after the letter of the minus term in the hierarchical nesting order, <b>1 > m > b > i > h > 0</b>.</p>
+
+<p><b>Some points to note</b> regarding adding and deleting trailing commas:</p>
<ul>
-<li><p>For the implementation of these parameters, a <b>list</b> is basically taken to be a container of items (parens, square brackets, or braces), which is not a code block, with one or more commas. These parameters only apply to something that fits this definition of a list.</p>
+<li><p>It is recommended to also use the <b>--converge</b> parameter when adding and/or deleting trailing commas, especially if the formatter may be making other line break changes at the same time. The reason is that the decision regarding whether or not a list is multiline or bare is made based on the <b>input</b> stream if only one iteration is made, which is the default.</p>
-<p>Note that a paren-less list of parameters is not a list by this definition, so these parameters have no effect on a peren-less list.</p>
+<p>When iterations are requested with the <b>--converge</b> parameter, any comma deletion operations are postponed until the start of the <b>second iteration</b>, after most changes in line breaks have been made.</p>
-<p>Another consequence is that if the only comma in a list is deleted, then it cannot later be added back with these parameters because the container no longer fits this definition of a list. For example, given</p>
+<p>To illustrate, if we start with</p>
-<pre><code> my ( $self, ) = @_;</code></pre>
+<pre><code> f(
+ a => 1,
+ b => 2, );</code></pre>
-<p>and if we remove the comma with</p>
+<p>and attempt to delete non-bare commas,</p>
-<pre><code> # perltidy -wtc=m -dtc
- my ( $self ) = @_;</code></pre>
+<pre><code> # perltidy -wtc=b -dtc
+ f(
+ a => 1,
+ b => 2
+ );</code></pre>
-<p>then we cannot use these trailing comma controls to add this comma back.</p>
+<p>we delete a comma which has become bare, which is not what is wanted. This happened because the change was based on the starting rather than the final line breaks. Running with <b>--converge</b> gives the desired result:</p>
-</li>
-<li><p>By <b>multiline</b> list is meant a list for which the first comma and trailing comma are on different lines.</p>
+<pre><code> # perltidy -wtc=b -dtc --converge
+ f(
+ a => 1,
+ b => 2,
+ );</code></pre>
+
+<p>because comma changes are based on the line breaks after the first iteration.</p>
+
+<p>A parameter <b>--delay-trailing-comma-operations</b>, or <b>-dtco</b>, is available to control this behavior if desired. Negating this parameter, with <b>-ndtco</b>, tells perltidy to always use the starting state to make decisions regarding comma addition and deletion, even when iterations are requested. This should not normally be necessary.</p>
</li>
-<li><p>A <b>bare</b> trailing comma is a comma which is at the end of a line. That is, the closing container token follows on a different line. So a list with a bare trailing comma is a special case of a multi-line list.</p>
+<li><p>Perltidy does not add a trailing comma in some <b>edge cases</b> which appear to be near a stability limit. So if a comma is unexpectedly not added, this is probably the reason.</p>
</li>
-<li><p>The decision regarding whether or not a list is multi-line or bare is made based on the <b>input</b> stream. In some cases it may take an iteration or two to reach a final state.</p>
+<li><p>If the parameter <b>--break-at-trailing-comma-types</b>, or <b>-btct</b>. is also employed, it operates on the state after any adding or deleting of commas. And it will allow trailing commas to be added in most edge cases. For example, given the following input text</p>
+
+<pre><code> plot(
+ 'g', Canvas => $overview_canvas
+ );</code></pre>
+
+<p>formatting with <code>-wtc=f(b) -atc</code> will not add a trailing comma because the list will be flattened and the comma will not remain bare. But we can add a trailing comma, and keep the container open, with</p>
+
+<pre><code> # perltidy -wtc='f(b' -atc -btct='f(b'
+ plot(
+ 'g', Canvas => $overview_canvas,
+ );</code></pre>
+
+<p>As another example, given the same text on a single line without a trailing comma</p>
+
+<pre><code> plot( 'g', Canvas => $overview_canvas );</code></pre>
+
+<p>we can add a trailing comma and break the container open with</p>
+
+<pre><code> # perltidy -wtc=1 -atc -btct=1
+ plot(
+ 'g', Canvas => $overview_canvas,
+ );</code></pre>
+
+<p>After that, we could use <code>-btct='f(b'</code> to keep the container open.</p>
</li>
<li><p>When using these parameters for the first time it is a good idea to practice on some test scripts and verify that the results are as expected.</p>
</li>
-<li><p>Since the default behavior is not to add or delete commas, these parameters can be useful on a temporary basis for reformatting a script.</p>
+</ul>
+
+<p><b>Special Considerations for Lone Trailing Commas</b></p>
+
+<p>Adding or deleting the only comma in a list can have some implications which need to be explained and possibly controlled. Two additional controls are available for these lone commas:</p>
+
+<ul>
+
+<li><p><b>--add-lone-trailing-commas, -altc</b> - gives permission to add a comma if it will be the only comma. This is on by default and explained below.</p>
+
+</li>
+<li><p><b>--delete-lone-trailing-commas, -dltc</b> - gives permission to delete the only comma in a list. This is on by default and explained below.</p>
</li>
</ul>
+<p>One issue with deleting a lone comma is that if it is deleted, then it might not be possible add it back automatically since perltidy uses the existence of commas to help locate containers where commas are appropriate. For example, given</p>
+
+<pre><code> my ( $self, ) = @_;</code></pre>
+
+<p>and if we remove the comma with</p>
+
+<pre><code> # perltidy -wtc=m -dtc
+ my ( $self ) = @_;</code></pre>
+
+<p>then we cannot use the trailing comma controls to add this comma back. The parameter <b>--delete-lone-trailing-commas</b> allows such a comma to be deleted, and is on by default, but can be turned off to prevent this. This might be useful if one is experimenting with formatting options and wants to restrict testing to operations which are reversible. Note that this parameter is a fine-tuning control for <b>--delete-trailing-commas</b> which must also be set for it to have any effect.</p>
+
+<p>However, if a single item in a list is itself is a list with multiple lines, such as the item in braces here</p>
+
+<pre><code> $self->make_grammar(
+ {
+ iterator => $self->_iterator,
+ parser => $self,
+ }
+ );</code></pre>
+
+<p>then perltidy can add and/or delete a lone comma:</p>
+
+<pre><code> # perltidy -atc -wtc=b
+ $self->make_grammar(
+ {
+ iterator => $self->_iterator,
+ parser => $self,
+ },
+ );</code></pre>
+
+<p>But it turns out that these cases usually coincide with situations where the <b>--weld-nested-containers</b>, or <b>-wn</b>, parameter would apply, and adding such commas can block welding. For example, the <b>-wn</b> parameter would succeed on the first of the above snippets, but it would fail on the second because of the added trailing comma.</p>
+
+<p>The parameter <b>--add-lone-trailing-commas</b>, or <b>-altc</b> allows these commas to be added, provide that <b>--add-trailing-commas</b> is also set. It is on by default. Users of <b>-wn</b> may want to turn it off with <b>--noadd-lone-trailing-commas</b>, <b>-naltc</b> to prevent such commas from being added.</p>
+
+<p>If such commas do get added, then can be removed to allow welding with the control described in the next section.</p>
+
</dd>
<dt id="dwic---delete-weld-interfering-commas"><b>-dwic</b>, <b>--delete-weld-interfering-commas</b></dt>
<dd>
<p>If the closing tokens of two nested containers are separated by a comma, then welding requested with <b>--weld-nested-containers</b> cannot occur. Any commas in this situation are optional trailing commas and can be removed with <b>-dwic</b>. For example, a comma in this script prevents welding:</p>
<pre><code> # perltidy -wn
- skip_symbols(
- [ qw(
- Perl_dump_fds
- Perl_ErrorNo
- Perl_GetVars
- PL_sys_intern
- ) ],
+ $self->make_grammar(
+ {
+ iterator => $self->_iterator,
+ parser => $self,
+ },
);</code></pre>
-<p>Using <b>-dwic</b> removes the comma and allows welding:</p>
+<p>Adding <b>-dwic</b> removes the comma and allows welding:</p>
<pre><code> # perltidy -wn -dwic
- skip_symbols( [ qw(
- Perl_dump_fds
- Perl_ErrorNo
- Perl_GetVars
- PL_sys_intern
- ) ] );</code></pre>
+ $self->make_grammar( {
+ iterator => $self->_iterator,
+ parser => $self,
+ } );</code></pre>
-<p>Since the default is not to add or delete commas, this feature is off by default. Here are some points to note about the <b>-dwic</b> parameter</p>
+<p>This feature is off by default. Here are some points to note about the <b>-dwic</b> parameter</p>
<ul>
-<li><p>This operation is not reversible, so please check results of using this parameter carefully.</p>
+<li><p>This operation is not always reversible, so please check results of using this parameter carefully.</p>
</li>
<li><p>Removing this type of isolated trailing comma is necessary for welding to be possible, but not sufficient. So welding will not always occur where these commas are removed.</p>
+</li>
+<li><p>This operation is independent of <b>--add-trailing-commas</b> and <b>--delete-trailing-commas</b>. If it conflicts with any of those settings, it has priority.</p>
+
+</li>
+</ul>
+
+</dd>
+</dl>
+
+<h2 id="Adding-and-Deleting-Interbracket-Arrows">Adding and Deleting Interbracket Arrows</h2>
+
+<p>In the following expression, the arrow operator '->' between the closing and opening brackets of hash keys and array indexes are optional:</p>
+
+<pre><code> return $self->{'commandline'}->{'args'}->[0]->[0]->{'hgroups'};</code></pre>
+
+<p>These will be called <b>interbracket arrows</b> here, for lack of a better term. Perltidy will not change them by default, but they can be added or removed with the following parameters.</p>
+
+<dl>
+
+<dt id="dia---delete-interbracket-arrows"><b>-dia</b>, <b>--delete-interbracket-arrows</b></dt>
+<dd>
+
+<p>This parameter deletes interbracket arrows. Applied to the above example we have</p>
+
+<pre><code> # perltidy -dia
+ return $self->{'commandline'}{'args'}[0][0]{'hgroups'};</code></pre>
+
+<p>By default this applies to all interbracket arrows, but selective deletion is possible with controls described below.</p>
+
+</dd>
+<dt id="aia---add-interbracket-arrows"><b>-aia</b>, <b>--add-interbracket-arrows</b></dt>
+<dd>
+
+<p>This parameter adds interbracket arrows. Applied to the line of code above, we get back the original line.</p>
+
+<pre><code> # perltidy -aia
+ return $self->{'commandline'}->{'args'}->[0]->[0]->{'hgroups'};</code></pre>
+
+<p>Selective changes can be made with controls described below.</p>
+
+</dd>
+<dt id="ias-s---interbracket-arrow-style-s"><b>-ias=s</b>, <b>--interbracket-arrow-style=s</b></dt>
+<dd>
+
+<p>By default the <b>-add-</b> and <b>-delete-</b> parameters apply to all interbracket arrows.</p>
+
+<p>An optional style can be specified with this parameter string <b>s</b>. In that case the parameters <b>--add-interbracket-arrows</b> and <b>--delete-interbracket-arrows</b> only apply where they would bring the formatting into agreement with the specified style. They may both be used in a single run if a mixed style is specified since there is no conflict.</p>
+
+<p>The style string <b>s</b> gives a graphical description of the desired style. It lists up to four possible pairs of bracket types with an optional arrow between them. For example:</p>
+
+<pre><code> -ias='][ }->{'</code></pre>
+
+<p>This means no arrows are wanted between '][' but arrows should be between '}{'. And it means that the unlisted pairs ']{' and '}[' should remain unchanged, either with or without arrows.</p>
+
+<p>In this particular example, if the parameter <b>--delete-interbracket-arrows</b> is used, then only arrows like ']->[' will be deleted, since they are the only ones which disagree with the style.</p>
+
+<p>And likewise, if <b>--add-interbracket-arrows</b> is used, then arrows will only be inserted between brackets like '}{' to bring the formatting into conformity with the style in this example.</p>
+
+<p>Spaces in the string <b>s</b> are optional. They are ignored when the expression is parsed.</p>
+
+<p>The style corresponding to all possible arrows is</p>
+
+<pre><code> -ias=']->[ ]->{ }->[ }->{'</code></pre>
+
+<p>For convenience, this may also be requested with <b>-ias=1</b> or <b>-ias='*'</b>.</p>
+
+<p>The style corresponding to no interbracket arrows is</p>
+
+<pre><code> -ias='] [ ] { } [ } {'</code></pre>
+
+<p>which may also be requested with <b>-ias=0</b>.</p>
+
+</dd>
+<dt id="wia---warn-interbracket-arrows"><b>-wia</b>, <b>--warn-interbracket-arrows</b></dt>
+<dd>
+
+<p>If this parameter is set, then a message is written to the error file in the following cases:</p>
+
+<ul>
+
+<li><p>If an arrow is added or deleted by an add or delete command.</p>
+
+</li>
+<li><p>If a style is defined and an arrow would have been added or deleted if requested. So for example, the command</p>
+
+<pre><code> perltidy -wia -ias=']['</code></pre>
+
+<p>will show where a file has arrows like ]->[' since they do not match the style, but no changes will be made because the delete command <b>-dia</b> has not been given. And</p>
+
+<pre><code> perltidy -wia -ias=0</code></pre>
+
+<p>will warn if any arrows exist, since the flag -ias=0 means that no arrows are wanted.</p>
+
+</li>
+</ul>
+
+</dd>
+<dt id="iac-n---interbracket-arrow-complexity-n"><b>-iac=n</b>, <b>--interbracket-arrow-complexity=n</b></dt>
+<dd>
+
+<p>This parameter makes it possible to skip adding or deleting arrows following a container which is complex in some sense. Three levels of complexity can be specified with the integer <b>n</b>, as follows:</p>
+
+<pre><code> n=0 the left container must contain be a single thing (token)
+ n=1 the left container must not contain other containers [DEFAULT]
+ n=2 the left container may contain anything</code></pre>
+
+<p>Some examples:</p>
+
+<pre><code> # Container complexity
+ {'commandline'} 0 single token OK by default
+ { $type . $name } 1 multiple tokens OK by default
+ [ $plot{'x-axis'} - 1 ] 2 contains a container SKIPPED by default</code></pre>
+
+<p>So, with the default complexity level of 1, an arrow could be added or deleted following the first two of these containers but not the third.</p>
+
+</dd>
+</dl>
+
+<p><b>Some points to consider</b> when working with these parameters are:</p>
+
+<ul>
+
+<li><p>There are no known bugs, but this is a relatively new feature. So please carefully check file differences and run tests when interbracket arrows are added or removed.</p>
+
+</li>
+<li><p>For some unusual spacing parameters, it could take an extra iteration for the spaces between brackets to reach their final state after arrows are added or deleted.</p>
+
+</li>
+<li><p>Any comments between brackets will prevent the adding and deleting of arrows.</p>
+
</li>
</ul>
+<h2 id="Missing-Else-Blocks">Missing Else Blocks</h2>
+
+<p>A defensive programming technique is to require that every <b>if-elsif-</b> chain be terminated with an <b>else</b> block, even though it is not strictly required. This helps insure that there are no holes in the logic.</p>
+
+<p>For example, consider the following snippet:</p>
+
+<pre><code> my $type = get_cards();
+ if ( $type = 1 ) { action("hold 'em") }
+ elsif ( $type = 2 ) { action("fold 'em") }
+ elsif ( $type = 3 ) { action("walk away") }</code></pre>
+
+<p>What if the variable <b>$type</b> is some other value? It might have been obvious that this was okay when the code was first written, but it might not be so clear when the code is reviewed a few years later. A terminal <b>else</b> block with a comment would help clarify things.</p>
+
+<p>The parameters in this section can help by either issuing a warning if an <b>else</b> is missing, or even inserting an empty <b>else</b> block where one is missing, or both.</p>
+
+<dl>
+
+<dt id="wme---warn-missing-else"><b>-wme</b>, <b>--warn-missing-else</b></dt>
+<dd>
+
+<p>This flag tells perltidy to issue a warning if a program is missing a terminal <b>else</b> block. The default is not to issue such a warning.</p>
+
+</dd>
+<dt id="ame---add-missing-else"><b>-ame</b>, <b>--add-missing-else</b></dt>
+<dd>
+
+<p>This flag tells perltidy to output an empty else block wherever a program is missing a terminal <b>else</b> block. To get a warning when this is done you should also set <b>-wme</b>. The default is not to add missing else blocks.</p>
+
+</dd>
+<dt id="amec-s---add-missing-else-comment-s"><b>-amec=s</b>, <b>--add-missing-else-comment=s</b></dt>
+<dd>
+
+<p>This string is a side comment which will be written to highlight a new empty else block. The default is:</p>
+
+<pre><code> -amec='##FIXME - added with perltidy -ame'</code></pre>
+
</dd>
</dl>
+<p>For example, on the above example we can add a missing else and also get a warning notice with:</p>
+
+<pre><code> # perltidy -ame -wme
+ my $type = get_cards();
+ if ( $type == 1 ) { action("hold 'em") }
+ elsif ( $type == 2 ) { action("fold 'em") }
+ elsif ( $type == 3 ) { action("walk away") }
+ else {
+ ##FIXME - added with perltidy -ame
+ }</code></pre>
+
+<p>Any <b>##FIXME</b> comments created in this way should be reviewed and changed appropriately. For example, one might decide that the code fine as is, and just change the comment to indicate that nothing has been overlooked:</p>
+
+<pre><code> my $type = get_cards();
+ if ( $type == 1 ) { action("hold 'em") }
+ elsif ( $type == 2 ) { action("fold 'em") }
+ elsif ( $type == 3 ) { action("walk away") }
+ else {
+ # ok - no worries
+ }</code></pre>
+
+<p>Or maybe a deeper analysis reveals that something was missed:</p>
+
+<pre><code> my $type = get_cards();
+ if ( $type == 1 ) { action("hold 'em") }
+ elsif ( $type == 2 ) { action("fold 'em") }
+ elsif ( $type == 3 ) { action("walk away") }
+ else { action("run") }</code></pre>
+
+<p>Sometimes it turns out that the else block should not reachable, in which case an error exit might be appropriate. In any case, having the <b>else</b> block can improve code maintainability.</p>
+
<h2 id="Retaining-or-Ignoring-Existing-Line-Breaks">Retaining or Ignoring Existing Line Breaks</h2>
<p>Several additional parameters are available for controlling the extent to which line breaks in the input script influence the output script. In most cases, the default parameter values are set so that, if a choice is possible, the output style follows the input style. For example, if a short logical container is broken in the input script, then the default behavior is for it to remain broken in the output script.</p>
<p>For example, given this snippet:</p>
<pre><code> return unless $cmd = $cmd || ($dot
- && $Last_Shell) || &prompt('|');
+ && $Last) || &prompt('|');
# perltidy -bol [default]
return
unless $cmd = $cmd
|| ( $dot
- && $Last_Shell )
+ && $Last )
|| &prompt('|');
# perltidy -nbol
- return unless $cmd = $cmd || ( $dot && $Last_Shell ) || &prompt('|');</code></pre>
+ return unless $cmd = $cmd || ( $dot && $Last ) || &prompt('|');</code></pre>
</dd>
<dt id="bom---break-at-old-method-breakpoints"><b>-bom</b>, <b>--break-at-old-method-breakpoints</b></dt>
<p>By default, a method call arrow <code>-></code> is considered a candidate for a breakpoint, but method chains will fill to the line width before a break is considered. With <b>-bom</b>, breaks before the arrow are preserved, so if you have pre-formatted a method chain:</p>
-<pre><code> my $q = $rs
- ->related_resultset('CDs')
- ->related_resultset('Tracks')
- ->search({
- 'track.id' => {-ident => 'none_search.id'},
- })->as_query;</code></pre>
+<pre><code> # perltidy -bom
+ $Document
+ ->schild(0)
+ ->schildren();</code></pre>
-<p>It will <b>keep</b> these breaks, rather than become this:</p>
+<p>the flag <b>-bom</b> will <b>keep</b> these line breaks, rather than become this:</p>
-<pre><code> my $q = $rs->related_resultset('CDs')->related_resultset('Tracks')->search({
- 'track.id' => {-ident => 'none_search.id'},
- })->as_query;</code></pre>
+<pre><code> # perltidy [DEFAULT]
+ $Document->schild(0)->schildren();</code></pre>
<p>This flag will also look for and keep a 'cuddled' style of calls, in which lines begin with a closing paren followed by a call arrow, as in this example:</p>
-<pre><code> # perltidy -bom -wn
- my $q = $rs->related_resultset(
- 'CDs'
- )->related_resultset(
- 'Tracks'
- )->search( {
- 'track.id' => { -ident => 'none_search.id' },
- } )->as_query;</code></pre>
-
-<p>You may want to include the <b>-weld-nested-containers</b> flag in this case to keep nested braces and parens together, as in the last line.</p>
+<pre><code> # perltidy -bom
+ my $q = $rs->related_resultset(
+ 'CDs'
+ )->related_resultset(
+ 'Tracks'
+ )->search(
+ {
+ 'track.id' => { -ident => 'none_search.id' },
+ }
+ )->as_query;</code></pre>
</dd>
<dt id="bos---break-at-old-semicolon-breakpoints"><b>-bos</b>, <b>--break-at-old-semicolon-breakpoints</b></dt>
<pre><code> $z = sqrt( $x**2 + $y**2 );</code></pre>
-<p>The result using <b>perltidy -bos</b> keeps the isolated semicolon:</p>
+<p>Using the <b>-bos</b> flag keeps the isolated semicolon:</p>
-<pre><code> $z = sqrt( $x**2 + $y**2 )
+<pre><code> # perltidy -bos
+ $z = sqrt( $x**2 + $y**2 )
;</code></pre>
+<p>The extra continuation indentation spaces on the semicolon can be removed by also setting <b>--noindent-leading-semicolon</b>.</p>
+
+<pre><code> # perltidy -bos -nils
+ $z = sqrt( $x**2 + $y**2 )
+ ;</code></pre>
+
<p>The default is not to do this, <b>-nbos</b>.</p>
</dd>
<dt id="bok---break-at-old-keyword-breakpoints"><b>-bok</b>, <b>--break-at-old-keyword-breakpoints</b></dt>
<dd>
-<p>By default, perltidy will retain a breakpoint before keywords which may return lists, such as <code>sort</code> and <map>. This allows chains of these operators to be displayed one per line. Use <b>-nbok</b> to prevent retaining these breakpoints.</p>
+<p>By default, perltidy will retain a breakpoint before keywords which may return lists, such as <code>sort</code> and <code>map</code>. This allows chains of these operators to be displayed one per line. Use <b>-nbok</b> to prevent retaining these breakpoints.</p>
</dd>
<dt id="bot---break-at-old-ternary-breakpoints"><b>-bot</b>, <b>--break-at-old-ternary-breakpoints</b></dt>
<p>It is possible to be more specific in matching parentheses by preceding them with a letter. The possible letters are 'k', 'K', 'f', 'F', 'w', and 'W', with these meanings (these are the same as used in the <b>--weld-nested-exclusion-list</b> and <b>--line-up-parentheses-exclusion-list</b> parameters):</p>
-<pre><code> 'k' matches if the previous nonblank token is a perl built-in keyword (such as 'if', 'while'),
- 'K' matches if 'k' does not, meaning that the previous token is not a keyword.
- 'f' matches if the previous token is a function other than a keyword.
+<pre><code> 'k' matches if the previous nonblank token is a perl keyword
+ (such as 'if', 'while'),
+ 'K' matches if 'k' does not: previous token is not a keyword
+ 'f' matches if previous token is a function (not a keyword)
'F' matches if 'f' does not.
'w' matches if either 'k' or 'f' match.
'W' matches if 'w' does not.</code></pre>
<dt id="blbs-n---blank-lines-before-subs-n"><b>-blbs=n</b>, <b>--blank-lines-before-subs=n</b></dt>
<dd>
-<p>The parameter <b>-blbs=n</b> requests that least <b>n</b> blank lines precede a sub definition which does not follow a comment and which is more than one-line long. The default is <-blbs=1>. <b>BEGIN</b> and <b>END</b> blocks are included.</p>
+<p>The parameter <b>-blbs=n</b> requests that least <b>n</b> blank lines precede a sub definition which does not follow a comment and which is more than one-line long. The default is <b>-blbs=1</b>. <b>BEGIN</b> and <b>END</b> blocks are included.</p>
<p>The requested number of blanks statement will be inserted regardless of the value of <b>--maximum-consecutive-blank-lines=n</b> (<b>-mbl=n</b>) with the exception that if <b>-mbl=0</b> then no blanks will be output.</p>
<dt id="bbs---blanks-before-subs"><b>-bbs</b>, <b>--blanks-before-subs</b></dt>
<dd>
-<p>For compatibility with previous versions, <b>-bbs</b> or <b>--blanks-before-subs</b> is equivalent to <i>-blbp=1</i> and <i>-blbs=1</i>.</p>
+<p>For compatibility with previous versions, <b>-bbs</b> or <b>--blanks-before-subs</b> is equivalent to <b>-blbp=1</b> and <b>-blbs=1</b>.</p>
-<p>Likewise, <b>-nbbs</b> or <b>--noblanks-before-subs</b> is equivalent to <i>-blbp=0</i> and <i>-blbs=0</i>.</p>
+<p>Likewise, <b>-nbbs</b> or <b>--noblanks-before-subs</b> is equivalent to <b>-blbp=0</b> and <b>-blbs=0</b>.</p>
</dd>
<dt id="bbb---blanks-before-blocks"><b>-bbb</b>, <b>--blanks-before-blocks</b></dt>
<p>Now suppose the script continues to be developed, but at some later date we decide we don't want these spaces after all. We might expect that running with the flags <b>-blao=0</b> and <b>-blbc=0</b> will undo them. However, by default perltidy retains single blank lines, so the blank lines remain.</p>
-<p>We can easily fix this by telling perltidy to ignore old blank lines by including the added parameter <b>-kbl=0</b> and rerunning. Then the unwanted blank lines will be gone. However, this will cause all old blank lines to be ignored, perhaps even some that were added by hand to improve formatting. So please be cautious when using these parameters.</p>
+<p>We can easily fix this by telling perltidy to ignore old blank lines by including the added parameter <b>--keep-old-blank-lines=0</b> and rerunning. Then the unwanted blank lines will be gone. However, this will cause all old blank lines to be ignored, perhaps even some that were added by hand to improve formatting. So please be cautious when using these parameters.</p>
</dd>
<dt id="mbl-n---maximum-consecutive-blank-lines-n"><b>-mbl=n</b> <b>--maximum-consecutive-blank-lines=n</b></dt>
<dd>
-<p>This parameter specifies the maximum number of consecutive blank lines which will be output within code sections of a script. The default is n=1. If the input file has more than n consecutive blank lines, the number will be reduced to n except as noted above for the <b>-blbp</b> and <b>-blbs</b> parameters. If <b>n=0</b> then no blank lines will be output (unless all old blank lines are retained with the <b>-kbl=2</b> flag of the next section).</p>
+<p>This parameter specifies the maximum number of consecutive blank lines which will be output within code sections of a script. The default is n=1. If the input file has more than n consecutive blank lines, the number will be reduced to <b>n</b> except as noted above for the <b>--blank-lines-before-subs</b> and <b>--blank-lines-before-subs</b> parameters. If <b>n=0</b> then no blank lines will be output (unless all old blank lines are retained with the <b>--keep-old-blank-lines=2</b> flag of the next section).</p>
<p>This flag obviously does not apply to pod sections, here-documents, and quotes.</p>
<p>The possible values of <b>n</b> are:</p>
<pre><code> n=0 ignore all old blank lines
- n=1 stable: keep old blanks, but limited by the value of the B<-mbl=n> flag
- n=2 keep all old blank lines, regardless of the value of the B<-mbl=n> flag</code></pre>
+ n=1 stable: keep old blanks, but limited by the B<-mbl=n> flag
+ n=2 keep all old blank lines, regardless of the B<-mbl=n> flag</code></pre>
<p>The default is <b>n=1</b>.</p>
<dt id="sob---swallow-optional-blank-lines"><b>-sob</b>, <b>--swallow-optional-blank-lines</b></dt>
<dd>
-<p>This is equivalent to <b>kbl=0</b> and is included for compatibility with previous versions.</p>
+<p>This is equivalent to <b>--keep-old-blank-lines=0</b> and is included for compatibility with previous versions.</p>
</dd>
<dt id="nsob---noswallow-optional-blank-lines"><b>-nsob</b>, <b>--noswallow-optional-blank-lines</b></dt>
<dd>
-<p>This is equivalent to <b>kbl=1</b> and is included for compatibility with previous versions.</p>
+<p>This is equivalent to <b>--keep-old-blank-lines=1</b> and is included for compatibility with previous versions.</p>
</dd>
</dl>
<p>Before describing the meaning of the parameters in detail let us look at an example which is formatted with default parameter settings.</p>
-<pre><code> print "Entering test 2\n";
- use Test;
- use Encode qw(from_to encode decode
- encode_utf8 decode_utf8
- find_encoding is_utf8);
- use charnames qw(greek);
- my @encodings = grep( /iso-?8859/, Encode::encodings() );
- my @character_set = ( '0' .. '9', 'A' .. 'Z', 'a' .. 'z' );
- my @source = qw(ascii iso8859-1 cp1250);
- my @destiny = qw(cp1047 cp37 posix-bc);
- my @ebcdic_sets = qw(cp1047 cp37 posix-bc);
- my $str = join( '', map( chr($_), 0x20 .. 0x7E ) );
- return unless ($str);</code></pre>
+<pre><code> print "Entering test 2\n";
+ use Test;
+ use Encode qw(from_to encode decode
+ encode_utf8 decode_utf8
+ find_encoding is_utf8);
+ use charnames qw(greek);
+ my @encodings = grep( /iso-?8859/, Encode::encodings() );
+ my @character_set = ( '0' .. '9', 'A' .. 'Z', 'a' .. 'z' );
+ my @source = qw(ascii iso8859-1 cp1250);
+ my @destiny = qw(cp1047 cp37 posix-bc);
+ my @ebcdic_sets = qw(cp1047 cp37 posix-bc);
+ my $str = join( '', map( chr($_), 0x20 .. 0x7E ) );
+ return unless ($str);</code></pre>
<p>using <b>perltidy -kgb</b> gives:</p>
-<pre><code> print "Entering test 2\n";
- <----------this blank controlled by -kgbb
- use Test;
- use Encode qw(from_to encode decode
- encode_utf8 decode_utf8
- find_encoding is_utf8);
- use charnames qw(greek);
- <---------this blank controlled by -kgbi
- my @encodings = grep( /iso-?8859/, Encode::encodings() );
- my @character_set = ( '0' .. '9', 'A' .. 'Z', 'a' .. 'z' );
- my @source = qw(ascii iso8859-1 cp1250);
- my @destiny = qw(cp1047 cp37 posix-bc);
- my @ebcdic_sets = qw(cp1047 cp37 posix-bc);
- my $str = join( '', map( chr($_), 0x20 .. 0x7E ) );
- <----------this blank controlled by -kgba
- return unless ($str);</code></pre>
+<pre><code> print "Entering test 2\n";
+ <----------this blank controlled by -kgbb
+ use Test;
+ use Encode qw(from_to encode decode
+ encode_utf8 decode_utf8
+ find_encoding is_utf8);
+ use charnames qw(greek);
+ <---------this blank controlled by -kgbi
+ my @encodings = grep( /iso-?8859/, Encode::encodings() );
+ my @character_set = ( '0' .. '9', 'A' .. 'Z', 'a' .. 'z' );
+ my @source = qw(ascii iso8859-1 cp1250);
+ my @destiny = qw(cp1047 cp37 posix-bc);
+ my @ebcdic_sets = qw(cp1047 cp37 posix-bc);
+ my $str = join( '', map( chr($_), 0x20 .. 0x7E ) );
+ <----------this blank controlled by -kgba
+ return unless ($str);</code></pre>
<p>Blank lines have been introduced around the <b>my</b> and <b>use</b> sequences. What happened is that the default keyword list includes <b>my</b> and <b>use</b> but not <b>print</b> and <b>return</b>. So a continuous sequence of nine <b>my</b> and <b>use</b> statements was located. This number exceeds the default threshold of five, so blanks were placed before and after the entire group. Then, since there was also a subsequence of six <b>my</b> lines, a blank line was introduced to separate them.</p>
<pre><code> -lp -bl -noll -pt=2 -bt=2 -sbt=2 -icp</code></pre>
-<p>To use this style with <b>-xlp</b> instead of <b>-lp</b> use <b>-gnu -xlp</b>.</p>
+<p>To use this style with <b>-xlp</b> (<b>--extended-line-up-parentheses</b>) instead of <b>-lp</b> (<b>--line-up-parentheses</b>) use <b>-gnu -xlp</b>.</p>
</dd>
<dt id="pbp---perl-best-practices"><b>-pbp</b>, <b>--perl-best-practices</b></dt>
<p><b>-pbp</b> is an abbreviation for the parameters in the book <b>Perl Best Practices</b> by Damian Conway:</p>
-<pre><code> -l=78 -i=4 -ci=4 -st -se -vt=2 -cti=0 -pt=1 -bt=1 -sbt=1 -bbt=1 -nsfs -nolq
+<pre><code> -l=78 -i=4 -ci=4 -st -se -vt=2 -cti=0 -pt=1 -bt=1 -sbt=1 -bbt=1
+ -nsfs -nolq
-wbb="% + - * / x != == >= <= =~ !~ < > | & =
**= += *= &= <<= &&= -= /= |= >>= ||= //= .= %= ^= x="</code></pre>
-<p>Please note that this parameter set includes -st and -se flags, which make perltidy act as a filter on one file only. These can be overridden by placing <b>-nst</b> and/or <b>-nse</b> after the -pbp parameter.</p>
+<p>Please note that this parameter set includes <b>-st</b> (<b>--standard-output</b>) and <b>-se</b> (<b>--standard-error-output</b>) flags, which make perltidy act as a filter on one file only. These can be overridden by placing <b>-nst</b> and/or <b>-nse</b> after the -pbp parameter.</p>
-<p>Also note that the value of continuation indentation, -ci=4, is equal to the value of the full indentation, -i=4. It is recommended that the either (1) the parameter <b>-ci=2</b> be used instead, or (2) the flag <b>-xci</b> be set. This will help show structure, particularly when there are ternary statements. The following snippet illustrates these options.</p>
+<p>Also note that the value of continuation indentation, <b>-ci=4</b>, is equal to the value of the full indentation, <b>-i=4</b>. It is recommended that the either (1) the parameter <b>-ci=2</b> be used instead, or (2) the flag <b>-xci</b> be set. This will help show structure, particularly when there are ternary statements. The following snippet illustrates these options.</p>
<pre><code> # perltidy -pbp
$self->{_text} = (
)
. (
$page
- ? ( $section ? ' in ' : '' ) . "the $page$page_ext manpage"
+ ? ( $section ? ' in ' : '' ) . "the $page_ext manpage"
: ' elsewhere in this document'
);
)
. (
$page
- ? ( $section ? ' in ' : '' ) . "the $page$page_ext manpage"
+ ? ( $section ? ' in ' : '' ) . "the $page_ext manpage"
: ' elsewhere in this document'
);
: "the section on $section"
)
. ( $page
- ? ( $section ? ' in ' : '' ) . "the $page$page_ext manpage"
+ ? ( $section ? ' in ' : '' ) . "the $page_ext manpage"
: ' elsewhere in this document'
);</code></pre>
<p>The <b>-xci</b> flag was developed after the <b>-pbp</b> parameters were published so you need to include it separately.</p>
+</dd>
+<dt id="Making-a-file-unreadable"><b>Making a file unreadable</b></dt>
+<dd>
+
+<p>The goal of perltidy is to improve the readability of files, but there are two commands which have the opposite effect, <b>--mangle</b> and <b>--extrude</b>. They are actually merely aliases for combinations of other parameters. Both of these strip all possible whitespace, but leave comments and pod documents, so that they are essentially reversible. The difference between these is that <b>--mangle</b> puts the fewest possible line breaks in a script while <b>--extrude</b> puts the maximum possible. Note that these options do not provided any meaningful obfuscation, because perltidy can be used to reformat the files. They were originally developed to help test the tokenization logic of perltidy, but they have other uses. One use for <b>--mangle</b> is the following:</p>
+
+<pre><code> perltidy --mangle myfile.pl -st | perltidy -o myfile.pl.new</code></pre>
+
+<p>This will form the maximum possible number of one-line blocks (see next section), and can sometimes help clean up a badly formatted script.</p>
+
+<p>A similar technique can be used with <b>--extrude</b> instead of <b>--mangle</b> to make the minimum number of one-line blocks.</p>
+
+<p>Another use for <b>--mangle</b> is to combine it with <b>--delete-all-comments (-dac)</b> to reduce the file size of a perl script.</p>
+
</dd>
</dl>
<p>The main exception to this rule is that perltidy will attempt to form new one-line blocks following the keywords <code>map</code>, <code>eval</code>, and <code>sort</code>, <code>eval</code>, because these code blocks are often small and most clearly displayed in a single line. This behavior can be controlled with the flag <b>--one-line-block-exclusion-list</b> described below.</p>
-<p>When the <b>cuddled-else</b> style is used, the default treatment of one-line blocks may interfere with the cuddled style. In this case, the default behavior may be changed with the flag <b>--cuddled-break-option=n</b> described elsehwere.</p>
+<p>When the <b>cuddled-else</b> style is used, the default treatment of one-line blocks may interfere with the cuddled style. In this case, the default behavior may be changed with the flag <b>--cuddled-break-option=n</b> described elsewhere.</p>
<p>When an existing one-line block is longer than the maximum line length, and must therefore be broken into multiple lines, perltidy checks for and adds any optional terminating semicolon (unless the <b>-nasc</b> option is used) if the block is a code block.</p>
<p>As noted above, perltidy will, by default, attempt to create new one-line blocks for certain block types. This flag allows the user to prevent this behavior for the block types listed in the string <b>s</b>. The list <b>s</b> may include any of the words <code>sort</code>, <code>map</code>, <code>grep</code>, <code>eval</code>, or it may be <code>*</code> to indicate all of these.</p>
-<p>So for example to prevent multi-line <b>eval</b> blocks from becoming one-line blocks, the command would be <b>-olbxl='eval'</b>. In this case, existing one-line <b>eval</b> blocks will remain on one-line if possible, and existing multi-line <b>eval</b> blocks will remain multi-line blocks.</p>
+<p>So for example to prevent multiline <b>eval</b> blocks from becoming one-line blocks, the command would be <b>-olbxl='eval'</b>. In this case, existing one-line <b>eval</b> blocks will remain on one-line if possible, and existing multiline <b>eval</b> blocks will remain multiline blocks.</p>
</dd>
<dt id="olbn-n---one-line-block-nesting-n"><b>-olbn=n</b>, <b>--one-line-block-nesting=n</b></dt>
<p>This flag controls the placement of semicolons at the end of one-line blocks. Semicolons are optional before a closing block brace, and frequently they are omitted at the end of a one-line block containing just a single statement. By default, perltidy follows the input file regarding these semicolons, but this behavior can be controlled by this flag. The values of n are:</p>
-<pre><code> n=0 remove terminal semicolons in one-line blocks having a single statement
- n=1 stable; keep input file placement of terminal semicolons [DEFAULT ]
- n=2 add terminal semicolons in all one-line blocks</code></pre>
+<pre><code> n=0 remove terminal semicolons in single-statement one-line blocks
+ n=1 stable; keep input terminal semicolons [DEFAULT ]
+ n=2 add terminal semicolons in all one-line blocks</code></pre>
<p>Note that the <b>n=2</b> option has no effect if adding semicolons is prohibited with the <b>-nasc</b> flag. Also not that while <b>n=2</b> adds missing semicolons to all one-line blocks, regardless of complexity, the <b>n=0</b> option only removes ending semicolons which terminate one-line blocks containing just one semicolon. So these two options are not exact inverses.</p>
<dt id="Forming-new-one-line-blocks"><b>Forming new one-line blocks</b></dt>
<dd>
-<p>Sometimes it might be desirable to convert a script to have one-line blocks whenever possible. Although there is currently no flag for this, a simple workaround is to execute perltidy twice, once with the flag <b>-noadd-newlines</b> and then once again with normal parameters, like this:</p>
+<p>Sometimes it might be desirable to convert a script to have one-line blocks whenever possible. Although there is currently no flag for this, a simple workaround is to execute perltidy twice, once with the flag <b>--noadd-newlines</b> and then once again with normal parameters, like this:</p>
<pre><code> cat infile | perltidy -nanl | perltidy >outfile</code></pre>
<p>There is no automatic way to break existing long one-line blocks into multiple lines, but this can be accomplished by processing a script, or section of a script, with a short value of the parameter <b>maximum-line-length=n</b>. Then, when the script is reformatted again with the normal parameters, the blocks which were broken will remain broken (with the exceptions noted above).</p>
-<p>Another trick for doing this for certain block types is to format one time with the <b>-cuddled-else</b> flag and <b>--cuddled-break-option=2</b>. Then format again with the normal parameters. This will break any one-line blocks which are involved in a cuddled-else style.</p>
+<p>Another trick for doing this for certain block types is to format one time with the <b>--cuddled-else</b> flag and <b>--cuddled-break-option=2</b>. Then format again with the normal parameters. This will break any one-line blocks which are involved in a cuddled-else style.</p>
</dd>
</dl>
<p>If it is simpler to specify only the token types which are to be aligned, then include the types which are to be aligned in the list of <b>--valign-inclusion-list</b>. In that case you may leave the <b>valign-exclusion-list</b> undefined, or use the special symbol <b>*</b> for the exclusion list. For example, the following parameters enable alignment only at commas and 'fat commas':</p>
-<pre><code> --valign-inclusion-list=', =>'
- --valign-exclusion-list='*' ( this is optional and may be omitted )</code></pre>
+<pre><code> --valign-inclusion-list=', =>'
+ --valign-exclusion-list='*' ( this is optional and may be omitted )</code></pre>
<p>These parameter lists should consist of space-separated tokens from the above list of possible alignment tokens, or a '*'. If an unrecognized token appears, it is simply ignored. And if a specific token is entered in both lists by mistake then the exclusion list has priority.</p>
<p>To illustrate, consider the following snippet with default formatting</p>
-<pre><code> # perltidy
- $co_description = ($color) ? 'bold cyan' : ''; # description
- $co_prompt = ($color) ? 'bold green' : ''; # prompt
- $co_unused = ($color) ? 'on_green' : 'reverse'; # unused</code></pre>
+<pre><code> # perltidy
+ $co_description = ($color) ? 'bold cyan' : ''; # descr
+ $co_prompt = ($color) ? 'bold green' : ''; # prompt
+ $co_unused = ($color) ? 'on_green' : 'reverse'; # unused</code></pre>
<p>To exclude all alignments except the equals (i.e., include only equals) we could use:</p>
-<pre><code> # perltidy -vil='='
- $co_description = ($color) ? 'bold cyan' : ''; # description
- $co_prompt = ($color) ? 'bold green' : ''; # prompt
- $co_unused = ($color) ? 'on_green' : 'reverse'; # unused</code></pre>
+<pre><code> # perltidy -vil='='
+ $co_description = ($color) ? 'bold cyan' : ''; # descr
+ $co_prompt = ($color) ? 'bold green' : ''; # prompt
+ $co_unused = ($color) ? 'on_green' : 'reverse'; # unused</code></pre>
<p>To exclude only the equals we could use:</p>
-<pre><code> # perltidy -vxl='='
- $co_description = ($color) ? 'bold cyan' : ''; # description
- $co_prompt = ($color) ? 'bold green' : ''; # prompt
- $co_unused = ($color) ? 'on_green' : 'reverse'; # unused</code></pre>
+<pre><code> # perltidy -vxl='='
+ $co_description = ($color) ? 'bold cyan' : ''; # descr
+ $co_prompt = ($color) ? 'bold green' : ''; # prompt
+ $co_unused = ($color) ? 'on_green' : 'reverse'; # unused</code></pre>
<p>Notice in this last example that although only the equals alignment was excluded, the ternary alignments were also lost. This happens because the vertical aligner sweeps from left-to-right and usually stops if an important alignment cannot be made for some reason.</p>
-<p>But also notice that side comments remain aligned because their alignment is controlled separately with the parameter <b>--valign-side_comments</b> described above.</p>
+<p>But also notice that side comments remain aligned because their alignment is controlled separately with the parameter <b>--valign-side-comments</b> described above.</p>
</dd>
-</dl>
-
-<h2 id="Extended-Syntax">Extended Syntax</h2>
+<dt id="Aligning-postfix-unless-and-if-with---valign-if-unless-or--viu"><b>Aligning postfix unless and if with --valign-if-unless or -viu</b></dt>
+<dd>
-<p>This section describes some parameters for dealing with extended syntax.</p>
+<p>By default, postfix <b>if</b> terms align and postfix <b>unless</b> terms align, but separately. For example,</p>
-<p>For another method of handling extended syntax see the section <a href="#Skipping-Selected-Sections-of-Code">"Skipping Selected Sections of Code"</a>.</p>
+<pre><code> # perltidy [DEFAULT]
+ print "Tried to add: @Resolve\n" if ( @Resolve and !$Quiet );
+ print "Would need: @DepList\n" if ( @DepList and !$Quiet );
+ print "Output:\n" unless $Quiet;
+ print join( "\n", @Output ) . "\n" unless $Quiet;</code></pre>
-<p>Also note that the module <i>Perl::Tidy</i> supplies a pre-filter and post-filter capability. This requires calling the module from a separate program rather than through the binary <i>perltidy</i>.</p>
+<p>The <b>-viu</b> flag causes a postfix <b>unless</b> to be treated as if it were a postfix <b>if</b> for purposes of alignment, and thus they align:</p>
-<dl>
+<pre><code> # perltidy -viu
+ print "Tried to add: @Resolve\n" if ( @Resolve and !$Quiet );
+ print "Would need: @DepList\n" if ( @DepList and !$Quiet );
+ print "Output:\n" unless $Quiet;
+ print join( "\n", @Output ) . "\n" unless $Quiet;</code></pre>
-<dt id="xs---extended-syntax"><b>-xs</b>, <b>--extended-syntax</b></dt>
+</dd>
+<dt id="Aligning-signed-numbers-with---valign-signed-numbers-or--vsn"><b>Aligning signed numbers with --valign-signed-numbers or -vsn</b></dt>
<dd>
-<p>This flag allows perltidy to handle certain common extensions to the standard syntax without complaint.</p>
+<p>Setting <b>-vsn</b> causes columns of numbers containing both signed and unsigned values to have leading signs placed in their own column. For example:</p>
-<p>For example, without this flag a structure such as the following would generate a syntax error:</p>
+<pre><code> # perltidy -vsn
+ my @correct = (
+ [ 123456.79, 86753090000.868, 11 ],
+ [ -123456.79, -86753090000.868, -11 ],
+ [ 123456.001, 80.080, 10 ],
+ [ -123456.001, -80.080, 0 ],
+ [ 10.9, 10.9, 11 ],
+ );</code></pre>
-<pre><code> Method deposit( Num $amount) {
- $self->balance( $self->balance + $amount );
- }</code></pre>
+<p>The default is <b>-vsn</b>. This can be turned off to get is strict left justification:</p>
-<p>This flag is enabled by default but it can be deactivated with <b>-nxs</b>. Probably the only reason to deactivate this flag is to generate more diagnostic messages when debugging a script.</p>
+<pre><code> # perltidy -nvsn
+ my @correct = (
+ [ 123456.79, 86753090000.868, 11 ],
+ [ -123456.79, -86753090000.868, -11 ],
+ [ 123456.001, 80.080, 10 ],
+ [ -123456.001, -80.080, 0 ],
+ [ 10.9, 10.9, 11 ],
+ );</code></pre>
-</dd>
-<dt id="sal-s---sub-alias-list-s"><b>-sal=s</b>, <b>--sub-alias-list=s</b></dt>
-<dd>
+<p>Some points regarding <b>-vsn</b> are:</p>
-<p>This flag causes one or more words to be treated the same as if they were the keyword <b>sub</b>. The string <b>s</b> contains one or more alias words, separated by spaces or commas.</p>
+<ul>
-<p>For example,</p>
+<li><p>This option works by inserting a single space ahead of unsigned numbers when possible. This is not done if it would require increasing the maximum width of a column.</p>
-<pre><code> perltidy -sal='method fun _sub M4'</code></pre>
+</li>
+<li><p>This option is mainly limited to lists of comma-separated numbers. For multiline lists of numbers, having trailing commas can sometimes improve the results. If missing, perltidy can add them for example with parameters <b>-wtc=b -atc</b>. See <a href="#Adding-and-Deleting-Commas">"Adding and Deleting Commas"</a>.</p>
+
+</li>
+<li><p>This option has a control parameter <b>--valign-signed-number-limit=N</b>, or <b>-vsnl=N</b>. This value controls formatting of very long columns of numbers and should not normally need to be changed. To see its purpose, consider a very long column of just unsigned numbers, say 1000 lines. If we add a single negative number, it is undesirable to move all of the other numbers over by one space. This could create many lines of file differences but not really improve the appearance when a local section of the table was viewed. The number <b>N</b> avoids this problem by not adding extra indentation to a run of more than <b>N</b> lines of unsigned numbers. The default value, <b>N=20</b>, is set to be a number of lines for which the ends of a long column of unsigned numbers are not normally both in view.</p>
+
+</li>
+</ul>
+
+</dd>
+<dt id="Aligning-assignment-operators-with---valign-wide-equals-or--vwe"><b>Aligning assignment operators with --valign-wide-equals or -vwe</b></dt>
+<dd>
+
+<p>The following assignment operators are aligned independently by default:</p>
+
+<pre><code> = **= += *= &= <<= &&= -= /= |= >>= ||= //= .= %= ^= x=</code></pre>
+
+<p>Setting <b>--valign-wide-equals</b>, or <b>-vwe</b>, causes them to be vertically aligned together with the trailing <b>=</b> all aligned. For example, here is the default formatting for some code with several of these operators:</p>
+
+<pre><code> $str .= SPACE x $total_pad_count;
+ $str_len += $total_pad_count;
+ $total_pad_count = 0;
+ $str .= $rfields->[$j];
+ $str_len += $rfield_lengths->[$j];</code></pre>
+
+<p>And here is the same code formatted with <b>-vme</b>:</p>
+
+<pre><code> # perltidy -vme
+ $str .= SPACE x $total_pad_count;
+ $str_len += $total_pad_count;
+ $total_pad_count = 0;
+ $str .= $rfields->[$j];
+ $str_len += $rfield_lengths->[$j];</code></pre>
+
+<p>This option was added for issue git #135 and can improve readability, but it is off by default to avoid changing existing formatting.</p>
+
+</dd>
+</dl>
+
+<h2 id="Extended-Syntax">Extended Syntax</h2>
+
+<p>This section describes some parameters for dealing with extended syntax.</p>
+
+<p>For another method of handling extended syntax see the section <a href="#Skipping-Selected-Sections-of-Code">"Skipping Selected Sections of Code"</a>.</p>
+
+<p>Also note that the module <i>Perl::Tidy</i> supplies a pre-filter and post-filter capability. This requires calling the module from a separate program rather than through the binary <i>perltidy</i>.</p>
+
+<dl>
+
+<dt id="xs---extended-syntax"><b>-xs</b>, <b>--extended-syntax</b></dt>
+<dd>
+
+<p>This flag allows perltidy to handle certain common extensions to the standard syntax without complaint.</p>
+
+<p>For example, without this flag a structure such as the following would generate a syntax error:</p>
+
+<pre><code> Method deposit( Num $amount) {
+ $self->balance( $self->balance + $amount );
+ }</code></pre>
+
+<p>This flag is enabled by default but it can be deactivated with <b>-nxs</b>. Probably the only reason to deactivate this flag is to generate more diagnostic messages when debugging a script.</p>
+
+</dd>
+<dt id="sal-s---sub-alias-list-s"><b>-sal=s</b>, <b>--sub-alias-list=s</b></dt>
+<dd>
+
+<p>This flag causes one or more words to be treated the same as if they were the keyword <b>sub</b>. The string <b>s</b> contains one or more alias words, separated by spaces or commas.</p>
+
+<p>For example,</p>
+
+<pre><code> perltidy -sal='method fun _sub M4'</code></pre>
<p>will cause the perltidy to treat the words 'method', 'fun', '_sub' and 'M4' the same as if they were 'sub'. Note that if the alias words are separated by spaces then the string of words should be placed in quotes.</p>
<dt id="uf-s---use-feature-s"><b>-uf=s</b>, <b>--use-feature=s</b></dt>
<dd>
-<p>This flag tells perltidy to allow the syntax associated a pragma in string <b>s</b>. Currently only the recognized values for the string are <b>s='class'</b> or string <b>s=' '</b>. The default is <b>--use-feature='class'</b>. This enables perltidy to recognized the special words <b>class</b>, <b>method</b>, <b>field</b>, and <b>ADJUST</b>. If this causes a conflict with other uses of these words, the default can be turned off with <b>--use-feature=' '</b>.</p>
+<p>This flag tells perltidy to allow or disallow the syntax associated a pragma in string <b>s</b>. The current possible settings are:</p>
+
+<ul>
+
+<li><p><b>--use-feature='class'</b>. This tells perltidy to recognized the special words <b>class</b>, <b>method</b>, <b>field</b>, and <b>ADJUST</b> as defined for this feature.</p>
+
+</li>
+<li><p><b>--use-feature='noclass'</b>. This tells perltidy <b>not</b> to treat words <b>class</b>, <b>method</b>, <b>field</b>, <b>ADJUST</b> specially.</p>
+
+</li>
+<li><p><b>Neither of these</b> (<b>--use-feature</b> not defined). This is the DEFAULT and recommended setting. In this case perltidy will try to automatically handle both the newer --use-feature 'class' syntax as well as some conflicting uses of some of these special words by existing modules.</p>
+
+</li>
+</ul>
+
+<p>Note that this parameter is independent of any <b>use feature</b> control lines within a script. Perltidy does not look for or read such control lines. This is because perltidy must be able to work on small chunks of code sent from an editor, so it cannot assume that such lines will be within the lines being formatted.</p>
+
+</dd>
+<dt id="Working-around-problems-with-older-version-of-Perl"><b>Working around problems with older version of Perl</b></dt>
+<dd>
+
+<p>Perltidy contains a number of rules which help avoid known subtleties and problems with older versions of perl, and these rules always take priority over whatever formatting flags have been set. For example, perltidy will usually avoid starting a new line with a bareword, because this might cause problems if <code>use strict</code> is active.</p>
+
+<p>There is no way to override these rules.</p>
</dd>
</dl>
-<h2 id="Other-Controls">Other Controls</h2>
+<h2 id="Deleting-and-Extracting-Pod-or-Comments">Deleting and Extracting Pod or Comments</h2>
<dl>
<p>The negatives of these commands also work, and are the defaults.</p>
</dd>
+</dl>
+
+<h2 id="The-perltidyrc-file">The perltidyrc file</h2>
+
+<dl>
+
<dt id="Using-a-.perltidyrc-command-file"><b>Using a <i>.perltidyrc</i> command file</b></dt>
<dd>
<p>Under Windows, perltidy will also search for a configuration file named <i>perltidy.ini</i> since Windows does not allow files with a leading period (.). Use <code>perltidy -dpro</code> to see the possible locations for your system. An example might be <i>C:\Documents and Settings\All Users\perltidy.ini</i>.</p>
-<p>Another option is the use of the PERLTIDY environment variable. The method for setting environment variables depends upon the version of Windows that you are using. Instructions for Windows 95 and later versions can be found here:</p>
-
-<p>http://www.netmanage.com/000/20021101_005_tcm21-6336.pdf</p>
+<p>Another option is the use of the PERLTIDY environment variable. The method for setting environment variables depends upon the version of Windows that you are using.</p>
<p>Under Windows NT / 2000 / XP the PERLTIDY environment variable can be placed in either the user section or the system section. The later makes the configuration file common to all users on the machine. Be sure to enter the full path of the configuration file in the value of the environment variable. Ex. PERLTIDY=C:\Documents and Settings\perltidy.ini</p>
<p>The parameters in the <i>.perltidyrc</i> file are installed first, so any parameters given on the command line will have priority over them.</p>
-<p>To avoid confusion, perltidy ignores any command in the .perltidyrc file which would cause some kind of dump and an exit. These are:</p>
+<p>To avoid confusion, perltidy ignores any command in the .perltidyrc file which would cause some kind of dump and an exit. These include:</p>
<pre><code> -h -v -ddf -dln -dop -dsn -dtt -dwls -dwrs -ss</code></pre>
<li><p>It may be simplest to develop and test configuration files with alternative names, and invoke them with <b>-pro=filename</b> on the command line. Then rename the desired file to <i>.perltidyrc</i> when finished.</p>
</li>
-<li><p>The parameters in the <i>.perltidyrc</i> file can be switched off with the <b>-npro</b> option.</p>
+<li><p>The parameters in the <i>.perltidyrc</i> file can be switched off with the <b>-npro</b> option on the command line.</p>
+
+</li>
+<li><p>Any parameter in the <i>.perltidyrc</i> file can be overridden with a replacement value on the command line. This is because the command line is processed after the <i>.perltidyrc</i> file.</p>
</li>
<li><p>The commands <b>--dump-options</b>, <b>--dump-defaults</b>, <b>--dump-long-names</b>, and <b>--dump-short-names</b>, all described below, may all be helpful.</p>
</li>
</ul>
+</dd>
+<dt id="Skipping-a-line-with-an-unknown-parameter"><b>Skipping a line with an unknown parameter</b></dt>
+<dd>
+
+<p>Versions of perltidy greater than 20240511 have an option to filter unrecognized parameters from a perltidy command file. If a line in the file begins with <b>three dashes</b> followed by a parameter name (rather than one or two), then the line will be removed if the parameter is unknown. Otherwise, a dash will be removed to make the line valid. This option was added to allow a single command file to be used during the transition to a new version of perltidy.</p>
+
</dd>
<dt id="Creating-a-new-abbreviation"><b>Creating a new abbreviation</b></dt>
<dd>
<p>For a specific example, the following line</p>
-<pre><code> oneliner { --maximum-line-length=0 --noadd-newlines --noadd-terminal-newline}</code></pre>
+<pre><code> oneliner { --maximum-line-length=0 --noadd-newlines --noadd-terminal-newline}</code></pre>
<p>or equivalently with abbreviations</p>
-<pre><code> oneliner { -l=0 -nanl -natnl }</code></pre>
+<pre><code> oneliner { -l=0 -nanl -natnl }</code></pre>
<p>could be placed in a <i>.perltidyrc</i> file to temporarily override the maximum line length with a large value, to temporarily prevent new line breaks from being added, and to prevent an extra newline character from being added the file. All other settings in the <i>.perltidyrc</i> file still apply. Thus it provides a way to format a long 'one liner' when perltidy is invoked with</p>
<p>Please note: do not use this flag unless you are sure your script needs it. Parsing errors can occur if it does not have a hash-bang, or, for example, if the actual first hash-bang is in a here-doc. In that case a parsing error will occur because the tokenization will begin in the middle of the here-doc.</p>
</dd>
-<dt id="Making-a-file-unreadable"><b>Making a file unreadable</b></dt>
-<dd>
-
-<p>The goal of perltidy is to improve the readability of files, but there are two commands which have the opposite effect, <b>--mangle</b> and <b>--extrude</b>. They are actually merely aliases for combinations of other parameters. Both of these strip all possible whitespace, but leave comments and pod documents, so that they are essentially reversible. The difference between these is that <b>--mangle</b> puts the fewest possible line breaks in a script while <b>--extrude</b> puts the maximum possible. Note that these options do not provided any meaningful obfuscation, because perltidy can be used to reformat the files. They were originally developed to help test the tokenization logic of perltidy, but they have other uses. One use for <b>--mangle</b> is the following:</p>
+</dl>
-<pre><code> perltidy --mangle myfile.pl -st | perltidy -o myfile.pl.new</code></pre>
+<h2 id="Debugging-perltidy-input">Debugging perltidy input</h2>
-<p>This will form the maximum possible number of one-line blocks (see next section), and can sometimes help clean up a badly formatted script.</p>
-
-<p>A similar technique can be used with <b>--extrude</b> instead of <b>--mangle</b> to make the minimum number of one-line blocks.</p>
-
-<p>Another use for <b>--mangle</b> is to combine it with <b>-dac</b> to reduce the file size of a perl script.</p>
+<dl>
-</dd>
-<dt id="Debugging"><b>Debugging</b></dt>
+<dt id="The---dump-...-parameters">The <b>--dump-...</b> parameters</dt>
<dd>
-<p>The following flags are available for debugging:</p>
+<p>The following flags are available for debugging. Note that all commands named <b>--dump-...</b> will simply write some requested information to standard output and then immediately exit.</p>
-<p><b>--dump-cuddled-block-list</b> or <b>-dcbl</b> will dump to standard output the internal hash of cuddled block types created by a <b>-cuddled-block-list</b> input string.</p>
+<p><b>--dump-cuddled-block-list</b> or <b>-dcbl</b> will dump to standard output the internal hash of cuddled block types created by a <b>--cuddled-block-list</b> input string.</p>
<p><b>--dump-defaults</b> or <b>-ddf</b> will write the default option set to standard output and quit</p>
+<p><b>--dump-integer-option-range</b> or <b>-dior</b> will write a list of comma-separated values. Each line contains the name of an integer option along with its minimum, maximum, and default values.</p>
+
<p><b>--dump-profile</b> or <b>-dpro</b> will write the name of the current configuration file and its contents to standard output and quit.</p>
<p><b>--dump-options</b> or <b>-dop</b> will write current option set to standard output and quit.</p>
<p><b>--dump-token-types</b> or <b>-dtt</b> will write a list of all token types to standard output and quit.</p>
-<p><b>--dump-want-left-space</b> or <b>-dwls</b> will write the hash %want_left_space to standard output and quit. See the section on controlling whitespace around tokens.</p>
+<p><b>--dump-want-left-space</b> or <b>-dwls</b> will write the hash <code>%want_left_space</code> to standard output and quit. See the section on controlling whitespace around tokens.</p>
+
+<p><b>--dump-want-right-space</b> or <b>-dwrs</b> will write the hash <code>%want_right_space</code> to standard output and quit. See the section on controlling whitespace around tokens.</p>
+
+<p>See <a href="#Analyzing-Code">"Analyzing Code"</a> for additional <b>--dump-</b> parameters.</p>
-<p><b>--dump-want-right-space</b> or <b>-dwrs</b> will write the hash %want_right_space to standard output and quit. See the section on controlling whitespace around tokens.</p>
+</dd>
+<dt id="Other-parameters-related-to-processing"><b>Other parameters related to processing</b></dt>
+<dd>
<p><b>--no-memoize</b> or <b>-nmem</b> will turn of memoizing. Memoization can reduce run time when running perltidy repeatedly in a single process. It is on by default but can be deactivated for testing with <b>-nmem</b>.</p>
<p><b>--no-timestamp</b> or <b>-nts</b> will eliminate any time stamps in output files to prevent differences in dates from causing test installation scripts to fail. There are just a couple of places where timestamps normally occur. One is in the headers of html files, and another is when the <b>-cscw</b> option is selected. The default is to allow timestamps (<b>--timestamp</b> or <b>-ts</b>).</p>
-<p><b>--file-size-order</b> or <b>-fso</b> will cause files to be processed in order of increasing size, when multiple files are being processed. This is useful during program development, when large numbers of files with varying sizes are processed, because it can reduce virtual memory usage.</p>
+<p><b>--file-size-order</b> or <b>-fso</b> will cause files to be processed in order of increasing size, when multiple files are being processed. This is particularly useful during program development, when large numbers of files with varying sizes are processed, because it can reduce virtual memory usage. This is the default and can be deactivated with <b>-nfso</b>.</p>
<p><b>--maximum-file-size-mb=n</b> or <b>-maxfs=n</b> specifies the maximum file size in megabytes that perltidy will attempt to format. This parameter is provided to avoid causing system problems by accidentally attempting to format an extremely large data file. Most perl scripts are less than about 2 MB in size. The integer <b>n</b> has a default value of 10, so perltidy will skip formatting files which have a size greater than 10 MB. The command to increase the limit to 20 MB for example would be</p>
<pre><code> perltidy -maxfs=20</code></pre>
-<p>This only applies to files specified by filename on the command line.</p>
+<p>This length test is applied to named files before they are read into memory. It is applied to files arriving from standard input after they are read into memory. It is not applied to character strings arriving by a call to the Perl::Tidy module.</p>
+
+</dd>
+<dt id="Controls-for-when-to-stop-processing"><b>Controls for when to stop processing</b></dt>
+<dd>
<p><b>--maximum-level-errors=n</b> or <b>-maxle=n</b> specifies the maximum number of indentation level errors are allowed before perltidy skips formatting and just outputs a file verbatim. The default is <b>n=1</b>. This means that if the final indentation of a script differs from the starting indentation by more than 1 levels, the file will be output verbatim. To avoid formatting if there are any indentation level errors use -maxle=0. To skip this check you can either set n equal to a large number, such as <b>n=100</b>, or set <b>n=-1</b>.</p>
<p>A recommended value is <b>n=3</b>. However, the default is <b>n=0</b> (skip this check) to avoid causing problems with scripts which have extended syntaxes.</p>
-<p><b>-DEBUG</b> will write a file with extension <i>.DEBUG</i> for each input file showing the tokenization of all lines of code.</p>
+</dd>
+<dt id="Handling-errors-in-options-which-take-integer-values"><b>Handling errors in options which take integer values</b></dt>
+<dd>
+
+<p>Many of the input parameters take integer values. Before processing begins, a check is made to see if any of these integer parameters exceed their valid ranges. The default behavior when a range is exceeded is to write a warning message and reset the value to its default setting. This default behavior can be changed with the parameter <b>--integer-range-check=n</b>, or <b>-irc=n</b>, as follows:</p>
+
+<pre><code> n=0 skip check completely (for stress-testing perltidy only)
+ n=1 reset bad values to defaults but do not issue a warning
+ n=2 reset bad values to defaults and issue warning [DEFAULT]
+ n=3 stop if any values are out of bounds</code></pre>
+
+<p>The values <b>n=0</b> and <b>n=1</b> are mainly useful for testing purposes.</p>
</dd>
-<dt id="Making-a-table-of-information-on-code-blocks"><b>Making a table of information on code blocks</b></dt>
+<dt id="Debugging-perltidy-tokenization"><b>Debugging perltidy tokenization</b></dt>
+<dd>
+
+<p><b>-DEBUG, -D</b> will write a file with extension <i>.DEBUG</i> for each input file showing the tokenization of all lines of code. This can produce a lot of output and is mainly useful for debugging tokenization issues during perltidy development.</p>
+
+</dd>
+</dl>
+
+<h2 id="Analyzing-Code">Analyzing Code</h2>
+
+<p>Perltidy reports any obvious issues that are found during formatting, such as unbalanced braces. But several parameters are available for making certain additional checks for issues which might be of interest to a programmer. These parameters fall into two categories as indicated by their prefix, <b>--dump-</b> or <b>--warn-</b>:</p>
+
+<ul>
+
+<li><p>The <b>--dump-</b> parameters read a file, write information to the standard output, and then exit without doing any formatting.</p>
+
+</li>
+<li><p>The <b>--warn-</b> parameters, on the other hand, cause perltidy to function normally but issue warnings to the error output when certain conditions are encountered.</p>
+
+</li>
+</ul>
+
+<p>Some of these have associated control parameters.</p>
+
+<dl>
+
+<dt id="Use---dump-block-summary-to-make-a-table-of-information-on-code-blocks"><b>Use --dump-block-summary to make a table of information on code blocks</b></dt>
<dd>
<p>A table listing information about the blocks of code in a file can be made with <b>--dump-block-summary</b>, or <b>-dbs</b>. This causes perltidy to read and parse the file, write a table of comma-separated values for selected code blocks to the standard output, and then exit. This parameter must be on the command line, not in a <i>.perlticyrc</i> file, and it requires a single file name on the command line. For example</p>
<p>produces an output file <i>blocks.csv</i> whose lines hold these parameters:</p>
-<pre><code> filename - the name of the file
- line - the line number of the opening brace of this block
- line_count - the number of lines between opening and closing braces
- code_lines - the number of lines excluding blanks, comments, and pod
- type - the block type (sub, for, foreach, ...)
- name - the block name if applicable (sub name, label, asub name)
- depth - the nesting depth of the opening block brace
- max_change - the change in depth to the most deeply nested code block
- block_count - the total number of code blocks nested in this block
- mccabe_count - the McCabe complexity measure of this code block</code></pre>
+<pre><code> filename - name of the file
+ line - line number of the opening brace of this block
+ line_count - number of lines between opening and closing braces
+ code_lines - number of lines excluding blanks, comments, and pod
+ type - block type (sub, for, foreach, ...)
+ name - block name if applicable (sub or asub name, label..)
+ depth - nesting depth of the opening block brace
+ max_change - change in depth to the most deeply nested code block
+ block_count - total number of code blocks nested in this block
+ mccabe_count - McCabe complexity measure of this code block</code></pre>
<p>This feature was developed to help identify complex sections of code as an aid in refactoring. The McCabe complexity measure follows the definition used by Perl::Critic. By default the table contains these values for subroutines, but the user may request them for any or all blocks of code or packages. For blocks which are loops nested within loops, a postfix '+' to the <code>type</code> is added to indicate possible code complexity. Although the table does not otherwise indicate which blocks are nested in other blocks, this can be determined by computing and comparing the block ending line numbers.</p>
+<p>For subroutines, the number of call arguments (args) is listed in parentheses in the <code>type</code> column. For example, <code>sub(9)</code> indicates a sub with 9 args. Subroutines whose arg count cannot easily be determined are indicated as <code>sub(*)</code>. The count includes any leading object passed in a method call (such as <b>$self</b>).</p>
+
<p>By default the table lists subroutines with more than 20 <code>code_lines</code>, but this can be changed with the following two parameters:</p>
-<p><b>--dump-block-minimum-lines=n</b>, or <b>-dbl=n</b>, where <b>n</b> is the minimum number of <code>code_lines</code> to be included. The default is <b>-n=20</b>. Note that <code>code_lines</code> is the number of lines excluding and comments, blanks and pod.</p>
+<p><b>--dump-block-minimum-lines=n</b>, or <b>-dbl=n</b>, where <b>n</b> is the minimum number of <code>code_lines</code> to be included. The default is <b>-n=20</b>. Note that <code>code_lines</code> is the number of lines excluding comments, blanks and pod.</p>
<p><b>--dump-block-types=s</b>, or <b>-dbt=s</b>, where string <b>s</b> is a list of block types to be included. The type of a block is either the name of the perl builtin keyword for that block (such as <b>sub if elsif else for foreach ..</b>) or the word immediately before the opening brace. In addition, there are a few symbols for special block types, as follows:</p>
<pre><code> if elsif else for foreach ... any keyword introducing a block
- sub - any sub or anynomous sub
+ sub - any sub or anonymous sub
asub - any anonymous sub
* - any block except nameless blocks
+ - any nested inner block loop
package - any package or class
- closure - any nameless block</code></pre>
+ closure - any nameless block
+ elsif3 - an if-elsif- chain with 3 or more elsif's (see below)</code></pre>
+
+<p>A chain of <b>if-elsif-...</b> blocks may be reported as a single line item by entering the word <b>elsif</b> with an appended integer, as indicated by the last item in this list. The integer indicates the number of <b>elsif</b> blocks required for a chain to be reported. If you use this, you may want to also use <b>-dbl=n</b>, with a smaller number of lines <b>n</b> than the default.</p>
<p>In addition, specific block loop types which are nested in other loops can be selected by adding a <b>+</b> after the block name. (Nested loops are sometimes good candidates for restructuring).</p>
<pre><code> perltidy -dbs -dbl=1 -dbt='* closure' somefile.pl >blocks.csv</code></pre>
+</li>
+<li><p>This selects every if-chain which contains 2 or more <code>elsif</code> blocks:</p>
+
+<pre><code> perltidy -dbs -dbl=1 -dbt='elsif2' somefile.pl >blocks.csv</code></pre>
+
+</li>
+<li><p>This selects every <code>while</code> block with 6 or more code lines</p>
+
+<pre><code> perltidy -dbs -dbt=while -dbl=6 somfile.pl >while.csv</code></pre>
+
</li>
</ul>
</dd>
-<dt id="Working-with-MakeMaker-AutoLoader-and-SelfLoader"><b>Working with MakeMaker, AutoLoader and SelfLoader</b></dt>
+<dt id="Use---dump-unusual-variables-to-find-unused-reused-and-certain-other-variables-of-interest"><b>Use --dump-unusual-variables to find unused, reused, and certain other variables of interest</b></dt>
+<dd>
+
+<p>Variables with certain properties of interest to a programmer can be listed with <b>--dump-unusual-variables</b> or <b>-duv</b>. This parameter must be on the command line, along with a single file name. It causes perltidy to scan the file for certain variable types, write any found to the standard output, and then immediately exit without doing any formatting. For example</p>
+
+<pre><code> perltidy -duv somefile.pl >vars.txt</code></pre>
+
+<p>produces a file with lines which look something like</p>
+
+<pre><code> 1778:u: my $input_file
+ 6089:r: my $j: reused - see line 6076</code></pre>
+
+<p>The values on the line are separated by colons and have the following meaning:</p>
+
+<pre><code> line number - the number of the line of the input file
+ issue - a single letter indicating the issue, see below
+ variable name - the name of the variable, preceded by a keyword
+ note - an optional note referring to another line</code></pre>
+
+<p>If there are a large number of issues it can be convenient to read the file into a spreadsheet.</p>
+
+<p>The checks are made for variables introduced by the keywords <b>my</b>, <b>state</b>, and <b>our</b>, along with variables defined with <b>use vars</b> and <b>use constant</b>. It is a good idea to also set <b>use strict</b> in a script so that Perl itself can find issues with variables which appear in a script without one of these methods.</p>
+
+<p>The types of checks which are made are identified in the output with one of the letters <b>r</b>, <b>s</b>, <b>p</b>, <b>u</b>, and <b>c</b> as follows:</p>
+
+<dl>
+
+<dt id="r:-reused-variable-name"><b>r: reused variable name</b></dt>
+<dd>
+
+<p>These are variables which are re-declared in the scope of a variable with the identical name. This can be confusing, perhaps not when the code is first written, but possibly later during maintenance work. For example, this can make it difficult to locate the correct variable with an editor when changes are being made. This issue can be avoided by renaming one of the conflicting variables. Note that this is similar to the <b>Perl::Critic</b> policy <b>Variables::ProhibitReusedNames</b>.</p>
+
+</dd>
+<dt id="s:-sigil-change-but-reused-bareword"><b>s: sigil change but reused bareword</b></dt>
<dd>
-<p>The first $VERSION line of a file which might be eval'd by MakeMaker is passed through unchanged except for indentation. Use <b>--nopass-version-line</b>, or <b>-npvl</b>, to deactivate this feature.</p>
+<p>These are variables which have the same bareword name but a different sigil (<b>$</b>, <b>@</b>, or <b>%</b>) as another variable in the same scope. For example, this occurs if variables <b>$data</b> and <b>%data</b> share the same scope. This can also be confusing for the reasons mentioned above and can be avoided by renaming one of the variables.</p>
+
+</dd>
+<dt id="p:-package-crossing-variables"><b>p: package-crossing variables</b></dt>
+<dd>
-<p>If the AutoLoader module is used, perltidy will continue formatting code after seeing an __END__ line. Use <b>--nolook-for-autoloader</b>, or <b>-nlal</b>, to deactivate this feature.</p>
+<p>These are lexical variables which are declared in one package and still visible in subroutines of a different package in the same file. This can be confusing, and it might cause the program to run differently, or fail, if the the packages were ever split into separate files. This issue can usually be avoided by placing code in block braces of some type. For example, this issue is often found in test code and can sometimes be fixed by using the structure</p>
-<p>Likewise, if the SelfLoader module is used, perltidy will continue formatting code after seeing a __DATA__ line. Use <b>--nolook-for-selfloader</b>, or <b>-nlsl</b>, to deactivate this feature.</p>
+<pre><code> main();
+
+ sub main { #<<<
+ ## old main code goes here
+ }</code></pre>
+
+<p>The <b>non-indenting-braces</b> side comment <code>#<<<</code> is not required but will keep the indentation of the old code unchanged.</p>
+
+<p>This check is only applied to package statements which are not enclosed in block braces in order avoid warnings at temporary package changes.</p>
</dd>
-<dt id="Working-around-problems-with-older-version-of-Perl"><b>Working around problems with older version of Perl</b></dt>
+<dt id="u:-unused-variables"><b>u: unused variables</b></dt>
<dd>
-<p>Perltidy contains a number of rules which help avoid known subtleties and problems with older versions of perl, and these rules always take priority over whatever formatting flags have been set. For example, perltidy will usually avoid starting a new line with a bareword, because this might cause problems if <code>use strict</code> is active.</p>
+<p>These are lexical variables declared with <code>my</code> or <code>state</code> (but not <code>our</code>) and not referenced again within their scope. Calling them <b>unused</b> is convenient but not really accurate; this is a "gray area" for a program. There are some good reasons for having such variables. For example, they might occur in a list of values provided by another routine or data structure, and therefore must be listed, even though they might not be referenced again. Having such variables can make them immediately available for future development and debugging, and can be beneficial for program clarity.</p>
-<p>There is no way to override these rules.</p>
+<p><b>But</b> sometimes they can occur due to being orphaned by a coding change, due to a misspelling, or by having an unintentional preceding <code>my</code>. So it is worth reviewing them, especially for new code. Here is an example of an unused variable in a script located with this method:</p>
+
+<pre><code> BEGIN { my $string = "" }
+ ...
+ $string .= "ok";</code></pre>
+
+<p>This looks nice at first glance, but the scope of the <code>my</code> declaration is limited to the surrounding braces, so it is not the same variable as the other <code>$string</code> and must therefore be reported as unused. This particular problem would have also been caught by perl if the author had used <code>strict</code>.</p>
+
+</dd>
+<dt id="c:-unused-constants"><b>c: unused constants</b></dt>
+<dd>
+
+<p>These are names which are declared with a <code>use constant</code> and a reference was not seen again within their package. They might be needed by an external package, or a set of standard definitions, or available for future development. And in some unusual cases a reference may have been missed by perltidy. But they might also be unused remnants from code development, or due to a misspelling, so it can be worthwhile reviewing them.</p>
+
+</dd>
+</dl>
+
+<p><b>Exception</b>: The following <b>our</b> variables are exempt from warnings: <b>$VERSION</b>, <b>@EXPORT</b>, <b>@EXPORT_OK</b>, <b>%EXPORT_TAGS</b>, <b>@ISA, $AUTOLOAD</b>.</p>
+
+</dd>
+<dt id="Use---warn-variable-types-to-warn-about-certain-variable-types"><b>Use --warn-variable-types to warn about certain variable types</b></dt>
+<dd>
+
+<p>The flag <b>--warn-variable-types=string</b>, or <b>-wvt=string</b>, is the <b>--warn</b> counterpart to <b>--dump-unusual-variables</b>, and can be used to produce a warning message if certain of the above variable types are encountered during formatting. All possible variable warnings may be requested with <b>-wvt='*'</b> or <b>-wvt=1</b>.</p>
+
+<p>For example,</p>
+
+<pre><code> perltidy -wvt='*' somefile.pl</code></pre>
+
+<p>The default is not to do any of these checks, and it can also be indicated with <b>-wvt=0</b>.</p>
+
+<p>To restrict the check to a specific set warnings, set the input <b>string</b> to be a space-separated or comma-separated list of the letters associated with the types of variables to be checked. For example:</p>
+
+<pre><code> perltidy -wvt='s r' somefile.pl</code></pre>
+
+<p>will process <i>somefile.pl</i> normally but issue a warning if either of the issues <b>s</b> or <b>r</b>, described above, are encountered.</p>
+
+<p>A companion flag, <b>--warn-variable-exclusion-list=string</b>, or <b>-wvxl=string</b>, can be used to skip warning checks for a list of variable names. A leading and/or trailing '*' may be placed on any of these variable names to allow a partial match.</p>
+
+<p>For example,</p>
+
+<pre><code> perltidy -wvt=1 -wvxl='$self $class *_unused' somefile.pl</code></pre>
+
+<p>will do all possible checks but not report any warnings for variables <code>$self</code>, <code>$class</code>, and for example <code>$value_unused</code>.</p>
+
+<p>This partial match option provides a way to trigger a warning message when a new unused variable is detected in a script. This can be accomplished by adding a unique suffix to the names of existing unused variables, such as <code>_unused</code>. This suffix is then added to the exclusion list.</p>
+
+<p>As a specific example, consider the following line which is part of some debug code which only references the latter three variables (but might someday need to reference the package variable too).</p>
+
+<pre><code> my ( $package_uu, $filename, $line, $subroutine ) = caller();</code></pre>
+
+<p>The unused variable, <code>$package_uu</code>, has been specially marked with suffix <code>_uu</code>. No type <b>u</b> (unused variable) warning will be produced provided that this wildcard suffix is in the exclusion list:</p>
+
+<pre><code> -wvxl='*_uu'</code></pre>
+
+</dd>
+<dt id="Use---dump-unique-keys-to-help-locate-misspelled-hash-keys"><b>Use --dump-unique-keys</b> to help locate misspelled hash keys</dt>
+<dd>
+
+<p>The parameter <b>--dump-unique-keys</b>, or <b>-duk</b>, dumps a list of hash keys which appear to be used just once, and do not appear among the quoted strings in a file. For example:</p>
+
+<pre><code> perltidy -duk File.pm >output.txt</code></pre>
+
+<p>The lines in the output file list each unique key and its line number. Typically, most of the listed keys listed will be perfectly valid keys needed, for example, for communication with other modules or for future development. But the list might also include something unexpected, such as a misspelled key.</p>
+
+<p>A program <code>dump_unique_keys.pl</code> at <a href="https://github.com/perltidy/perltidy/tree/master/examples">https://github.com/perltidy/perltidy/tree/master/examples</a> can run perltidy with <b>-duk</b> on multiple files, and then remove any common keys from the list.</p>
+
+</dd>
+<dt id="Use---dump-mixed-call-parens-to-find-functions-called-both-with-and-without-parens"><b>Use --dump-mixed-call-parens to find functions called both with and without parens</b></dt>
+<dd>
+
+<p>The parameter <b>--dump-mixed-call-parens</b>, or <b>-dmcp</b>, provides information on the use of call parens within a program. It produces a list of keywords and sub names which occur both both with and without parens. In other words, with a mixed style. This might be useful if one is working to standardize the call style for some particular keyword or function. For example,</p>
+
+<pre><code> perltidy -dmcp somefile.pl >output.txt</code></pre>
+
+<p>will analyze the text of <i>somefile.pl</i>, write the results to <i>output.txt</i>, and then immediately exit (like all <b>dump-</b> parameters).</p>
+
+<p>The output shows a list of operators and the number of times they were used with parens and the number of times without parens. For example, here is a small section of the output from one file in a past Perl distribution:</p>
+
+<pre><code> k:length:17:9
+ k:open:30:9
+ k:pop:3:4</code></pre>
+
+<p>The first line shows that the <code>length</code> function occurs 17 times with parens and 9 times without parens. The 'k' indicates that <code>length</code> is a Perl builtin keyword ('U' would mean user-defined sub, and 'w' would mean unknown bareword). So from this partial output we see that the author had a preference for parens around the args of <code>length</code> and <code>open</code>, whereas <code>pop</code> was about equally likely to have parens as not.</p>
+
+<p>More detailed information can be obtained with the parameters described in the next section.</p>
+
+</dd>
+<dt id="Use---want-call-parens-s-and---nowant-call-parens-s-to-warn-about-specific-missing-or-extra-call-parens"><b>Use --want-call-parens=s and --nowant-call-parens=s to warn about specific missing or extra call parens</b></dt>
+<dd>
+
+<p>The parameter <b>--want-call-parens=s</b>, or <b>-wcp=s</b>, can be used to to produce a warning message if call parens are missing from selected functions. Likewise, <b>--nowant-call-parens=s</b>, or <b>-nwcp=s</b>, can warn if call parens exist for selected functions. When either of these parameters are set, perltidy will report any discrepancies from the requested style in its error output.</p>
+
+<p>Before using either of these parameters, it may be helpful to first use <b>--dump-mixed-call-parens=s</b>, described in the previous section, to get an overview of the existing paren usage in a file.</p>
+
+<p>The string arguments <b>s</b> are space-separated lists of the names of the functions to be checked. The function names may be builtin keywords or user-defined subs. They may not include a package prefix or sigil. To illustrate,</p>
+
+<pre><code> perltidy -wcp='length open' -nwcp='pop' somefile.pl</code></pre>
+
+<p>means that the builtin functions <code>length</code> and <code>open</code> should have parens around their call args but <code>pop</code> should not. The error output might contain lines such as:</p>
+
+<pre><code> 2314:open FD_TO_CLOSE: no call parens
+ 3652:pop (: has call parens
+ 3783:length $DB: no call parens
+ ...</code></pre>
+
+<p>For builtin keywords which have both a block form and a trailing modifier form, such as <code>if</code>, only the trailing modifier form will be checked since parens are mandatory for the block form.</p>
+
+<p>The symbol <b>&</b> may entered instead of a function name to mean all user-defined subs not explicitly listed. So the compact expression</p>
+
+<pre><code> perltidy -wcp='&' somefile.pl</code></pre>
+
+<p>means that calls to all user-defined subs in the file being processed should have their call arguments enclosed in parens.</p>
+
+<p>Perltidy does not have the ability to add or delete call parens because it is difficult to automate, so changes must be made manually. When adding or removing parentheses, it is essential to pay attention to operator precedence issues. For example, if the parens in the following statement are removed, then <code>||</code> must be changed to <code>or</code>:</p>
+
+<pre><code> open( IN, "<", $infile ) || die("cannot open $infile:$!\n");</code></pre>
+
+<p>Otherwise, the <code>||</code> will operate on <code>$infile</code> rather than the return value of <code>open</code>.</p>
+
+</dd>
+<dt id="Use---dump-mismatched-args-to-find-function-calls-with-args-not-matching-sub-declarations"><b>Use --dump-mismatched-args to find function calls with args not matching sub declarations</b></dt>
+<dd>
+
+<p>The parameter <b>--dump-mismatched-args</b>, or <b>-dma</b>, causes perltidy to examine the definitions of subroutines in a file, and calls to those subs, and report certain differences. Like all <b>--dump</b> commands, it writes its report to standard output and exits immediately. For example</p>
+
+<pre><code> perltidy -dma somefile.pl >results.txt</code></pre>
+
+<p>Four types of issues are reported, types <b>a</b>, <b>o</b>, <b>u</b>, and <b>i</b>:</p>
+
+<dl>
+
+<dt id="a:-calls-made-to-a-sub-both-with-and-without-the-arrow-operator"><b>a:</b> calls made to a sub both with and without the <b>arrow</b> operator</dt>
+<dd>
+
+<p>For example the following two lines would be reported as a mismatch:</p>
+
+<pre><code> Fault();</code></pre>
+
+<p>and</p>
+
+<pre><code> $self->Fault();</code></pre>
+
+<p>This may or may not be an error, but it is worth checking. It might become an error in the future if sub <code>Fault</code> starts to access <code>$self</code>.</p>
+
+</dd>
+<dt id="o:-overcount-:-the-number-of-call-args-exceeds-the-expected-number"><b>o:</b> (<b>overcount</b>): the number of call args exceeds the expected number.</dt>
+<dd>
+
+</dd>
+<dt id="u:-undercount-:-the-number-of-call-args-is-less-than-the-expected-number"><b>u:</b> (<b>undercount</b>): the number of call args is less than the expected number.</dt>
+<dd>
+
+<p>For example</p>
+
+<pre><code> sub gnab_gib {
+ my $self=shift;
+ my ($v1,$v2)=@_;
+ ...
+ }
+
+ $self->gnab_gib(42);</code></pre>
+
+<p>In this case, the sub is expecting a total of three args (<code>$self</code>, <code>$v1</code>, and <code>$v2</code>) but only receives two (<code>$self</code> and <code>42</code>), so an undercount is reported. This is not necessarily an error because the sub may allow for this possibility, but it is worth checking.</p>
+
+<p>Although it is not possible to automatically determine which sub args are optional, if optional sub args are enclosed in an extra set of parentheses, perltidy will take this a signal that they are optional and not issue a warning. So if the above example is written as</p>
+
+<pre><code> sub gnab_gib {
+ my $self = shift;
+ my ( $v1, ($v2) ) = @_; # <-- $v2 is optional
+ ...;
+ }</code></pre>
+
+<p>then perltidy will consider that the second arg is optional and not issue a warning for:</p>
+
+<pre><code> $self->gnab_gib(42);</code></pre>
+
+<p>For multiple default call args, place one set of parens around them all. Some examples:</p>
+
+<pre><code> my ( ($v1) ) = @_; # <-- $v1 is optional
+ my ( $v1, ( $v2, $v3 ) ) = @_; # <-- $v2, $v3 are optional</code></pre>
+
+</dd>
+<dt id="i:-indeterminate:-a-specific-number-of-expected-args-for-a-sub-could-not-be-determined-but-it-is-called-with-a-specific-number.-This-issue-is-reported-for-the---dump--option-but-not-the---warn--option"><b>i:</b> <b>indeterminate:</b> a specific number of expected args for a sub could not be determined, but it is called with a specific number. This issue is reported for the <b>--dump-</b> option but not the <b>--warn-</b> option.</dt>
+<dd>
</dd>
</dl>
+<p><b>Notes and Limitations:</b></p>
+
+<ul>
+
+<li><p>This option works best for subs which unpack call args in an orderly manner near the beginning of the sub from <code>@_</code> and/or with <code>shift</code> operations. It will also work for direct access to specific elements of the @_ array. However if the coding for arg extraction is complex then the number of sub args will be considered indeterminate and a count cannot be checked.</p>
+
+</li>
+<li><p>Sub calls made without parentheses around the args are not checked.</p>
+
+</li>
+<li><p>Anonymous subs and lexical subs (introduced with <code>my</code>) are not checked.</p>
+
+</li>
+<li><p>Only calls which appear to be to subs defined within the file being processed are checked. But note that a file may contain multiple packages.</p>
+
+</li>
+</ul>
+
+</dd>
+<dt id="Use---warn-mismatched-args-to-produce-a-warning-for-function-calls-with-args-not-matching-sub-declarations"><b>Use --warn-mismatched-args to produce a warning for function calls with args not matching sub declarations</b>.</dt>
+<dd>
+
+<p>This is similar to the <b>-dump</b> parameter described above except that any mismatches are reported in the error file and otherwise formatting continues normally. The basic usage is</p>
+
+<pre><code> perltidy -wma somefile.pl</code></pre>
+
+<p>Several companion controls are available to avoid unwanted error messages:</p>
+
+<ul>
+
+<li><p><b>--warn-mismatched-arg-types=s</b>, or <b>-wmat=s</b>, can be used to select specific tests, type <b>a</b> (arrow test) or <b>o</b> (overcounts) or <b>u</b> (undercounts). All checks may be requested with <b>-wmat='*'</b> or <b>-wmat=1</b>. This is the default.</p>
+
+<p>To restrict the check to a specific warning type, set the string equal to the letter of that warning, any <b>a</b>, <b>o</b>, or <b>u</b>. For example</p>
+
+<pre><code> perltidy -wmat='a o' somefile.pl</code></pre>
+
+<p>will format <i>somefile.pl</i> and report any arrow-type mismatches and overcount mismatches, but will skip undercount mismatches.</p>
+
+</li>
+<li><p><b>--warn-mismatched-arg-exclusion-list</b>, or <b>-wmaxl=string</b>, can be given to skip the warning checks for a list of subroutine names, entered as a quoted string of space- or comma-separated names, without a package prefix. All subs with those names will be skipped, regardless of package. A leading and/or trailing <b>*</b> on a name may be used to indicate a partial string match.</p>
+
+</li>
+<li><p><b>--warn-mismatched-arg-undercount-cutoff=n</b>, or <b>-wmauc=n</b>, can be used to avoid <b>undercount</b> warnings when the expected number of args is less than <b>n</b>. Please note that this number <b>n</b> is the number of args from the point of view of the sub definition, so an object like <code>$self</code> passed with an arrow operator counts as one arg.</p>
+
+<p>The default value is <b>n=4</b>. This has been found to allow most programs to pass without warnings, but it should be reduced if possible for better error checking. The minimum possible value of <b>n</b> needed to avoid triggering an error for a program can be determined by running with <b>-wma -wmauc=0</b>. If there are undercount errors, a note at the bottom of the error output indicates the value of <b>n</b> required to avoid reporting them.</p>
+
+<p>As noted above for the parameter <b>--dump-mismatched-args</b>, if optional call args are enclosed in separate parentheses, then perltidy will recognize them as optional args and avoid needless warnings. If this method is used, then <b>-wmauc=0</b> should be used for maximal checking.</p>
+
+</li>
+<li><p><b>--warn-mismatched-arg-overcount-cutoff=n</b>, or <b>-wmaoc=n</b>, can be used to avoid <b>overcount</b> warnings when the expected number of args is less than <b>n</b>. The default value is <b>n=1</b>. This avoids warning messages for subroutines which are dummy placeholders for possible expansion.</p>
+
+</li>
+</ul>
+
+<p>To illustrate these controls,</p>
+
+<pre><code> perltidy -wma -wmat='o u' -wmaxl='new old' -wmauc=2 somefile.pl</code></pre>
+
+<p>means format <i>somefile.pl</i> as usual and check for mismatched overcounts and undercounts but not arrows. Skip checking for any sub named <code>new</code> or <code>old</code>, and only warn of undercounts for subs expecting more than 2 args.</p>
+
+</dd>
+<dt id="Use---dump-mismatched-returns-to-find-function-calls-where-the-number-of-requested-values-may-disagree-with-sub-return-statements"><b>Use --dump-mismatched-returns</b> to find function calls where the number of requested values may disagree with sub return statements</dt>
+<dd>
+
+<p>The parameter <b>--dump-mismatched-returns</b>, or <b>-dmr</b>, examines the return side of sub call statements. Like all <b>--dump</b> commands, it writes its report to standard output and exits immediately. For example</p>
+
+<pre><code> perltidy -dmr somefile.pl >results.txt</code></pre>
+
+<p>The following types of issues are reported:</p>
+
+<dl>
+
+<dt id="x:-calls-requesting-an-array-from-a-sub-with-no-return-statements"><b>x:</b> calls requesting an array from a sub with no return statements.</dt>
+<dd>
+
+</dd>
+<dt id="y:-calls-requesting-a-scalar-from-a-sub-with-no-return-statements"><b>y:</b> calls requesting a scalar from a sub with no return statements.</dt>
+<dd>
+
+</dd>
+<dt id="o:-overwant-:-calls-requesting-an-array-with-a-count-which-exceeds-the-maximum-number-returned-by-the-sub"><b>o:</b> (<b>overwant</b>): calls requesting an array with a count which exceeds the maximum number returned by the sub.</dt>
+<dd>
+
+</dd>
+<dt id="u:-underwant-:-calls-requesting-an-array-with-a-count-which-is-below-the-maximum-and-which-does-not-match-a-number-returned-by-the-sub"><b>u:</b> (<b>underwant</b>): calls requesting an array with a count which is below the maximum and which does not match a number returned by the sub.</dt>
+<dd>
+
+</dd>
+<dt id="s:-calls-requesting-a-scalar-from-a-sub-which-only-returns-two-or-more-items"><b>s:</b> calls requesting a scalar from a sub which only returns two or more items.</dt>
+<dd>
+
+</dd>
+</dl>
+
+<p>These issue types are illustrated with the following code</p>
+
+<pre><code> sub macho {
+ ...
+ ( $name, $flags ); # 2 values but no 'return' statement
+ }
+
+ ( $name, $flags ) = macho(); # 'x' (want array, but no return)
+ $name = macho(); # 'y' (want scalar but no return)
+
+ sub wimp {
+ ...;
+ return ( $name, $flags ); # 2 values with 'return' statement
+ }
+
+ ( $name, $flags, $access) = wimp(); # 'o' (want array 3 > 2)
+ ($name) = wimp(); # 'u' (want array 1 < 2)
+ $name = wimp(); # 's' (want scalar but 2 values returned)</code></pre>
+
+<p>This analysis works by scanning all call statements and all sub return statements, and comparing the the number of items wanted with the possible number of items returned. If a specific value for either of these numbers cannot be determined for a call then it cannot be checked.</p>
+
+<p>Since only return statements are scanned for return values, this analysis will not be useful for programming which relies on the default return mechanism, as in the first sub above. Note that the <b>Perl::Critic</b> policy <b>RequireFinalReturn</b> can be used to check for code in this situation.</p>
+
+<p>Reported issues are not necessarily errors, but they might be, or they might indicate potentially confusing code.</p>
+
+</dd>
+<dt id="Use---warn-mismatched-returns-to-issue-warnings-when-the-number-of-requested-values-may-disagree-with-sub-return-statements"><b>Use --warn-mismatched-returns</b> to issue warnings when the number of requested values may disagree with sub return statements</dt>
+<dd>
+
+<p>This is similar to the <b>-dump</b> parameter described above except that any mismatches are reported in the error file and otherwise formatting continues normally. The basic usage is</p>
+
+<pre><code> perltidy -wmr somefile.pl</code></pre>
+
+<p>The following companion controls are available to avoid unwanted error messages:</p>
+
+<ul>
+
+<li><p><b>--warn-mismatched-return-types=string</b>, or <b>-wmrt=string</b>, can be used to limit checks.</p>
+
+<p>To restrict the checking, set the string equal to the letter(s) of that warning, any <b>x</b>, <b>y</b>, <b>o</b>, <b>u</b>, or <b>s</b>. For example</p>
+
+<pre><code> perltidy -wmrt='x o s' somefile.pl</code></pre>
+
+<p>will format <i>somefile.pl</i> and report issue types <b>x</b>, <b>o</b>, and <b>s</b> but not types <b>u</b> and <b>y</b>. All checks may be requested with <b>-wmrt='*'</b> or <b>-wmrt=1</b>. This is the default.</p>
+
+</li>
+<li><p><b>--warn-mismatched-return-exclusion-list</b>, or <b>-wmrxl=string</b>, can be given to skip the warning checks for a list of subroutine names, entered as a quoted string of space- or comma-separated names, without a package prefix. All subs with those names will be skipped, regardless of package. A leading and/or trailing <b>*</b> on a name may be used to indicate a partial string match.</p>
+
+</li>
+</ul>
+
+</dd>
+</dl>
+
+<h2 id="Working-with-MakeMaker-AutoLoader-and-SelfLoader"><b>Working with MakeMaker, AutoLoader and SelfLoader</b></h2>
+
+<p>The first $VERSION line of a file which might be eval'd by MakeMaker is passed through unchanged except for indentation. The default <b>--pass-version-line</b>, or <b>-pvl</b>, is to do this. Use <b>--nopass-version-line</b>, or <b>-npvl</b>, to deactivate this feature.</p>
+
+<p>If the AutoLoader module is used, perltidy will continue formatting code after seeing an __END__ line. The default <b>--look-for-autoloader</b>, or <b>-lal</b>, is to do this. Use <b>--nolook-for-autoloader</b>, or <b>-nlal</b>, to deactivate this feature.</p>
+
+<p>Likewise, if the SelfLoader module is used, perltidy will continue formatting code after seeing a __DATA__ line. The default <b>--look-for-selfloader</b>, or <b>-lsl</b>, is to do this. Use <b>--nolook-for-selfloader</b>, or <b>-nlsl</b>, to deactivate this feature.</p>
+
<h1 id="HTML-OPTIONS">HTML OPTIONS</h1>
<dl>
<dt id="The--pre-flag-for-code-snippets">The <b>-pre</b> flag for code snippets</dt>
<dd>
-<p>When the <b>-pre</b> flag is given, only the pre-formatted section, within the <PRE> and </PRE> tags, will be output. This simplifies inclusion of the output in other files. The default is to output a complete web page.</p>
+<p>When the <b>-pre</b> flag is given, only the pre-formatted section, within the <code><PRE</code>> and <code></PRE</code>> tags, will be output. This simplifies inclusion of the output in other files. The default is to output a complete web page.</p>
</dd>
<dt id="The--nnn-flag-for-line-numbering">The <b>-nnn</b> flag for line numbering</dt>
identifier identifier i
bareword, function bareword w
keyword keyword k
- quite, pattern quote q
+ quote, pattern quote q
here doc text here-doc-text h
here doc target here-doc-target hh
punctuation punctuation pu
<p>The following list shows all short parameter names which allow a prefix 'n' to produce the negated form:</p>
-<pre><code> D anl asbl asc ast asu atc atnl aws b
- baa baao bar bbao bbb bbc bbs bl bli boa
- boc bok bol bom bos bot cblx ce conv cpb
- cs csc cscb cscw dac dbc dbs dcbl dcsc ddf
- dln dnl dop dp dpro drc dsc dsm dsn dtc
- dtt dwic dwls dwrs dws eos f fll fpva frm
- fs fso gcs hbc hbcm hbco hbh hbhh hbi hbj
- hbk hbm hbn hbp hbpd hbpu hbq hbs hbsc hbv
- hbw hent hic hicm hico hih hihh hii hij hik
- him hin hip hipd hipu hiq his hisc hiv hiw
- hsc html ibc icb icp iob isbc iscl kgb kgbd
- kgbi kis lal log lop lp lsl mem nib ohbr
- okw ola olc oll olq opr opt osbc osbr otr
- ple pod pvl q sac sbc sbl scbb schb scp
- scsb sct se sfp sfs skp sob sobb sohb sop
- sosb sot ssc st sts t tac tbc toc tp
- tqw trp ts tsc tso vbc vc vmll vsc w
- wfc wn x xci xlp xs</code></pre>
+<pre><code> D aia altc ame anl asbl asc ast asu atc
+ atnl aws b baa baao bar bbao bbb bbc bbs
+ bl bli boa boc bok bol bom bos bot cblx
+ ce conv cpb cs csc cscb cscw dac dbc dbs
+ dcbl dcsc ddf dia dior dln dltc dma dmcp dmr
+ dnl dop dp dpro drc dsc dsm dsn dtc dtco
+ dtt duk duv dwic dwls dwrs dws eos f fpva
+ frm fs fso gcs hbc hbcm hbco hbh hbhh hbi
+ hbj hbk hbm hbn hbp hbpd hbpu hbq hbs hbsc
+ hbv hbw hent hic hicm hico hih hihh hii hij
+ hik him hin hip hipd hipu hiq his hisc hiv
+ hiw hsc html ibc icb icp ils iob ipc isbc
+ iscl kgb kgbd kgbi kis lal log lop lp lsl
+ mci mem nib ohbr okw ola olc oll olq opr
+ opt osbc osbr otr ple pod pvl q qwaf sac
+ sbc sbl scbb schb scp scsb sct se sfp sfs
+ skp sob sobb sohb sop sosb sot ssc st sts
+ t tac tbc toc tp tqw trp ts tsc tso
+ vbc vc viu vmll vsc vsn vwe w wfc wia
+ wma wme wmr wn x xbt xci xlp xs</code></pre>
<p>Equivalently, the prefix 'no' or 'no-' on the corresponding long names may be used.</p>
<h1 id="VERSION">VERSION</h1>
-<p>This man page documents perltidy version 20230309</p>
+<p>This man page documents perltidy version 20250105</p>
<h1 id="BUG-REPORTS">BUG REPORTS</h1>
<h1 id="COPYRIGHT">COPYRIGHT</h1>
-<p>Copyright (c) 2000-2022 by Steve Hancock</p>
+<p>Copyright (c) 2000-2025 by Steve Hancock</p>
<h1 id="LICENSE">LICENSE</h1>
<li><a href="#Block-Brace-Vertical-Tightness">Block Brace Vertical Tightness</a></li>
<li><a href="#Closing-Block-Brace-Indentation">Closing Block Brace Indentation</a></li>
<li><a href="#Indentation-Style-for-Other-Containers">Indentation Style for Other Containers</a></li>
+ <li><a href="#Welding">Welding</a></li>
<li><a href="#Opening-Vertical-Tightness">Opening Vertical Tightness</a></li>
<li><a href="#Closing-Token-Placement">Closing Token Placement</a></li>
<li><a href="#Stack-Opening-Tokens">Stack Opening Tokens</a></li>
<li><a href="#Outdenting-Long-Quotes">Outdenting Long Quotes</a></li>
<li><a href="#Many-Other-Parameters">Many Other Parameters</a></li>
<li><a href="#Example-.perltidyrc-files">Example .perltidyrc files</a></li>
- <li><a href="#Tidyview">Tidyview</a></li>
<li><a href="#Additional-Information">Additional Information</a></li>
</ul>
</li>
<p>Before you begin, experiment using just <code>perltidy filename.pl</code> on some of your files. From the results (which you will find in files with a <i>.tdy</i> extension), you will get a sense of what formatting changes, if any, you'd like to make. If the default formatting is acceptable, you do not need a <i>.perltidyrc</i> file.</p>
+<p>The default is based on the recommendations in the <a href="https://perldoc.perl.org/perlstyle">Perl style guide</a>. It is worth noting that, although many variations are possible with the available parameters, this style has some significant advantages when small sections of code are sent to perltidy from within an editor. The reason is that perltidy can usually format a small container spanning multiple lines of code provided that the parens, braces and brackets are balanced. For the default style, the number of lines required to make a balanced selection of code is generally less than for other styles. For example, if a cuddled style is used, then an entire <code>if-elsif-</code> chain must be selected for formatting rather than an individual <code>elsif</code> block. This can be tedious and time consuming.</p>
+
<h2 id="Use-as-Filter">Use as Filter?</h2>
<p>Do you almost always want to run perltidy as a standard filter on just one input file? If yes, use <b>-st</b> and <b>-se</b>.</p>
<p>Do you want the maximum line length to be 80 columns? If no, use <b>-l=n</b>, where <b>n</b> is the number of columns you prefer.</p>
+<p>When setting the maximum line length, something to consider is that perltidy will use this to decide when a list of items should be broken into multiple lines. So if it is set excessively large, lists may be so wide that they are hard to read.</p>
+
<h2 id="Indentation-in-Code-Blocks">Indentation in Code Blocks</h2>
<p>In the block below, the variable <code>$anchor</code> is one indentation level deep and is indented by 4 spaces as shown here:</p>
<pre><code> if ( $flag eq "a" ) {
$anchor = $header;
- } </code></pre>
+ }</code></pre>
<p>If you want to change this to be a different number <b>n</b> of spaces per indentation level, use <b>-i=n</b>.</p>
<h2 id="Tabs">Tabs</h2>
-<p>The default, and recommendation, is to represent leading whitespace with actual space characters. However, if you prefer to entab leading whitespace with one tab character for each <b>n</b> spaces, use <b>-et=n</b>. Typically, <b>n</b> would be 8.</p>
+<p>The default, and recommendation, is to represent leading whitespace with actual space characters. However, if you prefer to entab leading whitespace of lines of code with one tab character for each <b>n</b> spaces, use <b>-et=n</b>. The value of <b>n</b> should be set to be the same as used by your display software. If there is a disagreement, then vertical alignment will not be displayed correctly.</p>
+
+<p>Please note that this number <b>n</b> has nothing to do with the number of spaces for one level of indentation, which is specified separately with <b>-i=n</b>.</p>
<h2 id="Opening-Block-Brace-Right-or-Left">Opening Block Brace Right or Left?</h2>
<pre><code> if ( $flag eq "h" ) {
$headers = 0;
- } </code></pre>
+ }</code></pre>
<p>If you like opening braces on the left, like this, go to <a href="#Opening-Braces-Left">"Opening Braces Left"</a>.</p>
<pre><code> if ( $flag eq "h" ) {
$headers = 0;
- }
+ }
elsif ( $flag eq "f" ) {
$sectiontype = 3;
- }
- else {
+ }
+ else {
print "invalid option: " . substr( $arg, $i, 1 ) . "\n";
dohelp();
}</code></pre>
}
);</code></pre>
-<p>In this default indentation scheme, a simple formula is used to find the indentation of every line. Notice how the first 'undef' is indented 4 spaces (one level) to the right, and how 'PrintError' is indented 4 more speces (one more level) to the right.</p>
+<p>In this default indentation scheme, a simple formula is used to find the indentation of every line. Notice how the first 'undef' is indented 4 spaces (one level) to the right, and how 'PrintError' is indented 4 more spaces (one more level) to the right.</p>
<p>The alternate is to let the location of the opening paren (or square bracket, or curly brace) define the indentation, like this:</p>
<p>The first scheme is completely robust. The second scheme often looks a little nicer, but be aware that deeply nested structures it can be spoiled if the line length limit is exceeded. Also, if there are comments or blank lines within a complex structure perltidy will temporarily fall back on the default indentation scheme. You may want to try both on large sections of code to see which works best.</p>
+<p>Also note that a disadvantage of this second scheme is that small changes in code, such as a change in the length of a sub name, can cause changes in many lines of code. For example, if we decide to change the name <code>connect</code> to <code>connect_to_destination</code>, then all of the call args would have to move right by 15 spaces. This can produce a lot of lines of differences when changes are committed.</p>
+
<p>If you prefer the first (default) scheme, no parameter is needed.</p>
-<p>If you prefer the latter scheme, use <b>-lp</b>.</p>
+<p>If you prefer the latter scheme, use <b>--line-up-parentheses</b>, or <b>-lp</b>. There is an alternative version of this option named <b>--extended-line-up-parentheses</b>, or <b>-xlp</b> which can also be used. For simplicity, the name <b>-lp</b> will refer to either of these options in the following text.</p>
+
+<h2 id="Welding">Welding</h2>
+
+<p>The following snippet is displayed with the default formatting:</p>
+
+<pre><code> $opt_c = Text::CSV_XS->new(
+ {
+ binary => 1,
+ sep_char => $opt_c,
+ always_quote => 1,
+ }
+ );</code></pre>
+
+<p>For this type of structure, where an inner container is nested within an outer container, we can get a more compact display with the parameter <b>--weld-nested-containers</b>, or <b>-wn</b>:</p>
+
+<pre><code> # perltidy -wn
+ $opt_c = Text::CSV_XS->new( {
+ binary => 1,
+ sep_char => $opt_c,
+ always_quote => 1,
+ } );</code></pre>
+
+<p>The name of the parameter comes from the idea that the two opening and two closing tokens have been 'welded' together to act as a single unit. The indentation spaces of the contents has also been reduced by one level.</p>
+
+<p>This is a nice transformation because it is symmetric at the opening and closing, and leaves a sort of 'sandwich' structure 0which is still quite readable.</p>
+
+<p>Some parameters available for asymmetric compressions, at just the opening or closing of complex structures, are described in the next sections.</p>
<h2 id="Opening-Vertical-Tightness">Opening Vertical Tightness</h2>
<p>The <b>-vt=2</b> style does not limit itself to a single indentation step per line.</p>
-<p>Note that in the above example the function 'do_sumething_about_it' started on a new line. This is because it follows an opening code block brace and is governed by the flag previously set in <a href="#Block-Brace-Vertical-Tightness">"Block Brace Vertical Tightness"</a>.</p>
+<p>Note that in the above example the function 'do_something_about_it' started on a new line. This is because it follows an opening code block brace and is governed by the flag previously set in <a href="#Block-Brace-Vertical-Tightness">"Block Brace Vertical Tightness"</a>.</p>
<h2 id="Closing-Token-Placement">Closing Token Placement</h2>
<p>The manual shows how all of these vertical tightness controls may be applied independently to each type of non-block opening and opening token.</p>
+<p>Also, note that <b>--weld-nested-containers</b>, or <b>-wn</b>, mentioned previously, operates like the combination of <b>-sot</b> and <b>-sct</b> and also reduces the indentation level of the contents.</p>
+
<h2 id="Define-Horizontal-Tightness">Define Horizontal Tightness</h2>
<p>Horizontal tightness parameters define how much space is included within a set of container tokens.</p>
<pre><code> $width = $col[ $j + $k ] - $col[ $j ]; # -sbt=0
$width = $col[ $j + $k ] - $col[$j]; # -sbt=1 (default)
- $width = $col[$j + $k] - $col[$j]; # -sbt=2 </code></pre>
+ $width = $col[$j + $k] - $col[$j]; # -sbt=2</code></pre>
<p>For curly braces, decide which of the following values of <b>-bt=n</b> you prefer:</p>
<p>The default is not to place a space after a function call:</p>
-<pre><code> myfunc( $a, $b, $c ); # default </code></pre>
+<pre><code> myfunc( $a, $b, $c ); # default</code></pre>
<p>If you prefer a space:</p>
<p>The default is to place a space between only these keywords and an opening paren:</p>
-<pre><code> my local our and or eq ne if else elsif until unless
+<pre><code> my local our and or eq ne if else elsif until unless
while for foreach return switch case given when</code></pre>
<p>but no others. For example, the default is:</p>
<p>If you prefer a space, like this:</p>
-<pre><code> $i = 1 ; </code></pre>
+<pre><code> $i = 1 ;</code></pre>
<p>enter <b>-sts</b>.</p>
-vt=1
-vtc=1</code></pre>
-<h2 id="Tidyview">Tidyview</h2>
-
-<p>There is a graphical program called <b>tidyview</b> which you can use to read a preliminary <i>.perltidyrc</i> file, make trial adjustments and immediately see their effect on a test file, and then write a new <i>.perltidyrc</i>. You can download a copy at</p>
-
-<p>http://sourceforge.net/projects/tidyview</p>
-
<h2 id="Additional-Information">Additional Information</h2>
-<p>This document has covered the main parameters. Many more parameters are available for special purposes and for fine-tuning a style. For complete information see the perltidy manual http://perltidy.sourceforge.net/perltidy.html</p>
-
-<p>For an introduction to using perltidy, see the tutorial http://perltidy.sourceforge.net/tutorial.html</p>
+<p>For further information see the perltidy documentation at <a href="http://perltidy.sourceforge.net">Sourceforge</a> or at <a href="https://metacpan.org/pod/distribution/Perl-Tidy/bin/perltidy">metacpan</a>. or at <a href="https://perltidy.github.io/perltidy/">GitHub</a></p>
-<p>Suggestions for improving this document are welcome and may be sent to perltidy at users.sourceforge.net</p>
+<p>The source code is maintained at <a href="https://github.com/perltidy/perltidy">GitHub</a>.</p>
</body>
<li><a href="#A-First-Test">A First Test</a></li>
<li><a href="#Indentation">Indentation</a></li>
<li><a href="#Input-Flags">Input Flags</a></li>
- <li><a href="#Line-Length-and-Continuation-Indentation">Line Length and Continuation Indentation.</a></li>
+ <li><a href="#Continuation-Indentation">Continuation Indentation.</a></li>
+ <li><a href="#Line-Length">Line Length</a></li>
<li><a href="#Tabs-or-Spaces">Tabs or Spaces?</a></li>
<li><a href="#Input-Output-Control">Input/Output Control</a></li>
<li><a href="#Style-Variations">Style Variations</a></li>
<li><a href="#Configuration-Files">Configuration Files</a></li>
<li><a href="#Error-Reporting">Error Reporting</a></li>
- <li><a href="#The-Log-File">The Log File</a></li>
<li><a href="#Using-Perltidy-as-a-Filter-on-Selected-Text-from-an-Editor">Using Perltidy as a Filter on Selected Text from an Editor</a></li>
+ <li><a href="#Adding-Blank-Lines-to-Control-List-Formatting">Adding Blank Lines to Control List Formatting</a></li>
+ <li><a href="#Adding-Blank-Lines-to-Control-Vertical-Alignment">Adding Blank Lines to Control Vertical Alignment</a></li>
+ <li><a href="#Format-Skipping">Format Skipping</a></li>
+ <li><a href="#Finding-Unused-Variables">Finding Unused Variables</a></li>
<li><a href="#Writing-an-HTML-File">Writing an HTML File</a></li>
<li><a href="#Summary">Summary</a></li>
</ul>
print "I think that's the problem\n";
}</code></pre>
-<p>You'll notice an immediate style change from the "cuddled-else" style of the original to the default "non-cuddled-else" style. This is because perltidy has to make some kind of default selection of formatting options, and this default tries to follow the suggestions in the perlstyle man pages.</p>
+<p>You'll notice an immediate style change from the "cuddled-else" style of the original to the default "non-cuddled-else" style. This is because perltidy has to make some kind of default selection of formatting options, and this default tries to follow the suggestions in the <a href="https://metacpan.org/dist/perl/view/pod/perlstyle.pod">perlstyle</a> man pages.</p>
<p>If you prefer the original "cuddled-else" style, don't worry, you can indicate that with a <b>-ce</b> flag. So if you rerun with that flag</p>
<p>The short forms are convenient for entering parameters by hand, whereas the long forms, though often ridiculously long, are self-documenting and therefore useful in configuration scripts. You may use either one or two dashes ahead of the parameters. Also, the '=' sign is optional, and may be a single space instead. However, the value of a parameter must NOT be adjacent to the flag, like this <b>-i3</b> (WRONG). Also, flags must be input separately, never bundled together.</p>
-<h2 id="Line-Length-and-Continuation-Indentation">Line Length and Continuation Indentation.</h2>
+<h2 id="Continuation-Indentation">Continuation Indentation.</h2>
<p>If you change the indentation spaces you will probably also need to change the continuation indentation spaces with the parameter <b>-ci=n</b>. The continuation indentation is the extra indentation -- 2 spaces by default -- given to that portion of a long line which has been placed below the start of a statement. For example:</p>
unless sysread( $impl->{file}, $element, $impl->{group} )
and truncate( $impl->{file}, $new_end );</code></pre>
-<p>There is no fixed rule for setting the value for <b>-ci=n</b>, but it should probably not exceed one-half of the number of spaces of a full indentation level.</p>
+<p>It works well to use a value <b>n</b> equal to one-half the number of spaces to a full indentation level. If it is set equal to the full indentation level, then formatting will be improved by also setting <b>--extended-continuation-indentation</b>, or <b>-xci</b>. The manual has some examples.</p>
+
+<h2 id="Line-Length">Line Length</h2>
<p>In the above snippet, the statement was broken into three lines. The actual number is governed by a parameter, the maximum line length, as well as by what perltidy considers to be good break points. The maximum line length is 80 characters by default. You can change this to be any number <b>n</b> with the <b>-l=n</b> flag. Perltidy tries to produce lines which do not exceed this length, and it does this by finding good break points. For example, the above snippet would look like this with <b>perltidy -l=40</b>:</p>
<p>What happens in this case is that the shell takes care of the redirected input files, '<somefile.pl', and so perltidy never sees the filename. Therefore, it knows to use the standard input and standard output channels.</p>
+<p>If you ever find that you enter <i>perltidy</i> and nothing seems to happen, it could be that you did not give it source to work on. So in that case it is waiting for input from the standard input, which is probably the keyboard.</p>
+
<p>If you are executing perltidy on a file and want to force the output to standard output, rather than create a <i>.tdy</i> file, you can indicate this with the flag <b>-st</b>, like this:</p>
<pre><code> perltidy somefile.pl -st >otherfile.pl</code></pre>
+<p>If you just enter</p>
+
<p>You can also control the name of the output file with the <b>-o</b> flag, like this:</p>
<pre><code> perltidy testfile.pl -o=testfile.new.pl</code></pre>
<h2 id="Style-Variations">Style Variations</h2>
-<p>Perltidy has to make some kind of default selection of formatting options, and its choice is to try to follow the suggestions in the perlstyle man pages. Many programmers more or less follow these suggestions with a few exceptions. In this section we will look at just a few of the most commonly used style parameters. Later, you may want to systematically develop a set of style parameters with the help of the perltidy <b>stylekey</b> web page at http://perltidy.sourceforge.net/stylekey.html</p>
+<p>Perltidy has to make some kind of default selection of formatting options, and its choice is to try to follow the suggestions in the perlstyle man pages. The default parameter settings will produce quite readable code, and should be sufficient for many purposes. Many programmers more or less follow these suggestions with a few exceptions. In this section we will look at just a few of the most commonly used style parameters. Later, you may want to systematically develop a set of style parameters with the help of the perltidy <b>stylekey</b> web page at http://perltidy.sourceforge.net/stylekey.html</p>
<dl>
print "I think that's the problem\n";
}</code></pre>
-</dd>
-<dt id="lp-Lining-up-with-parentheses"><b>-lp</b>, Lining up with parentheses</dt>
-<dd>
-
-<p>The <b>-lp</b> parameter can enhance the readability of lists by adding extra indentation. Consider:</p>
-
-<pre><code> %romanNumerals = (
- one => 'I',
- two => 'II',
- three => 'III',
- four => 'IV',
- five => 'V',
- six => 'VI',
- seven => 'VII',
- eight => 'VIII',
- nine => 'IX',
- ten => 'X'
- );</code></pre>
-
-<p>With the <b>-lp</b> flag, this is formatted as:</p>
-
-<pre><code> %romanNumerals = (
- one => 'I',
- two => 'II',
- three => 'III',
- four => 'IV',
- five => 'V',
- six => 'VI',
- seven => 'VII',
- eight => 'VIII',
- nine => 'IX',
- ten => 'X'
- );</code></pre>
-
-<p>which is preferred by some. (I've actually used <b>-lp</b> and <b>-cti=1</b> to format this block. The <b>-cti=1</b> flag causes the closing paren to align vertically with the opening paren, which works well with the <b>-lp</b> indentation style). An advantage of <b>-lp</b> indentation are that it displays lists nicely. A disadvantage is that deeply nested lists can require a long line length.</p>
-
</dd>
<dt id="bt--pt--sbt:-Container-tightness"><b>-bt</b>,<b>-pt</b>,<b>-sbt</b>: Container tightness</dt>
<dd>
<p>Here is what <i>testfile.pl.ERR</i> contains:</p>
<pre><code> 10: final indentation level: 1
-
+
Final nesting depth of '{'s is 1
The most recent un-matched '{' is on line 6
6: } elsif ($temperature < 68) {{
<p>This kind of error can otherwise be hard to find.</p>
-<h2 id="The-Log-File">The Log File</h2>
+<h2 id="Using-Perltidy-as-a-Filter-on-Selected-Text-from-an-Editor">Using Perltidy as a Filter on Selected Text from an Editor</h2>
-<p>One last topic that needs to be touched upon concerns the <i>.LOG</i> file. This is where perltidy records messages that are not normally of any interest, but which just might occasionally be useful. This file is not saved, though, unless perltidy detects that it has made a mistake or you ask for it to be saved.</p>
+<p>Most programmer's editors allow a selected group of lines to be passed through an external filter. Perltidy has been designed to work well as a filter, and it is well worthwhile learning the appropriate commands to do this with your editor. This means that you can enter a few keystrokes and watch a block of text get reformatted. If you are not doing this, you are missing out of a lot of fun! You may want to supply the <b>-q</b> flag to prevent error messages regarding incorrect syntax, since errors may be obvious in the indentation of the reformatted text. This is entirely optional, but if you do not use the <b>-q</b> flag, you will need to use the undo keys in case an error message appears on the screen.</p>
-<p>There are a couple of ways to ask perltidy to save a log file. To create a relatively sparse log file, use</p>
+<p>For example, within the <b>vim</b> editor it is only necessary to select the text by any of the text selection methods, and then issue the command !perltidy in command mode. Thus, an entire file can be formatted using</p>
-<pre><code> perltidy -log testfile.pl</code></pre>
+<pre><code> :%!perltidy -q</code></pre>
-<p>and for a verbose log file, use</p>
+<p>or, without the <b>-q</b> flag, just</p>
-<pre><code> perltidy -g testfile.pl</code></pre>
+<pre><code> :%!perltidy</code></pre>
-<p>The difference is that the first form only saves detailed information at least every 50th line, while the second form saves detailed information about every line.</p>
+<p>It isn't necessary to format an entire file, however. Perltidy will probably work well as long as you select blocks of text whose braces, parentheses, and square brackets are properly balanced. You can even format an <code>elsif</code> block without the leading <code>if</code> block, as long as the text you select has all braces balanced.</p>
-<p>So returning to our example, lets force perltidy to save a verbose log file by issuing the following command</p>
+<p>For the <b>emacs</b> editor, first mark a region and then pipe it through perltidy. For example, to format an entire file, select it with <code>C-x h</code> and then pipe it with <code>M-1 M-|</code> and then <code>perltidy</code>. The numeric argument, <code>M-1</code> causes the output from perltidy to replace the marked text. See "GNU Emacs Manual" for more information, http://www.gnu.org/manual/emacs-20.3/html_node/emacs_toc.html</p>
-<pre><code> perltidy -g testfile.pl</code></pre>
+<p>If you have difficulty with an editor, try the <b>-st</b> flag, which will force perltidy to send output to standard output. This might be needed, for example, if the editor passes text to perltidy as temporary filename instead of through the standard input. If this works, you might put the <b>-st</b> flag in your <i>.perltidyrc</i> file.</p>
-<p>You will find that a file named <i>testfile.pl.LOG</i> has been created in your directory.</p>
+<p>After you get your editor and perltidy successfully talking to each other, try formatting a snippet of code with a brace error to see what happens. (Do not use the quiet flag, <b>-q</b>, for this test). Perltidy will send one line starting with <code>##</code> to standard error output. Your editor may either display it at the top of the reformatted text or at the bottom (or even midstream!). You probably cannot control this, and perltidy can't, but you need to know where to look when an actual error is detected.</p>
-<p>If you open this file, you will see that it is a text file with a combination of warning messages and informative messages. All you need to know for now is that it exists; someday it may be useful.</p>
+<h2 id="Adding-Blank-Lines-to-Control-List-Formatting">Adding Blank Lines to Control List Formatting</h2>
-<h2 id="Using-Perltidy-as-a-Filter-on-Selected-Text-from-an-Editor">Using Perltidy as a Filter on Selected Text from an Editor</h2>
+<p>In the following example, the return list of five values has been automatically formatted over two lines:</p>
-<p>Most programmer's editors allow a selected group of lines to be passed through an external filter. Perltidy has been designed to work well as a filter, and it is well worthwhile learning the appropriate commands to do this with your editor. This means that you can enter a few keystrokes and watch a block of text get reformatted. If you are not doing this, you are missing out of a lot of fun! You may want to supply the <b>-q</b> flag to prevent error messages regarding incorrect syntax, since errors may be obvious in the indentation of the reformatted text. This is entirely optional, but if you do not use the <b>-q</b> flag, you will need to use the undo keys in case an error message appears on the screen.</p>
+<pre><code> my (
+ $rinput_string, $is_encoded_data, $decoded_input_as,
+ $encoding_log_message, $length_function,
+ ) = $self->get_decoded_string_buffer($input_file);</code></pre>
-<p>For example, within the <b>vim</b> editor it is only necessary to select the text by any of the text selection methods, and then issue the command !perltidy in command mode. Thus, an entire file can be formatted using</p>
+<p>As a general rule, if there are no blank lines or comments between the opening and closing parens, as in this example, perltidy will use an automated rule to set line breaks. Otherwise, it will keep the line breaks unchanged. So by inserting a blank line somewhere within a list we can 'freeze' the line breaks to be the way we would like.</p>
-<pre><code> :%!perltidy -q</code></pre>
+<p>For example, here is the above list with a line break at every comma, and with a blank line to prevent perltidy from changing it:</p>
-<p>or, without the <b>-q</b> flag, just</p>
+<pre><code> my (
+ $rinput_string,
+ $is_encoded_data,
+ $decoded_input_as,
+ $encoding_log_message,
+ $length_function,
-<pre><code> :%!perltidy</code></pre>
+ ) = $self->get_decoded_string_buffer($input_file);</code></pre>
-<p>It isn't necessary to format an entire file, however. Perltidy will probably work well as long as you select blocks of text whose braces, parentheses, and square brackets are properly balanced. You can even format an <code>elsif</code> block without the leading <code>if</code> block, as long as the text you select has all braces balanced.</p>
+<p>An easy way to switch to a single column list such as this is to select the list from within an editor and reformat with with --maximum-fields-per-table=1, or -mft=1. This will format the list in a single column. Then insert a blank line to keep this format.</p>
-<p>For the <b>emacs</b> editor, first mark a region and then pipe it through perltidy. For example, to format an entire file, select it with <code>C-x h</code> and then pipe it with <code>M-1 M-|</code> and then <code>perltidy</code>. The numeric argument, <code>M-1</code> causes the output from perltidy to replace the marked text. See "GNU Emacs Manual" for more information, http://www.gnu.org/manual/emacs-20.3/html_node/emacs_toc.html</p>
+<h2 id="Adding-Blank-Lines-to-Control-Vertical-Alignment">Adding Blank Lines to Control Vertical Alignment</h2>
-<p>If you have difficulty with an editor, try the <b>-st</b> flag, which will force perltidy to send output to standard output. This might be needed, for example, if the editor passes text to perltidy as temporary filename instead of through the standard input. If this works, you might put the <b>-st</b> flag in your <i>.perltidyrc</i> file.</p>
+<p>Vertical alignment refers to the insertion of blank spaces to align tokens which successive lines have in common, such as the <b>=</b> here:</p>
-<p>If you have some tips for making perltidy work with your editor, and are willing to share them, please email me (see below) and I'll try to incorporate them in this document or put up a link to them.</p>
+<pre><code> my $self = shift;
+ my $debug_file = $self->{_debug_file};
+ my $is_encoded_data = $self->{_is_encoded_data};</code></pre>
-<p>After you get your editor and perltidy successfully talking to each other, try formatting a snippet of code with a brace error to see what happens. (Do not use the quiet flag, <b>-q</b>, for this test). Perltidy will send one line starting with <code>##</code> to standard error output. Your editor may either display it at the top of the reformatted text or at the bottom (or even midstream!). You probably cannot control this, and perltidy can't, but you need to know where to look when an actual error is detected.</p>
+<p>Vertical alignment is automatic unless it has been deactivated by one of its controls, but it always stops and tries to restart at blank lines. So a blank line can be inserted to stop an unwanted alignment. So, for example, we can can insert a blank line to break the alignment in the above example like this:</p>
+
+<pre><code> my $self = shift;
+
+ my $debug_file = $self->{_debug_file};
+ my $is_encoded_data = $self->{_is_encoded_data};</code></pre>
+
+<h2 id="Format-Skipping">Format Skipping</h2>
+
+<p>To have perltidy leave existing formatting unchanged, surround the lines to be skipped with special comments like this:</p>
+
+<pre><code> #<<<
+ my @list = (1,
+ 1, 1,
+ 1, 2, 1,
+ 1, 3, 3, 1,
+ 1, 4, 6, 4, 1,);
+ #>>></code></pre>
+
+<p>A related comment control is <b>--code-skipping</b>, indicated with '#<<V>. and '#>>V>', which simply passes lines of code to the output without tokenization. This is useful for some extended syntaxes. Another is <b>--non-indenting-braces</b>, indicated by placing a side comment '#<<<' following a block brace, which prevents code within the marked braces from getting an extra level indentation. This is useful if we want to put braces around code and want to minimize the changes in formatting.</p>
+
+<h2 id="Finding-Unused-Variables">Finding Unused Variables</h2>
+
+<p>Perltidy has several parameters which can assist in locating problems in code. One of these is <b>-dump-unusual-variables</b>, or <b>-duv</b>. It will scan a file and produce a list of unused, reused, and certain other lexical variables of interest. To use it to analyze a file named <i>testfile.pl</i> enter:</p>
+
+<pre><code> perltidy -duv testfile.pl >tmp.txt</code></pre>
+
+<p>The information will be dumped to the standard output, <i>tmp.txt</i> in this example, and perltidy will exit without formatting the file. The lines of output identify four types of issues, namely</p>
+
+<pre><code> u: unused variables
+ r: reused variable name in same scope
+ s: sigil change but reused bareword, such as %file and $file
+ p: package-crossing variables: a variable with scope in multiple packages</code></pre>
+
+<p>These issues, although not errors, can be worth reviewing, especially for new code. Other parameters which can be useful when reviewing code are <b>--warn-missing-else</b> and <b>--dump-block-summary</b>. The manual has further information.</p>
<h2 id="Writing-an-HTML-File">Writing an HTML File</h2>
<p>which will produce a file <i>testfile.pl.html</i>. There are many parameters available for adjusting the appearance of an HTML file, but a very easy way is to just write the HTML file with this simple command and then edit the stylesheet which is embedded at its top.</p>
-<p>One important thing to know about the <b>-html</b> flag is that perltidy can either send its output to its beautifier or to its HTML writer, but (unfortunately) not both in a single run. So the situation can be represented like this:</p>
+<p>One important thing to know about the <b>-html</b> flag is that perltidy can either send its output to its beautifier or to its HTML writer, but not both in a single run. So the situation can be represented like this:</p>
<pre><code> ------------
| | --->beautifier--> testfile.pl.tdy
| | --->HTML -------> testfile.pl.html
------------</code></pre>
-<p>And in the future, there may be more output filters. So if you would like to both beautify a script and write it to HTML, you need to do it in two steps.</p>
+<p>So if you would like to both beautify a script and write it to HTML, you need to do it in two steps.</p>
<h2 id="Summary">Summary</h2>
<p>That's enough to get started using perltidy. When you are ready to create a <i>.perltidyrc</i> file, you may find it helpful to use the <i>stylekey</i> page as a guide at http://perltidy.sourceforge.net/stylekey.html</p>
-<p>Many additional special features and capabilities can be found in the manual pages for perltidy at http://perltidy.sourceforge.net/perltidy.html</p>
+<p>We hope that perltidy makes perl programming a little more fun.</p>
-<p>We hope that perltidy makes perl programming a little more fun. Please check the perltidy web site http://perltidy.sourceforge.net occasionally for updates.</p>
+<p>Further documentation can be found at the web site <a href="https://perltidy.github.io/perltidy/">at GitHub</a> or <a href="http://perltidy.sourceforge.net">at Sourceforge</a> or <a href="https://metacpan.org/pod/distribution/Perl-Tidy/bin/perltidy">at metacpan</a></p>
-<p>The author may be contacted at perltidy at users.sourceforge.net.</p>
+<p>Issues can be reported at <a href="https://github.com/perltidy/perltidy">GitHub</a></p>
</body>
# This program was posted on the MacPerl mailing list by
# Charles Albrecht as one way to get perltidy to work as a filter
# under BBEdit.
+# 20240102: slh fixed obvious error found with -duv: ('my' inside BEGIN block)
use Perl::Tidy;
-BEGIN { my $input_string = ""; my $output_string = ""; }
+my ($input_string, $output_string);
+BEGIN { $input_string = ""; $output_string = ""; }
$input_string .= $_;
$| = 1;
use vars qw($opt_l $opt_h);
-my $usage = <<EOM;
+main();
+
+sub main {
+
+ my $usage = <<EOM;
usage: break_long_quotes.pl [ -ln ] filename >outfile
where n=line length (default 72)
EOM
-getopts('hl:') or die "$usage";
-if ($opt_h) { die $usage }
-if ( !defined $opt_l ) {
- $opt_l = 70;
-}
-else {
- $opt_l =~ /^\d+$/ or die "$usage";
-}
+ getopts('hl:') or die "$usage";
+ if ($opt_h) { die $usage }
+ if ( !defined $opt_l ) {
+ $opt_l = 70;
+ }
+ else {
+ $opt_l =~ /^\d+$/ or die "$usage";
+ }
-unless ( @ARGV == 1 ) { die $usage }
-my $file = $ARGV[0];
-scan_file( $file, $opt_l );
+ unless ( @ARGV == 1 ) { die $usage }
+ my $file = $ARGV[0];
+ scan_file( $file, $opt_l );
+}
sub scan_file {
my ( $file, $line_length ) = @_;
unless ($fh) { die "cannot open '$file': $!\n" }
my $formatter = MyWriter->new($line_length);
- my $err=perltidy(
+ my $err = perltidy(
'formatter' => $formatter, # callback object
'source' => $fh,
'argv' => "-npro -se", # don't need .perltidyrc
# errors to STDOUT
);
- if ($err){
+ if ($err) {
die "Error calling perltidy\n";
}
$fh->close();
# find leading whitespace
my $leading_whitespace = ( $input_line =~ /^(\s*)/ ) ? $1 : "";
- if ($starting_in_quote) {$leading_whitespace=""};
+ if ($starting_in_quote) { $leading_whitespace = "" }
my $new_line = $leading_whitespace;
# loop over tokens looking for quotes (token type Q)
# look for long quoted strings on a single line
# (multiple line quotes not currently handled)
if ( $$rtoken_type[$j] eq 'Q'
- && !( $j == 0 && $starting_in_quote )
+ && !( $j == 0 && $starting_in_quote )
&& !( $j == $jmax && $ending_in_quote )
&& ( length($token) > $max_quote_length ) )
{
} ## end if ( $line_type eq 'CODE')
# print the line
- $self->print($input_line."\n");
+ $self->print( $input_line . "\n" );
return;
} ## end sub write_line
# break a string at one or more spaces so that the longest substring is
# less than the desired length (if possible).
my ( $str, $quote_char, $max_length ) = @_;
- my $blank = ' ';
- my $prev_char = "";
+ my $blank = ' ';
my @break_after_pos;
my $quote_pos = -1;
while ( ( $quote_pos = index( $str, $blank, 1 + $quote_pos ) ) >= 0 ) {
# called once after the last line of a file
sub finish_formatting {
my $self = shift;
- $self->flush_comments();
}
# Example script for removing trailing blank lines of code from a perl script
# This is from the examples/ directory of the perltidy distribution and may
-# be modified as needed.
+# be modified as needed.
# This was written in response to RT #118553, "leave only one newline at end of file".
# Adding the requested feature to perltidy itself would have very undesirable
use IO::File;
$| = 1;
use vars qw($opt_h);
-my $usage = <<EOM;
+main();
+
+sub main {
+ my $usage = <<EOM;
usage: $0 filename >outfile
EOM
-getopts('h') or die "$usage";
-if ($opt_h) { die $usage }
-
-# Make the source for perltidy, which will be a filehandle
-# or just '-' if the source is stdin
-my ($file, $fh, $source);
-if ( @ARGV == 0 ) {
- $source = '-';
-}
-elsif ( @ARGV == 1 ) {
- $file = $ARGV[0];
- $fh = IO::File->new( $file, 'r' );
- unless ($fh) { die "cannot open '$file': $!\n" }
- $source = $fh;
-}
-else { die $usage }
-
-# make the callback object
-my $formatter = MyFormatter->new();
-
-my $dest;
-
-# start perltidy, which will start calling our write_line()
-my $err=perltidy(
- 'formatter' => $formatter, # callback object
- 'source' => $source,
- 'destination' => \$dest, # (not really needed)
- 'argv' => "-npro -se", # dont need .perltidyrc
- # errors to STDOUT
-);
-if ($err) {
- die "Error calling perltidy\n";
+ getopts('h') or die "$usage";
+ if ($opt_h) { die $usage }
+
+ # Make the source for perltidy, which will be a filehandle
+ # or just '-' if the source is stdin
+ my ( $file, $fh, $source );
+ if ( @ARGV == 0 ) {
+ $source = '-';
+ }
+ elsif ( @ARGV == 1 ) {
+ $file = $ARGV[0];
+ $fh = IO::File->new( $file, 'r' );
+ unless ($fh) { die "cannot open '$file': $!\n" }
+ $source = $fh;
+ }
+ else { die $usage }
+
+ # make the callback object
+ my $formatter = MyFormatter->new();
+
+ my $dest;
+
+ # start perltidy, which will start calling our write_line()
+ my $err = perltidy(
+ 'formatter' => $formatter, # callback object
+ 'source' => $source,
+ 'destination' => \$dest, # (not really needed)
+ 'argv' => "-npro -se", # dont need .perltidyrc
+ # errors to STDOUT
+ );
+ if ($err) {
+ die "Error calling perltidy\n";
+ }
+ $fh->close() if $fh;
+ return;
}
-$fh->close() if $fh;
package MyFormatter;
sub write_line {
# This is called from perltidy line-by-line; we just save lines
- my $self = shift;
- my $line_of_tokens = shift;
+ my $self = shift;
+ my $line_of_tokens = shift;
push @lines, $line_of_tokens;
}
my $self = shift;
# remove all trailing blank lines of code
- while (my $line_of_tokens = pop(@lines)) {
- my $line_type = $line_of_tokens->{_line_type};
- my $input_line = $line_of_tokens->{_line_text};
+ while ( my $line_of_tokens = pop(@lines) ) {
+ my $line_type = $line_of_tokens->{_line_type};
+ my $input_line = $line_of_tokens->{_line_text};
if ( $line_type eq 'CODE' ) {
chomp $input_line;
next unless ($input_line);
}
- push @lines, $line_of_tokens;
- last;
+ push @lines, $line_of_tokens;
+ last;
}
# write remaining lines
foreach my $line_of_tokens (@lines) {
- my $line_type = $line_of_tokens->{_line_type};
- my $input_line = $line_of_tokens->{_line_text};
- print $input_line;
+ my $line_type = $line_of_tokens->{_line_type};
+ my $input_line = $line_of_tokens->{_line_text};
+ print $input_line;
}
return;
}
--- /dev/null
+#!/usr/bin/perl
+use strict;
+use warnings;
+use File::Temp qw{ tempfile };
+
+# Run perltidy --dump-unique-keys on multiple files, and
+# show hash keys which just appear in one file.
+# Requires Perl::Tidy version 20240903.09 or higher
+
+# The latest version of this file should be at:
+# https://github.com/perltidy/perltidy/blob/master/examples/dump_unique_keys.pl
+
+my ( $fh_tmp, $tmpfile );
+END {
+ if ( defined($tmpfile) && -e $tmpfile ) {
+ unlink($tmpfile) or warn "Could not unlink $tmpfile: $!";
+ }
+}
+
+main();
+
+sub main {
+
+ my $usage = <<EOM;
+Run perltidy --dump-unique-keys on multiple files
+Usage: $0 file1 file2 ...
+ if no files are given, look for MANIFEST and use files lib/.../*.pm
+EOM
+
+ my @files = @ARGV;
+
+ if ( !@files ) {
+ my $MANIFEST = "MANIFEST";
+ if ( -e $MANIFEST && -f $MANIFEST ) {
+ my $rfiles = read_MANIFEST($MANIFEST);
+ @files = @{$rfiles};
+ my $num=@files;
+ print STDERR "Reading $MANIFEST...found $num files\n";
+ }
+ }
+
+ if ( !@files ) { die $usage }
+
+ foreach my $file (@files) {
+ if ( !-e $file ) { die "file '$file' not found\n" }
+ }
+
+ ( $fh_tmp, $tmpfile ) = tempfile();
+ if ( !$fh_tmp ) {
+ die "unable to open temporary file $tmpfile\n";
+ }
+
+ # Loop to run perltidy -duk on each file:
+ # - capture standard output to a file for further processing
+ # - any error messages go to the standard error output
+ my %seen;
+ my $saw_error;
+ foreach my $file (@files) {
+ next if ( $seen{$file}++ );
+ next if (!-e $file || -z $file );
+ my $cmd = "perltidy -npro -duk $file >>$tmpfile -se";
+ my $err = system($cmd);
+ if ($err) { $saw_error++; warn "perltidy returned error for '$file'\n" }
+ }
+
+ my $fh;
+ if ( !open( $fh, '<', $tmpfile ) ) {
+ die "cannot open my temp file '$tmpfile': $!\n";
+ }
+
+ # read the captured output and find duplicate words
+ my %word_count;
+ my @lines;
+ foreach my $line (<$fh>) {
+ my $word;
+ if ( $line =~ /^(.*),(\d+)\s*$/ ) {
+ $word = $1;
+ if ( !defined( $word_count{$word} ) ) {
+ $word_count{$word} = 1;
+ }
+ else {
+ $word_count{$word}++;
+ }
+ }
+ push @lines, [ $line, $word ];
+ }
+ $fh->close();
+
+ # remove duplicate words
+ my @dups = grep { $word_count{$_} > 1 } keys %word_count;
+ my %is_dup;
+ @is_dup{@dups} = (1) x scalar(@dups);
+
+ my $last_word = "START";
+ my @new_lines;
+ foreach my $item (@lines) {
+ my ( $line, $word ) = @{$item};
+ if ( defined($word) ) {
+
+ # line with word: skip duplicate words
+ next if ( $is_dup{$word} );
+ }
+ else {
+
+ # line with filename: remove previous line if it also was a filename
+ if ( !defined($last_word) ) { pop @new_lines }
+ }
+ $last_word = $word;
+ push @new_lines, $line;
+ }
+
+ my $output_string .= join "", @new_lines;
+ print {*STDOUT} $output_string;
+
+} ## end sub main
+
+sub read_MANIFEST {
+ my ($MANIFEST) = @_;
+
+ # scan MANIFEST for existing files of the form 'lib/.../*.pm'
+ my $fh;
+ if ( !open( $fh, '<', $MANIFEST ) ) {
+ die "cannot open '$MANIFEST': $!\n";
+ }
+ my @files;
+ foreach my $line (<$fh>) {
+ chomp $line;
+ next unless $line;
+ my @parts = split '/', $line;
+ if ( $parts[0] ne 'lib' ) { next }
+ if ( $parts[-1] !~ /\.pm$/i ) { next }
+ if ( -e $line ) { push @files, $line }
+ }
+ return \@files;
+} ## end sub read_MANIFEST
#!/usr/bin/perl -w
use Perl::Tidy;
-# Illustrate use of prefilter and postfilter parameters to perltidy.
+# Illustrate use of prefilter and postfilter parameters to perltidy.
# This example program uses a prefilter it to convert the 'method'
# keyword to 'sub', and a postfilter to convert back, so that perltidy will
-# work for Method::Signature::Simple code.
-# NOTE: This program illustrates the use of filters but has not been
-# extensively tested.
+# work for Method::Signature::Simple code.
+
+# NOTE:
+# In the current version of perltidy this can be accomplished simply
+# by using --sub-alias-list=method. However, this remains a good example
+# of the use of prefilters and postfilters.
# usage:
# perl filter_example.pl filter_example.in
# 2. Then perltidy formats the code
# 3. Then the postfilter changes 'sub METHOD_' to 'method ' everywhere.
# (This assumes that there are no methods named METHOD_*, and that the keyword
-# method always begins a line in the input file).
+# method always begins a line in the input file).
#
-# Debugging hints:
-# 1. Try commenting out the postfilter and running with
+# Debugging hints:
+# 1. Try commenting out the postfilter and running with
# the --notidy option to see what the prefilter alone is doing.
# 2. Then run with both pre- and post ters with --notidy to be sure
# that the postfilter properly undoes the prefilter.
}
# attributes
- method foo : lvalue { $self->{foo}
+ method foo : lvalue { $self->{foo}
}
# change invocant name
- method
+ method
foo ($class: $bar) { $class->bar($bar) }
#
# usage:
# find_naughty file1 [file2 [...]]
-# find_naughty <file.pl
+# find_naughty <file.pl
#
# Author: Steve Hancock, July 2003
#
use IO::File;
$| = 1;
use vars qw($opt_h);
-my $usage = <<EOM;
+main();
+
+sub main {
+ my $usage = <<EOM;
usage:
find_naughty file1 [file2 [...]]
find_naughty <file.pl
EOM
-getopts('h') or die "$usage";
-if ($opt_h) { die $usage }
+ getopts('h') or die "$usage";
+ if ($opt_h) { die $usage }
-unless (@ARGV) { unshift @ARGV, '-' } # stdin
-foreach my $source (@ARGV) {
- PerlTokenSearch::find_naughty(
- _source => $source,
- );
+ unless (@ARGV) { unshift @ARGV, '-' } # stdin
+ foreach my $source (@ARGV) {
+ PerlTokenSearch::find_naughty( _source => $source, );
+ }
+ return;
}
#####################################################################
# source filehandle and looks for selected variables.
#
# It works by making a callback object with a write_line() method to
-# receive tokenized lines from perltidy.
+# receive tokenized lines from perltidy.
#
# Usage:
#
sub find_naughty {
- my %args = ( @_ );
+ my %args = (@_);
print "Testing File: $args{_source}\n";
# run perltidy, which will call $formatter's write_line() for each line
- my $err=perltidy(
+ my $err = perltidy(
'source' => $args{_source},
'formatter' => bless( \%args, __PACKAGE__ ), # callback object
- 'argv' => "-npro -se", # -npro : ignore .perltidyrc,
- # -se : errors to STDOUT
+ 'argv' => "-npro -se", # -npro : ignore .perltidyrc,
+ # -se : errors to STDOUT
);
if ($err) {
die "Error calling perltidy\n";
# This is called back from perltidy line-by-line
# We're looking for $`, $&, and $'
my ( $self, $line_of_tokens ) = @_;
- my $source = $self->{_source};
+ my $source = $self->{_source};
# pull out some stuff we might need
my $line_type = $line_of_tokens->{_line_type};
# and check it
if ( $token =~ /^\$[\`\&\']$/ ) {
- print STDERR
- "$source:$input_line_number: $token\n";
+ print STDERR "$source:$input_line_number: $token\n";
}
}
}
#!/usr/bin/perl -w
#
-# Walk through a perl script and reformat perl comments
+# Walk through a perl script and reformat perl comments
# using Text::Autoformat.
#
# usage:
#
# This file demonstrates using Perl::Tidy to walk through a perl file
# and find all of its comments. It offers to reformat each group of
-# consecutive full-line comments with Text::Autoformat.
+# consecutive full-line comments with Text::Autoformat.
#
# This may or may not be useful, depending on your coding style.
# Change it to suit your own purposes; see sub get_line().
#
-# Uses: Text::Autoformat
+# Uses: Text::Autoformat
# Perl::Tidy
#
# Steve Hancock, March 2003
# Based on a suggestion by Tim Maher
+# Also discussed in git#127
#
+# Note that the indentation which is assumed for blocks of comments is whatever
+# the indentation is in the input file. A potential problem is that if the
+# input file has very long comments which have been out-dented, this may not
+# give the desired result. This problem can be avoided by doing a preliminary
+# formatting pass with perltidy which includes the parameter
+# --nooutdent-long-comments (-nolc).
+
# TODO: (just ideas that probably won't get done)
# -Handle lines of stars, dashes, etc better
# -Need flag to limit changes to lines greater than some minimum length
# -reformat side and hanging side comments
+
use strict;
use Getopt::Std;
use Text::Autoformat;
$| = 1;
use vars qw($opt_l $opt_h);
-my $usage = <<EOM;
+main();
+
+sub main {
+ my $usage = <<EOM;
usage: perlcomment [ -ln ] filename >outfile
where n=line length (default 72)
EOM
-getopts('hl:') or die "$usage";
-if ($opt_h) {die $usage}
-if ( !defined $opt_l ) {
- $opt_l = 72;
-}
-else {
- $opt_l =~ /^\d+$/ or die "$usage";
-}
+ getopts('hl:') or die "$usage";
+ if ($opt_h) { die $usage }
+ if ( !defined $opt_l ) {
+ $opt_l = 72;
+ }
+ else {
+ $opt_l =~ /^\d+$/ or die "$usage";
+ }
-unless ( @ARGV == 1 ) { die $usage }
-my $file = $ARGV[0];
-autoformat_file( $file, $opt_l );
+ unless ( @ARGV == 1 ) { die $usage }
+ my $file = $ARGV[0];
+ autoformat_file( $file, $opt_l );
+ return;
+}
sub autoformat_file {
my ( $file, $line_length ) = @_;
unless ($fh) { die "cannot open '$file': $!\n" }
my $formatter = CommentFormatter->new($line_length);
- my $err=perltidy(
- 'formatter' => $formatter, # callback object
+ my $err = perltidy(
+ 'formatter' => $formatter, # callback object
'source' => $fh,
- 'argv' => "-npro -se", # dont need .perltidyrc
- # errors to STDOUT
+ 'argv' => "-npro -se", # dont need .perltidyrc
+ # errors to STDOUT
);
if ($err) {
die "Error calling perltidy\n";
# Other lines go to stdout immediately
my $self = shift;
my $line_of_tokens = shift;
- my $line_type = $line_of_tokens->{_line_type};
- ## my $input_line_number = $line_of_tokens->{_line_number};
+ my $line_type = $line_of_tokens->{_line_type};
+ ## my $input_line_number = $line_of_tokens->{_line_number};
my $input_line = $line_of_tokens->{_line_text}; # the original line
my $rtoken_type = $line_of_tokens->{_rtoken_type}; # type of tokens
my $rtokens = $line_of_tokens->{_rtokens}; # text of tokens
# Just print non-code, non-comment lines
if (
- $line_type ne 'CODE' # if it's not code,
- || !@$rtokens # or is a blank line
+ $line_type ne 'CODE' # if it's not code,
+ || !@$rtokens # or is a blank line
|| $$rtoken_type[-1] ne '#' # or the last token isn't a comment
)
{
# - a full line comment (@$rtokens==1)
# Output a line with a side comment, but remember it
- if (@$rtokens > 1) {
+ if ( @$rtokens > 1 ) {
$self->print($input_line);
$self->{_in_hanging_side_comment} = 1;
return;
# A hanging side comment is a full-line comment immediately
# following a side comment or another hanging side comment.
# Output a hanging side comment directly
- if ($self->{_in_hanging_side_comment}) {
+ if ( $self->{_in_hanging_side_comment} ) {
$self->print($input_line);
return;
}
sub append_comment {
my ( $self, $input_line ) = @_;
- my $rcomment_block = $self->{_rcomment_block};
+ my $rcomment_block = $self->{_rcomment_block};
my $maximum_comment_length = $self->{_maximum_comment_length};
$$rcomment_block .= $input_line;
- if (length($input_line) > $maximum_comment_length) {
- $self->{_maximum_comment_length}=length($input_line);
+ if ( length($input_line) > $maximum_comment_length ) {
+ $self->{_maximum_comment_length} = length($input_line);
}
}
sub flush_comments {
- my ($self) = @_;
- my $rcomment_block = $self->{_rcomment_block};
- my $line_length = $self->{_line_length};
+ my ($self) = @_;
+ my $rcomment_block = $self->{_rcomment_block};
+ my $line_length = $self->{_line_length};
my $maximum_comment_length = $self->{_maximum_comment_length};
if ($$rcomment_block) {
- my $comments = $$rcomment_block;
+ my $comments = $$rcomment_block;
# we will just reformat lines longer than the desired length for now
# TODO: this can be changed
}
print $comments;
$$rcomment_block = "";
- $self->{_maximum_comment_length}=0;
+ $self->{_maximum_comment_length} = 0;
}
}
}
sub ifyes {
my $count = 0;
ASK:
- my $ans = queryu(@_);
+ my $ans = queryu(@_);
if ( $ans =~ /^Y/ ) { return 1 }
elsif ( $ans =~ /^N/ ) { return 0 }
else {
use IO::File;
$| = 1;
use vars qw($opt_h);
-my $usage = <<EOM;
+main();
+
+sub main {
+ my $usage = <<EOM;
usage: perllinetype filename >outfile
EOM
-getopts('h') or die "$usage";
-if ($opt_h) { die $usage }
+ getopts('h') or die "$usage";
+ if ($opt_h) { die $usage }
-# Make the source for perltidy, which will be a filehandle
-# or just '-' if the source is stdin
-my ($file, $fh, $source);
-if ( @ARGV == 0 ) {
- $source = '-';
-}
-elsif ( @ARGV == 1 ) {
- $file = $ARGV[0];
- $fh = IO::File->new( $file, 'r' );
- unless ($fh) { die "cannot open '$file': $!\n" }
- $source = $fh;
-}
-else { die $usage }
+ # Make the source for perltidy, which will be a filehandle
+ # or just '-' if the source is stdin
+ my ( $file, $fh, $source );
+ if ( @ARGV == 0 ) {
+ $source = '-';
+ }
+ elsif ( @ARGV == 1 ) {
+ $file = $ARGV[0];
+ $fh = IO::File->new( $file, 'r' );
+ unless ($fh) { die "cannot open '$file': $!\n" }
+ $source = $fh;
+ }
+ else { die $usage }
-# make the callback object
-my $formatter = MyFormatter->new();
+ # make the callback object
+ my $formatter = MyFormatter->new();
-my $dest;
+ my $dest;
-# start perltidy, which will start calling our write_line()
-my $err=perltidy(
- 'formatter' => $formatter, # callback object
- 'source' => $source,
- 'destination' => \$dest, # (not really needed)
- 'argv' => "-npro -se", # dont need .perltidyrc
- # errors to STDOUT
-);
-if ($err) {
- die "Error calling perltidy\n";
+ # start perltidy, which will start calling our write_line()
+ my $err = perltidy(
+ 'formatter' => $formatter, # callback object
+ 'source' => $source,
+ 'destination' => \$dest, # (not really needed)
+ 'argv' => "-npro -se", # dont need .perltidyrc
+ # errors to STDOUT
+ );
+ if ($err) {
+ die "Error calling perltidy\n";
+ }
+ $fh->close() if $fh;
}
-$fh->close() if $fh;
package MyFormatter;
use IO::File;
$| = 1;
use vars qw($opt_c $opt_h);
-my $usage = <<EOM;
+main();
+
+sub main {
+ my $usage = <<EOM;
usage: perlmask [ -cn ] filename >outfile
EOM
-getopts('c:h') or die "$usage";
-if ($opt_h) { die $usage }
-unless ( defined($opt_c) ) { $opt_c = 0 }
-if (@ARGV > 1) { die $usage }
+ getopts('c:h') or die "$usage";
+ if ($opt_h) { die $usage }
+ unless ( defined($opt_c) ) { $opt_c = 0 }
+ if ( @ARGV > 1 ) { die $usage }
-my $source=$ARGV[0]; # an undefined filename will become stdin
+ my $source = $ARGV[0]; # an undefined filename will become stdin
-# strings to hold the files (arrays could be used to)
-my ( $masked_file, $original_file );
+ # strings to hold the files (arrays could be used to)
+ my ( $masked_file, $original_file );
-PerlMask::perlmask(
- _source => $source,
- _rmasked_file => \$masked_file,
- _roriginal_file => \$original_file, # optional
- _compression => $opt_c # optional, default=0
-);
+ PerlMask::perlmask(
+ _source => $source,
+ _rmasked_file => \$masked_file,
+ _roriginal_file => \$original_file, # optional
+ _compression => $opt_c # optional, default=0
+ );
-# Now we have the masked and original files in strings of equal length.
-# We could search for specific text in the masked file here. But here
-# we'll just print the masked file:
-if ($masked_file) { print $masked_file; }
+ # Now we have the masked and original files in strings of equal length.
+ # We could search for specific text in the masked file here. But here
+ # we'll just print the masked file:
+ if ($masked_file) { print $masked_file; }
+ return;
+}
#####################################################################
#
sub perlmask {
- my %args = ( _compression => 0, @_ );
+ my %args = ( _compression => 0, @_ );
my $rfile = $args{_rmasked_file};
unless ( defined($rfile) ) {
croak
"Missing required parameter '_rmasked_file' in call to perlmask\n";
}
- my $ref=ref($rfile);
+ my $ref = ref($rfile);
unless ( $ref =~ /^(SCALAR|ARRAY)$/ ) {
- croak <<EOM;
+ croak <<EOM;
Expecting _rmasked_file = ref to SCALAR or ARRAY in perlmask but got : ($ref)
EOM
}
# run perltidy, which will call $formatter's write_line() for each line
- my $err=perltidy(
+ my $err = perltidy(
'source' => $args{_source},
'formatter' => bless( \%args, __PACKAGE__ ), # callback object
- 'argv' => "-npro -se", # -npro : ignore .perltidyrc,
- # -se : errors to STDOUT
+ 'argv' => "-npro -se", # -npro : ignore .perltidyrc,
+ # -se : errors to STDOUT
);
if ($err) {
die "Error calling perltidy\n";
my $len = length($input_line);
if ( $opt_c == 0 && $len > 0 ) {
print_line( $roriginal_file, $input_line ) if $roriginal_file;
- print_line( $rmasked_file, '#' x $len );
+ print_line( $rmasked_file, '#' x $len );
}
else {
print_line( $roriginal_file, $input_line ) if $roriginal_file;
- print_line( $rmasked_file, "" );
+ print_line( $rmasked_file, "" );
}
return;
}
if ( $opt_c <= 1 ) {
# Find leading whitespace. But be careful..we don't want the
- # whitespace if it is part of quoted text, because it will
+ # whitespace if it is part of quoted text, because it will
# already be contained in a token.
if ( $input_line =~ /^(\s+)/ && !$line_of_tokens->{_starting_in_quote} )
{
}
}
print_line( $roriginal_file, $input_line ) if $roriginal_file;
- print_line( $rmasked_file, $masked_line );
+ print_line( $rmasked_file, $masked_line );
# self-check lengths; this error should never happen
if ( $opt_c == 0 && length($masked_line) != length($input_line) ) {
+++ /dev/null
-#!/usr/bin/perl -w
-use strict;
-
-# This program reads .perltidyrc files and writes them back out
-# into a standard format (but comments will be lost).
-#
-# It also demonstrates how to use the perltidy 'options-dump' and related call
-# parameters to read a .perltidyrc file, convert to long names, put it in a
-# hash, and write back to standard output in sorted order. Requires
-# Perl::Tidy.
-#
-# Steve Hancock, June 2006
-#
-my $usage = <<EOM;
- usage:
- perltidyrc_dump.pl [-d -s -q -h] [ filename ]
- filename is the name of a .perltidyrc config file to dump, or
- if no filename is given, find and dump the system default .perltidyrc.
- -d delete options which are the same as Perl::Tidy defaults
- (default is to keep them)
- -s write short parameter names
- (default is long names with short name in side comment)
- -q quiet: no comments
- -h help
-EOM
-use Getopt::Std;
-my %my_opts;
-my $cmdline = $0 . " " . join " ", @ARGV;
-getopts( 'hdsq', \%my_opts ) or die "$usage";
-if ( $my_opts{h} ) { die "$usage" }
-if ( @ARGV > 1 ) { die "$usage" }
-
-my $config_file = $ARGV[0];
-my (
- $error_message, $rOpts, $rGetopt_flags,
- $rsections, $rabbreviations, $rOpts_default,
- $rabbreviations_default,
-
-) = read_perltidyrc($config_file);
-
-# always check the error message first
-if ($error_message) {
- die "$error_message\n";
-}
-
-# make a list of perltidyrc options which are same as default
-my %equals_default;
-foreach my $long_name ( keys %{$rOpts} ) {
- my $val = $rOpts->{$long_name};
- if ( defined( $rOpts_default->{$long_name} ) ) {
- my $val2 = $rOpts_default->{$long_name};
- if ( defined($val2) && defined($val) ) {
- $equals_default{$long_name} = ( $val2 eq $val );
- }
- }
-}
-
-# Optional: minimize the perltidyrc file length by deleting long_names
-# in $rOpts which are also in $rOpts_default and have the same value.
-# This would be useful if a perltidyrc file has been constructed from a
-# full parameter dump, for example.
-if ( $my_opts{d} ) {
- foreach my $long_name ( keys %{$rOpts} ) {
- delete $rOpts->{$long_name} if $equals_default{$long_name};
- }
-}
-
-# find user-defined abbreviations
-my %abbreviations_user;
-foreach my $key ( keys %$rabbreviations ) {
- unless ( $rabbreviations_default->{$key} ) {
- $abbreviations_user{$key} = $rabbreviations->{$key};
- }
-}
-
-# dump the options, if any
-if ( %$rOpts || %abbreviations_user ) {
- dump_options( $cmdline, \%my_opts, $rOpts, $rGetopt_flags, $rsections,
- $rabbreviations, \%equals_default, \%abbreviations_user );
-}
-else {
- if ($config_file) {
- print STDERR <<EOM;
-No configuration parameters seen in file: $config_file
-EOM
- }
- else {
- print STDERR <<EOM;
-No .perltidyrc file found, use perltidy -dpro to see locations checked.
-EOM
- }
-}
-
-sub dump_options {
-
- # write the options back out as a valid .perltidyrc file
- # This version writes long names by sections
- my ( $cmdline, $rmy_opts, $rOpts, $rGetopt_flags, $rsections,
- $rabbreviations, $requals_default, $rabbreviations_user )
- = @_;
-
- # $rOpts is a reference to the hash returned by Getopt::Long
- # $rGetopt_flags are the flags passed to Getopt::Long
- # $rsections is a hash giving manual section {long_name}
-
- # build a hash giving section->long_name->parameter_value
- # so that we can write parameters by section
- my %section_and_name;
- my $rsection_name_value = \%section_and_name;
- my %saw_section;
- foreach my $long_name ( keys %{$rOpts} ) {
- my $section = $rsections->{$long_name};
- $section = "UNKNOWN" unless ($section); # shouldn't happen
-
- # build a hash giving section->long_name->parameter_value
- $rsection_name_value->{$section}->{$long_name} = $rOpts->{$long_name};
-
- # remember what sections are in this hash
- $saw_section{$section}++;
- }
-
- # build a table for long_name->short_name abbreviations
- my %short_name;
- foreach my $abbrev ( keys %{$rabbreviations} ) {
- foreach my $abbrev ( sort keys %$rabbreviations ) {
- my @list = @{ $$rabbreviations{$abbrev} };
-
- # an abbreviation may expand into one or more other words,
- # but only those that expand to a single word (which must be
- # one of the long names) are the short names that we want
- # here.
- next unless @list == 1;
- my $long_name = $list[0];
- $short_name{$long_name} = $abbrev;
- }
- }
-
- unless ( $rmy_opts->{q} ) {
- my $date = localtime();
- print "# perltidy configuration file created $date\n";
- print "# using: $cmdline\n";
- }
-
- # loop to write section-by-section
- foreach my $section ( sort keys %saw_section ) {
- unless ( $rmy_opts->{q} ) {
- print "\n";
-
- # remove leading section number, which is there
- # for sorting, i.e.,
- # 1. Basic formatting options -> Basic formatting options
- my $trimmed_section = $section;
- $trimmed_section =~ s/^\d+\. //;
- print "# $trimmed_section\n";
- }
-
- # loop over all long names for this section
- my $rname_value = $rsection_name_value->{$section};
- foreach my $long_name ( sort keys %{$rname_value} ) {
-
- # pull out getopt flag and actual parameter value
- my $flag = $rGetopt_flags->{$long_name};
- my $value = $rname_value->{$long_name};
-
- # turn this it back into a parameter
- my $prefix = '--';
- my $short_prefix = '-';
- my $suffix = "";
- if ($flag) {
- if ( $flag =~ /^=/ ) {
- if ( $value !~ /^\d+$/ ) { $value = '"' . $value . '"' }
- $suffix = "=" . $value;
- }
- elsif ( $flag =~ /^!/ ) {
- $prefix .= "no" unless ($value);
- $short_prefix .= "n" unless ($value);
- }
- elsif ( $flag =~ /^:/ ) {
- if ( $value !~ /^\d+$/ ) { $value = '"' . $value . '"' }
- $suffix = "=" . $value;
- }
- else {
-
- # shouldn't happen
- print
-"# ERROR in dump_options: unrecognized flag $flag for $long_name\n";
- }
- }
-
- # print the long version of the parameter
- # with the short version as a side comment
- my $short_name = $short_name{$long_name};
- my $long_option = $prefix . $long_name . $suffix;
-
- # A few options do not have a short abbreviation. These include
- # 'recombine' and 'valign', which are mainly for debugging. As a
- # workaround, we will make it the same as the long option. This
- # will insure that the -s -q flags work.
- my $short_option = $long_option;
- if ($short_name) {
- $short_option = $short_prefix . $short_name . $suffix;
- }
-
- my $note = $requals_default->{$long_name} ? " [=default]" : "";
- if ( $rmy_opts->{s} ) {
- print $short_option. "\n";
- }
- else {
- my $side_comment = "";
- unless ( $rmy_opts->{q} ) {
- my $spaces = 40 - length($long_option);
- $spaces = 2 if ( $spaces < 2 );
- $side_comment =
- ' ' x $spaces . '# ' . $short_option . $note;
- }
- print $long_option . $side_comment . "\n";
- }
- }
- }
-
- if ( %{$rabbreviations_user} ) {
- unless ( $rmy_opts->{q} ) {
- print "\n";
- print "# Abbreviations\n";
- }
- foreach my $key ( keys %$rabbreviations_user ) {
- my @vals = @{ $rabbreviations_user->{$key} };
- print $key. ' {' . join( ' ', @vals ) . '}' . "\n";
- }
- }
-}
-
-sub read_perltidyrc {
-
- # Example routine to have Perl::Tidy read and validate perltidyrc
- # file, and return related flags and abbreviations.
- #
- # input parameter -
- # $config_file is the name of a .perltidyrc file we want to read
- # or a reference to a string or array containing the .perltidyrc file
- # if not defined, Perl::Tidy will try to find the user's .perltidyrc
- # output parameters -
- # $error_message will be blank unless an error occurs
- # $rOpts - reference to the hash of options in the .perlticyrc
- # NOTE:
- # Perl::Tidy will croak or die on certain severe errors
-
- my ($config_file) = @_;
- my $error_message = "";
- my %Opts; # any options found will be put here
-
- # the module must be installed for this to work
- eval "use Perl::Tidy";
- if ($@) {
- $error_message = "Perl::Tidy not installed\n";
- return ( $error_message, \%Opts );
- }
-
- # be sure this version supports this
- my $version = $Perl::Tidy::VERSION;
- if ( $version < 20060528 ) {
- $error_message = "perltidy version $version cannot read options\n";
- return ( $error_message, \%Opts );
- }
-
- my $stderr = ""; # try to capture error messages
- my $argv = ""; # do not let perltidy see our @ARGV
-
- # we are going to make two calls to perltidy...
- # first with an empty .perltidyrc to get the default parameters
- my $empty_file = ""; # this will be our .perltidyrc file
- my %Opts_default; # this will receive the default options hash
- my %abbreviations_default;
- my $err = Perl::Tidy::perltidy(
- perltidyrc => \$empty_file,
- dump_options => \%Opts_default,
- dump_options_type => 'full', # 'full' gives everything
- dump_abbreviations => \%abbreviations_default,
- stderr => \$stderr,
- argv => \$argv,
- );
- if ($err) {
- die "Error calling perltidy\n";
- }
-
- # now we call with a .perltidyrc file to get its parameters
- my %Getopt_flags;
- my %sections;
- my %abbreviations;
- Perl::Tidy::perltidy(
- perltidyrc => $config_file,
- dump_options => \%Opts,
- dump_options_type => 'perltidyrc', # default is 'perltidyrc'
- dump_getopt_flags => \%Getopt_flags,
- dump_options_category => \%sections,
- dump_abbreviations => \%abbreviations,
- stderr => \$stderr,
- argv => \$argv,
- );
-
- # try to capture any errors generated by perltidy call
- # but for severe errors it will typically croak
- $error_message .= $stderr;
-
- # debug: show how everything is stored by printing it out
- my $DEBUG = 0;
- if ($DEBUG) {
- print "---Getopt Parameters---\n";
- foreach my $key ( sort keys %Getopt_flags ) {
- print "$key$Getopt_flags{$key}\n";
- }
- print "---Manual Sections---\n";
- foreach my $key ( sort keys %sections ) {
- print "$key -> $sections{$key}\n";
- }
- print "---Abbreviations---\n";
- foreach my $key ( sort keys %abbreviations ) {
- my @names = @{ $abbreviations{$key} };
- print "$key -> {@names}\n";
- unless ( $abbreviations_default{$key} ) {
- print "NOTE: $key is user defined\n";
- }
- }
- }
-
- return ( $error_message, \%Opts, \%Getopt_flags, \%sections,
- \%abbreviations, \%Opts_default, \%abbreviations_default, );
-}
use IO::File;
use Getopt::Std;
use vars qw($opt_h);
-my $file;
-my $usage = <<EOM;
+
+main();
+
+sub main {
+ my $file;
+ my $usage = <<EOM;
usage: perlxmltok filename >outfile
EOM
-getopts('h') or die "$usage";
-if ($opt_h) {die $usage}
-if ( @ARGV == 1 ) {
- $file = $ARGV[0];
-}
-else { die $usage }
-my $source;
-my $fh;
-if ($file) {
- $fh = IO::File->new( $file, 'r' );
- unless ($fh) { die "cannot open '$file': $!\n" }
- $source = $fh;
-}
-else {
- $source = '-';
-}
-my $formatter = Perl::Tidy::XmlWriter->new($file);
-my $dest;
-
-# start perltidy, which will start calling our write_line()
-my $err = perltidy(
- 'formatter' => $formatter, # callback object
- 'source' => $source,
- 'destination' => \$dest, # not really needed
- 'argv' => "-npro -se", # dont need .perltidyrc
- # errors to STDOUT
-);
-if ($err) {
- die "Error calling perltidy\n";
+ getopts('h') or die "$usage";
+ if ($opt_h) { die $usage }
+ if ( @ARGV == 1 ) {
+ $file = $ARGV[0];
+ }
+ else { die $usage }
+ my $source;
+ my $fh;
+ if ($file) {
+ $fh = IO::File->new( $file, 'r' );
+ unless ($fh) { die "cannot open '$file': $!\n" }
+ $source = $fh;
+ }
+ else {
+ $source = '-';
+ }
+ my $formatter = Perl::Tidy::XmlWriter->new($file);
+ my $dest;
+
+ # start perltidy, which will start calling our write_line()
+ my $err = perltidy(
+ 'formatter' => $formatter, # callback object
+ 'source' => $source,
+ 'destination' => \$dest, # not really needed
+ 'argv' => "-npro -se", # dont need .perltidyrc
+ # errors to STDOUT
+ );
+ if ($err) {
+ die "Error calling perltidy\n";
+ }
+ $fh->close() if $fh;
+ return;
}
-$fh->close() if $fh;
#####################################################################
#
sub markup_tokens {
my $self = shift;
my ( $rtokens, $rtoken_type ) = @_;
- my ( @marked_tokens, $j, $string, $type, $token );
+ my ( @marked_tokens, $j, $type, $token );
for ( $j = 0 ; $j < @$rtoken_type ; $j++ ) {
$type = $$rtoken_type[$j];
#
# perltidy - a perl script indenter and formatter
#
-# Copyright (c) 2000-2022 by Steve Hancock
+# Copyright (c) 2000-2025 by Steve Hancock
# Distributed under the GPL license agreement; see file COPYING
#
# This program is free software; you can redistribute it and/or modify
#
# For brief instructions, try 'perltidy -h'.
# For more complete documentation, try 'man perltidy'
-# or visit http://perltidy.sourceforge.net
+# or visit the GitHub site https://perltidy.github.io/perltidy/
#
# This script is an example of the default style. It was formatted with:
#
package Perl::Tidy;
-# perlver reports minimum version needed is 5.8.0
+# perlver reports minimum version needed is 5.8.1
# 5.004 needed for IO::File
# 5.008 needed for wide characters
-use 5.008;
+# 5.008001 needed for utf8::is_utf8
+# 5.008001 needed for Scalar::Util::refaddr
+use 5.008001;
use warnings;
use strict;
use Exporter;
use English qw( -no_match_vars );
use Digest::MD5 qw(md5_hex);
use Perl::Tidy::Debugger;
-use Perl::Tidy::DevNull;
use Perl::Tidy::Diagnostics;
use Perl::Tidy::FileWriter;
use Perl::Tidy::Formatter;
use Perl::Tidy::IOScalar;
use Perl::Tidy::IOScalarArray;
use Perl::Tidy::IndentationItem;
-use Perl::Tidy::LineSink;
-use Perl::Tidy::LineSource;
use Perl::Tidy::Logger;
use Perl::Tidy::Tokenizer;
use Perl::Tidy::VerticalAligner;
# DEVEL_MODE can be turned on for extra checking during development
use constant DEVEL_MODE => 0;
+use constant DIAGNOSTICS => 0;
use constant EMPTY_STRING => q{};
use constant SPACE => q{ };
+use constant CONST_1024 => 1024; # bytes per kb; 2**10
use vars qw{
$VERSION
use IO::File;
use File::Basename;
use File::Copy;
-use File::Temp qw(tempfile);
+use File::Spec ();
+
+# perl stat function index names, based on
+# https://perldoc.perl.org/functions/stat
+use constant {
+
+ _mode_ => 2, # file mode (type and permissions)
+ _uid_ => 4, # numeric user ID of file's owner
+ _gid_ => 5, # numeric group ID of file's owner
+ _atime_ => 8, # last access time in seconds since the epoch
+ _mtime_ => 9, # last modify time in seconds since the epoch
+
+## _dev_ => 0, # device number of filesystem
+## _ino_ => 1, # inode number
+## _nlink_ => 3, # number of (hard) links to the file
+## _rdev_ => 6, # the device identifier (special files only)
+## _size_ => 7, # total size of file, in bytes
+## _ctime_ => 10, # inode change time in seconds since the epoch (*)
+## _blksize_ => 11, # preferred I/O size in bytes for interacting with
+## # the file (may vary from file to file)
+## _blocks_ => 12, # actual number of system-specific blocks allocated
+## # on disk (often, but not always, 512 bytes each)
+};
BEGIN {
# then the Release version must be bumped, and it is probably past time for
# a release anyway.
- $VERSION = '20230309';
+ $VERSION = '20250105';
} ## end BEGIN
sub DESTROY {
+ my $self = shift;
# required to avoid call to AUTOLOAD in some versions of perl
-}
+ return;
+} ## end sub DESTROY
sub AUTOLOAD {
our $AUTOLOAD;
return if ( $AUTOLOAD =~ /\bDESTROY$/ );
my ( $pkg, $fname, $lno ) = caller();
- print STDERR <<EOM;
+ print {*STDERR} <<EOM;
======================================================================
Unexpected call to Autoload looking for sub $AUTOLOAD
-Called from package: '$pkg'
+Called from package: '$pkg'
Called from File '$fname' at line '$lno'
This error is probably due to a recent programming change
======================================================================
sub streamhandle {
- # given filename and mode (r or w), create an object which:
+ my ( $filename, $mode, ($is_encoded_data) ) = @_;
+
+ # Given:
+ # $filename
+ # $mode = 'r' or 'w' (only 'w' is used now, see note below)
+ # $is_encoded_data (optional flag)
+
+ # Create an object which:
# has a 'getline' method if mode='r', and
# has a 'print' method if mode='w'.
# The objects also need a 'close' method.
# (check for 'print' method for 'w' mode)
# (check for 'getline' method for 'r' mode)
- # An optional flag $is_encoded_data may be given, as follows:
-
- # Case 1. Any non-empty string: encoded data is being transferred, set
- # encoding to be utf8 for files and for stdin.
-
- # Case 2. Not given, or an empty string: unencoded binary data is being
- # transferred, set binary mode for files and for stdin.
+ # An optional flag '$is_encoded_data' may be given, as follows:
+ # - true: encoded data is being transferred,
+ # set encoding to be utf8 for files and for stdin.
+ # - false: unencoded binary data is being transferred,
+ # set binary mode for files and for stdin.
- my ( $filename, $mode, $is_encoded_data ) = @_;
+ # NOTE: mode 'r' works but is no longer used.
+ # Use sub stream_slurp instead for mode 'r', for efficiency.
+ $mode = lc($mode);
+ if ( $mode ne 'w' ) {
+ if ( DEVEL_MODE || $mode ne 'r' ) {
+ Fault("streamhandle called in unexpected mode '$mode'\n");
+ }
+ }
my $ref = ref($filename);
my $New;
my $fh;
+ #-------------------
# handle a reference
+ #-------------------
if ($ref) {
if ( $ref eq 'ARRAY' ) {
$New = sub { Perl::Tidy::IOScalarArray->new( $filename, $mode ) };
# operator. If this causes trouble, the check can be
# skipped and we can just let it crash if there is no
# getline.
- if ( $mode =~ /[rR]/ ) {
+ if ( $mode eq 'r' ) {
# RT#97159; part 1 of 2: updated to use 'can'
- ##if ( $ref eq 'IO::File' || defined &{ $ref . "::getline" } ) {
if ( $ref->can('getline') ) {
$New = sub { $filename };
}
# Accept an object with a print method for writing.
# See note above about IO::File
- if ( $mode =~ /[wW]/ ) {
+ if ( $mode eq 'w' ) {
# RT#97159; part 2 of 2: updated to use 'can'
- ##if ( $ref eq 'IO::File' || defined &{ $ref . "::print" } ) {
if ( $ref->can('print') ) {
$New = sub { $filename };
}
}
}
+ #----------------
# handle a string
+ #----------------
else {
if ( $filename eq '-' ) {
$New = sub { $mode eq 'w' ? *STDOUT : *STDIN }
$New = sub { IO::File->new( $filename, $mode ) };
}
}
- $fh = $New->( $filename, $mode );
- if ( !$fh ) {
- Warn("Couldn't open file:$filename in mode:$mode : $ERRNO\n");
+ #--------------
+ # Open the file
+ #--------------
+ $fh = $New->( $filename, $mode );
+ if ( !$fh ) {
+ Warn("Couldn't open file:'$filename' in mode:$mode : $OS_ERROR\n");
}
- else {
- # Case 1: handle encoded data
- if ($is_encoded_data) {
- if ( ref($fh) eq 'IO::File' ) {
- ## binmode object call not available in older perl versions
- ## $fh->binmode(":raw:encoding(UTF-8)");
- binmode $fh, ":raw:encoding(UTF-8)";
- }
- elsif ( $filename eq '-' ) {
- binmode STDOUT, ":raw:encoding(UTF-8)";
- }
- else {
- # shouldn't happen
- }
+ #------------
+ # Set binmode
+ #------------
+ else {
+ if ( ref($fh) eq 'IO::File' ) {
+ ## binmode object call not available in older perl versions
+ ## $fh->binmode(":raw:encoding(UTF-8)");
+ if ($is_encoded_data) { binmode $fh, ":raw:encoding(UTF-8)"; }
+ else { binmode $fh }
+ }
+ elsif ( $filename eq '-' ) {
+ if ($is_encoded_data) { binmode STDOUT, ":raw:encoding(UTF-8)"; }
+ else { binmode STDOUT }
}
-
- # Case 2: handle unencoded data
else {
- if ( ref($fh) eq 'IO::File' ) { binmode $fh }
- elsif ( $filename eq '-' ) { binmode STDOUT }
- else { } # shouldn't happen
+
+ # shouldn't get here
+ if (DEVEL_MODE) {
+ my $ref_fh = ref($fh);
+ Fault(<<EOM);
+unexpected streamhandle state for file='$filename' mode='$mode' ref(fh)=$ref_fh
+EOM
+ }
}
}
- return $fh, ( $ref or $filename );
+ return $fh;
} ## end sub streamhandle
-sub find_input_line_ending {
-
- # Peek at a file and return first line ending character.
- # Return undefined value in case of any trouble.
- my ($input_file) = @_;
- my $ending;
-
- # silently ignore input from object or stdin
- if ( ref($input_file) || $input_file eq '-' ) {
- return $ending;
- }
-
- my $fh;
- open( $fh, '<', $input_file ) || return $ending;
-
- binmode $fh;
- my $buf;
- read( $fh, $buf, 1024 );
- close $fh || return $ending;
- if ( $buf && $buf =~ /([\012\015]+)/ ) {
- my $test = $1;
-
- # dos
- if ( $test =~ /^(\015\012)+$/ ) { $ending = "\015\012" }
+sub stream_slurp {
- # mac
- elsif ( $test =~ /^\015+$/ ) { $ending = "\015" }
+ my ( $filename, ($timeout_in_seconds) ) = @_;
- # unix
- elsif ( $test =~ /^\012+$/ ) { $ending = "\012" }
-
- # unknown
- else { }
- }
+ # Given:
+ # $filename
+ # $timeout_in_seconds (optional timeout, in seconds)
- # no ending seen
- else { }
+ # Read the text in $filename and
+ # return:
+ # undef if read error, or
+ # $rinput_string = ref to string of text
- return $ending;
-} ## end sub find_input_line_ending
+ # if $filename is: Read
+ # ---------------- -----------------
+ # ARRAY ref array ref
+ # SCALAR ref string ref
+ # object ref object with 'getline' method (exit if no 'getline')
+ # '-' STDIN
+ # string file named $filename
-{ ## begin closure for sub catfile
+ # Note that any decoding from utf8 must be done by the caller
- my $missing_file_spec;
+ my $ref = ref($filename);
+ my $rinput_string;
- BEGIN {
- $missing_file_spec = !eval { require File::Spec; 1 };
+ # handle a reference
+ if ($ref) {
+ if ( $ref eq 'ARRAY' ) {
+ my $buf = join EMPTY_STRING, @{$filename};
+ $rinput_string = \$buf;
+ }
+ elsif ( $ref eq 'SCALAR' ) {
+ $rinput_string = $filename;
+ }
+ else {
+ if ( $ref->can('getline') ) {
+ my $buf = EMPTY_STRING;
+ while ( defined( my $line = $filename->getline() ) ) {
+ $buf .= $line;
+ }
+ $rinput_string = \$buf;
+ }
+ else {
+ confess <<EOM;
+------------------------------------------------------------------------
+No 'getline' method is defined for object of class '$ref'
+Please check your call to Perl::Tidy::perltidy. Trace follows.
+------------------------------------------------------------------------
+EOM
+ }
+ }
}
- sub catfile {
-
- # concatenate a path and file basename
- # returns undef in case of error
-
- my @parts = @_;
-
- # use File::Spec if we can
- unless ($missing_file_spec) {
- return File::Spec->catfile(@parts);
+ # handle a string
+ else {
+ if ( $filename eq '-' ) {
+ local $INPUT_RECORD_SEPARATOR = undef;
+ my $buf;
+ if ( $timeout_in_seconds && $timeout_in_seconds > 0 ) {
+ eval {
+ local $SIG{ALRM} = sub { die "alarm\n" };
+ alarm($timeout_in_seconds);
+ $buf = <>;
+ alarm(0);
+ 1;
+ }
+ or Die(
+"Timeout reading stdin using -tos=$timeout_in_seconds seconds. Use -tos=0 to skip timeout check.\n"
+ );
+ }
+ else {
+ $buf = <>;
+ }
+ $rinput_string = \$buf;
}
+ else {
+ if ( open( my $fh, '<', $filename ) ) {
+ local $INPUT_RECORD_SEPARATOR = undef;
+ my $buf = <$fh>;
+ $fh->close() or Warn("Cannot close $filename\n");
+ $rinput_string = \$buf;
+ }
+ else {
+ Warn("Cannot open $filename: $OS_ERROR\n");
+ return;
+ }
+ }
+ }
- # Perl 5.004 systems may not have File::Spec so we'll make
- # a simple try. We assume File::Basename is available.
- # return if not successful.
- my $name = pop @parts;
- my $path = join '/', @parts;
- my $test_file = $path . $name;
- my ( $test_name, $test_path ) = fileparse($test_file);
- return $test_file if ( $test_name eq $name );
- return if ( $OSNAME eq 'VMS' );
-
- # this should work at least for Windows and Unix:
- $test_file = $path . '/' . $name;
- ( $test_name, $test_path ) = fileparse($test_file);
- return $test_file if ( $test_name eq $name );
- return;
- } ## end sub catfile
-} ## end closure for sub catfile
+ return $rinput_string;
+} ## end sub stream_slurp
# Here is a map of the flow of data from the input source to the output
# line sink:
#
-# LineSource-->Tokenizer-->Formatter-->VerticalAligner-->FileWriter-->
+# -->Tokenizer-->Formatter-->VerticalAligner-->FileWriter-->
# input groups output
# lines tokens lines of lines lines
# lines
#
# The overall process is controlled by the "main" package.
#
-# LineSource is the stream of input lines
-#
# Tokenizer analyzes a line and breaks it into tokens, peeking ahead
# if necessary. A token is any section of the input line which should be
# manipulated as a single entity during formatting. For example, a single
# This function isolates the call to Perl's internal function
# utf8::is_utf8() which is true for strings represented in an 'upgraded
- # form'. It is available after Perl version 5.8.
+ # form'. It is available AFTER Perl version 5.8.
# See https://perldoc.perl.org/Encode.
# See also comments in Carp.pm and other modules using this function
my $md5_hex = sub {
my ($buf) = @_;
- # Evaluate the MD5 sum for a string
+ # Evaluate the MD5 sum for a string:
+ # Given:
+ # $buf = a string
+ # Return:
+ # $digest = its MD5 sum
+
# Patch for [rt.cpan.org #88020]
# Use utf8::encode since md5_hex() only operates on bytes.
# my $digest = md5_hex( utf8::encode($sink_buffer) );
my $octets = Encode::encode( "utf8", $buf );
my $digest = md5_hex($octets);
return $digest;
-};
+}; ## end $md5_hex = sub
+
+sub get_iteration_count {
+ return $rstatus->{iteration_count};
+}
BEGIN {
_length_function_ => $i++,
_line_separator_default_ => $i++,
_line_separator_ => $i++,
+ _line_tidy_begin_ => $i++,
+ _line_tidy_end_ => $i++,
_logger_object_ => $i++,
_output_file_ => $i++,
_postfilter_ => $i++,
_prefilter_ => $i++,
_rOpts_ => $i++,
_saw_pbp_ => $i++,
- _tabsize_ => $i++,
_teefile_stream_ => $i++,
_user_formatter_ => $i++,
_input_copied_verbatim_ => $i++,
_input_output_difference_ => $i++,
+ _dump_to_stdout_ => $i++,
};
} ## end BEGIN
my %input_hash = @_;
+ # This is the main perltidy routine
+
my %defaults = (
argv => undef,
destination => undef,
$Warn_count = 0;
# don't overwrite callers ARGV
+ # Localization of @ARGV could be avoided by calling GetOptionsFromArray
+ # instead of GetOptions, but that is not available before perl 5.10
local @ARGV = @ARGV;
local *STDERR = *STDERR;
my ($key) = @_;
my $hash_ref = $input_hash{$key};
if ( defined($hash_ref) ) {
- unless ( ref($hash_ref) eq 'HASH' ) {
+ if ( ref($hash_ref) ne 'HASH' ) {
my $what = ref($hash_ref);
my $but_is =
$what ? "but is ref to $what" : "but is not a reference";
}
}
return $hash_ref;
- };
+ }; ## end $get_hash_ref = sub
%input_hash = ( %defaults, %input_hash );
my $argv = $input_hash{'argv'};
my $destination_stream = $input_hash{'destination'};
- my $errorfile_stream = $input_hash{'errorfile'};
- my $logfile_stream = $input_hash{'logfile'};
- my $teefile_stream = $input_hash{'teefile'};
- my $debugfile_stream = $input_hash{'debugfile'};
my $perltidyrc_stream = $input_hash{'perltidyrc'};
my $source_stream = $input_hash{'source'};
my $stderr_stream = $input_hash{'stderr'};
my $postfilter = $input_hash{'postfilter'};
if ($stderr_stream) {
- ( $fh_stderr, my $stderr_file ) =
- Perl::Tidy::streamhandle( $stderr_stream, 'w' );
+ $fh_stderr = Perl::Tidy::streamhandle( $stderr_stream, 'w' );
if ( !$fh_stderr ) {
croak <<EOM;
------------------------------------------------------------------------
my $flag = shift;
if ($flag) { goto ERROR_EXIT }
else { goto NORMAL_EXIT }
- croak "unexpectd return to Exit";
+ croak "unexpected return to sub Exit";
} ## end sub Exit
sub Die {
my $msg = shift;
Warn($msg);
Exit(1);
- croak "unexpected return to Die";
+ croak "unexpected return from sub Exit";
} ## end sub Die
sub Fault {
# except if there has been a bug introduced by a recent program change.
# Please add comments at calls to Fault to explain why the call
# should not occur, and where to look to fix it.
- my ( $package0, $filename0, $line0, $subroutine0 ) = caller(0);
- my ( $package1, $filename1, $line1, $subroutine1 ) = caller(1);
- my ( $package2, $filename2, $line2, $subroutine2 ) = caller(2);
+ my ( $package0_uu, $filename0_uu, $line0, $subroutine0_uu ) = caller(0);
+ my ( $package1_uu, $filename1, $line1, $subroutine1 ) = caller(1);
+ my ( $package2_uu, $filename2_uu, $line2_uu, $subroutine2 ) = caller(2);
my $pkg = __PACKAGE__;
my $input_stream_name = $rstatus->{'input_name'};
==============================================================================
EOM
- # This return is to keep Perl-Critic from complaining.
- return;
+ croak "unexpected return from sub Die";
} ## end sub Fault
# extract various dump parameters
# validate dump_options_type
if ( defined($dump_options) ) {
- unless ( defined($dump_options_type) ) {
+ if ( !defined($dump_options_type) ) {
$dump_options_type = 'perltidyrc';
}
if ( $dump_options_type ne 'perltidyrc'
croak <<EOM;
------------------------------------------------------------------------
Please check value of -dump_options_type in call to perltidy;
-saw: '$dump_options_type'
+saw: '$dump_options_type'
expecting: 'perltidyrc' or 'full'
------------------------------------------------------------------------
EOM
# if the user defines a formatter, there is no output stream,
# but we need a null stream to keep coding simple
- $destination_stream = Perl::Tidy::DevNull->new();
+ $destination_stream = \my $tmp;
}
# see if ARGV is overridden
if ( defined($argv) ) {
- my $rargv = ref $argv;
+ my $rargv = ref($argv);
if ( $rargv eq 'SCALAR' ) { $argv = ${$argv}; $rargv = undef }
# ref to ARRAY
my ( $rargv_str, $msg ) = parse_args($argv);
if ($msg) {
Die(<<EOM);
-Error parsing this string passed to to perltidy with 'argv':
+Error parsing this string passed to to perltidy with 'argv':
$msg
EOM
}
}
$self->[_file_extension_separator_] = $dot;
+ # save a copy of the last two input args for error checking later
+ my @ARGV_saved;
+ if ( @ARGV > 1 ) {
+ @ARGV_saved = ( $ARGV[-2], $ARGV[-1] );
+ }
+
+ # see if -wvt was entered on the command line before @ARGV is changed
+ my $wvt_in_args = grep { /-(wvt|warn-variable-types)=/ } @ARGV;
+
#-------------------------
# get command line options
#-------------------------
my ( $rOpts, $config_file, $rraw_options, $roption_string,
- $rexpansion, $roption_category, $roption_range )
+ $rexpansion, $roption_category, $roption_range, $rinteger_option_range )
= process_command_line(
$perltidyrc_stream, $is_Windows, $Windows_type,
$rpending_complaint, $dump_options_type,
# dump from command line
if ( $rOpts->{'dump-options'} ) {
- print STDOUT $readable_options;
+ print {*STDOUT} $readable_options;
Exit(0);
}
- # --dump-block-summary requires one filename in the arg list.
- # This is a safety precaution in case a user accidentally adds -dbs to the
- # command line parameters and is expecting formatted output to stdout.
- # Another precaution, added elsewhere, is to ignore -dbs in a .perltidyrc
- my $numf = @Arg_files;
- if ( $rOpts->{'dump-block-summary'} && $numf != 1 ) {
- Die(<<EOM);
---dump-block-summary expects 1 filename in the arg list but saw $numf filenames
+ # some dump options require one filename in the arg list. This is a safety
+ # precaution in case a user accidentally adds such an option to the command
+ # line parameters and is expecting formatted output to stdout. Another
+ # precaution, added elsewhere, is to ignore these in a .perltidyrc
+ my $num_files = @Arg_files;
+ foreach my $opt_name (
+ qw(
+ dump-block-summary
+ dump-unusual-variables
+ dump-mixed-call-parens
+ dump-mismatched-args
+ dump-mismatched-returns
+ dump-unique-keys
+ )
+ )
+ {
+ if ( $rOpts->{$opt_name} ) {
+ $self->[_dump_to_stdout_] = 1;
+ if ( $num_files != 1 ) {
+ Die(<<EOM);
+--$opt_name expects 1 filename in the arg list but saw $num_files filenames
EOM
+ }
+ }
}
#----------------------------------------
# check parameters and their interactions
#----------------------------------------
- $self->check_options( $is_Windows, $Windows_type, $rpending_complaint );
+ $self->check_options( $num_files, $rinteger_option_range );
if ($user_formatter) {
$rOpts->{'format'} = 'user';
$rOpts->{'encode-output-strings'} ? 'eos' : 'neos';
# be sure we have a valid output format
- unless ( exists $default_file_extension{ $rOpts->{'format'} } ) {
+ if ( !exists $default_file_extension{ $rOpts->{'format'} } ) {
my $formats = join SPACE,
sort map { "'" . $_ . "'" } keys %default_file_extension;
my $fmt = $rOpts->{'format'};
my ( $in_place_modify, $backup_extension, $delete_backup ) =
$self->check_in_place_modify( $source_stream, $destination_stream );
- Perl::Tidy::Formatter::check_options($rOpts);
+ my $line_range_clipped = $rOpts->{'line-range-tidy'}
+ && ( $self->[_line_tidy_begin_] > 1
+ || defined( $self->[_line_tidy_end_] ) );
+
+ Perl::Tidy::Formatter::check_options( $rOpts, $wvt_in_args, $num_files,
+ $line_range_clipped );
Perl::Tidy::Tokenizer::check_options($rOpts);
Perl::Tidy::VerticalAligner::check_options($rOpts);
if ( $rOpts->{'format'} eq 'html' ) {
Perl::Tidy::HtmlWriter->check_options($rOpts);
}
+ # Try to catch an unusual missing string parameter error, like this:
+ # perltidy -wvt perltidy.pl
+ # The problem is that -wvt wants a string, so it grabs 'perltidy.pl'.
+ # Then there is no output filename, so input is assumed to be stdin.
+ # This make perltidy unexpectedly wait for input. To the user, it
+ # appears that perltidy has gone into an infinite loop. Issue c312.
+ # To avoid getting this far, it is best for parameters which take a
+ # string to check the strings in one of the 'check_options' subs, and
+ # exit if there is an obvious error. This has been done for -wvt,
+ # but are undoubtedly other parameters where this problem might occur.
+ if ( !$num_files && @ARGV_saved > 1 ) {
+ my $opt_test = $ARGV_saved[-2];
+ my $file_test = $ARGV_saved[-1];
+ if ( $opt_test =~ s/^[-]+//
+ && $file_test !~ /^[-]/
+ && $file_test !~ /^\d+$/
+ && -e $file_test )
+ {
+
+ # These options can take filenames, so we will ignore them here
+ my %is_option_with_file_parameter;
+ my @qf = qw( outfile profile );
+ @is_option_with_file_parameter{@qf} = (1) x scalar(@qf);
+
+ # Expand an abbreviation into a long name
+ my $long_name;
+ my $exp = $rexpansion->{$opt_test};
+ if ( !$exp ) { $long_name = $opt_test }
+ elsif ( @{$exp} == 1 ) { $long_name = $exp->[0] }
+ else { }
+
+ # If this arg grabbed the file, then it must take a string arg
+ if ( $long_name
+ && defined( $rOpts->{$long_name} )
+ && $rOpts->{$long_name} eq $file_test
+ && !$is_option_with_file_parameter{$long_name} )
+ {
+ Die(<<EOM);
+Stopping on possible missing string parameter for '-$opt_test':
+This parameter takes a string and has been set equal to file '$file_test',
+and formatted output will go to standard output. If this is actually correct,
+you can skip this message by entering this as '-$opt_test=$file_test'.
+EOM
+ }
+ }
+ }
+
# make the pattern of file extensions that we shouldn't touch
my $forbidden_file_extensions = "(($dot_pattern)(LOG|DEBUG|ERR|TEE)";
if ($output_extension) {
# Create a diagnostics object if requested;
# This is only useful for code development
my $diagnostics_object = undef;
- if ( $rOpts->{'DIAGNOSTICS'} ) {
+ if (DIAGNOSTICS) {
$diagnostics_object = Perl::Tidy::Diagnostics->new();
}
}
# use stdin by default if no source array and no args
+ elsif ( !@Arg_files ) {
+ unshift( @Arg_files, '-' );
+ }
+
+ # check file existence and expand any globs
else {
- unshift( @Arg_files, '-' ) unless @Arg_files;
+ my @updated_files;
+ foreach my $input_file (@Arg_files) {
+ if ( -e $input_file ) {
+ push @updated_files, $input_file;
+ }
+ else {
+
+ # file doesn't exist - check for a file glob
+ if ( $input_file =~ /([\?\*\[\{])/ ) {
+
+ # Windows shell may not remove quotes, so do it
+ my $ifile = $input_file;
+ if ( $ifile =~ /^\'(.+)\'$/ ) { $ifile = $1 }
+ if ( $ifile =~ /^\"(.+)\"$/ ) { $ifile = $1 }
+ my $pattern = fileglob_to_re($ifile);
+ my $dh;
+ if ( opendir( $dh, './' ) ) {
+ my @files =
+ grep { /$pattern/ && !-d } readdir($dh);
+ closedir($dh);
+ next unless (@files);
+ push @updated_files, @files;
+ next;
+ }
+ }
+ Warn("skipping file: '$input_file': no matches found\n");
+ next;
+ }
+ } ## end loop over input filenames
+
+ @Arg_files = @updated_files;
+ if ( !@Arg_files ) {
+ Die("no matching input files found\n");
+ }
}
# Flag for loading module Unicode::GCString for evaluating text width:
}
my $logfile_header = make_logfile_header( $rOpts, $config_file,
- $rraw_options, $Windows_type, $readable_options, );
+ $rraw_options, $Windows_type, $readable_options );
# Store some values needed by lower level routines
$self->[_diagnostics_object_] = $diagnostics_object;
# loop to process all files
#--------------------------
$self->process_all_files(
-
- \%input_hash,
- \@Arg_files,
-
- # filename stuff...
- $output_extension,
- $forbidden_file_extensions,
- $in_place_modify,
- $backup_extension,
- $delete_backup,
-
- # logfile stuff...
- $logfile_header,
- $rpending_complaint,
- $rpending_logfile_message,
-
+ {
+ rinput_hash => \%input_hash,
+ rfiles => \@Arg_files,
+
+ # filename stuff...
+ source_stream => $source_stream,
+ output_extension => $output_extension,
+ forbidden_file_extensions => $forbidden_file_extensions,
+ in_place_modify => $in_place_modify,
+ backup_extension => $backup_extension,
+ delete_backup => $delete_backup,
+
+ # logfile stuff...
+ logfile_header => $logfile_header,
+ rpending_complaint => $rpending_complaint,
+ rpending_logfile_message => $rpending_logfile_message,
+ }
);
#-----
# Make a file extension, adding any leading '.' if necessary.
# (the '.' may actually be an '_' under VMS).
- my ( $self, $extension, $default ) = @_;
+ my ( $self, $extension, ($default) ) = @_;
- # '$extension' is the first choice (usually a user entry)
- # '$default' is a backup extension
+ # Given:
+ # $extension = the first choice (usually a user entry)
+ # $default = an optional backup extension
+ # Return:
+ # $extension = the actual file extension
$extension = EMPTY_STRING unless defined($extension);
$extension =~ s/^\s+//;
$extension =~ s/\s+$//;
# Use default extension if nothing remains of the first choice
- #
if ( length($extension) == 0 ) {
$extension = $default;
$extension = EMPTY_STRING unless defined($extension);
my ( $self, $source_stream, $destination_stream ) = @_;
- # get parameters associated with the -b option
+ # See if --backup-and-modify-in-place (-b) is set, and if so,
+ # return its associated parameters
my $rOpts = $self->[_rOpts_];
# check for -b option;
if ($in_place_modify) {
if ( $rOpts->{'standard-output'}
|| $destination_stream
- || ref $source_stream
+ || ref($source_stream)
|| $rOpts->{'outfile'}
|| defined( $rOpts->{'output-path'} ) )
{
sub backup_method_copy {
- my ( $self, $input_file, $output_file, $backup_extension, $delete_backup )
+ my ( $self, $input_file, $routput_string, $backup_extension,
+ $delete_backup )
= @_;
# Handle the -b (--backup-and-modify-in-place) option with -bm='copy':
# - First copy $input file to $backup_name.
- # - Then open input file and rewrite with contents of $output_file
+ # - Then open input file and rewrite with contents of $routput_string
# - Then delete the backup if requested
# NOTES:
# - Die immediately on any error.
- # - $output_file is actually an ARRAY ref
+ # - $routput_string is a SCALAR ref
my $backup_file = $input_file . $backup_extension;
- unless ( -f $input_file ) {
+ if ( !-f $input_file ) {
# no real file to backup ..
# This shouldn't happen because of numerous preliminary checks
if ( -f $backup_file ) {
unlink($backup_file)
or Die(
-"unable to remove previous '$backup_file' for -b option; check permissions: $ERRNO\n"
+"unable to remove previous '$backup_file' for -b option; check permissions: $OS_ERROR\n"
);
}
# Copy input file to backup
File::Copy::copy( $input_file, $backup_file )
- or Die("File::Copy failed trying to backup source: $ERRNO");
+ or Die("File::Copy failed trying to backup source: $OS_ERROR");
# set permissions of the backup file to match the input file
my @input_file_stat = stat($input_file);
$in_place_modify );
# set the modification time of the copy to the original value (rt#145999)
- my ( $read_time, $write_time ) = @input_file_stat[ 8, 9 ];
+ my ( $read_time, $write_time ) = @input_file_stat[ _atime_, _mtime_ ];
if ( defined($write_time) ) {
utime( $read_time, $write_time, $backup_file )
|| Warn("error setting times for backup file '$backup_file'\n");
# Open the original input file for writing ... opening with ">" will
# truncate the existing data.
open( my $fout, ">", $input_file )
- || Die(
-"problem re-opening $input_file for write for -b option; check file and directory permissions: $ERRNO\n"
+ or Die(
+"problem re-opening $input_file for write for -b option; check file and directory permissions: $OS_ERROR\n"
);
- if ( $self->[_is_encoded_data_] ) {
- binmode $fout, ":raw:encoding(UTF-8)";
- }
+ if ( $self->[_is_encoded_data_] ) { binmode $fout, ":raw:encoding(UTF-8)" }
+ else { binmode $fout }
# Now copy the formatted output to it..
-
- # if formatted output is in an ARRAY ref (normally this is true)...
- if ( ref($output_file) eq 'ARRAY' ) {
- foreach my $line ( @{$output_file} ) {
- $fout->print($line)
- or
- Die("cannot print to '$input_file' with -b option: $OS_ERROR\n");
- }
- }
-
- # or in a SCALAR ref (less efficient, and only used for testing)
- elsif ( ref($output_file) eq 'SCALAR' ) {
- foreach my $line ( split /^/, ${$output_file} ) {
- $fout->print($line)
- or
- Die("cannot print to '$input_file' with -b option: $OS_ERROR\n");
- }
+ # output must be SCALAR ref..
+ if ( ref($routput_string) eq 'SCALAR' ) {
+ $fout->print( ${$routput_string} )
+ or Die("cannot print to '$input_file' with -b option: $OS_ERROR\n");
}
# Error if anything else ...
- # This can only happen if the output was changed from \@tmp_buff
else {
- my $ref = ref($output_file);
+ my $ref = ref($routput_string);
Die(<<EOM);
Programming error: unable to print to '$input_file' with -b option:
unexpected ref type '$ref'; expecting 'ARRAY' or 'SCALAR'
else {
unlink($backup_file)
or Die(
-"unable to remove backup file '$backup_file' for -b option; check permissions: $ERRNO\n"
+"unable to remove backup file '$backup_file' for -b option; check permissions: $OS_ERROR\n"
);
}
}
sub backup_method_move {
- my ( $self, $input_file, $output_file, $backup_extension, $delete_backup )
+ my ( $self, $input_file, $routput_string, $backup_extension,
+ $delete_backup )
= @_;
# Handle the -b (--backup-and-modify-in-place) option with -bm='move':
# - First move $input file to $backup_name.
- # - Then copy $output_file to $input_file.
+ # - Then copy $routput_string to $input_file.
# - Then delete the backup if requested
# NOTES:
# - Die immediately on any error.
- # - $output_file is actually an ARRAY ref
+ # - $routput_string is a SCALAR ref
# - $input_file permissions will be set by sub set_output_file_permissions
my $backup_name = $input_file . $backup_extension;
- unless ( -f $input_file ) {
+ if ( !-f $input_file ) {
# oh, oh, no real file to backup ..
# shouldn't happen because of numerous preliminary checks
if ( -f $backup_name ) {
unlink($backup_name)
or Die(
-"unable to remove previous '$backup_name' for -b option; check permissions: $ERRNO\n"
+"unable to remove previous '$backup_name' for -b option; check permissions: $OS_ERROR\n"
);
}
# we use copy for symlinks, move for regular files
if ( -l $input_file ) {
File::Copy::copy( $input_file, $backup_name )
- or Die("File::Copy failed trying to backup source: $ERRNO");
+ or Die("File::Copy failed trying to backup source: $OS_ERROR");
}
else {
rename( $input_file, $backup_name )
or Die(
-"problem renaming $input_file to $backup_name for -b option: $ERRNO\n"
+"problem renaming $input_file to $backup_name for -b option: $OS_ERROR\n"
);
}
# Open a file with the original input file name for writing ...
- my $is_encoded_data = $self->[_is_encoded_data_];
- my ( $fout, $iname ) =
- Perl::Tidy::streamhandle( $input_file, 'w', $is_encoded_data );
- if ( !$fout ) {
- Die(
-"problem re-opening $input_file for write for -b option; check file and directory permissions: $ERRNO\n"
- );
- }
-
- # Now copy the formatted output to it..
+ open( my $fout, ">", $input_file )
+ or Die(
+"problem re-opening $input_file for write for -b option; check file and directory permissions: $OS_ERROR\n"
+ );
- # if formatted output is in an ARRAY ref ...
- if ( ref($output_file) eq 'ARRAY' ) {
- foreach my $line ( @{$output_file} ) {
- $fout->print($line)
- or
- Die("cannot print to '$input_file' with -b option: $OS_ERROR\n");
- }
- }
+ if ( $self->[_is_encoded_data_] ) { binmode $fout, ":raw:encoding(UTF-8)" }
+ else { binmode $fout }
- # or in a SCALAR ref (less efficient, for testing only)
- elsif ( ref($output_file) eq 'SCALAR' ) {
- foreach my $line ( split /^/, ${$output_file} ) {
- $fout->print($line)
- or
- Die("cannot print to '$input_file' with -b option: $OS_ERROR\n");
- }
+ # Now copy the formatted output to it..
+ # output must be SCALAR ref..
+ if ( ref($routput_string) eq 'SCALAR' ) {
+ $fout->print( ${$routput_string} )
+ or Die("cannot print to '$input_file' with -b option: $OS_ERROR\n");
}
# Error if anything else ...
- # This can only happen if the output was changed from \@tmp_buff
else {
- my $ref = ref($output_file);
+ my $ref = ref($routput_string);
Die(<<EOM);
Programming error: unable to print to '$input_file' with -b option:
unexpected ref type '$ref'; expecting 'ARRAY' or 'SCALAR'
$in_place_modify );
# Keep original modification time if no change (rt#145999)
- my ( $read_time, $write_time ) = @input_file_stat[ 8, 9 ];
+ my ( $read_time, $write_time ) = @input_file_stat[ _atime_, _mtime_ ];
if ( !$self->[_input_output_difference_] && defined($write_time) ) {
utime( $read_time, $write_time, $input_file )
|| Warn("error setting times for '$input_file'\n");
else {
unlink($backup_name)
or Die(
-"unable to remove previous '$backup_name' for -b option; check permissions: $ERRNO\n"
+"unable to remove previous '$backup_name' for -b option; check permissions: $OS_ERROR\n"
);
}
}
} ## end sub backup_method_move
+# masks for file permissions
+use constant OCT_777 => oct(777); # All users (O+G+W) + r/w/x bits
+use constant OCT_7777 => oct(7777); # Same + suid/sgid/sbit
+use constant OCT_600 => oct(600); # Owner RW permission
+
sub set_output_file_permissions {
my ( $self, $output_file, $rinput_file_stat, $in_place_modify ) = @_;
+ # Set the permissions for the output file
+
# Given:
# $output_file = the file whose permissions we will set
# $rinput_file_stat = the result of stat($input_file)
# $in_place_modify = true if --backup-and-modify-in-place is set
- my ( $mode_i, $uid_i, $gid_i ) = @{$rinput_file_stat}[ 2, 4, 5 ];
- my ( $uid_o, $gid_o ) = ( stat($output_file) )[ 4, 5 ];
- my $input_file_permissions = $mode_i & oct(7777);
+ my ( $mode_i, $uid_i, $gid_i ) =
+ @{$rinput_file_stat}[ _mode_, _uid_, _gid_ ];
+ my ( $uid_o, $gid_o ) = ( stat($output_file) )[ _uid_, _gid_ ];
+ my $input_file_permissions = $mode_i & OCT_7777;
my $output_file_permissions = $input_file_permissions;
#rt128477: avoid inconsistent owner/group and suid/sgid
else {
# owner or group differ: do not copy suid and sgid
- $output_file_permissions = $mode_i & oct(777);
+ $output_file_permissions = $mode_i & OCT_777;
if ( $input_file_permissions != $output_file_permissions ) {
Warn(
"Unable to copy setuid and/or setgid bits for output file '$output_file'\n"
# assumption that a previous backup can be unlinked even if
# not writable.
if ( !$in_place_modify ) {
- $output_file_permissions |= oct(600);
+ $output_file_permissions |= OCT_600;
}
if ( !chmod( $output_file_permissions, $output_file ) ) {
} ## end sub set_output_file_permissions
sub get_decoded_string_buffer {
- my ( $self, $input_file, $display_name, $rpending_logfile_message ) = @_;
- # Decode the input buffer if necessary or requested
+ my ( $self, $input_file, $display_name ) = @_;
- # Given
+ # Decode the input buffer from utf8 if necessary or requested
+
+ # Given:
# $input_file = the input file or stream
# $display_name = its name to use in error messages
- # Return
- # $buf = string buffer with input, decoded from utf8 if necessary
+ # Set $self->[_line_separator_], and
+
+ # Return:
+ # $rinput_string = ref to input string, decoded from utf8 if necessary
# $is_encoded_data = true if $buf is decoded from utf8
# $decoded_input_as = true if perltidy decoded input buf
# $encoding_log_message = messages for log file,
my $rOpts = $self->[_rOpts_];
- my $source_object = Perl::Tidy::LineSource->new(
- input_file => $input_file,
- rOpts => $rOpts,
- );
-
- # return nothing if error
- return unless ($source_object);
-
- my $buf = EMPTY_STRING;
- while ( my $line = $source_object->get_line() ) {
- $buf .= $line;
+ my $rinput_string =
+ stream_slurp( $input_file, $rOpts->{'timeout-in-seconds'} );
+ return unless ( defined($rinput_string) );
+
+ # Note that we could have a zero size input string here if it
+ # arrived from standard input or from a string ref. For example
+ # 'perltidy <null.pl'. If we issue a warning and stop, as we would
+ # for a zero length file ('perltidy null.pl'), then we could cause
+ # a call to the perltidy module to misbehave as a filter. So we will
+ # process this as any other file in this case without any warning (c286).
+ if ( !length( ${$rinput_string} ) ) {
+
+ # zero length, but keep going
+ }
+
+ # Check size of strings arriving from the standard input. These
+ # could not be checked until now.
+ if ( $input_file eq '-' ) {
+ my $size_in_mb =
+ length( ${$rinput_string} ) / ( CONST_1024 * CONST_1024 );
+ my $maximum_file_size_mb = $rOpts->{'maximum-file-size-mb'};
+ if ( $size_in_mb > $maximum_file_size_mb ) {
+ $size_in_mb = sprintf( "%0.1f", $size_in_mb );
+ Warn(
+"skipping file: <stdin>: size $size_in_mb MB exceeds limit $maximum_file_size_mb; use -maxfs=i to change\n"
+ );
+ return;
+ }
}
+ $rinput_string = $self->set_line_separator($rinput_string);
+
my $encoding_in = EMPTY_STRING;
my $rOpts_character_encoding = $rOpts->{'character-encoding'};
my $encoding_log_message;
# could also happen if the user has done some unusual manipulations of
# the source. In any case, we will not attempt to decode it because
# that could result in an output string in a different mode.
- if ( is_char_mode($buf) ) {
+ if ( is_char_mode( ${$rinput_string} ) ) {
$encoding_in = "utf8";
$rstatus->{'char_mode_source'} = 1;
}
# In testing I have found that including additional guess 'suspect'
# encodings sometimes works but can sometimes lead to disaster by
# using an incorrect decoding.
- my $buf_in = $buf;
- my $decoder = guess_encoding( $buf_in, 'utf8' );
- if ( ref($decoder) ) {
+ my $decoder;
+ if ( ${$rinput_string} =~ /[^[:ascii:]]/ ) {
+ $decoder = guess_encoding( ${$rinput_string}, 'utf8' );
+ }
+ if ( $decoder && ref($decoder) ) {
$encoding_in = $decoder->name;
if ( $encoding_in ne 'UTF-8' && $encoding_in ne 'utf8' ) {
$encoding_in = EMPTY_STRING;
- $buf = $buf_in;
$encoding_log_message .= <<EOM;
Guessed encoding '$encoding_in' is not utf8; no encoding will be used
EOM
}
else {
- if ( !eval { $buf = $decoder->decode($buf_in); 1 } ) {
+ my $buf;
+ if ( !eval { $buf = $decoder->decode( ${$rinput_string} ); 1 } )
+ {
$encoding_log_message .= <<EOM;
Guessed encoding '$encoding_in' but decoding was unsuccessful; no encoding is used
"file: $display_name: bad guess to decode source as $encoding_in\n"
);
$encoding_in = EMPTY_STRING;
- $buf = $buf_in;
}
else {
$encoding_log_message .= <<EOM;
Guessed encoding '$encoding_in' successfully decoded
EOM
$decoded_input_as = $encoding_in;
+ $rinput_string = \$buf;
}
}
}
# Case 4. Decode with a specific encoding
else {
$encoding_in = $rOpts_character_encoding;
+ my $buf;
if (
!eval {
- $buf = Encode::decode( $encoding_in, $buf,
+ $buf = Encode::decode( $encoding_in, ${$rinput_string},
Encode::FB_CROAK | Encode::LEAVE_SRC );
1;
}
Specified encoding '$encoding_in' successfully decoded
EOM
$decoded_input_as = $encoding_in;
+ $rinput_string = \$buf;
}
}
# Delete any Byte Order Mark (BOM), which can cause trouble
if ($is_encoded_data) {
- $buf =~ s/^\x{FEFF}//;
+ ${$rinput_string} =~ s/^\x{FEFF}//;
}
$rstatus->{'input_name'} = $display_name;
# Define the function to determine the display width of character
# strings
- my $length_function = sub { return length( $_[0] ) };
+ my $length_function;
if ($is_encoded_data) {
# Try to load Unicode::GCString for defining text display width, if
}
}
return (
- $buf,
+ $rinput_string,
$is_encoded_data,
$decoded_input_as,
$encoding_log_message,
);
} ## end sub get_decoded_string_buffer
-sub process_all_files {
+{ #<<<
- my (
+my $LF;
+my $CR;
+my $CRLF;
- $self,
- $rinput_hash,
- $rfiles,
+BEGIN {
+ $LF = chr(10);
+ $CR = chr(13);
+ $CRLF = $CR . $LF;
+}
- $output_extension,
- $forbidden_file_extensions,
- $in_place_modify,
- $backup_extension,
- $delete_backup,
+sub get_line_separator_default {
- $logfile_header,
- $rpending_complaint,
- $rpending_logfile_message,
+ my ($rOpts) = @_;
- ) = @_;
+ # Get the line separator that will apply unless overridden by a
+ # --preserve-line-endings flag for a specific file
+
+ my $line_separator_default = "\n";
+
+ my $ole = $rOpts->{'output-line-ending'};
+ if ($ole) {
+ my %endings = (
+ dos => $CRLF,
+ win => $CRLF,
+ mac => $CR,
+ unix => $LF,
+ );
+
+ $line_separator_default = $endings{ lc $ole };
+
+ if ( !$line_separator_default ) {
+ my $str = join SPACE, keys %endings;
+ Die(<<EOM);
+Unrecognized line ending '$ole'; expecting one of: $str
+EOM
+ }
+
+ # Check for conflict with -ple
+ if ( $rOpts->{'preserve-line-endings'} ) {
+ Warn("Ignoring -ple; conflicts with -ole\n");
+ $rOpts->{'preserve-line-endings'} = undef;
+ }
+ }
+
+ return $line_separator_default;
+
+} ## end sub get_line_separator_default
+
+sub set_line_separator {
+
+ my ( $self, $rinput_string ) = @_;
+
+ # Set the (output) line separator as requested or necessary
+
+ my $rOpts = $self->[_rOpts_];
+
+ # Start with the default (output) line separator
+ my $line_separator = $self->[_line_separator_default_];
+
+ # First try to find the line separator of the input stream
+ my $input_line_separator;
+
+ # Limit the search to a reasonable number of characters, in case we
+ # have a weird file
+ my $str = substr( ${$rinput_string}, 0, CONST_1024 );
+ if ($str) {
+
+ if ( $str =~ m/(($CR|$LF)+)/ ) {
+
+ my $test = $1;
+
+ # dos
+ if ( $test =~ /^($CRLF)+\z/ ) {
+ $input_line_separator = $CRLF;
+ }
+
+ # mac
+ elsif ( $test =~ /^($CR)+\z/ ) {
+ $input_line_separator = $CR;
+ }
+
+ # unix
+ elsif ( $test =~ /^($LF)+\z/ ) {
+ $input_line_separator = $LF;
+ }
+
+ # unknown
+ else { }
+ }
+
+ # no ending seen
+ else { }
+ }
+
+ if ( defined($input_line_separator) ) {
+
+ # Remember the input line separator if needed
+ if ( $rOpts->{'preserve-line-endings'} ) {
+ $line_separator = $input_line_separator;
+ }
+
+ # Convert line endings to "\n" for processing if necessary.
+ if ( $input_line_separator ne "\n" ) {
+ my @lines = split /^/, ${$rinput_string};
+
+ # try to convert CR to \n
+ if ( $input_line_separator eq $CR ) {
+
+ # if this file is currently a single line ..
+ if ( @lines == 1 ) {
+
+ # and becomes multiple lines with the change ..
+ @lines = map { $_ . "\n" } split /$CR/, ${$rinput_string};
+ if ( @lines > 1 ) {
+
+ # then make the change
+ my $buf = join EMPTY_STRING, @lines;
+ $rinput_string = \$buf;
+ }
+ }
+ }
+
+ # convert CR-LF to LF
+ elsif ( ( $input_line_separator eq $CRLF ) && ( "\n" eq $LF ) ) {
+ foreach my $line (@lines) { $line =~ s/$CRLF$/\n/ }
+ my $buf = join EMPTY_STRING, @lines;
+ $rinput_string = \$buf;
+ }
+
+ # unknown line ending scheme - leave it alone and let the tokenizer
+ # deal with it
+ else {
+ }
+ }
+ }
+
+ $self->[_line_separator_] = $line_separator;
+ return $rinput_string;
+} ## end sub set_line_separator
+}
+
+sub process_all_files {
+
+ my ( $self, $rcall_hash ) = @_;
# This routine is the main loop to process all files.
# Total formatting is done with these layers of subroutines:
# process_iteration_layer - handle any iterations on formatting
# process_single_case - solves one formatting problem
- my $rOpts = $self->[_rOpts_];
- my $dot = $self->[_file_extension_separator_];
- my $diagnostics_object = $self->[_diagnostics_object_];
- my $line_separator_default = $self->[_line_separator_default_];
+ my $rinput_hash = $rcall_hash->{rinput_hash};
+ my $rfiles = $rcall_hash->{rfiles};
+ my $source_stream = $rcall_hash->{source_stream};
+ my $output_extension = $rcall_hash->{output_extension};
+ my $forbidden_file_extensions = $rcall_hash->{forbidden_file_extensions};
+ my $in_place_modify = $rcall_hash->{in_place_modify};
+ my $backup_extension = $rcall_hash->{backup_extension};
+ my $delete_backup = $rcall_hash->{delete_backup};
+ my $logfile_header = $rcall_hash->{logfile_header};
+ my $rpending_complaint = $rcall_hash->{rpending_complaint};
+ my $rpending_logfile_message = $rcall_hash->{rpending_logfile_message};
+
+ my $rOpts = $self->[_rOpts_];
+ my $dot = $self->[_file_extension_separator_];
+ my $diagnostics_object = $self->[_diagnostics_object_];
my $destination_stream = $rinput_hash->{'destination'};
my $errorfile_stream = $rinput_hash->{'errorfile'};
my $logfile_stream = $rinput_hash->{'logfile'};
my $teefile_stream = $rinput_hash->{'teefile'};
my $debugfile_stream = $rinput_hash->{'debugfile'};
- my $source_stream = $rinput_hash->{'source'};
- my $stderr_stream = $rinput_hash->{'stderr'};
my $number_of_files = @{$rfiles};
- while ( my $input_file = shift @{$rfiles} ) {
-
+ foreach my $input_file ( @{$rfiles} ) {
my $fileroot;
my @input_file_stat;
my $display_name;
# If the source is from an array or string, then .LOG output
# is only possible if a logfile stream is specified. This prevents
- # unexpected perltidy.LOG files.
+ # unexpected perltidy.LOG files. If the stream is not defined
+ # then we will capture it in a string ref but it will not be
+ # accessible. Previously by Perl::Tidy::DevNull (fix c255);
if ( !defined($logfile_stream) ) {
- $logfile_stream = Perl::Tidy::DevNull->new();
+ $logfile_stream = \my $tmp;
# Likewise for .TEE and .DEBUG output
}
if ( !defined($teefile_stream) ) {
- $teefile_stream = Perl::Tidy::DevNull->new();
+ $teefile_stream = \my $tmp;
}
if ( !defined($debugfile_stream) ) {
- $debugfile_stream = Perl::Tidy::DevNull->new();
+ $debugfile_stream = \my $tmp;
}
}
elsif ( $input_file eq '-' ) { # '-' indicates input from STDIN
else {
$fileroot = $input_file;
$display_name = $input_file;
- unless ( -e $input_file ) {
+ if ( !-e $input_file ) {
+ Warn("skipping file: '$input_file': no matches found\n");
+ next;
+ }
- # file doesn't exist - check for a file glob
- if ( $input_file =~ /([\?\*\[\{])/ ) {
-
- # Windows shell may not remove quotes, so do it
- my $input_file = $input_file;
- if ( $input_file =~ /^\'(.+)\'$/ ) { $input_file = $1 }
- if ( $input_file =~ /^\"(.+)\"$/ ) { $input_file = $1 }
- my $pattern = fileglob_to_re($input_file);
- my $dh;
- if ( opendir( $dh, './' ) ) {
- my @files =
- grep { /$pattern/ && !-d } readdir($dh);
- closedir($dh);
- next unless (@files);
- unshift @{$rfiles}, @files;
- next;
- }
- }
- Warn("skipping file: '$input_file': no matches found\n");
- next;
- }
-
- unless ( -f $input_file ) {
+ if ( !-f $input_file ) {
Warn("skipping file: $input_file: not a regular file\n");
next;
}
# If for example a source file got clobbered somehow,
# the old .tdy or .bak files might still exist so we
# shouldn't overwrite them with zero length files.
- unless ( -s $input_file ) {
+ if ( !-s $input_file ) {
Warn("skipping file: $input_file: Zero size\n");
next;
}
# And avoid formatting extremely large files. Since perltidy reads
# files into memory, trying to process an extremely large file
# could cause system problems.
- my $size_in_mb = ( -s $input_file ) / ( 1024 * 1024 );
- if ( $size_in_mb > $rOpts->{'maximum-file-size-mb'} ) {
+ my $size_in_mb = ( -s $input_file ) / ( CONST_1024 * CONST_1024 );
+ my $maximum_file_size_mb = $rOpts->{'maximum-file-size-mb'};
+ if ( $size_in_mb > $maximum_file_size_mb ) {
$size_in_mb = sprintf( "%0.1f", $size_in_mb );
Warn(
-"skipping file: $input_file: size $size_in_mb MB exceeds limit $rOpts->{'maximum-file-size-mb'}; use -mfs=i to change\n"
+"skipping file: $input_file: size $size_in_mb MB exceeds limit $maximum_file_size_mb; use -maxfs=i to change\n"
);
next;
}
- unless ( ( -T $input_file ) || $rOpts->{'force-read-binary'} ) {
+ if ( !-T $input_file && !$rOpts->{'force-read-binary'} ) {
Warn("skipping file: $input_file: Non-text (override with -f)\n"
);
next;
if ( $in_place_modify && !-w $input_file ) {
my $backup_method = $rOpts->{'backup-method'};
if ( defined($backup_method) && $backup_method eq 'copy' ) {
- Warn
-"skipping file '$input_file' for -b option: file reported as non-writable\n";
+ Warn(
+"skipping file '$input_file' for -b option: file reported as non-writable\n"
+ );
next;
}
}
# add option to change path here
if ( defined( $rOpts->{'output-path'} ) ) {
- my ( $base, $old_path ) = fileparse($fileroot);
+ my ( $base, $old_path_uu ) = fileparse($fileroot);
my $new_path = $rOpts->{'output-path'};
- unless ( -d $new_path ) {
- unless ( mkdir $new_path, 0777 ) {
- Die("unable to create directory $new_path: $ERRNO\n");
- }
+ if ( !-d $new_path ) {
+ mkdir($new_path) # Default MODE is 0777
+ or
+ Die("unable to create directory $new_path: $OS_ERROR\n");
}
my $path = $new_path;
- $fileroot = catfile( $path, $base );
- unless ($fileroot) {
+ $fileroot = File::Spec->catfile( $path, $base );
+ if ( !$fileroot ) {
Die(<<EOM);
------------------------------------------------------------------------
Problem combining $new_path and $base to make a filename; check -opath
# copy source to a string buffer, decoding from utf8 if necessary
my (
- $buf,
+ $rinput_string,
$is_encoded_data,
$decoded_input_as,
$encoding_log_message,
$length_function,
- ) = $self->get_decoded_string_buffer( $input_file, $display_name,
- $rpending_logfile_message );
+ ) = $self->get_decoded_string_buffer( $input_file, $display_name );
# Skip this file on any error
- next if ( !defined($buf) );
+ next if ( !defined($rinput_string) );
# Register this file name with the Diagnostics package, if any.
$diagnostics_object->set_input_file($input_file)
if $diagnostics_object;
- # OK: the (possibly decoded) input is now in string $buf. We just need
- # to to prepare the output and error logger before formatting it.
+ # The (possibly decoded) input is now in string ref $rinput_string.
+ # Now prepare the output stream and error logger.
#--------------------------
# prepare the output stream
#--------------------------
- my $output_file = undef;
+ my $output_file;
my $output_name = EMPTY_STRING;
my $actual_output_extension;
$msg .= " (-pbp contains -st; see manual)" if ($saw_pbp);
Die("$msg\n");
}
- elsif ($destination_stream) {
+
+ if ($destination_stream) {
Die(
"You may not specify a destination array and -o together\n"
);
}
- elsif ( defined( $rOpts->{'output-path'} ) ) {
+
+ if ( defined( $rOpts->{'output-path'} ) ) {
Die("You may not specify -o and -opath together\n");
}
- elsif ( defined( $rOpts->{'output-file-extension'} ) ) {
+
+ if ( defined( $rOpts->{'output-file-extension'} ) ) {
Die("You may not specify -o and -oext together\n");
}
+
$output_file = $rOpts->{outfile};
$output_name = $output_file;
}
else {
if ($in_place_modify) {
-
- # Send output to a temporary array buffer. This will
- # allow efficient copying back to the input by
- # sub backup_and_modify_in_place, below.
- my @tmp_buff;
- $output_file = \@tmp_buff;
$output_name = $display_name;
}
else {
}
}
+ # prepare standard output in case of a dump to stdout
+ if ( $is_encoded_data && $self->[_dump_to_stdout_] ) {
+ binmode *STDOUT, ':encoding(UTF-8)';
+ }
+
$rstatus->{'file_count'} += 1;
$rstatus->{'output_name'} = $output_name;
$rstatus->{'iteration_count'} = 0;
$logger_object->complain( ${$rpending_complaint} );
}
- # Use input line endings if requested
- my $line_separator = $line_separator_default;
- if ( $rOpts->{'preserve-line-endings'} ) {
- my $ls_input = find_input_line_ending($input_file);
- if ( defined($ls_input) ) { $line_separator = $ls_input }
- }
-
# additional parameters needed by lower level routines
$self->[_actual_output_extension_] = $actual_output_extension;
$self->[_debugfile_stream_] = $debugfile_stream;
$self->[_fileroot_] = $fileroot;
$self->[_is_encoded_data_] = $is_encoded_data;
$self->[_length_function_] = $length_function;
- $self->[_line_separator_] = $line_separator;
$self->[_logger_object_] = $logger_object;
$self->[_output_file_] = $output_file;
$self->[_teefile_stream_] = $teefile_stream;
$self->[_input_copied_verbatim_] = 0;
$self->[_input_output_difference_] = 1; ## updated later if -b used
- #----------------------------------------------------------
- # Do all formatting of this buffer.
- # Results will go to the selected output file or streams(s)
- #----------------------------------------------------------
- $self->process_filter_layer($buf);
+ #--------------------
+ # process this buffer
+ #--------------------
+ my $routput_string = $self->process_filter_layer($rinput_string);
- #--------------------------------------------------
- # Handle the -b option (backup and modify in-place)
- #--------------------------------------------------
- if ($in_place_modify) {
+ #------------------------------------------------
+ # send the tidied output to its final destination
+ #------------------------------------------------
+ if ( $rOpts->{'format'} eq 'tidy' && defined($routput_string) ) {
- # For -b option, leave the file unchanged if a severe error caused
- # formatting to be skipped. Otherwise we will overwrite any backup.
- if ( !$self->[_input_copied_verbatim_] ) {
+ $self->write_tidy_output(
+ {
+ routput_string => $routput_string,
+ rinput_file_stat => \@input_file_stat,
+ in_place_modify => $in_place_modify,
+ input_file => $input_file,
+ backup_extension => $backup_extension,
+ delete_backup => $delete_backup,
+ }
+ );
+ }
- my $backup_method = $rOpts->{'backup-method'};
+ $logger_object->finish()
+ if $logger_object;
+ } ## end loop over files
- # Option 1, -bm='copy': uses newer version in which original is
- # copied to the backup and rewritten; see git #103.
- if ( defined($backup_method) && $backup_method eq 'copy' ) {
- $self->backup_method_copy(
- $input_file, $output_file,
- $backup_extension, $delete_backup
- );
- }
+ return;
+} ## end sub process_all_files
- # Option 2, -bm='move': uses older version, where original is
- # moved to the backup and formatted output goes to a new file.
- else {
- $self->backup_method_move(
- $input_file, $output_file,
- $backup_extension, $delete_backup
- );
- }
+sub write_tidy_output {
+
+ my ( $self, $rcall_hash ) = @_;
+
+ # Write tidied output in '$routput_string' to its final destination
+
+ my $routput_string = $rcall_hash->{routput_string};
+ my $rinput_file_stat = $rcall_hash->{rinput_file_stat};
+ my $in_place_modify = $rcall_hash->{in_place_modify};
+ my $input_file = $rcall_hash->{input_file};
+ my $backup_extension = $rcall_hash->{backup_extension};
+ my $delete_backup = $rcall_hash->{delete_backup};
+
+ my $rOpts = $self->[_rOpts_];
+ my $is_encoded_data = $self->[_is_encoded_data_];
+ my $output_file = $self->[_output_file_];
+
+ # There are three main output paths:
+
+ #-------------------------------------------------------------------------
+ # PATH 1: $output_file is not defined: --backup and modify in-place option
+ #-------------------------------------------------------------------------
+ if ($in_place_modify) {
+
+ # For -b option, leave the file unchanged if a severe error caused
+ # formatting to be skipped. Otherwise we will overwrite any backup.
+ if ( !$self->[_input_copied_verbatim_] ) {
+
+ my $backup_method = $rOpts->{'backup-method'};
+
+ #-------------------------------------------------------------
+ # PATH 1a: -bm='copy': uses newer version in which original is
+ # copied to the backup and rewritten; see git #103.
+ #-------------------------------------------------------------
+ if ( defined($backup_method) && $backup_method eq 'copy' ) {
+ $self->backup_method_copy(
+ $input_file, $routput_string,
+ $backup_extension, $delete_backup
+ );
+ }
+
+ #-------------------------------------------------------------
+ # PATH 1b: -bm='move': uses older version, where original is
+ # moved to the backup and formatted output goes to a new file.
+ #-------------------------------------------------------------
+ else {
+ $self->backup_method_move(
+ $input_file, $routput_string,
+ $backup_extension, $delete_backup
+ );
}
- $output_file = $input_file;
}
+ }
- #-------------------------------------------------------------------
- # Otherwise set output file ownership and permissions if appropriate
- #-------------------------------------------------------------------
- elsif ( $output_file && -f $output_file && !-l $output_file ) {
- if (@input_file_stat) {
- if ( $rOpts->{'format'} eq 'tidy' ) {
+ #--------------------------------------------------------------------------
+ # PATH 2: $output_file is a reference (=destination_stream): send output to
+ # a destination stream ref received from an external perl program. We use
+ # a sub to do this because the encoding rules are a bit tricky.
+ #--------------------------------------------------------------------------
+ elsif ( ref($output_file) ) {
+ $self->copy_buffer_to_external_ref( $routput_string, $output_file );
+ }
+
+ #--------------------------------------------------------------------------
+ # PATH 3: $output_file is named file or '-'; send output to the file system
+ #--------------------------------------------------------------------------
+ else {
+
+ #--------------------------
+ # PATH 3a: output to STDOUT
+ #--------------------------
+ if ( $output_file eq '-' ) {
+ my $fh = *STDOUT;
+ if ($is_encoded_data) { binmode $fh, ":raw:encoding(UTF-8)" }
+ else { binmode $fh }
+ $fh->print( ${$routput_string} );
+ }
+
+ #--------------------------------
+ # PATH 3b: output to a named file
+ #--------------------------------
+ else {
+ if ( open( my $fh, '>', $output_file ) ) {
+ if ($is_encoded_data) { binmode $fh, ":raw:encoding(UTF-8)" }
+ else { binmode $fh }
+ $fh->print( ${$routput_string} );
+ $fh->close() or Die("Cannot close '$output_file': $OS_ERROR\n");
+ }
+ else {
+ Die("Cannot open $output_file to write: $OS_ERROR\n");
+ }
+
+ # set output file ownership and permissions if appropriate
+ if ( $output_file && -f $output_file && !-l $output_file ) {
+ if ( @{$rinput_file_stat} ) {
$self->set_output_file_permissions( $output_file,
- \@input_file_stat, $in_place_modify );
+ \@{$rinput_file_stat}, $in_place_modify );
}
-
- # else use default permissions for html and any other format
}
}
- $logger_object->finish()
- if $logger_object;
- } ## end of main loop to process all files
+ # Save diagnostic info
+ if ($is_encoded_data) {
+ $rstatus->{'output_encoded_as'} = 'UTF-8';
+ }
+ }
return;
-} ## end sub process_all_files
+
+} ## end sub write_tidy_output
sub process_filter_layer {
- my ( $self, $buf ) = @_;
+ my ( $self, $rinput_string ) = @_;
# This is the filter layer of processing.
- # Do all requested formatting on the string '$buf', including any
- # pre- and post-processing with filters.
- # Store the results in the selected output file(s) or stream(s).
+ # Do all requested formatting on the string ref '$rinput_string', including
+ # any pre- and post-processing with filters.
+ # Returns:
+ # $routput_string = ref to tidied output if in 'tidy' mode
+ # (nothing) if not in 'tidy' mode [these modes handle output separately]
# Total formatting is done with these layers of subroutines:
# perltidy - main routine; checks run parameters
# process_single_case - solves one formatting problem
# Data Flow in this layer:
- # $buf
- # -> optional prefilter operation
+ # $rinput_string
+ # -> optional prefilter operations
# -> [ formatting by sub process_iteration_layer ]
- # -> ( optional postfilter_buffer for postfilter, other operations )
- # -> ( optional destination_buffer for encoding )
- # -> final sink_object
+ # -> return if not in 'tidy' mode
+ # -> optional postfilter operations
+ # -> $routput_string
# What is done based on format type:
# utf8 decoding is done for all format types
# prefiltering is applied to all format types
# - because it may be needed to get through the tokenizer
# postfiltering is only done for format='tidy'
- # - might cause problems operating on html text
+ # - not appropriate for html text, which has already been output
# encoding of decoded output is only done for format='tidy'
# - because html does its own encoding; user formatter does what it wants
- my $rOpts = $self->[_rOpts_];
- my $is_encoded_data = $self->[_is_encoded_data_];
- my $logger_object = $self->[_logger_object_];
- my $output_file = $self->[_output_file_];
- my $user_formatter = $self->[_user_formatter_];
- my $destination_stream = $self->[_destination_stream_];
- my $prefilter = $self->[_prefilter_];
- my $postfilter = $self->[_postfilter_];
- my $decoded_input_as = $self->[_decoded_input_as_];
- my $line_separator = $self->[_line_separator_];
-
- my $remove_terminal_newline =
- !$rOpts->{'add-terminal-newline'} && substr( $buf, -1, 1 ) !~ /\n/;
-
- # vars for postfilter, if used
- my $use_postfilter_buffer;
- my $postfilter_buffer;
-
- # vars for destination buffer, if used
- my $destination_buffer;
- my $use_destination_buffer;
- my $encode_destination_buffer;
+ # Be sure the string we received is defined
+ if ( !defined($rinput_string) ) {
+ Fault("bad call: the source string ref \$rinput_string is undefined\n");
+ }
+ if ( ref($rinput_string) ne 'SCALAR' ) {
+ Fault("bad call: the source string ref is not SCALAR\n");
+ }
+
+ my $rOpts = $self->[_rOpts_];
+ my $logger_object = $self->[_logger_object_];
- # vars for iterations, if done
- my $sink_object;
+ # vars for --line-range-tidy filter, if needed
+ my @input_lines_pre;
+ my @input_lines_post;
# vars for checking assertions, if needed
my $digest_input;
my $saved_input_buf;
- my $ref_destination_stream = ref($destination_stream);
+ # var for checking --noadd-terminal-newline
+ my $chomp_terminal_newline;
- # Setup vars for postfilter, destination buffer, assertions and sink object
- # if needed. These are only used for 'tidy' formatting.
+ # Setup post-filter vars; these apply to 'tidy' mode only
if ( $rOpts->{'format'} eq 'tidy' ) {
- # evaluate MD5 sum of input file, if needed, before any prefilter
- if ( $rOpts->{'assert-tidy'}
- || $rOpts->{'assert-untidy'}
- || $rOpts->{'backup-and-modify-in-place'} )
- {
- $digest_input = $md5_hex->($buf);
- $saved_input_buf = $buf;
- }
-
- #-----------------------
- # Setup postfilter buffer
- #-----------------------
- # If we need access to the output for filtering or checking assertions
- # before writing to its ultimate destination, then we will send it
- # to a temporary buffer. The variables are:
- # $postfilter_buffer = the buffer to capture the output
- # $use_postfilter_buffer = is a postfilter buffer used?
- # These are used below, just after iterations are made.
- $use_postfilter_buffer =
- $postfilter
- || $remove_terminal_newline
- || $rOpts->{'assert-tidy'}
- || $rOpts->{'assert-untidy'}
- || $rOpts->{'backup-and-modify-in-place'};
-
- #-------------------------
- # Setup destination_buffer
- #-------------------------
- # If the final output destination is not a file, then we might need to
- # encode the result at the end of processing. So in this case we will
- # send the output to a temporary buffer.
- # The key variables are:
- # $destination_buffer - receives the formatted output
- # $use_destination_buffer - is $destination_buffer used?
- # $encode_destination_buffer - encode $destination_buffer?
- # These are used by sub 'copy_buffer_to_destination', below
-
- if ($ref_destination_stream) {
- $use_destination_buffer = 1;
- $output_file = \$destination_buffer;
- $self->[_output_file_] = $output_file;
-
- # Strings and arrays use special encoding rules
- if ( $ref_destination_stream eq 'SCALAR'
- || $ref_destination_stream eq 'ARRAY' )
- {
- $encode_destination_buffer =
- $rOpts->{'encode-output-strings'} && $decoded_input_as;
- }
+ #---------------------------------------------------------------------
+ # for --line-range-tidy, clip '$rinput_string' to a limited line range
+ #---------------------------------------------------------------------
+ my $line_tidy_begin = $self->[_line_tidy_begin_];
+ if ($line_tidy_begin) {
+
+ my @input_lines = split /^/, ${$rinput_string};
- # An object with a print method will use file encoding rules
- elsif ( $ref_destination_stream->can('print') ) {
- $encode_destination_buffer = $is_encoded_data;
+ my $num = @input_lines;
+ if ( $line_tidy_begin > $num ) {
+ Die(<<EOM);
+#--line-range-tidy=n1:n2 has n1=$line_tidy_begin which exceeds max line number of $num
+EOM
}
else {
- confess <<EOM;
-------------------------------------------------------------------------
-No 'print' method is defined for object of class '$ref_destination_stream'
-Please check your call to Perl::Tidy::perltidy. Trace follows.
-------------------------------------------------------------------------
-EOM
+ my $line_tidy_end = $self->[_line_tidy_end_];
+ if ( !defined($line_tidy_end) || $line_tidy_end > $num ) {
+ $line_tidy_end = $num;
+ }
+ my $input_string = join EMPTY_STRING,
+ @input_lines[ $line_tidy_begin - 1 .. $line_tidy_end - 1 ];
+ $rinput_string = \$input_string;
+
+ @input_lines_pre = @input_lines[ 0 .. $line_tidy_begin - 2 ];
+ @input_lines_post = @input_lines[ $line_tidy_end .. $num - 1 ];
}
}
- #-------------------------------------------
- # Make a sink object for the iteration phase
- #-------------------------------------------
- $sink_object = Perl::Tidy::LineSink->new(
- output_file => $use_postfilter_buffer
- ? \$postfilter_buffer
- : $output_file,
- line_separator => $line_separator,
- is_encoded_data => $is_encoded_data,
- );
+ #------------------------------------------
+ # evaluate MD5 sum of input file, if needed
+ #------------------------------------------
+ if ( $rOpts->{'assert-tidy'}
+ || $rOpts->{'assert-untidy'}
+ || $rOpts->{'backup-and-modify-in-place'} )
+ {
+ $digest_input = $md5_hex->( ${$rinput_string} );
+ $saved_input_buf = ${$rinput_string};
+ }
+
+ # When -noadd-terminal-newline is set, and the input does not
+ # have a newline, then we remove the final newline of the output
+ $chomp_terminal_newline = !$rOpts->{'add-terminal-newline'}
+ && substr( ${$rinput_string}, -1, 1 ) !~ /\n/;
+
}
#-----------------------------------------------------------------------
# for all format types ('tidy', 'html', 'user') because it may be needed
# to avoid tokenization errors.
#-----------------------------------------------------------------------
- $buf = $prefilter->($buf) if $prefilter;
-
- #----------------------------------------------------------------------
- # Format contents of string '$buf', iterating if requested.
- # For 'tidy', formatted result will be written to '$sink_object'
- # For 'html' and 'user', result goes directly to its ultimate destination.
- #----------------------------------------------------------------------
- $self->process_iteration_layer( $buf, $sink_object );
-
- #--------------------------------
- # Do postfilter buffer processing
- #--------------------------------
- if ($use_postfilter_buffer) {
-
- my $sink_object_post = Perl::Tidy::LineSink->new(
- output_file => $output_file,
- line_separator => $line_separator,
- is_encoded_data => $is_encoded_data,
- );
+ my $prefilter = $self->[_prefilter_];
+ if ($prefilter) {
+ my $input_string = $prefilter->( ${$rinput_string} );
+ $rinput_string = \$input_string;
+ }
- #----------------------------------------------------------------------
- # Apply any postfilter. The postfilter is a code reference that will be
- # applied to the source after tidying.
- #----------------------------------------------------------------------
- my $buf_post =
- $postfilter
- ? $postfilter->($postfilter_buffer)
- : $postfilter_buffer;
-
- if ( defined($digest_input) ) {
- my $digest_output = $md5_hex->($buf_post);
- $self->[_input_output_difference_] =
- $digest_output ne $digest_input;
- }
-
- # Check if file changed if requested, but only after any postfilter
- if ( $rOpts->{'assert-tidy'} ) {
- if ( $self->[_input_output_difference_] ) {
- my $diff_msg =
- compare_string_buffers( $saved_input_buf, $buf_post,
- $is_encoded_data );
- $logger_object->warning(<<EOM);
-assertion failure: '--assert-tidy' is set but output differs from input
-EOM
- $logger_object->interrupt_logfile();
- $logger_object->warning( $diff_msg . "\n" );
- $logger_object->resume_logfile();
- }
- }
+ #-------------------------------------------
+ # Format contents of string '$rinput_string'
+ #-------------------------------------------
+ my $routput_string = $self->process_iteration_layer($rinput_string);
- if ( $rOpts->{'assert-untidy'} ) {
- if ( !$self->[_input_output_difference_] ) {
- $logger_object->warning(
-"assertion failure: '--assert-untidy' is set but output equals input\n"
- );
- }
- }
+ #-------------------------------
+ # All done if not in 'tidy' mode
+ #-------------------------------
+ if ( $rOpts->{'format'} ne 'tidy' ) {
+ return;
+ }
- my $source_object = Perl::Tidy::LineSource->new(
- input_file => \$buf_post,
- rOpts => $rOpts,
- );
+ #---------------------
+ # apply any postfilter
+ #---------------------
+ my $postfilter = $self->[_postfilter_];
+ if ($postfilter) {
+ my $output_string = $postfilter->( ${$routput_string} );
+ $routput_string = \$output_string;
+ }
- # Copy the filtered buffer to the final destination
- if ( !$remove_terminal_newline ) {
- while ( my $line = $source_object->get_line() ) {
- $sink_object_post->write_line($line);
- }
- }
- else {
+ if ( defined($digest_input) ) {
+ my $digest_output = $md5_hex->( ${$routput_string} );
+ $self->[_input_output_difference_] = $digest_output ne $digest_input;
+ }
- # Copy the filtered buffer but remove the newline char from the
- # final line
- my $line;
- while ( my $next_line = $source_object->get_line() ) {
- $sink_object_post->write_line($line) if ($line);
- $line = $next_line;
- }
- if ($line) {
- $sink_object_post->set_line_separator(undef);
- chomp $line;
- $sink_object_post->write_line($line);
- }
+ #-----------------------------------------------------
+ # check for changes if requested by 'assert-...' flags
+ #-----------------------------------------------------
+ if ( $rOpts->{'assert-tidy'} ) {
+ if ( $self->[_input_output_difference_] ) {
+ my $diff_msg =
+ compare_string_buffers( \$saved_input_buf, $routput_string );
+ $logger_object->warning(<<EOM);
+assertion failure: '--assert-tidy' is set but output differs from input
+EOM
+ $logger_object->interrupt_logfile();
+ $logger_object->warning( $diff_msg . "\n" );
+ $logger_object->resume_logfile();
}
- $sink_object_post->close_output_file();
- $source_object->close_input_file();
}
- #--------------------------------------------------------
- # Do destination buffer processing, encoding if required.
- #--------------------------------------------------------
- if ($use_destination_buffer) {
- $self->copy_buffer_to_destination( $destination_buffer,
- $destination_stream, $encode_destination_buffer );
+ if ( $rOpts->{'assert-untidy'} ) {
+ if ( !$self->[_input_output_difference_] ) {
+ $logger_object->warning(
+"assertion failure: '--assert-untidy' is set but output equals input\n"
+ );
+ }
}
- else {
- # output went to a file in 'tidy' mode...
- if ( $is_encoded_data && $rOpts->{'format'} eq 'tidy' ) {
- $rstatus->{'output_encoded_as'} = 'UTF-8';
+ #----------------------------------------
+ # do --line-range-tidy line recombination
+ #----------------------------------------
+ if ( @input_lines_pre || @input_lines_post ) {
+ my $str_pre = join EMPTY_STRING, @input_lines_pre;
+ my $str_post = join EMPTY_STRING, @input_lines_post;
+ my $output_string = $str_pre . ${$routput_string} . $str_post;
+ $routput_string = \$output_string;
+ }
+
+ #-----------------------------------------
+ # handle a '--noadd-terminal-newline' flag
+ #-----------------------------------------
+ if ($chomp_terminal_newline) {
+ chomp ${$routput_string};
+ }
+
+ #-------------------------------------------------------------
+ # handle --preserve-line-endings or -output-line-endings flags
+ #-------------------------------------------------------------
+ # The native line separator has been used in all intermediate
+ # iterations and filter operations until here so that string
+ # operations work ok.
+ if ( $self->[_line_separator_] ne "\n" ) {
+ my $line_separator = $self->[_line_separator_];
+ my @output_lines = split /^/, ${$routput_string};
+ foreach my $line (@output_lines) {
+
+ # must check chomp because last line might not have a newline
+ # if --noadd-terminal-newline is also set (c283)
+ if ( chomp $line ) {
+ $line .= $line_separator;
+ }
}
+ my $output_string = join EMPTY_STRING, @output_lines;
+ $routput_string = \$output_string;
}
- # The final formatted result should now be in the selected output file(s)
- # or stream(s).
- return;
-
+ return $routput_string;
} ## end sub process_filter_layer
+# For safety, set an upper bound on number of iterations before stopping.
+# The average number of iterations is 2. No known cases exceed 4.
+use constant ITERATION_LIMIT => 6;
+
sub process_iteration_layer {
- my ( $self, $buf, $sink_object ) = @_;
+ my ( $self, $rinput_string ) = @_;
# This is the iteration layer of processing.
- # Do all formatting, iterating if requested, on the source string $buf.
+ # Do all formatting, iterating if requested, on the source $rinput_string
# Output depends on format type:
# For 'tidy' formatting, output goes to sink object
# For 'html' formatting, output goes to the ultimate destination
# process_single_case - solves one formatting problem
# Data Flow in this layer:
- # $buf -> [ loop over iterations ] -> $sink_object
-
- # Only 'tidy' formatting can use multiple iterations.
+ # $rinput_string -> [ loop over iterations ] -> $routput_string
my $diagnostics_object = $self->[_diagnostics_object_];
my $display_name = $self->[_display_name_];
my $fileroot = $self->[_fileroot_];
my $is_encoded_data = $self->[_is_encoded_data_];
my $length_function = $self->[_length_function_];
- my $line_separator = $self->[_line_separator_];
my $logger_object = $self->[_logger_object_];
my $rOpts = $self->[_rOpts_];
- my $tabsize = $self->[_tabsize_];
my $user_formatter = $self->[_user_formatter_];
- # create a source object for the buffer
- my $source_object = Perl::Tidy::LineSource->new(
- input_file => \$buf,
- rOpts => $rOpts,
- );
-
# make a debugger object if requested
my $debugger_object;
if ( $rOpts->{DEBUG} ) {
# make a tee file handle if requested
my $fh_tee;
+ my $tee_file;
if ( $rOpts->{'tee-pod'}
|| $rOpts->{'tee-block-comments'}
|| $rOpts->{'tee-side-comments'} )
{
- my $tee_file = $self->[_teefile_stream_]
+ $tee_file = $self->[_teefile_stream_]
|| $fileroot . $self->make_file_extension('TEE');
- ( $fh_tee, my $tee_filename ) =
- Perl::Tidy::streamhandle( $tee_file, 'w', $is_encoded_data );
+ $fh_tee = Perl::Tidy::streamhandle( $tee_file, 'w', $is_encoded_data );
if ( !$fh_tee ) {
- Warn("couldn't open TEE file $tee_file: $ERRNO\n");
+ Warn("couldn't open TEE file $tee_file: $OS_ERROR\n");
}
}
# vars for iterations and convergence test
my $max_iterations = 1;
my $convergence_log_message;
- my $do_convergence_test;
my %saw_md5;
# Only 'tidy' formatting can use multiple iterations
# check iteration count and quietly fix if necessary:
# - iterations option only applies to code beautification mode
# - the convergence check should stop most runs on iteration 2, and
- # virtually all on iteration 3. But we'll allow up to 6.
+ # virtually all on iteration 3. We allow up to ITERATION_LIMIT.
$max_iterations = $rOpts->{'iterations'};
if ( !defined($max_iterations)
|| $max_iterations <= 0 )
{
$max_iterations = 1;
}
- elsif ( $max_iterations > 6 ) {
- $max_iterations = 6;
+
+ if ( $max_iterations > ITERATION_LIMIT ) {
+ $max_iterations = ITERATION_LIMIT;
}
# get starting MD5 sum for convergence test
if ( $max_iterations > 1 ) {
- $do_convergence_test = 1;
- my $digest = $md5_hex->($buf);
+ my $digest = $md5_hex->( ${$rinput_string} );
$saw_md5{$digest} = 0;
}
}
# save objects to allow redirecting output during iterations
- my $sink_object_final = $sink_object;
my $logger_object_final = $logger_object;
my $iteration_of_formatter_convergence;
+ my $routput_string;
#---------------------
# Loop over iterations
$rstatus->{'iteration_count'} += 1;
- # send output stream to temp buffers until last iteration
- my $sink_buffer;
- if ( $iter < $max_iterations ) {
- $sink_object = Perl::Tidy::LineSink->new(
- output_file => \$sink_buffer,
- line_separator => $line_separator,
- is_encoded_data => $is_encoded_data,
- );
- }
- else {
- $sink_object = $sink_object_final;
- }
+ # create a string to capture the output
+ my $sink_buffer = EMPTY_STRING;
+ $routput_string = \$sink_buffer;
# Save logger, debugger and tee output only on pass 1 because:
# (1) line number references must be to the starting
# being deleted.
if ( $iter > 1 ) {
- $debugger_object->close_debug_file() if ($debugger_object);
- $fh_tee->close() if ($fh_tee);
+ $debugger_object->close_debug_file()
+ if ($debugger_object);
+
+ if ( $fh_tee
+ && $fh_tee->can('close')
+ && !ref($tee_file)
+ && $tee_file ne '-' )
+ {
+ $fh_tee->close()
+ or Warn("couldn't close TEE file $tee_file: $OS_ERROR\n");
+ }
$debugger_object = undef;
$logger_object = undef;
$formatter = Perl::Tidy::Formatter->new(
logger_object => $logger_object,
diagnostics_object => $diagnostics_object,
- sink_object => $sink_object,
+ sink_object => $routput_string,
length_function => $length_function,
is_encoded_data => $is_encoded_data,
fh_tee => $fh_tee,
Die("I don't know how to do -format=$rOpts->{'format'}\n");
}
- unless ($formatter) {
+ if ( !$formatter ) {
Die("Unable to continue with $rOpts->{'format'} formatting\n");
}
# create the tokenizer for this file
#-----------------------------------
my $tokenizer = Perl::Tidy::Tokenizer->new(
- source_object => $source_object,
+ source_object => $rinput_string,
logger_object => $logger_object,
debugger_object => $debugger_object,
diagnostics_object => $diagnostics_object,
- tabsize => $tabsize,
rOpts => $rOpts,
-
- starting_level => $rOpts->{'starting-indentation-level'},
- indent_columns => $rOpts->{'indent-columns'},
- look_for_hash_bang => $rOpts->{'look-for-hash-bang'},
- look_for_autoloader => $rOpts->{'look-for-autoloader'},
- look_for_selfloader => $rOpts->{'look-for-selfloader'},
- trim_qw => $rOpts->{'trim-qw'},
- extended_syntax => $rOpts->{'extended-syntax'},
-
- continuation_indentation => $rOpts->{'continuation-indentation'},
- outdent_labels => $rOpts->{'outdent-labels'},
+ starting_level => $rOpts->{'starting-indentation-level'},
);
#---------------------------------
#---------------------------------
$self->process_single_case( $tokenizer, $formatter );
- #-----------------------------------------
- # close the input source and report errors
- #-----------------------------------------
- $source_object->close_input_file();
+ #--------------
+ # report errors
+ #--------------
# see if the formatter is converged
if ( $max_iterations > 1
# temporary output buffer
if ( $iter < $max_iterations ) {
- $sink_object->close_output_file();
- $source_object = Perl::Tidy::LineSource->new(
- input_file => \$sink_buffer,
- rOpts => $rOpts,
- );
+ $rinput_string = \$sink_buffer;
# stop iterations if errors or converged
my $stop_now = $self->[_input_copied_verbatim_];
my $stopping_on_error = $stop_now;
if ($stop_now) {
$convergence_log_message = <<EOM;
-Stopping iterations because of severe errors.
+Stopping iterations because of severe errors.
EOM
}
- elsif ($do_convergence_test) {
+
+ # or do convergence test
+ else {
# stop if the formatter has converged
$stop_now ||= defined($iteration_of_formatter_convergence);
if ( !defined( $saw_md5{$digest} ) ) {
$saw_md5{$digest} = $iter;
}
+
+ # do a second iteration if all ok and requested by formatter
+ # to allow delayed adding/deleting of commas (git156, git143)
+ elsif ( $iter == 1
+ && !$stop_now
+ && $formatter->can('want_second_iteration')
+ && $formatter->want_second_iteration() )
+ {
+ ## deja vu, but do not set $stop_now
+ $saw_md5{$digest} = $iter;
+ }
else {
# Deja vu, stop iterating
+
$stop_now = 1;
my $iterm = $iter - 1;
if ( $saw_md5{$digest} != $iterm ) {
# them when they happen.
$rstatus->{'blinking'} = 1;
$convergence_log_message = <<EOM;
-BLINKER. Output for iteration $iter same as for $saw_md5{$digest}.
+BLINKER. Output for iteration $iter same as for $saw_md5{$digest}.
EOM
$stopping_on_error ||= $convergence_log_message;
DEVEL_MODE
- && print STDERR $convergence_log_message;
+ && print {*STDERR} $convergence_log_message;
$diagnostics_object->write_diagnostics(
$convergence_log_message)
if $diagnostics_object;
$rstatus->{'converged'} = 1;
}
}
- } ## end if ($do_convergence_test)
+ }
if ($stop_now) {
# convergence test above is temporarily skipped for
# testing.
if ( $iteration_of_formatter_convergence < $iter - 1 ) {
- print STDERR
+ print {*STDERR}
"STRANGE Early conv in $display_name: Stopping on it=$iter, converged in formatter on $iteration_of_formatter_convergence\n";
}
}
elsif ( !$stopping_on_error ) {
- print STDERR
+
+ # The md5 sum implies convergence but the convergence
+ # was not detected by the Formatter. This is not
+ # critical but should be investigated. It happened
+ # once when a line break was placed before a phantom
+ # comma under -qwaf, and has been fixed.
+ print {*STDERR}
"STRANGE no conv in $display_name: stopping on it=$iter, but not converged in formatter\n";
}
+ else {
+ ## looks ok
+ }
}
# we are stopping the iterations early;
- # copy the output stream to its final destination
- $sink_object = $sink_object_final;
- while ( my $line = $source_object->get_line() ) {
- $sink_object->write_line($line);
- }
- $source_object->close_input_file();
last;
}
} ## end if ( $iter < $max_iterations)
} ## end loop over iterations for one source file
- $sink_object->close_output_file() if $sink_object;
- $debugger_object->close_debug_file() if $debugger_object;
- $fh_tee->close() if $fh_tee;
+ $debugger_object->close_debug_file()
+ if $debugger_object;
+
+ if ( $fh_tee
+ && $fh_tee->can('close')
+ && !ref($tee_file)
+ && $tee_file ne '-' )
+ {
+ $fh_tee->close()
+ or Warn("couldn't close TEE file $tee_file: $OS_ERROR\n");
+ }
# leave logger object open for additional messages
$logger_object = $logger_object_final;
$logger_object->write_logfile_entry($convergence_log_message)
if $convergence_log_message;
- return;
+ return $routput_string;
} ## end sub process_iteration_layer
sub process_single_case {
- # run the formatter on a single defined case
my ( $self, $tokenizer, $formatter ) = @_;
+ # Run the formatter on a single defined case
+
# Total formatting is done with these layers of subroutines:
# perltidy - main routine; checks run parameters
# process_all_files - main loop to process all files;
return;
} ## end sub process_single_case
-sub copy_buffer_to_destination {
+sub copy_buffer_to_external_ref {
- my ( $self, $destination_buffer, $destination_stream,
- $encode_destination_buffer )
- = @_;
+ my ( $self, $routput, $destination_stream ) = @_;
- # Copy $destination_buffer to the final $destination_stream,
+ # Copy $routput to the final $destination_stream,
# encoding if the flag $encode_destination_buffer is true.
# Data Flow:
# $destination_buffer -> [ encode? ] -> $destination_stream
+ my $destination_buffer = EMPTY_STRING;
+ if ( ref($routput) eq 'ARRAY' ) {
+ $destination_buffer = join EMPTY_STRING, @{$routput};
+ }
+ elsif ( ref($routput) eq 'SCALAR' ) {
+ $destination_buffer = ${$routput};
+ }
+ else {
+ Fatal(
+ "'copy_buffer_to_external_ref' expecting ref to ARRAY or SCALAR\n");
+ }
+
$rstatus->{'output_encoded_as'} = EMPTY_STRING;
+ my $ref_destination_stream = ref($destination_stream);
+
+ # Encode output? Strings and arrays use special encoding rules; see:
+ # https://github.com/perltidy/perltidy/blob/master/docs/eos_flag.md
+ my $encode_destination_buffer;
+ if ( $ref_destination_stream eq 'SCALAR'
+ || $ref_destination_stream eq 'ARRAY' )
+ {
+ my $rOpts = $self->[_rOpts_];
+ $encode_destination_buffer =
+ $rOpts->{'encode-output-strings'} && $self->[_decoded_input_as_];
+ }
+
+ # An object with a print method will use file encoding rules
+ elsif ( $ref_destination_stream->can('print') ) {
+ $encode_destination_buffer = $self->[_is_encoded_data_];
+ }
+ else {
+ confess <<EOM;
+------------------------------------------------------------------------
+No 'print' method is defined for object of class '$ref_destination_stream'
+Please check your call to Perl::Tidy::perltidy. Trace follows.
+------------------------------------------------------------------------
+EOM
+ }
if ($encode_destination_buffer) {
my $encoded_buffer;
}
# Send data for SCALAR, ARRAY & OBJ refs to its final destination
- if ( ref($destination_stream) eq 'SCALAR' ) {
+ if ( $ref_destination_stream eq 'SCALAR' ) {
${$destination_stream} = $destination_buffer;
}
- elsif ($destination_buffer) {
+ elsif ( defined($destination_buffer) ) {
my @lines = split /^/, $destination_buffer;
- if ( ref($destination_stream) eq 'ARRAY' ) {
+ if ( $ref_destination_stream eq 'ARRAY' ) {
@{$destination_stream} = @lines;
}
foreach my $line (@lines) {
$destination_stream->print($line);
}
- my $ref_destination_stream = ref($destination_stream);
if ( $ref_destination_stream->can('close') ) {
$destination_stream->close();
}
# happen for example if user deleted all pod or comments
}
return;
-} ## end sub copy_buffer_to_destination
+} ## end sub copy_buffer_to_external_ref
} ## end of closure for sub perltidy
sub line_diff {
- # Given two strings, return
- # $diff_marker = a string with carat (^) symbols indicating differences
- # $pos1 = character position of first difference; pos1=-1 if no difference
+ my ( $s1, $s2 ) = @_;
+
+ # Given two strings, Return
+ # $diff_marker = a string with caret (^) symbols indicating differences
+ # $pos1 = character position of first difference; pos1=-1 if no difference
# Form exclusive or of the strings, which has null characters where strings
# have same common characters so non-null characters indicate character
# differences.
- my ( $s1, $s2 ) = @_;
my $diff_marker = EMPTY_STRING;
my $pos = -1;
- my $pos1 = $pos;
+ my $pos1 = -1;
if ( defined($s1) && defined($s2) ) {
- my $count = 0;
- my $mask = $s1 ^ $s2;
+ my $mask = $s1 ^ $s2;
while ( $mask =~ /[^\0]/g ) {
- $count++;
my $pos_last = $pos;
$pos = $LAST_MATCH_START[0];
- if ( $count == 1 ) { $pos1 = $pos; }
+ if ( $pos1 < 0 ) { $pos1 = $pos; }
$diff_marker .= SPACE x ( $pos - $pos_last - 1 ) . '^';
# we could continue to mark all differences, but there is no point
last;
- }
+ } ## end while ( $mask =~ /[^\0]/g)
}
- return wantarray ? ( $diff_marker, $pos1 ) : $diff_marker;
+ return ( $diff_marker, $pos1 );
} ## end sub line_diff
sub compare_string_buffers {
+ my ( $string_i, $string_o, ($max_diff_count) ) = @_;
+
# Compare input and output string buffers and return a brief text
# description of the first difference.
- my ( $bufi, $bufo, $is_encoded_data ) = @_;
- my $leni = length($bufi);
- my $leno = defined($bufo) ? length($bufo) : 0;
- my $msg =
- "Input file length is $leni chars\nOutput file length is $leno chars\n";
- return $msg unless $leni && $leno;
-
- my ( $fhi, $fnamei ) = streamhandle( \$bufi, 'r', $is_encoded_data );
- my ( $fho, $fnameo ) = streamhandle( \$bufo, 'r', $is_encoded_data );
- return $msg unless ( $fho && $fhi ); # for safety, shouldn't happen
- my ( $linei, $lineo );
- my ( $counti, $counto ) = ( 0, 0 );
- my ( $last_nonblank_line, $last_nonblank_count ) = ( EMPTY_STRING, 0 );
+ # Given:
+ # $string_i = input string, or ref to input string
+ # $string_o = output string, or ref to output string
+ # $max_diff_count = optional maximum number of differences to show,
+ # default=1
+ # Return:
+ # a string showing differences
+
+ my $rbufi = ref($string_i) ? $string_i : \$string_i;
+ my $rbufo = ref($string_o) ? $string_o : \$string_o;
+
+ if ( !defined($max_diff_count) ) { $max_diff_count = 1 }
+
+ my ( @aryi, @aryo );
+ my ( $leni, $leno ) = ( 0, 0 );
+ if ( defined($rbufi) ) {
+ $leni = length( ${$rbufi} );
+ @aryi = split /^/, ${$rbufi};
+ }
+ if ( defined($rbufo) ) {
+ $leno = length( ${$rbufo} );
+ @aryo = split /^/, ${$rbufo};
+ }
+ my $nlines_i = @aryi;
+ my $nlines_o = @aryo;
+ my $msg = <<EOM;
+Input file length has $leni chars in $nlines_i lines
+Output file length has $leno chars in $nlines_o lines
+EOM
+ return $msg unless ( $leni && $leno );
+
my $truncate = sub {
my ( $str, $lenmax ) = @_;
if ( length($str) > $lenmax ) {
$str = substr( $str, 0, $lenmax ) . "...";
}
return $str;
- };
- while (1) {
- if ($linei) {
- $last_nonblank_line = $linei;
- $last_nonblank_count = $counti;
+ }; ## end $truncate = sub
+
+ my $last_nonblank_line = EMPTY_STRING;
+ my $last_nonblank_count = 0;
+
+ # loop over lines until we find a difference
+ my $count = 0;
+ my $diff_count = 0;
+ while ( @aryi && @aryo ) {
+ $count++;
+ my $linei = shift @aryi;
+ my $lineo = shift @aryo;
+ chomp $linei;
+ chomp $lineo;
+ if ( $linei eq $lineo ) {
+ if ( length($linei) ) {
+ $last_nonblank_line = $linei;
+ $last_nonblank_count = $count;
+ }
+ next;
}
- $linei = $fhi->getline();
- $lineo = $fho->getline();
- # compare chomp'ed lines
- if ( defined($linei) ) { $counti++; chomp $linei }
- if ( defined($lineo) ) { $counto++; chomp $lineo }
-
- # see if one or both ended before a difference
- last unless ( defined($linei) && defined($lineo) );
-
- next if ( $linei eq $lineo );
-
- # lines differ ...
+ #---------------------------
+ # lines differ ... finish up
+ #---------------------------
my ( $line_diff, $pos1 ) = line_diff( $linei, $lineo );
- my $reason = "Files first differ at character $pos1 of line $counti";
+ my $ch1 = $pos1 + 1;
+ my $reason = "Files first differ at character $ch1 of line $count";
my ( $leading_ws_i, $leading_ws_o ) = ( EMPTY_STRING, EMPTY_STRING );
if ( $linei =~ /^(\s+)/ ) { $leading_ws_i = $1; }
$last_nonblank_count:$last_nonblank_line
EOM
}
- $line_diff = SPACE x ( 2 + length($counto) ) . $line_diff;
+ $line_diff = SPACE x ( 2 + length($count) ) . $line_diff;
$msg .= <<EOM;
-<$counti:$linei
->$counto:$lineo
-$line_diff
+<$count:$linei
+>$count:$lineo
+$line_diff
EOM
- return $msg;
- } ## end while
+ $diff_count++;
+ last if ( $diff_count >= $max_diff_count );
+ } ## end while ( @aryi && @aryo )
- # no line differences found, but one file may have fewer lines
- if ( $counti > $counto ) {
+ if ($diff_count) { return $msg }
+
+ #------------------------------------------------------
+ # no differences found, see if one file has fewer lines
+ #------------------------------------------------------
+ if ( $nlines_i > $nlines_o ) {
$msg .= <<EOM;
Files initially match file but output file has fewer lines
EOM
}
- elsif ( $counti < $counto ) {
+ elsif ( $nlines_i < $nlines_o ) {
$msg .= <<EOM;
Files initially match file but input file has fewer lines
EOM
# modified (corrected) from version in find2perl
my $x = shift;
- $x =~ s#([./^\$()])#\\$1#g; # escape special characters
- $x =~ s#\*#.*#g; # '*' -> '.*'
- $x =~ s#\?#.#g; # '?' -> '.'
- return "^$x\\z"; # match whole word
+ $x =~ s/([.\/^\$()])/\\$1/g; # escape special characters
+ $x =~ s/\*/.*/g; # '*' -> '.*'
+ $x =~ s/\?/./g; # '?' -> '.'
+ return "^$x\\z"; # match whole word
} ## end sub fileglob_to_re
sub make_logfile_header {
}
my $options_string = join( SPACE, @{$rraw_options} );
- if ($config_file) {
+ if ( defined($config_file) ) {
$msg .= "Found Configuration File >>> $config_file \n";
}
$msg .= "Configuration and command line parameters for this run:\n";
$msg .= "$options_string\n";
- if ( $rOpts->{'DEBUG'} || $rOpts->{'show-options'} ) {
+ if ( $rOpts->{'show-options'} ) {
$rOpts->{'logfile'} = 1; # force logfile to be saved
$msg .= "Final parameter set for this run\n";
$msg .= "------------------------------------\n";
# %option_range - a hash giving the valid ranges of certain options
# Note: a few options are not documented in the man page and usage
- # message. This is because these are experimental or debug options and
- # may or may not be retained in future versions.
+ # message. This is because these are deprecated, experimental or debug
+ # options and may or may not be retained in future versions:
+
+ # These undocumented flags are accepted but not used:
+ # --check-syntax
+ # --fuzzy-line-length
#
- # Here are the undocumented flags as far as I know. Any of them
- # may disappear at any time. They are mainly for fine-tuning
- # and debugging.
+ # These undocumented flags are for debugging:
+ # --recombine # used to debug line breaks
+ # --short-concatenation-item-length # used to break a '.' chain
#
- # fll --> fuzzy-line-length # a trivial parameter which gets
- # turned off for the extrude option
- # which is mainly for debugging
- # scl --> short-concatenation-item-length # helps break at '.'
- # recombine # for debugging line breaks
- # I --> DIAGNOSTICS # for debugging [**DEACTIVATED**]
######################################################################
# here is a summary of the Getopt codes:
my %expansion = ();
my %option_category = ();
my %option_range = ();
- my $rexpansion = \%expansion;
+ my %integer_option_range;
# names of categories in manual
# leading integers will allow sorting
$option_category{$long_name} = $category_name[$category];
if ($short_name) {
if ( $expansion{$short_name} ) {
- my $existing_name = $expansion{$short_name}[0];
+ my $existing_name = $expansion{$short_name}->[0];
Die(
"redefining abbreviation $short_name for $long_name; already used for $existing_name\n"
);
my $nshort_name = 'n' . $short_name;
my $nolong_name = 'no' . $long_name;
if ( $expansion{$nshort_name} ) {
- my $existing_name = $expansion{$nshort_name}[0];
+ my $existing_name = $expansion{$nshort_name}->[0];
Die(
"attempting to redefine abbreviation $nshort_name for $nolong_name; already used for $existing_name\n"
);
}
}
return;
- };
+ }; ## end $add_option = sub
# Install long option names which have a simple abbreviation.
# Options with code '!' get standard negation ('no' for long names,
$add_option->( 'use-unicode-gcstring', 'gcs', '!' );
$add_option->( 'warning-output', 'w', '!' );
$add_option->( 'add-terminal-newline', 'atnl', '!' );
+ $add_option->( 'line-range-tidy', 'lrt', '=s' );
+ $add_option->( 'timeout-in-seconds', 'tos', '=i' );
# options which are both toggle switches and values moved here
# to hide from tidyview (which does not show category 0 flags):
########################################
$add_option->( 'continuation-indentation', 'ci', '=i' );
$add_option->( 'extended-continuation-indentation', 'xci', '!' );
+ $add_option->( 'minimize-continuation-indentation', 'mci', '!' );
$add_option->( 'line-up-parentheses', 'lp', '!' );
$add_option->( 'extended-line-up-parentheses', 'xlp', '!' );
$add_option->( 'line-up-parentheses-exclusion-list', 'lpxl', '=s' );
$add_option->( 'outdent-labels', 'ola', '!' );
$add_option->( 'outdent-long-quotes', 'olq', '!' );
$add_option->( 'indent-closing-brace', 'icb', '!' );
+ $add_option->( 'indent-leading-semicolon', 'ils', '!' );
$add_option->( 'closing-token-indentation', 'cti', '=i' );
$add_option->( 'closing-paren-indentation', 'cpi', '=i' );
$add_option->( 'closing-brace-indentation', 'cbi', '=i' );
$category = 3; # Whitespace control
########################################
$add_option->( 'add-trailing-commas', 'atc', '!' );
+ $add_option->( 'add-lone-trailing-commas', 'altc', '!' );
$add_option->( 'add-semicolons', 'asc', '!' );
$add_option->( 'add-whitespace', 'aws', '!' );
$add_option->( 'block-brace-tightness', 'bbt', '=i' );
$add_option->( 'delete-old-whitespace', 'dws', '!' );
$add_option->( 'delete-repeated-commas', 'drc', '!' );
$add_option->( 'delete-trailing-commas', 'dtc', '!' );
+ $add_option->( 'delete-lone-trailing-commas', 'dltc', '!' );
$add_option->( 'delete-weld-interfering-commas', 'dwic', '!' );
$add_option->( 'delete-semicolons', 'dsm', '!' );
$add_option->( 'function-paren-vertical-alignment', 'fpva', '!' );
+ $add_option->( 'delay-trailing-comma-operations', 'dtco', '!' );
$add_option->( 'keyword-paren-inner-tightness', 'kpit', '=i' );
$add_option->( 'keyword-paren-inner-tightness-list', 'kpitl', '=s' );
$add_option->( 'logical-padding', 'lop', '!' );
+ $add_option->( 'multiple-token-tightness', 'mutt', '=s' );
$add_option->( 'nospace-after-keyword', 'nsak', '=s' );
$add_option->( 'nowant-left-space', 'nwls', '=s' );
$add_option->( 'nowant-right-space', 'nwrs', '=s' );
$add_option->( 'want-right-space', 'wrs', '=s' );
$add_option->( 'want-trailing-commas', 'wtc', '=s' );
$add_option->( 'space-prototype-paren', 'spp', '=i' );
+ $add_option->( 'space-signature-paren', 'ssp', '=i' );
$add_option->( 'valign-code', 'vc', '!' );
$add_option->( 'valign-block-comments', 'vbc', '!' );
$add_option->( 'valign-side-comments', 'vsc', '!' );
$add_option->( 'valign-exclusion-list', 'vxl', '=s' );
$add_option->( 'valign-inclusion-list', 'vil', '=s' );
+ $add_option->( 'valign-if-unless', 'viu', '!' );
+ $add_option->( 'valign-signed-numbers', 'vsn', '!' );
+ $add_option->( 'valign-signed-numbers-limit', 'vsnl', '=i' );
+ $add_option->( 'valign-wide-equals', 'vwe', '!' );
+ $add_option->( 'extended-block-tightness', 'xbt', '!' );
+ $add_option->( 'extended-block-tightness-list', 'xbtl', '=s' );
+ $add_option->( 'qw-as-function', 'qwaf', '!' );
########################################
$category = 4; # Comment controls
########################################
- $add_option->( 'closing-side-comment-else-flag', 'csce', '=i' );
- $add_option->( 'closing-side-comment-interval', 'csci', '=i' );
- $add_option->( 'closing-side-comment-list', 'cscl', '=s' );
- $add_option->( 'closing-side-comment-maximum-text', 'csct', '=i' );
- $add_option->( 'closing-side-comment-prefix', 'cscp', '=s' );
- $add_option->( 'closing-side-comment-warnings', 'cscw', '!' );
- $add_option->( 'closing-side-comments', 'csc', '!' );
- $add_option->( 'closing-side-comments-balanced', 'cscb', '!' );
- $add_option->( 'code-skipping', 'cs', '!' );
- $add_option->( 'code-skipping-begin', 'csb', '=s' );
- $add_option->( 'code-skipping-end', 'cse', '=s' );
- $add_option->( 'format-skipping', 'fs', '!' );
- $add_option->( 'format-skipping-begin', 'fsb', '=s' );
- $add_option->( 'format-skipping-end', 'fse', '=s' );
- $add_option->( 'hanging-side-comments', 'hsc', '!' );
- $add_option->( 'indent-block-comments', 'ibc', '!' );
- $add_option->( 'indent-spaced-block-comments', 'isbc', '!' );
- $add_option->( 'fixed-position-side-comment', 'fpsc', '=i' );
- $add_option->( 'minimum-space-to-comment', 'msc', '=i' );
- $add_option->( 'non-indenting-braces', 'nib', '!' );
- $add_option->( 'non-indenting-brace-prefix', 'nibp', '=s' );
- $add_option->( 'outdent-long-comments', 'olc', '!' );
- $add_option->( 'outdent-static-block-comments', 'osbc', '!' );
- $add_option->( 'static-block-comment-prefix', 'sbcp', '=s' );
- $add_option->( 'static-block-comments', 'sbc', '!' );
- $add_option->( 'static-side-comment-prefix', 'sscp', '=s' );
- $add_option->( 'static-side-comments', 'ssc', '!' );
- $add_option->( 'ignore-side-comment-lengths', 'iscl', '!' );
+ $add_option->( 'closing-side-comment-else-flag', 'csce', '=i' );
+ $add_option->( 'closing-side-comment-interval', 'csci', '=i' );
+ $add_option->( 'closing-side-comment-list', 'cscl', '=s' );
+ $add_option->( 'closing-side-comment-exclusion-list', 'cscxl', '=s' );
+ $add_option->( 'closing-side-comment-maximum-text', 'csct', '=i' );
+ $add_option->( 'closing-side-comment-prefix', 'cscp', '=s' );
+ $add_option->( 'closing-side-comment-warnings', 'cscw', '!' );
+ $add_option->( 'closing-side-comments', 'csc', '!' );
+ $add_option->( 'closing-side-comments-balanced', 'cscb', '!' );
+ $add_option->( 'code-skipping', 'cs', '!' );
+ $add_option->( 'code-skipping-begin', 'csb', '=s' );
+ $add_option->( 'code-skipping-end', 'cse', '=s' );
+ $add_option->( 'format-skipping', 'fs', '!' );
+ $add_option->( 'format-skipping-begin', 'fsb', '=s' );
+ $add_option->( 'format-skipping-end', 'fse', '=s' );
+ $add_option->( 'hanging-side-comments', 'hsc', '!' );
+ $add_option->( 'indent-block-comments', 'ibc', '!' );
+ $add_option->( 'indent-spaced-block-comments', 'isbc', '!' );
+ $add_option->( 'fixed-position-side-comment', 'fpsc', '=i' );
+ $add_option->( 'minimum-space-to-comment', 'msc', '=i' );
+ $add_option->( 'non-indenting-braces', 'nib', '!' );
+ $add_option->( 'non-indenting-brace-prefix', 'nibp', '=s' );
+ $add_option->( 'outdent-long-comments', 'olc', '!' );
+ $add_option->( 'outdent-static-block-comments', 'osbc', '!' );
+ $add_option->( 'static-block-comment-prefix', 'sbcp', '=s' );
+ $add_option->( 'static-block-comments', 'sbc', '!' );
+ $add_option->( 'static-side-comment-prefix', 'sscp', '=s' );
+ $add_option->( 'static-side-comments', 'ssc', '!' );
+ $add_option->( 'ignore-side-comment-lengths', 'iscl', '!' );
+ $add_option->( 'ignore-perlcritic-comments', 'ipc', '!' );
########################################
$category = 5; # Linebreak controls
$add_option->( 'stack-opening-hash-brace', 'sohb', '!' );
$add_option->( 'stack-opening-paren', 'sop', '!' );
$add_option->( 'stack-opening-square-bracket', 'sosb', '!' );
- $add_option->( 'vertical-tightness', 'vt', '=i' );
- $add_option->( 'vertical-tightness-closing', 'vtc', '=i' );
- $add_option->( 'want-break-after', 'wba', '=s' );
- $add_option->( 'want-break-before', 'wbb', '=s' );
- $add_option->( 'break-after-all-operators', 'baao', '!' );
- $add_option->( 'break-before-all-operators', 'bbao', '!' );
- $add_option->( 'keep-interior-semicolons', 'kis', '!' );
- $add_option->( 'one-line-block-semicolons', 'olbs', '=i' );
- $add_option->( 'one-line-block-nesting', 'olbn', '=i' );
- $add_option->( 'one-line-block-exclusion-list', 'olbxl', '=s' );
- $add_option->( 'break-before-hash-brace', 'bbhb', '=i' );
- $add_option->( 'break-before-hash-brace-and-indent', 'bbhbi', '=i' );
- $add_option->( 'break-before-square-bracket', 'bbsb', '=i' );
- $add_option->( 'break-before-square-bracket-and-indent', 'bbsbi', '=i' );
- $add_option->( 'break-before-paren', 'bbp', '=i' );
- $add_option->( 'break-before-paren-and-indent', 'bbpi', '=i' );
- $add_option->( 'brace-left-list', 'bll', '=s' );
- $add_option->( 'brace-left-exclusion-list', 'blxl', '=s' );
- $add_option->( 'break-after-labels', 'bal', '=i' );
+
+ # FIXME: --vt and --vtc are actually expansions now, so these two lines
+ # should eventually be removed.
+ $add_option->( 'vertical-tightness', 'vt', '=i' );
+ $add_option->( 'vertical-tightness-closing', 'vtc', '=i' );
+
+ $add_option->( 'want-break-after', 'wba', '=s' );
+ $add_option->( 'want-break-before', 'wbb', '=s' );
+ $add_option->( 'break-after-all-operators', 'baao', '!' );
+ $add_option->( 'break-before-all-operators', 'bbao', '!' );
+ $add_option->( 'keep-interior-semicolons', 'kis', '!' );
+ $add_option->( 'one-line-block-semicolons', 'olbs', '=i' );
+ $add_option->( 'one-line-block-nesting', 'olbn', '=i' );
+ $add_option->( 'one-line-block-exclusion-list', 'olbxl', '=s' );
+ $add_option->( 'break-before-hash-brace', 'bbhb', '=i' );
+ $add_option->( 'break-before-hash-brace-and-indent', 'bbhbi', '=i' );
+ $add_option->( 'break-before-square-bracket', 'bbsb', '=i' );
+ $add_option->( 'break-before-square-bracket-and-indent', 'bbsbi', '=i' );
+ $add_option->( 'break-before-paren', 'bbp', '=i' );
+ $add_option->( 'break-before-paren-and-indent', 'bbpi', '=i' );
+ $add_option->( 'brace-left-list', 'bll', '=s' );
+ $add_option->( 'brace-left-exclusion-list', 'blxl', '=s' );
+ $add_option->( 'break-after-labels', 'bal', '=i' );
# This was an experiment mentioned in git #78, originally named -bopl. I
# expanded it to also open logical blocks, based on git discussion #100,
########################################
$category = 6; # Controlling list formatting
########################################
- $add_option->( 'break-at-old-comma-breakpoints', 'boc', '!' );
- $add_option->( 'comma-arrow-breakpoints', 'cab', '=i' );
- $add_option->( 'maximum-fields-per-table', 'mft', '=i' );
+ $add_option->( 'break-at-old-comma-breakpoints', 'boc', '!' );
+ $add_option->( 'break-at-trailing-comma-types', 'btct', '=s' );
+ $add_option->( 'comma-arrow-breakpoints', 'cab', '=i' );
+ $add_option->( 'maximum-fields-per-table', 'mft', '=i' );
########################################
$category = 7; # Retaining or ignoring existing line breaks
########################################
$category = 9; # Other controls
########################################
+ $add_option->( 'warn-missing-else', 'wme', '!' );
+ $add_option->( 'add-missing-else', 'ame', '!' );
+ $add_option->( 'add-missing-else-comment', 'amec', '=s' );
$add_option->( 'delete-block-comments', 'dbc', '!' );
$add_option->( 'delete-closing-side-comments', 'dcsc', '!' );
$add_option->( 'delete-pod', 'dp', '!' );
$add_option->( 'look-for-hash-bang', 'x', '!' );
$add_option->( 'look-for-selfloader', 'lsl', '!' );
$add_option->( 'pass-version-line', 'pvl', '!' );
+ $add_option->( 'warn-variable-types', 'wvt', '=s' );
+ $add_option->( 'warn-variable-exclusion-list', 'wvxl', '=s' );
+ $add_option->( 'want-call-parens', 'wcp', '=s' );
+ $add_option->( 'nowant-call-parens', 'nwcp', '=s' );
+
+ $add_option->( 'warn-mismatched-args', 'wma', '!' );
+ $add_option->( 'warn-mismatched-arg-types', 'wmat', '=s' );
+ $add_option->( 'warn-mismatched-arg-undercount-cutoff', 'wmauc', '=i' );
+ $add_option->( 'warn-mismatched-arg-overcount-cutoff', 'wmaoc', '=i' );
+ $add_option->( 'warn-mismatched-arg-exclusion-list', 'wmaxl', '=s' );
+ $add_option->( 'warn-mismatched-returns', 'wmr', '!' );
+ $add_option->( 'warn-mismatched-return-types', 'wmrt', '=s' );
+ $add_option->( 'warn-mismatched-return-exclusion-list', 'wmrxl', '=s' );
+
+ $add_option->( 'add-interbracket-arrows', 'aia', '!' );
+ $add_option->( 'delete-interbracket-arrows', 'dia', '!' );
+ $add_option->( 'warn-interbracket-arrows', 'wia', '!' );
+ $add_option->( 'interbracket-arrow-style', 'ias', '=s' );
+ $add_option->( 'interbracket-arrow-complexity', 'iac', '=i' );
########################################
$category = 13; # Debugging
########################################
- $add_option->( 'DIAGNOSTICS', 'I', '!' ) if (DEVEL_MODE);
- $add_option->( 'DEBUG', 'D', '!' );
- $add_option->( 'dump-block-summary', 'dbs', '!' );
- $add_option->( 'dump-block-minimum-lines', 'dbl', '=i' );
- $add_option->( 'dump-block-types', 'dbt', '=s' );
- $add_option->( 'dump-cuddled-block-list', 'dcbl', '!' );
- $add_option->( 'dump-defaults', 'ddf', '!' );
- $add_option->( 'dump-long-names', 'dln', '!' );
- $add_option->( 'dump-options', 'dop', '!' );
- $add_option->( 'dump-profile', 'dpro', '!' );
- $add_option->( 'dump-short-names', 'dsn', '!' );
- $add_option->( 'dump-token-types', 'dtt', '!' );
- $add_option->( 'dump-want-left-space', 'dwls', '!' );
- $add_option->( 'dump-want-right-space', 'dwrs', '!' );
- $add_option->( 'fuzzy-line-length', 'fll', '!' );
- $add_option->( 'help', 'h', EMPTY_STRING );
+ $add_option->( 'DEBUG', 'D', '!' );
+ $add_option->( 'dump-block-summary', 'dbs', '!' );
+ $add_option->( 'dump-block-minimum-lines', 'dbl', '=i' );
+ $add_option->( 'dump-block-types', 'dbt', '=s' );
+ $add_option->( 'dump-cuddled-block-list', 'dcbl', '!' );
+ $add_option->( 'dump-defaults', 'ddf', '!' );
+ $add_option->( 'dump-integer-option-range', 'dior', '!' );
+ $add_option->( 'dump-long-names', 'dln', '!' );
+ $add_option->( 'dump-mismatched-args', 'dma', '!' );
+ $add_option->( 'dump-mismatched-returns', 'dmr', '!' );
+ $add_option->( 'dump-mixed-call-parens', 'dmcp', '!' );
+ $add_option->( 'dump-options', 'dop', '!' );
+ $add_option->( 'dump-profile', 'dpro', '!' );
+ $add_option->( 'dump-short-names', 'dsn', '!' );
+ $add_option->( 'dump-token-types', 'dtt', '!' );
+ $add_option->( 'dump-unusual-variables', 'duv', '!' );
+ $add_option->( 'dump-unique-keys', 'duk', '!' );
+ $add_option->( 'dump-want-left-space', 'dwls', '!' );
+ $add_option->( 'dump-want-right-space', 'dwrs', '!' );
+ $add_option->( 'fuzzy-line-length', 'fll', '!' );
+ $add_option->( 'help', 'h', EMPTY_STRING );
$add_option->( 'short-concatenation-item-length', 'scl', '=i' );
$add_option->( 'show-options', 'opt', '!' );
$add_option->( 'timestamp', 'ts', '!' );
$add_option->( 'maximum-file-size-mb', 'maxfs', '=i' );
$add_option->( 'maximum-level-errors', 'maxle', '=i' );
$add_option->( 'maximum-unexpected-errors', 'maxue', '=i' );
+ $add_option->( 'integer-range-check', 'irc', '=i' );
#---------------------------------------------------------------------
foreach my $opt (@option_string) {
my $long_name = $opt;
$long_name =~ s/(!|=.*|:.*)$//;
- unless ( defined( $option_category{$long_name} ) ) {
+ if ( !defined( $option_category{$long_name} ) ) {
if ( $long_name =~ /^html-linked/ ) {
$category = 10; # HTML options
}
elsif ( $long_name =~ /^pod2html/ ) {
$category = 11; # Pod2html
}
+ else {
+ $category = 12; # HTML properties
+ }
$option_category{$long_name} = $category_name[$category];
}
}
- #---------------------------------------
- # Assign valid ranges to certain options
- #---------------------------------------
- # In the future, these may be used to make preliminary checks
- # hash keys are long names
- # If key or value is undefined:
- # strings may have any value
- # integer ranges are >=0
- # If value is defined:
- # value is [qw(any valid words)] for strings
- # value is [min, max] for integers
- # if min is undefined, there is no lower limit
- # if max is undefined, there is no upper limit
- # Parameters not listed here have defaults
- %option_range = (
- 'format' => [ 'tidy', 'html', 'user' ],
- 'output-line-ending' => [ 'dos', 'win', 'mac', 'unix' ],
- 'space-backslash-quote' => [ 0, 2 ],
- 'block-brace-tightness' => [ 0, 2 ],
- 'keyword-paren-inner-tightness' => [ 0, 2 ],
- 'brace-tightness' => [ 0, 2 ],
- 'paren-tightness' => [ 0, 2 ],
- 'square-bracket-tightness' => [ 0, 2 ],
-
- 'block-brace-vertical-tightness' => [ 0, 2 ],
- 'brace-follower-vertical-tightness' => [ 0, 2 ],
- 'brace-vertical-tightness' => [ 0, 2 ],
- 'brace-vertical-tightness-closing' => [ 0, 2 ],
- 'paren-vertical-tightness' => [ 0, 2 ],
- 'paren-vertical-tightness-closing' => [ 0, 2 ],
- 'square-bracket-vertical-tightness' => [ 0, 2 ],
- 'square-bracket-vertical-tightness-closing' => [ 0, 2 ],
- 'vertical-tightness' => [ 0, 2 ],
- 'vertical-tightness-closing' => [ 0, 2 ],
-
- 'closing-brace-indentation' => [ 0, 3 ],
- 'closing-paren-indentation' => [ 0, 3 ],
- 'closing-square-bracket-indentation' => [ 0, 3 ],
- 'closing-token-indentation' => [ 0, 3 ],
-
- 'closing-side-comment-else-flag' => [ 0, 2 ],
- 'comma-arrow-breakpoints' => [ 0, 5 ],
-
- 'keyword-group-blanks-before' => [ 0, 2 ],
- 'keyword-group-blanks-after' => [ 0, 2 ],
-
- 'space-prototype-paren' => [ 0, 2 ],
- 'break-after-labels' => [ 0, 2 ],
- );
-
- # Note: we could actually allow negative ci if someone really wants it:
- # $option_range{'continuation-indentation'} = [ undef, undef ];
-
#------------------------------------------------------------------
# DEFAULTS: Assign default values to the above options here, except
# for 'outfile' and 'help'.
# These settings should approximate the perlstyle(1) suggestions.
#------------------------------------------------------------------
my @defaults = qw(
+ add-lone-trailing-commas
add-newlines
add-terminal-newline
add-semicolons
noextended-continuation-indentation
cuddled-break-option=1
delete-old-newlines
+ delete-repeated-commas
+ delete-lone-trailing-commas
delete-semicolons
dump-block-minimum-lines=20
dump-block-types=sub
extended-syntax
encode-output-strings
+ file-size-order
function-paren-vertical-alignment
fuzzy-line-length
hanging-side-comments
indent-block-comments
indent-columns=4
+ indent-leading-semicolon
+ integer-range-check=2
+ interbracket-arrow-complexity=1
iterations=1
keep-old-blank-lines=1
keyword-paren-inner-tightness=1
maximum-unexpected-errors=0
memoize
minimum-space-to-comment=4
+ warn-mismatched-arg-undercount-cutoff=4
+ warn-mismatched-arg-overcount-cutoff=1
nobrace-left-and-indent
nocuddled-else
nodelete-old-whitespace
noweld-nested-containers
recombine
nouse-unicode-gcstring
- use-feature=class
valign-code
valign-block-comments
valign-side-comments
+ valign-signed-numbers
+ valign-signed-numbers-limit=20
short-concatenation-item-length=8
space-for-semicolon
space-backslash-quote=1
space-prototype-paren=1
+ space-signature-paren=1
square-bracket-tightness=1
square-bracket-vertical-tightness-closing=0
square-bracket-vertical-tightness=0
code-skipping
format-skipping
default-tabsize=8
+ timeout-in-seconds=10
+
+ whitespace-cycle=0
+ entab-leading-whitespace=0
+ blank-lines-before-closing-block=0
+ blank-lines-after-opening-block=0
+
+ pod2html
+ html-table-of-contents
+ html-entities
+ );
+
+ #---------------------------------------
+ # Assign valid ranges to certain options
+ #---------------------------------------
+ # In the future, these may be used to make preliminary checks
+ # hash keys are long names
+ # If key or value is undefined:
+ # strings may have any value
+ # integer ranges are >=0
+ # If value is defined:
+ # value is [qw(any valid words)] for strings
+ # value is [min, max] for integers
+ # if min is undefined, there is no lower limit
+ # if max is undefined, there is no upper limit
+ # Parameters not listed here have defaults
+ %option_range = (
+ 'format' => [ 'tidy', 'html', 'user' ],
+ 'output-line-ending' => [ 'dos', 'win', 'mac', 'unix' ],
+ 'space-backslash-quote' => [ 0, 2 ],
+ 'block-brace-tightness' => [ 0, 2 ],
+ 'keyword-paren-inner-tightness' => [ 0, 2 ],
+ 'brace-tightness' => [ 0, 2 ],
+ 'paren-tightness' => [ 0, 2 ],
+ 'square-bracket-tightness' => [ 0, 2 ],
+
+ 'block-brace-vertical-tightness' => [ 0, 2 ],
+ 'brace-follower-vertical-tightness' => [ 0, 2 ],
+ 'brace-vertical-tightness' => [ 0, 2 ],
+ 'brace-vertical-tightness-closing' => [ 0, 3 ],
+ 'paren-vertical-tightness' => [ 0, 2 ],
+ 'paren-vertical-tightness-closing' => [ 0, 3 ],
+ 'square-bracket-vertical-tightness' => [ 0, 2 ],
+ 'square-bracket-vertical-tightness-closing' => [ 0, 3 ],
+ 'vertical-tightness' => [ 0, 2 ],
+ 'vertical-tightness-closing' => [ 0, 3 ],
+
+ 'closing-brace-indentation' => [ 0, 3 ],
+ 'closing-paren-indentation' => [ 0, 3 ],
+ 'closing-square-bracket-indentation' => [ 0, 3 ],
+ 'closing-token-indentation' => [ 0, 3 ],
+
+ 'closing-side-comment-else-flag' => [ 0, 2 ],
+ 'comma-arrow-breakpoints' => [ 0, 5 ],
- pod2html
- html-table-of-contents
- html-entities
+ 'keyword-group-blanks-before' => [ 0, 2 ],
+ 'keyword-group-blanks-after' => [ 0, 2 ],
+
+ 'space-prototype-paren' => [ 0, 2 ],
+ 'space-signature-paren' => [ 0, 2 ],
+ 'break-after-labels' => [ 0, 2 ],
+ );
+
+ # Valid [min,max] ranges of all integer options (type '=i'). This hash is
+ # replacing %option_range, above, for use by sub 'check_options'
+ %integer_option_range = (
+ 'blank-lines-after-opening-block' => [ 0, undef ],
+ 'blank-lines-before-closing-block' => [ 0, undef ],
+ 'blank-lines-before-packages' => [ 0, undef ],
+ 'blank-lines-before-subs' => [ 0, undef ],
+ 'block-brace-tightness' => [ 0, 2 ],
+ 'block-brace-vertical-tightness' => [ 0, 2 ],
+ 'brace-follower-vertical-tightness' => [ 0, 2 ],
+ 'brace-tightness' => [ 0, 2 ],
+ 'brace-vertical-tightness' => [ 0, 2 ],
+ 'brace-vertical-tightness-closing' => [ 0, 3 ],
+ 'break-after-labels' => [ 0, 2 ],
+ 'break-before-hash-brace' => [ 0, 3 ],
+ 'break-before-hash-brace-and-indent' => [ 0, 2 ],
+ 'break-before-paren' => [ 0, 3 ],
+ 'break-before-paren-and-indent' => [ 0, 2 ],
+ 'break-before-square-bracket' => [ 0, 3 ],
+ 'break-before-square-bracket-and-indent' => [ 0, 2 ],
+ 'closing-brace-indentation' => [ 0, 3 ],
+ 'closing-paren-indentation' => [ 0, 3 ],
+ 'closing-side-comment-else-flag' => [ 0, 2 ],
+ 'closing-side-comment-interval' => [ 0, undef ],
+ 'closing-side-comment-maximum-text' => [ 0, undef ],
+ 'closing-square-bracket-indentation' => [ 0, 3 ],
+ 'closing-token-indentation' => [ 0, 3 ],
+ 'comma-arrow-breakpoints' => [ 0, 5 ],
+ 'continuation-indentation' => [ 0, undef ],
+ 'cuddled-break-option' => [ 0, 2 ],
+ 'default-tabsize' => [ 0, undef ],
+ 'dump-block-minimum-lines' => [ 0, undef ],
+ 'entab-leading-whitespace' => [ 0, undef ],
+ 'fixed-position-side-comment' => [ 0, undef ],
+ 'indent-columns' => [ 0, undef ],
+ 'interbracket-arrow-complexity' => [ 0, 2 ],
+ 'integer-range-check' => [ 0, 3 ],
+ 'iterations' => [ 0, undef ],
+ 'keep-old-blank-lines' => [ 0, 2 ],
+ 'keyword-group-blanks-after' => [ 0, 2 ],
+ 'keyword-group-blanks-before' => [ 0, 2 ],
+ 'keyword-group-blanks-repeat-count' => [ 0, undef ],
+ 'keyword-paren-inner-tightness' => [ 0, 2 ],
+ 'long-block-line-count' => [ 0, undef ],
+ 'maximum-consecutive-blank-lines' => [ 0, undef ],
+ 'maximum-fields-per-table' => [ 0, undef ],
+ 'maximum-file-size-mb' => [ 0, undef ],
+ 'maximum-level-errors' => [ 0, undef ],
+ 'maximum-line-length' => [ 0, undef ],
+ 'maximum-unexpected-errors' => [ 0, undef ],
+ 'minimum-space-to-comment' => [ 0, undef ],
+ 'warn-mismatched-arg-undercount-cutoff' => [ 0, undef ],
+ 'warn-mismatched-arg-overcount-cutoff' => [ 0, undef ],
+ 'one-line-block-nesting' => [ 0, 1 ],
+ 'one-line-block-semicolons' => [ 0, 2 ],
+ 'paren-tightness' => [ 0, 2 ],
+ 'paren-vertical-tightness' => [ 0, 2 ],
+ 'paren-vertical-tightness-closing' => [ 0, 3 ],
+ 'short-concatenation-item-length' => [ 0, undef ],
+ 'space-backslash-quote' => [ 0, 2 ],
+ 'space-prototype-paren' => [ 0, 2 ],
+ 'space-signature-paren' => [ 0, 2 ],
+ 'square-bracket-tightness' => [ 0, 2 ],
+ 'square-bracket-vertical-tightness' => [ 0, 2 ],
+ 'square-bracket-vertical-tightness-closing' => [ 0, 3 ],
+ 'starting-indentation-level' => [ 0, undef ],
+ 'timeout-in-seconds' => [ 0, undef ],
+ 'vertical-tightness' => [ 0, 2 ],
+ 'vertical-tightness-closing' => [ 0, 3 ],
+ 'valign-signed-numbers-limit' => [ 0, undef ],
+ 'whitespace-cycle' => [ 0, undef ],
);
+ # Enter default values into the integer option range table
+ foreach my $opt (@defaults) {
+ if ( $opt =~ /^(.*)=(\d+)$/ ) {
+ my $key = $1;
+ my $def = $2;
+ if ( defined( $integer_option_range{$key} ) ) {
+ $integer_option_range{$key}->[2] = $def;
+ }
+ }
+ }
+
+ # Enter special values which have undef as the default.
+ # Note that cti, vt, and vtc are aliases which are included to work
+ # around an old problem with msdos (see note in check_options).
+ foreach my $key (
+ qw(
+ closing-token-indentation
+ vertical-tightness
+ vertical-tightness-closing
+ fixed-position-side-comment
+ starting-indentation-level
+ )
+ )
+ {
+ if ( defined( $integer_option_range{$key} )
+ && @{ $integer_option_range{$key} } < 3 )
+ {
+ $integer_option_range{$key}->[2] = undef;
+ }
+ }
+
+ # Verify that only integers of type =i are in the above list during
+ # development. This will guard against spelling errors.
+ if (DEVEL_MODE) {
+ my %option_flag;
+ my $msg = EMPTY_STRING;
+ foreach my $opt (@option_string) {
+ my $key = $opt;
+ my $flag = EMPTY_STRING;
+ if ( $key =~ /(.*)(!|=.*|:.*)$/ ) {
+ $key = $1;
+ $flag = $2;
+ }
+ $option_flag{$key} = $flag;
+ }
+
+ # Be sure all keys of %integer_option_range have option type '=i'
+ foreach my $opt ( keys %integer_option_range ) {
+ my $flag = $option_flag{$opt};
+ if ( !defined($flag) ) { $flag = EMPTY_STRING }
+ if ( $flag ne '=i' ) {
+
+ # If this fault occurs, one of the items in the previous hash
+ # is not type =i, possibly due to incorrect spelling.
+ $msg .=
+"Option '$opt' has an entry in '%integer_option_range' but is not an integer\n";
+ }
+ }
+
+ # Be sure all '=i' options are in %integer_option_range. This is not
+ # strictly necessary but helps insure that nothing was missed.
+ foreach my $opt ( keys %option_flag ) {
+ my $flag = $option_flag{$opt};
+ next if ( $flag ne '=i' );
+ if ( !defined( $integer_option_range{$opt} ) ) {
+ $msg .=
+"Integer option '$opt' is needs an entry in '%integer_option_range'\n";
+ }
+ }
+
+ # look for integer options without default values
+ foreach my $opt ( keys %integer_option_range ) {
+ if ( @{ $integer_option_range{$opt} } < 3 ) {
+ $msg .= "Integer option '$opt' does not have a default value\n";
+ }
+ }
+
+ if ($msg) {
+ Fault($msg);
+ }
+ }
+
#-----------------------------------------------------------------------
# Define abbreviations which will be expanded into the above primitives.
# These may be defined recursively.
# Uncomment next line to dump all expansions for debugging:
# dump_short_names(\%expansion);
- return (
- \@option_string, \@defaults, \%expansion,
- \%option_category, \%option_range
- );
+ return ( \@option_string, \@defaults, \%expansion, \%option_category,
+ \%option_range, \%integer_option_range );
} ## end sub generate_options
+{ #<<< closure process_command_line
+
# Memoize process_command_line. Given same @ARGV passed in, return same
# values and same @ARGV back.
# This patch was supplied by Jonathan Swartz Nov 2012 and significantly speeds
my @q = @_;
my (
- $perltidyrc_stream, $is_Windows, $Windows_type,
- $rpending_complaint, $dump_options_type
+ $perltidyrc_stream, $is_Windows_uu, $Windows_type_uu,
+ $rpending_complaint_uu, $dump_options_type
) = @q;
+ # This is the outer sub which handles memoization
+
my $use_cache = !defined($perltidyrc_stream) && !$dump_options_type;
if ($use_cache) {
my $cache_key = join( chr(28), @ARGV );
return _process_command_line(@q);
}
} ## end sub process_command_line
+} ## end closure process_command_line
# (note the underscore here)
sub _process_command_line {
$rpending_complaint, $dump_options_type
) = @_;
+ # This is the inner sub which actually processes the command line
+
use Getopt::Long;
# Save any current Getopt::Long configuration
}
else { $glc = undef }
- my (
- $roption_string, $rdefaults, $rexpansion,
- $roption_category, $roption_range
- ) = generate_options();
+ my ( $roption_string, $rdefaults, $rexpansion,
+ $roption_category, $roption_range, $rinteger_option_range )
+ = generate_options();
#--------------------------------------------------------------
# set the defaults by passing the above list through GetOptions
local @ARGV = ();
# do not load the defaults if we are just dumping perltidyrc
- unless ( $dump_options_type eq 'perltidyrc' ) {
+ if ( $dump_options_type ne 'perltidyrc' ) {
for my $i ( @{$rdefaults} ) { push @ARGV, "--" . $i }
}
if ( !GetOptions( \%Opts, @{$roption_string} ) ) {
}
my @raw_options = ();
- my $config_file = EMPTY_STRING;
my $saw_ignore_profile = 0;
my $saw_dump_profile = 0;
+ my $config_file;
#--------------------------------------------------------------
# Take a first look at the command-line parameters. Do as many
$saw_dump_profile = 1;
}
elsif ( $i =~ /^-(pro|profile)=(.+)/ ) {
- if ($config_file) {
+ if ( defined($config_file) ) {
Warn(
"Only one -pro=filename allowed, using '$2' instead of '$config_file'\n"
);
{
$start_dir = '.' if !$start_dir;
$start_dir = Cwd::realpath($start_dir);
- if ( my $found_file =
- find_file_upwards( $start_dir, $search_file ) )
- {
+ my $found_file =
+ find_file_upwards( $start_dir, $search_file );
+ if ( defined($found_file) ) {
$config_file = $found_file;
}
}
}
- unless ( -e $config_file ) {
- Warn("cannot find file given with -pro=$config_file: $ERRNO\n");
- $config_file = EMPTY_STRING;
+ if ( !-e $config_file ) {
+ Die(
+ "cannot find file given with -pro=$config_file: $OS_ERROR\n"
+ );
}
}
elsif ( $i =~ /^-(pro|profile)=?$/ ) {
Die("usage: -pro=filename or --profile=filename, no spaces\n");
}
- elsif ( $i =~ /^-(help|h|HELP|H|\?)$/ ) {
+ elsif ( $i =~ /^-(?: help | [ h \? ] )$/xi ) {
usage();
Exit(0);
}
dump_defaults( @{$rdefaults} );
Exit(0);
}
+ elsif ( $i =~ /^-(dump-integer-option-range|dior)$/ ) {
+ dump_integer_option_range($rinteger_option_range);
+ Exit(0);
+ }
elsif ( $i =~ /^-(dump-long-names|dln)$/ ) {
dump_long_names( @{$roption_string} );
Exit(0);
Perl::Tidy::Tokenizer->dump_token_types(*STDOUT);
Exit(0);
}
+ else {
+ ## no more special cases
+ }
}
+ # The above commands processed before disambiguation and then Exited. So
+ # we need to check below to see if the user entered something like
+ # '-dump-t' or '-he'. This will slip past here and not get processed.
+ my %early_exit_commands = (
+ 'help' => 'h',
+ 'version' => 'v',
+ 'dump-defaults' => 'ddf',
+ 'dump-integer-option-range' => 'dior',
+ 'dump-long-names' => 'dln',
+ 'dump-short-names' => 'dsn',
+ 'dump-token-types' => 'dtt',
+ );
+
if ( $saw_dump_profile && $saw_ignore_profile ) {
Warn("No profile to dump because of -npro\n");
Exit(1);
#----------------------------------------
# read any .perltidyrc configuration file
#----------------------------------------
- unless ($saw_ignore_profile) {
+ if ( !$saw_ignore_profile ) {
# resolve possible conflict between $perltidyrc_stream passed
# as call parameter to perltidy and -pro=filename on command
# line.
if ($perltidyrc_stream) {
- if ($config_file) {
+ if ( defined($config_file) ) {
Warn(<<EOM);
Conflict: a perltidyrc configuration file was specified both as this
- perltidy call parameter: $perltidyrc_stream
+ perltidy call parameter: $perltidyrc_stream
and with this -profile=$config_file.
Using -profile=$config_file.
EOM
# look for a config file if we don't have one yet
my $rconfig_file_chatter;
${$rconfig_file_chatter} = EMPTY_STRING;
- $config_file =
- find_config_file( $is_Windows, $Windows_type, $rconfig_file_chatter,
- $rpending_complaint )
- unless $config_file;
+ if ( !defined($config_file) ) {
+ $config_file =
+ find_config_file( $is_Windows, $Windows_type,
+ $rconfig_file_chatter, $rpending_complaint );
+ }
# open any config file
- my $fh_config;
- if ($config_file) {
- ( $fh_config, $config_file ) =
- Perl::Tidy::streamhandle( $config_file, 'r' );
- unless ($fh_config) {
- ${$rconfig_file_chatter} .=
- "# $config_file exists but cannot be opened\n";
+ my $rconfig_string;
+ if ( defined($config_file) ) {
+ $rconfig_string = stream_slurp($config_file);
+ if ( !defined($rconfig_string) ) {
+ Die(
+"exiting because profile '$config_file' could not be opened\n"
+ );
}
+ filter_unknown_options(
+ $rconfig_string, $roption_category,
+ $rexpansion, $rconfig_file_chatter
+ );
}
-
if ($saw_dump_profile) {
- dump_config_file( $fh_config, $config_file, $rconfig_file_chatter );
+ dump_config_file( $rconfig_string, $config_file,
+ $rconfig_file_chatter );
Exit(0);
}
- if ($fh_config) {
+ if ( defined($rconfig_string) ) {
my ( $rconfig_list, $death_message ) =
- read_config_file( $fh_config, $config_file, $rexpansion );
+ read_config_file( $rconfig_string, $config_file, $rexpansion );
Die($death_message) if ($death_message);
# process any .perltidyrc parameters right now so we can
if ( !GetOptions( \%Opts, @{$roption_string} ) ) {
Die(
-"Error in this config file: $config_file \nUse -npro to ignore this file, -h for help'\n"
+"Error in this config file: $config_file \nUse -npro to ignore this file, -dpro to dump it, -h for help'\n"
);
}
# a look at @ARGV.
if (@ARGV) {
my $count = @ARGV;
- my $str = "\'" . pop(@ARGV) . "\'";
- while ( my $param = pop(@ARGV) ) {
+ my $str = EMPTY_STRING;
+ foreach my $param (@ARGV) {
if ( length($str) < 70 ) {
- $str .= ", '$param'";
+ if ($str) { $str .= ', ' }
+ $str .= "'$param'";
}
else {
$str .= ", ...";
# Undo any options which cause premature exit. They are not
# appropriate for a config file, and it could be hard to
# diagnose the cause of the premature exit.
- foreach (
+
+ # These are options include dump switches of the form
+ # '--dump-xxx-xxx!'.
+ my @dump_commands =
+ grep { /^(dump-.*)!$/ } @{$roption_string};
+ foreach (@dump_commands) { s/!$// }
+
+ # Here is a current list of these @dump_commands:
+ # dump-block-summary
+ # dump-cuddled-block-list
+ # dump-defaults
+ # dump-integer-option-range
+ # dump-long-names
+ # dump-mismatched-args
+ # dump-mismatched-returns
+ # dump-mixed-call-parens
+ # dump-options
+ # dump-profile
+ # dump-short-names
+ # dump-token-types
+ # dump-unusual-variables
+ # dump-want-left-space
+ # dump-want-right-space
+
+ # The following two dump configuration parameters which
+ # take =i or =s would still be allowed:
+ # dump-block-minimum-lines', 'dbl', '=i' );
+ # dump-block-types', 'dbt', '=s' );
+
+ foreach my $cmd (
+ @dump_commands,
qw{
- dump-cuddled-block-list
- dump-defaults
- dump-long-names
- dump-options
- dump-profile
- dump-short-names
- dump-token-types
- dump-want-left-space
- dump-want-right-space
- dump-block-summary
help
stylesheet
version
}
)
{
-
- if ( defined( $Opts{$_} ) ) {
- delete $Opts{$_};
- Warn("ignoring --$_ in config file: $config_file\n");
+ if ( defined( $Opts{$cmd} ) ) {
+ delete $Opts{$cmd};
+ Warn("ignoring --$cmd in config file: $config_file\n");
}
}
}
Die("Error on command line; for help try 'perltidy -h'\n");
}
+ # Catch ambiguous entries which should have exited above (c333)
+ foreach my $long_name ( keys %early_exit_commands ) {
+ if ( $Opts{$long_name} ) {
+ my $short_name = $early_exit_commands{$long_name};
+ Die(<<EOM);
+Ambiguous entry; please enter '--$long_name' or '-$short_name'
+EOM
+ }
+ }
+
# reset Getopt::Long configuration back to its previous value
if ( defined($glc) ) {
my $ok = eval { Getopt::Long::Configure($glc); 1 };
}
return ( \%Opts, $config_file, \@raw_options, $roption_string,
- $rexpansion, $roption_category, $roption_range );
+ $rexpansion, $roption_category, $roption_range,
+ $rinteger_option_range );
} ## end sub _process_command_line
sub make_grep_alias_string {
+
my ($rOpts) = @_;
+ # pre-process the --grep-alias-list parameter
+
# Defaults: list operators in List::Util
# Possible future additions: pairfirst pairgrep pairmap
- my $default_string = join SPACE, qw(
- all
- any
- first
- none
- notall
- reduce
- reductions
- );
+ my $default_string = join SPACE,
+ qw( all any first none notall reduce reductions );
# make a hash of any excluded words
my %is_excluded_word;
} ## end sub make_grep_alias_string
sub cleanup_word_list {
+
my ( $rOpts, $option_name, $rforced_words ) = @_;
# Clean up the list of words in a user option to simplify use by
sub check_options {
- my ( $self, $is_Windows, $Windows_type, $rpending_complaint ) = @_;
+ my ( $self, $num_files, $rinteger_option_range ) = @_;
+
+ # Check options at a high level. Note that other modules have their
+ # own sub 'check_options' for lower level checking.
+
+ # Input parameters:
+ # $num_files = the number of files to be processed in this call to
+ # perltidy, needed for error checks.
+ # $rinteger_option-range = hash with valid ranges of parameters which
+ # take an integer
my $rOpts = $self->[_rOpts_];
# Since perltidy only encodes in utf8, problems can occur if we let it
# decode anything else. See discussions for issue git #83.
my $encoding = $rOpts->{'character-encoding'};
- if ( $encoding !~ /^\s*(guess|none|utf8|utf-8)\s*$/i ) {
+ if ( $encoding !~ /^\s*(?:guess|none|utf8|utf-8)\s*$/i ) {
Die(<<EOM);
--character-encoding = '$encoding' is not allowed; the options are: 'none', 'guess', 'utf8'
EOM
}
- # Since -vt, -vtc, and -cti are abbreviations, but under
+ my $integer_range_check = $rOpts->{'integer-range-check'};
+ if ( !defined($integer_range_check)
+ || $integer_range_check < 0
+ || $integer_range_check > 3 )
+ {
+ $integer_range_check = 2;
+ }
+
+ # Check for integer values out of bounds as follows:
+ # $integer_range_check=
+ # 0 => skip check completely (for stress-testing perltidy only)
+ # 1 => quietly reset bad values to defaults
+ # 2 => issue warning and reset bad values defaults [DEFAULT]
+ # 3 => stop if any values are out of bounds
+ if ($integer_range_check) {
+ my $Error_message;
+ foreach my $opt ( keys %{$rinteger_option_range} ) {
+ my $range = $rinteger_option_range->{$opt};
+ next unless defined($range);
+ my ( $min, $max, $default ) = @{$range};
+
+ my $val = $rOpts->{$opt};
+ if ( defined($min) && defined($val) && $val < $min ) {
+ $Error_message .= "--$opt=$val but should be >= $min";
+ if ( $integer_range_check < 3 ) {
+ $rOpts->{$opt} = $default;
+ my $def = defined($default) ? $default : 'undef';
+ $Error_message .= "; using default $def";
+ }
+ $Error_message .= "\n";
+ }
+ if ( defined($max) && defined($val) && $val > $max ) {
+ $Error_message .= "--$opt=$val but should be <= $max";
+ if ( $integer_range_check < 3 ) {
+ $rOpts->{$opt} = $default;
+ my $def = defined($default) ? $default : 'undef';
+ $Error_message .= "; using default $def";
+ }
+ $Error_message .= "\n";
+ }
+ }
+ if ($Error_message) {
+ if ( $integer_range_check == 1 ) {
+ ## no warning
+ }
+ elsif ( $integer_range_check == 2 ) {
+ Warn($Error_message);
+ }
+ else {
+ Die($Error_message);
+ }
+ }
+ }
+
+ # Note that -vt, -vtc, and -cti are abbreviations. But under
# msdos, an unquoted input parameter like vtc=1 will be
# seen as 2 parameters, vtc and 1, so the abbreviations
# won't be seen. Therefore, we will catch them here if
# they get through.
-
- if ( defined $rOpts->{'vertical-tightness'} ) {
+ if ( defined( $rOpts->{'vertical-tightness'} ) ) {
my $vt = $rOpts->{'vertical-tightness'};
$rOpts->{'paren-vertical-tightness'} = $vt;
$rOpts->{'square-bracket-vertical-tightness'} = $vt;
$rOpts->{'brace-vertical-tightness'} = $vt;
}
- if ( defined $rOpts->{'vertical-tightness-closing'} ) {
+ if ( defined( $rOpts->{'vertical-tightness-closing'} ) ) {
my $vtc = $rOpts->{'vertical-tightness-closing'};
$rOpts->{'paren-vertical-tightness-closing'} = $vtc;
$rOpts->{'square-bracket-vertical-tightness-closing'} = $vtc;
$rOpts->{'brace-vertical-tightness-closing'} = $vtc;
}
- if ( defined $rOpts->{'closing-token-indentation'} ) {
+ if ( defined( $rOpts->{'closing-token-indentation'} ) ) {
my $cti = $rOpts->{'closing-token-indentation'};
$rOpts->{'closing-square-bracket-indentation'} = $cti;
$rOpts->{'closing-brace-indentation'} = $cti;
}
}
return;
- };
+ }; ## end $check_blank_count = sub
# check for reasonable number of blank lines and fix to avoid problems
$check_blank_count->( 'blank-lines-before-subs', '-blbs' );
if ( $rOpts->{'opening-brace-on-new-line'} ) {
Warn(<<EOM);
- Conflict: you specified both 'opening-brace-always-on-right' (-bar) and
+ Conflict: you specified both 'opening-brace-always-on-right' (-bar) and
'opening-brace-on-new-line' (-bl). Ignoring -bl.
EOM
$rOpts->{'opening-brace-on-new-line'} = 0;
}
if ( $rOpts->{'brace-left-and-indent'} ) {
Warn(<<EOM);
- Conflict: you specified both 'opening-brace-always-on-right' (-bar) and
- '--brace-left-and-indent' (-bli). Ignoring -bli.
+ Conflict: you specified both 'opening-brace-always-on-right' (-bar) and
+ '--brace-left-and-indent' (-bli). Ignoring -bli.
EOM
$rOpts->{'brace-left-and-indent'} = 0;
}
$rOpts->{'default-tabsize'} = 8;
}
- # Check and clean up any use-feature list
- my $saw_use_feature_class;
- if ( $rOpts->{'use-feature'} ) {
- my $rseen = cleanup_word_list( $rOpts, 'use-feature' );
- $saw_use_feature_class = $rseen->{'class'};
- }
-
# Check and clean up any sub-alias-list
- if (
- defined( $rOpts->{'sub-alias-list'} )
- && length( $rOpts->{'sub-alias-list'} )
-
- || $saw_use_feature_class
- )
+ if ( defined( $rOpts->{'sub-alias-list'} )
+ && length( $rOpts->{'sub-alias-list'} ) )
{
my @forced_words;
# include 'sub' for convenience if this option is used
push @forced_words, 'sub';
- # use-feature=class requires method as a sub alias
- push @forced_words, 'method' if ($saw_use_feature_class);
-
cleanup_word_list( $rOpts, 'sub-alias-list', \@forced_words );
}
make_grep_alias_string($rOpts);
# Turn on fuzzy-line-length unless this is an extrude run, as determined
- # by the -i and -ci settings. Otherwise blinkers can form (case b935)
+ # by the -i and -ci settings. Otherwise blinkers can form (case b935).
+ # This is an undocumented parameter used only for stress-testing when
+ # --extrude is set.
if ( !$rOpts->{'fuzzy-line-length'} ) {
if ( $rOpts->{'maximum-line-length'} != 1
|| $rOpts->{'continuation-indentation'} != 0 )
$rOpts->{'logical-padding'} = 0;
}
- # Define $tabsize, the number of spaces per tab for use in
- # guessing the indentation of source lines with leading tabs.
- # Assume same as for this run if tabs are used, otherwise assume
- # a default value, typically 8
- $self->[_tabsize_] =
- $rOpts->{'entab-leading-whitespace'}
- ? $rOpts->{'entab-leading-whitespace'}
- : $rOpts->{'tabs'} ? $rOpts->{'indent-columns'}
- : $rOpts->{'default-tabsize'};
-
# Define the default line ending, before any -ple option is applied
$self->[_line_separator_default_] = get_line_separator_default($rOpts);
- return;
-} ## end sub check_options
-
-sub get_line_separator_default {
-
- my ( $rOpts, $input_file ) = @_;
-
- # Get the line separator that will apply unless overriden by a
- # --preserve-line-endings flag for a specific file
-
- my $line_separator_default = "\n";
-
- my $ole = $rOpts->{'output-line-ending'};
- if ($ole) {
- my %endings = (
- dos => "\015\012",
- win => "\015\012",
- mac => "\015",
- unix => "\012",
- );
-
- $line_separator_default = $endings{ lc $ole };
+ $self->[_line_tidy_begin_] = undef;
+ $self->[_line_tidy_end_] = undef;
+ my $line_range_tidy = $rOpts->{'line-range-tidy'};
+ if ($line_range_tidy) {
- if ( !$line_separator_default ) {
- my $str = join SPACE, keys %endings;
+ if ( $num_files > 1 ) {
Die(<<EOM);
-Unrecognized line ending '$ole'; expecting one of: $str
+--line-range-tidy expects no more than 1 filename in the arg list but saw $num_files filenames
EOM
}
- # Check for conflict with -ple
- if ( $rOpts->{'preserve-line-endings'} ) {
- Warn("Ignoring -ple; conflicts with -ole\n");
- $rOpts->{'preserve-line-endings'} = undef;
+ $line_range_tidy =~ s/\s+//g;
+ if ( $line_range_tidy =~ /^(\d+):(\d+)?$/ ) {
+ my $n1 = $1;
+ my $n2 = $2;
+ if ( $n1 < 1 ) {
+ Die(<<EOM);
+--line-range-tidy=n1:n2 expects starting line number n1>=1 but n1=$n1
+EOM
+ }
+ if ( defined($n2) && $n2 < $n1 ) {
+ Die(<<EOM);
+--line-range-tidy=n1:n2 expects ending line number n2>=n1 but n1=$n1 and n2=$n2
+EOM
+ }
+ $self->[_line_tidy_begin_] = $n1;
+ $self->[_line_tidy_end_] = $n2;
+ }
+ else {
+ Die(
+"unrecognized 'line-range-tidy'; expecting format '-lrt=n1:n2'\n"
+ );
}
}
- return $line_separator_default;
-
-} ## end sub get_line_separator_default
+ return;
+} ## end sub check_options
sub find_file_upwards {
+
my ( $search_dir, $search_file ) = @_;
+ # This implements the ... upward search for a file
+
$search_dir =~ s{/+$}{};
$search_file =~ s{^/+}{};
else {
$search_dir = dirname($search_dir);
}
- }
+ } ## end while (1)
# This return is for Perl-Critic.
# We shouldn't get out of the while loop without a return
sub expand_command_abbreviations {
# go through @ARGV and expand any abbreviations
+ # note that @ARGV has been localized
my ( $rexpansion, $rraw_options, $config_file ) = @_;
EOM
}
- if ($config_file) {
+ if ( defined($config_file) ) {
Die(<<"DIE");
-Please check your configuration file $config_file for circular-references.
+Please check your configuration file $config_file for circular-references.
To deactivate it, use -npro.
DIE
}
# Debug routine -- this will dump the expansion hash
sub dump_short_names {
my $rexpansion = shift;
- print STDOUT <<EOM;
+ print {*STDOUT} <<EOM;
List of short names. This list shows how all abbreviations are
translated into other abbreviations and, eventually, into long names.
-New abbreviations may be defined in a .perltidyrc file.
+New abbreviations may be defined in a .perltidyrc file.
For a list of all long names, use perltidy --dump-long-names (-dln).
--------------------------------------------------------------------------
EOM
foreach my $abbrev ( sort keys %{$rexpansion} ) {
my @list = @{ $rexpansion->{$abbrev} };
- print STDOUT "$abbrev --> @list\n";
+ print {*STDOUT} "$abbrev --> @list\n";
}
return;
} ## end sub dump_short_names
$base =~ s/;-?\d*$//
# remove explicit . version ie two dots in filename NB ^ escapes a dot
- or $base =~ s/( # begin capture $1
+ or $base =~ s{( # begin capture $1
(?:^|[^^])\. # match a dot not preceded by a caret
(?: # followed by nothing
| # or
)
) # end capture $1
\.-?\d*$ # match . version number
- /$1/x;
+ }{$1}x;
# normalize filename, if there are no unescaped dots then append one
- $base .= '.' unless $base =~ /(?:^|[^^])\./;
+ $base .= '.' unless ( $base =~ /(?:^|[^^])\./ );
# if we don't already have an extension then we just append the extension
my $separator = ( $base =~ /\.$/ ) ? EMPTY_STRING : "_";
sub Win_OS_Type {
- # TODO: are these more standard names?
- # Win32s Win95 Win98 WinMe WinNT3.51 WinNT4 Win2000 WinXP/.Net Win2003
+ my $rpending_complaint = shift;
# Returns a string that determines what MS OS we are on.
# Returns win32s,95,98,Me,NT3.51,NT4,2000,XP/.Net,Win2003
# Original code contributed by: Yves Orton
# We need to know this to decide where to look for config files
- my $rpending_complaint = shift;
- my $os = EMPTY_STRING;
- return $os unless $OSNAME =~ /win32|dos/i; # is it a MS box?
+ # TODO: are these more standard names?
+ # Win32s Win95 Win98 WinMe WinNT3.51 WinNT4 Win2000 WinXP/.Net Win2003
+
+ my $os = EMPTY_STRING;
+ return $os unless ( $OSNAME =~ /win32|dos/i ); # is it a MS box?
# Systems built from Perl source may not have Win32.pm
# But probably have Win32::GetOSVersion() anyway so the
1 => "XP/.Net",
2 => "Win2003",
51 => "NT3.51",
- }
+ },
}->{$id}->{$minor};
# If $os is undefined, the above code is out of date. Suggested updates
# are welcome.
- unless ( defined $os ) {
+ if ( !defined($os) ) {
$os = EMPTY_STRING;
# Deactivated this message 20180322 because it was needlessly
# sub to check file existence and record all tests
my $exists_config_file = sub {
my $config_file = shift;
- return 0 unless $config_file;
+ return 0 unless defined($config_file);
${$rconfig_file_chatter} .= "# Testing: $config_file\n";
return -f $config_file;
- };
+ }; ## end $exists_config_file = sub
# Sub to search upward for config file
my $resolve_config_file = sub {
# resolve <dir>/.../<file>, meaning look upwards from directory
my $config_file = shift;
- if ($config_file) {
+ if ( defined($config_file) ) {
if ( my ( $start_dir, $search_file ) =
( $config_file =~ m{^(.*)\.\.\./(.*)$} ) )
{
"# Searching Upward: $config_file\n";
$start_dir = '.' if !$start_dir;
$start_dir = Cwd::realpath($start_dir);
- if ( my $found_file =
- find_file_upwards( $start_dir, $search_file ) )
- {
+ my $found_file = find_file_upwards( $start_dir, $search_file );
+ if ( defined($found_file) ) {
$config_file = $found_file;
${$rconfig_file_chatter} .= "# Found: $config_file\n";
}
}
}
return $config_file;
- };
+ }; ## end $resolve_config_file = sub
my $config_file;
}
# Default environment vars.
- my @envs = qw(PERLTIDY HOME);
+ my @envs = qw( PERLTIDY HOME );
# Check the NT/2k/XP locations, first a local machine def, then a
# network def
- push @envs, qw(USERPROFILE HOMESHARE) if $OSNAME =~ /win32/i;
+ push @envs, qw( USERPROFILE HOMESHARE ) if $OSNAME =~ /win32/i;
# Now go through the environment ...
foreach my $var (@envs) {
}
# test ENV as directory:
- $config_file = catfile( $ENV{$var}, ".perltidyrc" );
+ $config_file = File::Spec->catfile( $ENV{$var}, ".perltidyrc" );
$config_file = $resolve_config_file->($config_file);
return $config_file if $exists_config_file->($config_file);
if ($is_Windows) {
- $config_file = catfile( $ENV{$var}, "perltidy.ini" );
+ $config_file =
+ File::Spec->catfile( $ENV{$var}, "perltidy.ini" );
$config_file = $resolve_config_file->($config_file);
return $config_file if $exists_config_file->($config_file);
}
if ($is_Windows) {
if ($Windows_type) {
- my ( $os, $system, $allusers ) =
+ my ( $os_uu, $system, $allusers ) =
Win_Config_Locs( $rpending_complaint, $Windows_type );
# Check All Users directory, if there is one.
# i.e. C:\Documents and Settings\User\perltidy.ini
if ($allusers) {
- $config_file = catfile( $allusers, ".perltidyrc" );
+ $config_file = File::Spec->catfile( $allusers, ".perltidyrc" );
return $config_file if $exists_config_file->($config_file);
- $config_file = catfile( $allusers, "perltidy.ini" );
+ $config_file = File::Spec->catfile( $allusers, "perltidy.ini" );
return $config_file if $exists_config_file->($config_file);
}
# Check system directory.
# retain old code in case someone has been able to create
# a file with a leading period.
- $config_file = catfile( $system, ".perltidyrc" );
+ $config_file = File::Spec->catfile( $system, ".perltidyrc" );
return $config_file if $exists_config_file->($config_file);
- $config_file = catfile( $system, "perltidy.ini" );
+ $config_file = File::Spec->catfile( $system, "perltidy.ini" );
return $config_file if $exists_config_file->($config_file);
}
}
# 9x/Me box. Contributed by: Yves Orton.
my ( $rpending_complaint, $os ) = @_;
- if ( !$os ) { $os = Win_OS_Type(); }
+ if ( !$os ) { $os = Win_OS_Type($rpending_complaint) }
return unless $os;
"I dont know a sensible place to look for config files on an $os system.\n";
return;
}
- return wantarray ? ( $os, $system, $allusers ) : $os;
+ return ( $os, $system, $allusers );
} ## end sub Win_Config_Locs
sub dump_config_file {
- my ( $fh, $config_file, $rconfig_file_chatter ) = @_;
- print STDOUT "${$rconfig_file_chatter}";
- if ($fh) {
- print STDOUT "# Dump of file: '$config_file'\n";
- while ( my $line = $fh->getline() ) { print STDOUT $line }
- my $ok = eval { $fh->close(); 1 };
- if ( !$ok && DEVEL_MODE ) {
- Fault("Could not close file handle(): $EVAL_ERROR\n");
- }
+ my ( $rconfig_string, $config_file, $rconfig_file_chatter ) = @_;
+ print {*STDOUT} "${$rconfig_file_chatter}";
+ if ($rconfig_string) {
+ my @lines = split /^/, ${$rconfig_string};
+ print {*STDOUT} "# Dump of file: '$config_file'\n";
+ foreach my $line (@lines) { print {*STDOUT} $line }
}
else {
- print STDOUT "# ...no config file found\n";
+ print {*STDOUT} "# ...no config file found\n";
}
return;
} ## end sub dump_config_file
+sub filter_unknown_options {
+
+ my (
+ $rconfig_string, $roption_category,
+ $rexpansion, $rconfig_file_chatter
+ ) = @_;
+
+ # Look through the configuration file for lines beginning with '---' and
+ # - remove the line if the option is unknown, or
+ # - remove the extra dash if the option is known
+ # See git #146 for discussion
+
+ # Given:
+ # $rconfig_string = string ref to a .perltidyrc configuration file
+ # $roption_category = ref to hash with long_names as key
+ # $rexpansion = ref to hash with abbreviations as key
+ # $rconfig_file_chatter = messages displayed in --dump-profile
+ #
+ # Update:
+ # $rconfig_string and $rconfig_file_chatter
+
+ # quick check to skip most files
+ if ( ${$rconfig_string} !~ /^\s*---\w/m ) { return }
+
+ my $new_config_string;
+ my $change_notices = EMPTY_STRING;
+ my @lines = split /^/, ${$rconfig_string};
+ foreach my $line (@lines) {
+ chomp $line;
+
+ # look for lines beginning with '---'
+ if ( $line && $line =~ /^\s*---(\w[\w-]*)/ ) {
+ my $word = $1;
+
+ # first look for a long name or an abbreviation
+ my $is_known = $roption_category->{$word} || $rexpansion->{$word};
+
+ # then look for prefix 'no' or 'no-' on a long name
+ if ( !$is_known && $word =~ s/^no-?// ) {
+ $is_known = $roption_category->{$word};
+ }
+
+ if ( !$is_known ) {
+ $change_notices .= "# removing unknown option line $line\n";
+ next;
+ }
+ else {
+ $change_notices .= "# accepting and fixing line $line\n";
+ $line =~ s/-//;
+ }
+ }
+ $new_config_string .= $line . "\n";
+ }
+
+ if ($change_notices) {
+ ${$rconfig_file_chatter} .= "# Filter operations:\n" . $change_notices;
+ ${$rconfig_string} = $new_config_string;
+ }
+ return;
+} ## end sub filter_unknown_options
+
sub read_config_file {
- my ( $fh, $config_file, $rexpansion ) = @_;
+ my ( $rconfig_string, $config_file, $rexpansion ) = @_;
+
+ # Read and process the contents of a perltidyrc command file
+
+ # Given:
+ # $rconfig_string = ref to the file as a string
+ # $config_file = name of the file, for error reporting
+ # $rexpansion = ref to hash of abbreviations; if this config file defines
+ # any abbreviations they will be added to it
+
+ # Return:
+ # \@config_list = ref to final parameters and values which will be
+ # placed in @ARGV for processing by GetOptions
+ # $death_message = error message returned if a fatal error occurs
my @config_list = ();
+ # remove side comments and join multiline quotes
+ my ( $rline_hash, $death_message ) =
+ strip_comments_and_join_quotes( $rconfig_string, $config_file );
+
# file is bad if non-empty $death_message is returned
- my $death_message = EMPTY_STRING;
+ if ($death_message) {
+ return ( \@config_list, $death_message );
+ }
my $name = undef;
- my $line_no;
my $opening_brace_line;
- while ( my $line = $fh->getline() ) {
- $line_no++;
- chomp $line;
- ( $line, $death_message ) =
- strip_comment( $line, $config_file, $line_no );
- last if ($death_message);
- next unless $line;
- $line =~ s/^\s*(.*?)\s*$/$1/; # trim both ends
- next unless $line;
+ foreach my $item ( @{$rline_hash} ) {
+ my $line = $item->{line};
+ my $line_no = $item->{line_no};
+ $line =~ s/^\s+//;
+ $line =~ s/\s+$//;
+ next unless ( length($line) );
my $body = $line;
# name { body } or name { or name { body
# See rules in perltidy's perldoc page
# Section: Other Controls - Creating a new abbreviation
- if ( $line =~ /^((\w+)\s*\{)(.*)?$/ ) {
- ( $name, $body ) = ( $2, $3 );
+ if ( $line =~ /^(?: (\w+) \s* \{ ) (.*)? $/x ) {
+ ( $name, $body ) = ( $1, $2 );
# Cannot start new abbreviation unless old abbreviation is complete
last if ($opening_brace_line);
last;
}
}
+ else {
+ # no abbreviations to untangle
+ }
# Now store any parameters
if ($body) {
$death_message =
"Didn't see a '}' to match the '{' at line $opening_brace_line in config file '$config_file'\n";
}
- my $ok = eval { $fh->close(); 1 };
- if ( !$ok && DEVEL_MODE ) {
- Fault("Could not close file handle(): $EVAL_ERROR\n");
- }
return ( \@config_list, $death_message );
} ## end sub read_config_file
-sub strip_comment {
+sub strip_comments_and_join_quotes {
- # Strip any comment from a command line
- my ( $instr, $config_file, $line_no ) = @_;
- my $msg = EMPTY_STRING;
+ my ( $rconfig_string, $config_file ) = @_;
- # check for full-line comment
- if ( $instr =~ /^\s*#/ ) {
- return ( EMPTY_STRING, $msg );
- }
+ # Tasks:
+ # 1. Strip comments from .perltidyrc lines
+ # 2. Join lines which are spanned by a quote
- # nothing to do if no comments
- if ( $instr !~ /#/ ) {
- return ( $instr, $msg );
- }
+ # Given:
+ # $rconfig_string = the configuration file
+ # $config_file = filename, for error messages
+ # Return:
+ # $rline_hash = hash with modified lines and their input numbers
+ # $msg = any error message; code will die on any message.
- # handle case of no quotes
- elsif ( $instr !~ /['"]/ ) {
+ # return variables
+ my $msg = EMPTY_STRING;
+ my $rline_hash = [];
- # We now require a space before the # of a side comment
- # this allows something like:
- # -sbcp=#
- # Otherwise, it would have to be quoted:
- # -sbcp='#'
- $instr =~ s/\s+\#.*$//;
- return ( $instr, $msg );
- }
+ # quote state variables
+ my $quote_char = EMPTY_STRING;
+ my $quote_start_line = EMPTY_STRING;
+ my $quote_start_line_no = -1;
+ my $in_string = EMPTY_STRING;
+ my $out_string = EMPTY_STRING;
- # handle comments and quotes
- my $outstr = EMPTY_STRING;
- my $quote_char = EMPTY_STRING;
- while (1) {
+ my @lines = split /^/, ${$rconfig_string};
+ my $line_no = 0;
- # looking for ending quote character
- if ($quote_char) {
- if ( $instr =~ /\G($quote_char)/gc ) {
- $quote_char = EMPTY_STRING;
- $outstr .= $1;
- }
- elsif ( $instr =~ /\G(.)/gc ) {
- $outstr .= $1;
- }
+ # loop over lines
+ foreach my $line (@lines) {
+ $line_no++;
+ $line =~ s/^\s+//;
+ $line =~ s/\s+$//;
+ next unless ( length($line) );
- # error..we reached the end without seeing the ending quote char
- else {
- $msg = <<EOM;
-Error reading file $config_file at line number $line_no.
-Did not see ending quote character <$quote_char> in this text:
-$instr
-Please fix this line or use -npro to avoid reading this file
-EOM
- last;
+ if ( !$quote_char ) {
+
+ # skip a full-line comment
+ if ( substr( $line, 0, 1 ) eq '#' ) {
+ next;
}
+ $in_string = $line;
+ $out_string = EMPTY_STRING;
}
-
- # accumulating characters and looking for start of a quoted string
else {
- if ( $instr =~ /\G([\"\'])/gc ) {
- $outstr .= $1;
- $quote_char = $1;
- }
- # Note: not yet enforcing the space-before-hash rule for side
- # comments if the parameter is quoted.
- elsif ( $instr =~ /\G#/gc ) {
- last;
- }
- elsif ( $instr =~ /\G(.)/gc ) {
- $outstr .= $1;
+ # treat previous newline as a space
+ $in_string = SPACE . $line;
+ }
+
+ # loop over string characters
+ # $in_string = the input string
+ # $out_string = the output string
+ # $quote_char = quote character being sought
+ while (1) {
+
+ # accumulating characters not in quote
+ if ( !$quote_char ) {
+
+ if ( $in_string =~ /\G([\"\'])/gc ) {
+
+ # starting new quote..
+ $out_string .= $1;
+ $quote_char = $1;
+ $quote_start_line_no = $line_no;
+ $quote_start_line = $line;
+ }
+ elsif ( $in_string =~ /\G#/gc ) {
+
+ # A space is required before the # of a side comment
+ # This allows something like:
+ # -sbcp=#
+ # Otherwise, it would have to be quoted:
+ # -sbcp='#'
+ if ( !length($out_string) || $out_string =~ s/\s+$// ) {
+ last;
+ }
+ $out_string .= '#';
+ }
+ elsif ( $in_string =~ /\G([^\#\'\"]+)/gc ) {
+
+ # neither quote nor side comment
+ $out_string .= $1;
+ }
+ else {
+
+ # end of line
+ last;
+ }
}
+
+ # looking for ending quote character
else {
- last;
+ if ( $in_string =~ /\G($quote_char)/gc ) {
+
+ # end of quote
+ $out_string .= $1;
+ $quote_char = EMPTY_STRING;
+ }
+ elsif ( $in_string =~ /\G([^$quote_char]+)/gc ) {
+
+ # accumulate quoted text
+ $out_string .= $1;
+ }
+ else {
+
+ # end of line
+ last;
+ }
}
+ } ## end while (1)
+
+ if ( !$quote_char ) {
+ push @{$rline_hash},
+ {
+ line => $out_string,
+ line_no => $line_no,
+ };
+ }
+
+ } ## end loop over lines
+
+ if ($quote_char) {
+ my $max_len = 80;
+ if ( length($quote_start_line) > $max_len ) {
+ $quote_start_line =
+ substr( $quote_start_line, 0, $max_len - 3 ) . '...';
}
+ $msg = <<EOM;
+Error: hit EOF reading file '$config_file' looking for end of quoted text
+which started at line $quote_start_line_no with quote character <$quote_char>:
+$quote_start_line
+Please fix or use -npro to avoid reading this file
+EOM
}
- return ( $outstr, $msg );
-} ## end sub strip_comment
+ return ( $rline_hash, $msg );
+} ## end sub strip_comments_and_join_quotes
sub parse_args {
last;
}
}
- }
+ } ## end while (1)
return ( \@body_parts, $msg );
} ## end sub parse_args
sub dump_long_names {
my @names = @_;
- print STDOUT <<EOM;
+ print {*STDOUT} <<EOM;
# Command line long names (passed to GetOptions)
#--------------------------------------------------
# here is a summary of the Getopt codes:
#--------------------------------------------------
EOM
- foreach my $name ( sort @names ) { print STDOUT "$name\n" }
+ foreach my $name ( sort @names ) { print {*STDOUT} "$name\n" }
return;
} ## end sub dump_long_names
+sub dump_integer_option_range {
+ my ($rinteger_option_range) = @_;
+ print {*STDOUT} "Option, min, max, default\n";
+ foreach my $key ( sort keys %{$rinteger_option_range} ) {
+ my ( $min, $max, $default ) = @{ $rinteger_option_range->{$key} };
+ foreach ( $min, $max, $default ) {
+ $_ = 'undef' unless defined($_);
+ }
+ print {*STDOUT} "$key, $min, $max, $default\n";
+ }
+ return;
+} ## end sub dump_integer_option_range
+
sub dump_defaults {
my @defaults = @_;
- print STDOUT "Default command line options:\n";
- foreach my $line ( sort @defaults ) { print STDOUT "$line\n" }
+ print {*STDOUT} "Default command line options:\n";
+ foreach my $line ( sort @defaults ) { print {*STDOUT} "$line\n" }
return;
} ## end sub dump_defaults
} ## end sub readable_options
sub show_version {
- print STDOUT <<"EOM";
-This is perltidy, v$VERSION
+ print {*STDOUT} <<"EOM";
+This is perltidy, v$VERSION
-Copyright 2000-2022, Steve Hancock
+Copyright 2000-2025 by Steve Hancock
Perltidy is free software and may be copied under the terms of the GNU
General Public License, which is included in the distribution files.
-Complete documentation for perltidy can be found using 'man perltidy'
-or on the internet at http://perltidy.sourceforge.net.
+Documentation can be found using 'man perltidy'
+or at GitHub https://perltidy.github.io/perltidy/
+or at metacpan https://metacpan.org/pod/distribution/Perl-Tidy/bin/perltidy
+or at Sourceforge https://perltidy.sourceforge.net
EOM
return;
} ## end sub show_version
sub usage {
- print STDOUT <<EOF;
+ print {*STDOUT} <<EOF;
This is perltidy version $VERSION, a perl script indenter. Usage:
perltidy [ options ] file1 file2 file3 ...
-f force perltidy to read a binary file
-g like -log but writes more detailed .LOG file, for debugging scripts
-opt write the set of options actually used to a .LOG file
- -npro ignore .perltidyrc configuration command file
- -pro=file read configuration commands from file instead of .perltidyrc
+ -npro ignore .perltidyrc configuration command file
+ -pro=file read configuration commands from file instead of .perltidyrc
-st send output to standard output, STDOUT
-se send all error output to standard error output, STDERR
-v display version number to standard output and quit
-bbvtl=s make -bbvt to apply to selected list of block types
-pt=n paren tightness (n=0, 1 or 2)
-sbt=n square bracket tightness (n=0, 1, or 2)
- -bvt=n brace vertical tightness,
+ -bvt=n brace vertical tightness,
n=(0=open, 1=close unless multiple steps on a line, 2=always close)
-pvt=n paren vertical tightness (see -bvt for n)
-sbvt=n square bracket vertical tightness (see -bvt for n)
- -bvtc=n closing brace vertical tightness:
+ -bvtc=n closing brace vertical tightness:
n=(0=open, 1=sometimes close, 2=always close)
-pvtc=n closing paren vertical tightness, see -bvtc for n.
-sbvtc=n closing square bracket vertical tightness, see -bvtc for n.
-lp line up parentheses, brackets, and non-BLOCK braces
-sfs add space before semicolon in for( ; ; )
-aws allow perltidy to add whitespace (default)
- -dws delete all old non-essential whitespace
+ -dws delete all old non-essential whitespace
-icb indent closing brace of a code block
- -cti=n closing indentation of paren, square bracket, or non-block brace:
+ -cti=n closing indentation of paren, square bracket, or non-block brace:
n=0 none, =1 align with opening, =2 one full indentation level
-icp equivalent to -cti=2
-wls=s want space left of tokens in string; i.e. -nwls='+ - * /'
-cbl=s list of blocks to cuddled, default 'try-catch-finally'
-dnl delete old newlines (default)
-l=n maximum line length; default n=80
- -bl opening brace on new line
+ -bl opening brace on new line
-sbl opening sub brace on new line. value of -bl is used if not given.
-bli opening brace on new line and indented
-bar opening brace always on right, even for long clauses
-bom break at old method call breakpoints: ->
-bok break at old list keyword breakpoints such as map, sort (default)
-bot break at old conditional (ternary ?:) operator breakpoints (default)
- -boa break at old attribute breakpoints
+ -boa break at old attribute breakpoints
-cab=n break at commas after a comma-arrow (=>):
n=0 break at all commas after =>
n=1 stable: break unless this breaks an existing one-line container
-cscp=s change closing side comment prefix to be other than '## end'
-cscl=s change closing side comment to apply to selected list of blocks
-csci=n minimum number of lines needed to apply a -csc tag, default n=6
- -csct=n maximum number of columns of appended text, default n=20
+ -csct=n maximum number of columns of appended text, default n=20
-cscw causes warning if old side comment is overwritten with -csc
-sbc use 'static block comments' identified by leading '##' (default)
Delete selected text
-dac delete all comments AND pod
- -dbc delete block comments
- -dsc delete side comments
+ -dbc delete block comments
+ -dsc delete side comments
-dp delete pod
Send selected text to a '.TEE' file
-tac tee all comments AND pod
- -tbc tee block comments
- -tsc tee side comments
- -tp tee pod
+ -tbc tee block comments
+ -tsc tee side comments
+ -tp tee pod
Outdenting
- -olq outdent long quoted strings (default)
+ -olq outdent long quoted strings (default)
-olc outdent a long block comment line
-ola outdent statement labels
-okw outdent control keywords (redo, next, last, goto, return)
A prefix of "n" negates short form toggle switches, and a prefix of "no"
negates the long forms. For example, -nasc means don't add missing
-semicolons.
+semicolons.
If you are unable to see this entire text, try "perltidy -h | more"
For more detailed information, and additional options, try "man perltidy",
-or go to the perltidy home page at http://perltidy.sourceforge.net
+or see https://metacpan.org/pod/distribution/Perl-Tidy/bin/perltidy
EOF
return;
errorfile => $errorfile,
teefile => $teefile,
debugfile => $debugfile,
- formatter => $formatter, # callback object (see below)
+ formatter => $formatter, # callback object (see below)
dump_options => $dump_options,
dump_options_type => $dump_options_type,
prefilter => $prefilter_coderef,
filename, an ARRAY reference, a SCALAR reference, or an object with
either a B<getline> or B<print> method, as appropriate.
- source - the source of the script to be formatted
- destination - the destination of the formatted output
- stderr - standard error output
- perltidyrc - the .perltidyrc file
- logfile - the .LOG file stream, if any
- errorfile - the .ERR file stream, if any
- dump_options - ref to a hash to receive parameters (see below),
- dump_options_type - controls contents of dump_options
- dump_getopt_flags - ref to a hash to receive Getopt flags
- dump_options_category - ref to a hash giving category of options
- dump_abbreviations - ref to a hash giving all abbreviations
+ source - the source of the script to be formatted
+ destination - the destination of the formatted output
+ stderr - standard error output
+ perltidyrc - the .perltidyrc file
+ logfile - the .LOG file stream, if any
+ errorfile - the .ERR file stream, if any
+ dump_options - ref to a hash to receive parameters (see below),
+ dump_options_type - controls contents of dump_options
+ dump_getopt_flags - ref to a hash to receive Getopt flags
+ dump_options_category - ref to a hash giving category of options
+ dump_abbreviations - ref to a hash giving all abbreviations
The following chart illustrates the logic used to decide how to
treat a parameter.
incomplete and should be ignored. Error messages in the B<stderr> stream will
indicate the cause of any problem.
-An exit value of 2 indicates that perltidy ran to completion but there there
+An exit value of 2 indicates that perltidy ran to completion but there
are warning messages in the B<stderr> stream related to parameter errors or
conflicts and/or warning messages in the B<errorfile> stream relating to
possible syntax errors in the source code being tidied.
F<.perltidyrc> configuration file, in the B<perltidyrc> parameter, and in the
B<argv> parameter.
-The B<-syn> (B<--check-syntax>) flag may be used with all source and
-destination streams except for standard input and output. However
-data streams which are not associated with a filename will
-be copied to a temporary file before being passed to Perl. This
-use of temporary files can cause somewhat confusing output from Perl.
-
If the B<-pbp> style is used it will typically be necessary to also
specify a B<-nst> flag. This is necessary to turn off the B<-st> flag
contained in the B<-pbp> parameter set which otherwise would direct
source => \$source_string,
destination => \$dest_string,
stderr => \$stderr_string,
- errorfile => \$errorfile_string, # ignored when -se flag is set
- ##phasers => 'stun', # uncomment to trigger an error
+ errorfile => \$errorfile_string, # ignored when -se flag is set
+ ##phasers => 'stun', # uncomment to trigger an error
);
if ($error) {
=head1 VERSION
-This man page documents Perl::Tidy version 20230309
+This man page documents Perl::Tidy version 20250105
=head1 LICENSE
use strict;
use warnings;
use English qw( -no_match_vars );
-our $VERSION = '20230309';
+our $VERSION = '20250105';
use constant EMPTY_STRING => q{};
use constant SPACE => q{ };
my $self = shift;
my $debug_file = $self->{_debug_file};
my $is_encoded_data = $self->{_is_encoded_data};
- my ( $fh, $filename ) =
- Perl::Tidy::streamhandle( $debug_file, 'w', $is_encoded_data );
+ my $fh = Perl::Tidy::streamhandle( $debug_file, 'w', $is_encoded_data );
if ( !$fh ) {
- Perl::Tidy::Warn("can't open $debug_file: $ERRNO\n");
+ Perl::Tidy::Warn("can't open debug file '$debug_file'\n");
}
$self->{_debug_file_opened} = 1;
$self->{_fh} = $fh;
my $self = shift;
if ( $self->{_debug_file_opened} ) {
- if ( !eval { $self->{_fh}->close(); 1 } ) {
-
- # ok, maybe no close function
+ my $fh = $self->{_fh};
+ my $debug_file = $self->{_debug_file};
+ if ( $fh
+ && $fh->can('close')
+ && $debug_file ne '-'
+ && !ref($debug_file) )
+ {
+ $fh->close()
+ or Perl::Tidy::Warn(
+ "can't close DEBUG file '$debug_file': $OS_ERROR\n");
}
}
return;
# to the .DEBUG file when the -D flag is entered.
my ( $self, $line_of_tokens ) = @_;
- my $input_line = $line_of_tokens->{_line_text};
-
- my $rtoken_type = $line_of_tokens->{_rtoken_type};
- my $rtokens = $line_of_tokens->{_rtokens};
- my $rlevels = $line_of_tokens->{_rlevels};
-
+ my $rtoken_type = $line_of_tokens->{_rtoken_type};
+ my $rtokens = $line_of_tokens->{_rtokens};
my $input_line_number = $line_of_tokens->{_line_number};
- my $line_type = $line_of_tokens->{_line_type};
- my ( $j, $num );
+## uncomment if needed:
+## my $input_line = $line_of_tokens->{_line_text};
+## my $rlevels = $line_of_tokens->{_rlevels};
+## my $line_type = $line_of_tokens->{_line_type};
my $token_str = "$input_line_number: ";
my $reconstructed_original = "$input_line_number: ";
my $pattern = EMPTY_STRING;
my @next_char = ( '"', '"' );
my $i_next = 0;
- unless ( $self->{_debug_file_opened} ) { $self->really_open_debug_file() }
+ if ( !$self->{_debug_file_opened} ) {
+ $self->really_open_debug_file();
+ }
my $fh = $self->{_fh};
foreach my $j ( 0 .. @{$rtoken_type} - 1 ) {
$pattern .= $rtoken_type->[$j];
}
$reconstructed_original .= $rtokens->[$j];
- $num = length( $rtokens->[$j] );
+ my $num = length( $rtokens->[$j] );
my $type_str = $rtoken_type->[$j];
# be sure there are no blank tokens (shouldn't happen)
+++ /dev/null
-#####################################################################
-#
-# The Perl::Tidy::DevNull class supplies a dummy print method
-#
-#####################################################################
-
-package Perl::Tidy::DevNull;
-use strict;
-use warnings;
-our $VERSION = '20230309';
-sub new { my $self = shift; return bless {}, $self }
-sub print { return }
-sub close { return }
-
-1;
-
# scanned at once for some particular condition of interest. It was
# particularly useful for developing guessing strategies.
#
-# NOTE: This feature is deactivated in final releases but can be
-# reactivated for debugging by un-commenting the 'I' options flag
-#
#####################################################################
package Perl::Tidy::Diagnostics;
use strict;
use warnings;
use English qw( -no_match_vars );
-our $VERSION = '20230309';
+our $VERSION = '20250105';
use constant EMPTY_STRING => q{};
return if ( $AUTOLOAD =~ /\bDESTROY$/ );
my ( $pkg, $fname, $lno ) = caller();
my $my_package = __PACKAGE__;
- print STDERR <<EOM;
+ print {*STDERR} <<EOM;
======================================================================
Error detected in package '$my_package', version $VERSION
Received unexpected AUTOLOAD call for sub '$AUTOLOAD'
-Called from package: '$pkg'
+Called from package: '$pkg'
Called from File '$fname' at line '$lno'
This error is probably due to a recent programming change
======================================================================
EOM
exit 1;
-}
+} ## end sub AUTOLOAD
sub DESTROY {
_input_file => EMPTY_STRING,
_fh => undef,
}, $class;
-}
+} ## end sub new
sub set_input_file {
my ( $self, $input_file ) = @_;
}
sub write_diagnostics {
- my ( $self, $msg ) = @_;
+ my ( $self, $msg, $line_number ) = @_;
+
+ # Write a message to the diagnostics file
+ # Input parameters:
+ # $msg = string describing the event
+ # $line_number = optional line number
- unless ( $self->{_write_diagnostics_count} ) {
+ if ( !$self->{_write_diagnostics_count} ) {
open( $self->{_fh}, ">", "DIAGNOSTICS" )
- or Perl::Tidy::Die("couldn't open DIAGNOSTICS: $ERRNO\n");
+ or Perl::Tidy::Die("couldn't open DIAGNOSTICS: $OS_ERROR\n");
+ }
+
+ if ( defined($line_number) ) {
+ $msg = "$line_number:\t$msg";
}
my $fh = $self->{_fh};
$fh->print("\nFILE:$input_file\n");
}
$self->{_last_diagnostic_file} = $input_file;
- my $input_line_number = Perl::Tidy::Tokenizer::get_input_line_number();
- $fh->print("$input_line_number:\t$msg");
+ $fh->print($msg);
$self->{_write_diagnostics_count}++;
return;
-}
+} ## end sub write_diagnostics
1;
-
#####################################################################
#
-# the Perl::Tidy::FileWriter class writes the output file
+# The Perl::Tidy::FileWriter class writes the output file created
+# by the formatter. It receives each output line and performs some
+# important monitoring services. These include:
+#
+# - Verifying that lines do not go out with tokens in the wrong order
+# - Checking for obvious iteration convergence when all output tokens
+# match all input tokens
+# - Keeping track of consecutive blank and non-blank lines
+# - Looking for line lengths which exceed the maximum requested length
+# - Reporting results to the log file
#
#####################################################################
package Perl::Tidy::FileWriter;
use strict;
use warnings;
-our $VERSION = '20230309';
+our $VERSION = '20250105';
+use Carp;
use constant DEVEL_MODE => 0;
use constant EMPTY_STRING => q{};
+# A limit on message length when a fault is detected
+use constant LONG_MESSAGE => 256;
+
+# Maximum number of little messages; probably need not be changed.
+use constant MAX_NAG_MESSAGES => 6;
+
sub AUTOLOAD {
# Catch any undefined sub calls so that we are sure to get
return if ( $AUTOLOAD =~ /\bDESTROY$/ );
my ( $pkg, $fname, $lno ) = caller();
my $my_package = __PACKAGE__;
- print STDERR <<EOM;
+ print {*STDERR} <<EOM;
======================================================================
Error detected in package '$my_package', version $VERSION
Received unexpected AUTOLOAD call for sub '$AUTOLOAD'
-Called from package: '$pkg'
+Called from package: '$pkg'
Called from File '$fname' at line '$lno'
This error is probably due to a recent programming change
======================================================================
# required to avoid call to AUTOLOAD in some versions of perl
}
-my $input_stream_name = EMPTY_STRING;
-
-# Maximum number of little messages; probably need not be changed.
-use constant MAX_NAG_MESSAGES => 6;
-
BEGIN {
# Array index names for variables.
# Do not combine with other BEGIN blocks (c101).
my $i = 0;
use constant {
- _line_sink_object_ => $i++,
_logger_object_ => $i++,
_rOpts_ => $i++,
_output_line_number_ => $i++,
_K_sequence_error_msg_ => $i++,
_K_last_arrival_ => $i++,
_save_logfile_ => $i++,
+ _routput_string_ => $i++,
+ _input_stream_name_ => $i++,
};
} ## end BEGIN
sub Die {
my ($msg) = @_;
Perl::Tidy::Die($msg);
- return;
+ croak "unexpected return from Perl::Tidy::Die";
}
sub Fault {
- my ($msg) = @_;
+ my ( $self, $msg ) = @_;
# This routine is called for errors that really should not occur
# except if there has been a bug introduced by a recent program change.
# Please add comments at calls to Fault to explain why the call
# should not occur, and where to look to fix it.
- my ( $package0, $filename0, $line0, $subroutine0 ) = caller(0);
- my ( $package1, $filename1, $line1, $subroutine1 ) = caller(1);
- my ( $package2, $filename2, $line2, $subroutine2 ) = caller(2);
+ my ( $package0_uu, $filename0_uu, $line0, $subroutine0_uu ) = caller(0);
+ my ( $package1_uu, $filename1, $line1, $subroutine1 ) = caller(1);
+ my ( $package2_uu, $filename2_uu, $line2_uu, $subroutine2 ) = caller(2);
my $pkg = __PACKAGE__;
+ # Catch potential error of Fault not called as a method
+ my $input_stream_name;
+ if ( !ref($self) ) {
+ $input_stream_name = "(UNKNOWN)";
+ $msg = "Fault not called as a method - please fix\n";
+ if ( $self && length($self) < LONG_MESSAGE ) { $msg .= $self }
+ $self = undef;
+ }
+ else {
+ $input_stream_name = $self->[_input_stream_name_];
+ }
+
Die(<<EOM);
==============================================================================
While operating on input stream with name: '$input_stream_name'
==============================================================================
EOM
- # This return is to keep Perl-Critic from complaining.
- return;
+ croak "unexpected return from sub Die";
} ## end sub Fault
sub warning {
my ( $self, $msg ) = @_;
+
+ # log a warning message from any caller
my $logger_object = $self->[_logger_object_];
if ($logger_object) { $logger_object->warning($msg); }
return;
my ( $class, $line_sink_object, $rOpts, $logger_object ) = @_;
my $self = [];
- $self->[_line_sink_object_] = $line_sink_object;
+ bless $self, $class;
$self->[_logger_object_] = $logger_object;
$self->[_rOpts_] = $rOpts;
$self->[_output_line_number_] = 1;
$self->[_K_arrival_order_matches_] = 0;
$self->[_K_sequence_error_msg_] = EMPTY_STRING;
$self->[_K_last_arrival_] = -1;
- $self->[_save_logfile_] = defined($logger_object);
+ $self->[_save_logfile_] =
+ defined($logger_object) && $logger_object->get_save_logfile();
- # save input stream name for local error messages
- $input_stream_name = EMPTY_STRING;
+ # '$line_sink_object' is a SCALAR ref which receives the lines.
+ my $ref = ref($line_sink_object);
+ if ( !$ref ) {
+ $self->Fault("FileWriter expects line_sink_object to be a ref\n");
+ }
+ elsif ( $ref eq 'SCALAR' ) {
+ $self->[_routput_string_] = $line_sink_object;
+ }
+ else {
+ my $str = $ref;
+ if ( length($str) > 63 ) { $str = substr( $str, 0, 60 ) . '...' }
+ $self->Fault(<<EOM);
+FileWriter expects 'line_sink_object' to be ref to SCALAR but it is ref to:
+$str
+EOM
+ }
+
+ my $input_stream_name = EMPTY_STRING;
if ($logger_object) {
$input_stream_name = $logger_object->get_input_stream_name();
}
+ $self->[_input_stream_name_] = $input_stream_name;
- bless $self, $class;
return $self;
} ## end sub new
sub setup_convergence_test {
my ( $self, $rlist ) = @_;
+
+ # Setup the convergence test,
+
+ # Given:
+ # $rlist = a reference to a list of line-ending token indexes 'K' of
+ # the input stream. We will compare these with the line-ending token
+ # indexes of the output stream. If they are identical, then we have
+ # convergence.
if ( @{$rlist} ) {
- # We are going to destroy the list, so make a copy
- # and put in reverse order so we can pop values
+ # We are going to destroy the list, so make a copy and put in
+ # reverse order so we can pop values as they arrive
my @list = @{$rlist};
if ( $list[0] < $list[-1] ) {
@list = reverse @list;
}
$self->[_rK_checklist_] = \@list;
}
+
+ # We will zero this flag on any error in arrival order:
$self->[_K_arrival_order_matches_] = 1;
$self->[_K_sequence_error_msg_] = EMPTY_STRING;
$self->[_K_last_arrival_] = -1;
sub get_convergence_check {
my ($self) = @_;
- my $rlist = $self->[_rK_checklist_];
- # converged if all K arrived and in correct order
- return $self->[_K_arrival_order_matches_] && !@{$rlist};
+ # converged if:
+ # - all expected indexes arrived
+ # - and in correct order
+ return !@{ $self->[_rK_checklist_] }
+ && $self->[_K_arrival_order_matches_];
+
} ## end sub get_convergence_check
sub get_output_line_number {
- return $_[0]->[_output_line_number_];
+ my $self = shift;
+ return $self->[_output_line_number_];
}
sub decrement_output_line_number {
- $_[0]->[_output_line_number_]--;
+ my $self = shift;
+ $self->[_output_line_number_]--;
return;
}
sub get_consecutive_nonblank_lines {
- return $_[0]->[_consecutive_nonblank_lines_];
+ my $self = shift;
+ return $self->[_consecutive_nonblank_lines_];
}
sub get_consecutive_blank_lines {
- return $_[0]->[_consecutive_blank_lines_];
+ my $self = shift;
+ return $self->[_consecutive_blank_lines_];
}
sub reset_consecutive_blank_lines {
- $_[0]->[_consecutive_blank_lines_] = 0;
- return;
-}
-
-# This sub call allows termination of logfile writing for efficiency when we
-# know that the logfile will not be saved.
-sub set_save_logfile {
- my ( $self, $save_logfile ) = @_;
- $self->[_save_logfile_] = $save_logfile;
+ my $self = shift;
+ $self->[_consecutive_blank_lines_] = 0;
return;
}
sub want_blank_line {
my $self = shift;
- unless ( $self->[_consecutive_blank_lines_] ) {
+ if ( !$self->[_consecutive_blank_lines_] ) {
$self->write_blank_code_line();
}
return;
} ## end sub want_blank_line
sub require_blank_code_lines {
+ my ( $self, $count ) = @_;
- # write out the requested number of blanks regardless of the value of -mbl
+ # Given:
+ # $count = number of blank lines to write
+ # Write out $count blank lines regardless of the value of -mbl
# unless -mbl=0. This allows extra blank lines to be written for subs and
# packages even with the default -mbl=1
- my ( $self, $count ) = @_;
my $need = $count - $self->[_consecutive_blank_lines_];
my $rOpts = $self->[_rOpts_];
my $forced = $rOpts->{'maximum-consecutive-blank-lines'} > 0;
} ## end sub require_blank_code_lines
sub write_blank_code_line {
- my ( $self, $forced ) = @_;
+ my ( $self, ($forced) ) = @_;
# Write a blank line of code, given:
# $forced = optional flag which, if set, forces the blank line
return;
}
- $self->[_line_sink_object_]->write_line("\n");
- $self->[_output_line_number_]++;
+ ${ $self->[_routput_string_] } .= "\n";
+ $self->[_output_line_number_]++;
$self->[_consecutive_blank_lines_]++;
$self->[_consecutive_new_blank_lines_]++ if ($forced);
$self->[_consecutive_blank_lines_] = 0;
$self->[_consecutive_new_blank_lines_] = 0;
$self->[_consecutive_nonblank_lines_]++;
+ $self->[_output_line_number_]++;
+
+ ${ $self->[_routput_string_] } .= $str;
- $self->[_line_sink_object_]->write_line($str);
- if ( chomp $str ) { $self->[_output_line_number_]++; }
if ( $self->[_save_logfile_] ) { $self->check_line_lengths($str) }
#----------------------------
# caused lines to go out in the wrong order. This could happen if
# either the cache or buffer that it uses are emptied in the wrong
# order.
- if ( !$self->[_K_sequence_error_msg_] ) {
+ if ( $K < $self->[_K_last_arrival_]
+ && !$self->[_K_sequence_error_msg_] )
+ {
my $K_prev = $self->[_K_last_arrival_];
- if ( $K < $K_prev ) {
- chomp $str;
- if ( length($str) > MAX_PRINTED_CHARS ) {
- $str = substr( $str, 0, MAX_PRINTED_CHARS ) . "...";
- }
- my $msg = <<EOM;
-While operating on input stream with name: '$input_stream_name'
+ chomp $str;
+ if ( length($str) > MAX_PRINTED_CHARS ) {
+ $str = substr( $str, 0, MAX_PRINTED_CHARS ) . "...";
+ }
+
+ my $msg = <<EOM;
Lines have arrived out of order in sub 'write_code_line'
as detected by token index K=$K arriving after index K=$K_prev in the following line:
$str
This is probably due to a recent programming change and needs to be fixed.
EOM
- # Always die during development, this needs to be fixed
- if (DEVEL_MODE) { Fault($msg) }
+ # Always die during development, this needs to be fixed
+ if (DEVEL_MODE) { $self->Fault($msg) }
- # Otherwise warn if string is not empty (added for b1378)
- $self->warning($msg) if ( length($str) );
+ # Otherwise warn if string is not empty (added for b1378)
+ $self->warning($msg) if ( length($str) );
- # Only issue this warning once
- $self->[_K_sequence_error_msg_] = $msg;
+ # Only issue this warning once
+ $self->[_K_sequence_error_msg_] = $msg;
- }
}
$self->[_K_last_arrival_] = $K;
}
# Write a line directly to the output, without any counting of blank or
# non-blank lines.
- $self->[_line_sink_object_]->write_line($str);
+ # Given:
+ # $str = line of text to write
+
+ ${ $self->[_routput_string_] } .= $str;
+
if ( chomp $str ) { $self->[_output_line_number_]++; }
if ( $self->[_save_logfile_] ) { $self->check_line_lengths($str) }
sub check_line_lengths {
my ( $self, $str ) = @_;
- # collect info on line lengths for logfile
+ # Collect info on line lengths for logfile
+ # Given:
+ # $str = line of text being written
# This calculation of excess line length ignores any internal tabs
- my $rOpts = $self->[_rOpts_];
+ my $rOpts = $self->[_rOpts_];
+ chomp $str;
my $len_str = length($str);
my $exceed = $len_str - $rOpts->{'maximum-line-length'};
if ( $str && substr( $str, 0, 1 ) eq "\t" && $str =~ /^\t+/g ) {
#
# The Perl::Tidy::Formatter package adds indentation, whitespace, and
# line breaks to the token stream
+
+# Usage Outline:
+#
+# STEP 1: initialize or re-initialize Formatter with user options
+# Perl::Tidy::Formatter::check_options($rOpts);
+#
+# STEP 2: crate a tokenizer for the source stream
+#
+# STEP 3: create a formatter for the destination stream
+# my $formatter = Perl::Tidy::Formatter->new(
+# ...
+# sink_object => $destination,
+# ...
+# );
+#
+# STEP 4: process each input line (see sub Perl::Tidy::process_single_case)
+# while ( my $line = $tokenizer->get_line() ) {
+# $formatter->write_line($line);
+# }
+#
+# STEP 4: finish formatting
+# $formatter->finish_formatting($severe_error);
#
#####################################################################
use constant DEVEL_MODE => 0;
use constant EMPTY_STRING => q{};
use constant SPACE => q{ };
+use constant BACKSLASH => q{\\};
{ #<<< A non-indenting brace to contain all lexical variables
use Carp;
use English qw( -no_match_vars );
-use List::Util qw( min max ); # min, max are in Perl 5.8
-our $VERSION = '20230309';
+use List::Util qw( min max first ); # min, max first are in Perl 5.8
+our $VERSION = '20250105';
# The Tokenizer will be loaded with the Formatter
##use Perl::Tidy::Tokenizer; # for is_keyword()
return if ( $AUTOLOAD =~ /\bDESTROY$/ );
my ( $pkg, $fname, $lno ) = caller();
my $my_package = __PACKAGE__;
- print STDERR <<EOM;
+ print {*STDERR} <<EOM;
======================================================================
Error detected in package '$my_package', version $VERSION
Received unexpected AUTOLOAD call for sub '$AUTOLOAD'
-Called from package: '$pkg'
+Called from package: '$pkg'
Called from File '$fname' at line '$lno'
This error is probably due to a recent programming change
======================================================================
sub DESTROY {
my $self = shift;
- $self->_decrement_count();
+ _decrement_count();
return;
}
# except if there has been a bug introduced by a recent program change.
# Please add comments at calls to Fault to explain why the call
# should not occur, and where to look to fix it.
- my ( $package0, $filename0, $line0, $subroutine0 ) = caller(0);
- my ( $package1, $filename1, $line1, $subroutine1 ) = caller(1);
- my ( $package2, $filename2, $line2, $subroutine2 ) = caller(2);
+ my ( $package0_uu, $filename0_uu, $line0, $subroutine0_uu ) = caller(0);
+ my ( $package1_uu, $filename1, $line1, $subroutine1 ) = caller(1);
+ my ( $package2_uu, $filename2_uu, $line2_uu, $subroutine2 ) = caller(2);
my $pkg = __PACKAGE__;
my $input_stream_name = get_input_stream_name();
$pkg reports VERSION='$VERSION'.
==============================================================================
EOM
-
- # We shouldn't get here, but this return is to keep Perl-Critic from
- # complaining.
- return;
+ croak "unexpected return from sub Die";
} ## end sub Fault
sub Fault_Warn {
# This is the same as Fault except that it calls Warn instead of Die
# and returns.
- my ( $package0, $filename0, $line0, $subroutine0 ) = caller(0);
- my ( $package1, $filename1, $line1, $subroutine1 ) = caller(1);
- my ( $package2, $filename2, $line2, $subroutine2 ) = caller(2);
+ my ( $package0_uu, $filename0_uu, $line0, $subroutine0_uu ) = caller(0);
+ my ( $package1_uu, $filename1, $line1, $subroutine1 ) = caller(1);
+ my ( $package2_uu, $filename2_uu, $line2_uu, $subroutine2 ) = caller(2);
my $input_stream_name = get_input_stream_name();
Warn(<<EOM);
# Section 1: Global variables which are either always constant or
# are constant after being configured by user-supplied
# parameters. They remain constant as a file is being processed.
+ # The INITIALIZER comment tells the sub responsible for initializing
+ # each variable. Failure to initialize or re-initialize a global
+ # variable can cause bugs which are hard to locate.
#-----------------------------------------------------------------
# INITIALIZER: sub check_options
$rOpts_add_newlines,
$rOpts_add_whitespace,
$rOpts_add_trailing_commas,
+ $rOpts_add_lone_trailing_commas,
$rOpts_blank_lines_after_opening_block,
$rOpts_block_brace_tightness,
$rOpts_block_brace_vertical_tightness,
$rOpts_delete_old_whitespace,
$rOpts_delete_side_comments,
$rOpts_delete_trailing_commas,
+ $rOpts_delete_lone_trailing_commas,
$rOpts_delete_weld_interfering_commas,
$rOpts_extended_continuation_indentation,
$rOpts_format_skipping,
$rOpts_fuzzy_line_length,
$rOpts_ignore_old_breakpoints,
$rOpts_ignore_side_comment_lengths,
+ $rOpts_ignore_perlcritic_comments,
$rOpts_indent_closing_brace,
$rOpts_indent_columns,
+ $rOpts_indent_leading_semicolon,
$rOpts_indent_only,
$rOpts_keep_interior_semicolons,
$rOpts_line_up_parentheses,
$rOpts_maximum_consecutive_blank_lines,
$rOpts_maximum_fields_per_table,
$rOpts_maximum_line_length,
+ $rOpts_minimize_continuation_indentation,
$rOpts_one_line_block_semicolons,
$rOpts_opening_brace_always_on_right,
$rOpts_outdent_keywords,
$rOpts_outdent_long_quotes,
$rOpts_outdent_static_block_comments,
$rOpts_recombine,
+ $rOpts_qw_as_function,
$rOpts_short_concatenation_item_length,
$rOpts_space_prototype_paren,
+ $rOpts_space_signature_paren,
$rOpts_stack_closing_block_brace,
$rOpts_static_block_comments,
+ $rOpts_add_missing_else,
+ $rOpts_warn_missing_else,
$rOpts_tee_block_comments,
$rOpts_tee_pod,
$rOpts_tee_side_comments,
$rOpts_variable_maximum_line_length,
$rOpts_valign_code,
$rOpts_valign_side_comments,
+ $rOpts_valign_if_unless,
+ $rOpts_valign_wide_equals,
$rOpts_whitespace_cycle,
+ $rOpts_extended_block_tightness,
$rOpts_extended_line_up_parentheses,
# Static hashes
%is_if_unless_and_or_last_next_redo_return,
%is_if_elsif_else_unless_while_until_for_foreach,
%is_if_unless_while_until_for_foreach,
+ %is_for_foreach,
%is_last_next_redo_return,
%is_if_unless,
%is_if_elsif,
%is_anon_sub_brace_follower,
%is_anon_sub_1_brace_follower,
%is_other_brace_follower,
+ %is_kwU,
+ %is_re_match_op,
+ %is_my_state_our,
+ %is_keyword_with_special_leading_term,
# INITIALIZER: sub check_options
$controlled_comma_style,
+
+ # INITIALIZER: sub initialize_tightness_vars
+ %tightness,
+
+ # INITIALIZER: sub initialize_multiple_token_tightness
+ %multiple_token_tightness,
+
+ # INITIALIZER: initialize_old_breakpoint_controls
%keep_break_before_type,
%keep_break_after_type,
- %outdent_keyword,
- %keyword_paren_inner_tightness,
+
+ # INITIALIZER: initialize_container_indentation_options
%container_indentation_options,
- %tightness,
+
+ # INITIALIZER: sub initialize_lpxl_lpil
%line_up_parentheses_control_hash,
- $line_up_parentheses_control_is_lxpl,
+ $line_up_parentheses_control_is_lpxl,
+
+ # INITIALIZER: sub outdent_keyword
+ %outdent_keyword,
+
+ # INITIALIZER: sub initialize_keyword_paren_inner_tightness
+ %keyword_paren_inner_tightness,
# These can be modified by grep-alias-list
# INITIALIZER: sub initialize_grep_and_friends
# INITIALIZER: sub initialize_space_after_keyword
%space_after_keyword,
+ # INITIALIZER: sub initialize_extended_block_tightness_list
+ %extended_block_tightness_list,
+
# INITIALIZED BY initialize_global_option_vars
%opening_vertical_tightness,
%closing_vertical_tightness,
# INITIALIZER: sub initialize_trailing_comma_rules
%trailing_comma_rules,
+ # INITIALIZER: sub initialize_trailing_comma_break_rules
+ %trailing_comma_break_rules,
+
+ # INITIALIZER: sub initialize_interbracket_arrow_style
+ %interbracket_arrow_style,
+
+ # INITIALIZER: sub initialize_call_paren_style
+ %call_paren_style,
+
+ # INITIALIZER: sub initialize_warn_variable_types
+ $rwarn_variable_types,
+ $ris_warn_variable_excluded_name,
+
+ # INITIALIZER: sub initialize_warn_mismatched_args
+ $rwarn_mismatched_arg_types,
+ $ris_warn_mismatched_arg_excluded_name,
+
+ # INITIALIZER: sub initialize_warn_mismatched_returns
+ $rwarn_mismatched_return_types,
+ $ris_warn_mismatched_return_excluded_name,
+
# regex patterns for text identification.
# Most can be configured by user parameters.
# Most are initialized in a sub make_**_pattern during configuration.
# INITIALIZER: sub make_sub_matching_pattern
$SUB_PATTERN,
$ASUB_PATTERN,
+ %matches_ASUB,
# INITIALIZER: make_static_block_comment_pattern
$static_block_comment_pattern,
# INITIALIZER: sub make_closing_side_comment_list_pattern
$closing_side_comment_list_pattern,
+ $closing_side_comment_want_asub,
+ $closing_side_comment_exclusion_pattern,
# Table to efficiently find indentation and max line length
# from level.
_CI_LEVEL_ => $i++,
_CUMULATIVE_LENGTH_ => $i++,
_LINE_INDEX_ => $i++,
- _KNEXT_SEQ_ITEM_ => $i++,
_LEVEL_ => $i++,
_TOKEN_ => $i++,
_TOKEN_LENGTH_ => $i++,
_Klimit_ => $i++,
_rdepth_of_opening_seqno_ => $i++,
_rSS_ => $i++,
- _Iss_opening_ => $i++,
- _Iss_closing_ => $i++,
+ _rI_opening_ => $i++,
+ _rI_closing_ => $i++,
+ _rK_next_seqno_by_K_ => $i++,
_rblock_type_of_seqno_ => $i++,
_ris_asub_block_ => $i++,
_ris_sub_block_ => $i++,
_K_closing_container_ => $i++,
_K_opening_ternary_ => $i++,
_K_closing_ternary_ => $i++,
- _K_first_seq_item_ => $i++,
+ _rK_sequenced_token_list_ => $i++,
_rtype_count_by_seqno_ => $i++,
_ris_function_call_paren_ => $i++,
_rlec_count_by_seqno_ => $i++,
_vertical_aligner_object_ => $i++,
_logger_object_ => $i++,
_radjusted_levels_ => $i++,
- _this_batch_ => $i++,
_ris_special_identifier_token_ => $i++,
_last_output_short_opening_token_ => $i++,
_in_brace_tabbing_disagreement_ => $i++,
_saw_VERSION_in_this_file_ => $i++,
+ _saw_use_strict_ => $i++,
_saw_END_or_DATA_ => $i++,
_rK_weld_left_ => $i++,
_rbreak_before_Kfirst_ => $i++,
_rbreak_after_Klast_ => $i++,
_converged_ => $i++,
+ _want_second_iteration_ => $i++,
_rstarting_multiline_qw_seqno_by_K_ => $i++,
_rending_multiline_qw_seqno_by_K_ => $i++,
_rmax_vertical_tightness_ => $i++,
_no_vertical_tightness_flags_ => $i++,
+ _last_vt_type_ => $i++,
+ _rwant_arrow_before_seqno_ => $i++,
+
+ _rseqno_arrow_call_chain_start_ => $i++,
+ _rarrow_call_chain_ => $i++,
+
+ # these vars are defined after call to respace tokens:
+ _rK_package_list_ => $i++,
+ _rK_AT_underscore_by_sub_seqno_ => $i++,
+ _rK_first_self_by_sub_seqno_ => $i++,
+ _rK_bless_by_sub_seqno_ => $i++,
+ _rK_return_by_sub_seqno_ => $i++,
+ _rK_wantarray_by_sub_seqno_ => $i++,
+ _rK_sub_by_seqno_ => $i++,
+ _ris_my_sub_by_seqno_ => $i++,
+ _rsub_call_paren_info_by_seqno_ => $i++,
+ _rDOLLAR_underscore_by_sub_seqno_ => $i++,
+ _this_batch_ => $i++,
_LAST_SELF_INDEX_ => $i - 1,
};
BEGIN {
- # Index names for batch variables.
+ # Index names for variables stored in _this_batch_.
# Do not combine with other BEGIN blocks (c101).
- # These are stored in _this_batch_, which is a sub-array of $self.
my $i = 0;
use constant {
_starting_in_quote_ => $i++,
# Initialize constant hashes ...
my @q;
- @q = qw(
- = **= += *= &= <<= &&=
- -= /= |= >>= ||= //=
- .= %= ^=
- x=
- );
+ @q = qw( = **= += *= &= <<= &&= -= /= |= >>= ||= //= .= %= ^= x= );
@is_assignment{@q} = (1) x scalar(@q);
# a hash needed by break_lists for efficiency:
push @q, qw{ ; < > ~ f };
@is_non_list_type{@q} = (1) x scalar(@q);
- @q = qw(is if unless and or err last next redo return);
+ @q = qw( is if unless and or err last next redo return );
@is_if_unless_and_or_last_next_redo_return{@q} = (1) x scalar(@q);
# These block types may have text between the keyword and opening
# curly. Note: 'else' does not, but must be included to allow trailing
# if/elsif text to be appended.
# patch for SWITCH/CASE: added 'case' and 'when'
- @q = qw(if elsif else unless while until for foreach case when catch);
+ @q = qw( if elsif else unless while until for foreach case when catch );
@is_if_elsif_else_unless_while_until_for_foreach{@q} =
(1) x scalar(@q);
- @q = qw(if unless while until for foreach);
+ # These can either have the BLOCK form or trailing modifier form:
+ @q = qw( if unless while until for foreach );
@is_if_unless_while_until_for_foreach{@q} =
(1) x scalar(@q);
- @q = qw(last next redo return);
+ # These can have several forms
+ @q = qw( for foreach );
+ @is_for_foreach{@q} = (1) x scalar(@q);
+
+ @q = qw( last next redo return );
@is_last_next_redo_return{@q} = (1) x scalar(@q);
# Map related block names into a common name to allow vertical alignment
'grep' => 'map',
);
- @q = qw(if unless);
+ @q = qw( if unless );
@is_if_unless{@q} = (1) x scalar(@q);
- @q = qw(if elsif);
+ @q = qw( if elsif );
@is_if_elsif{@q} = (1) x scalar(@q);
- @q = qw(if unless elsif);
+ @q = qw( if unless elsif );
@is_if_unless_elsif{@q} = (1) x scalar(@q);
- @q = qw(if unless elsif else);
+ @q = qw( if unless elsif else );
@is_if_unless_elsif_else{@q} = (1) x scalar(@q);
- @q = qw(elsif else);
+ @q = qw( elsif else );
@is_elsif_else{@q} = (1) x scalar(@q);
- @q = qw(and or err);
+ @q = qw( and or err );
@is_and_or{@q} = (1) x scalar(@q);
# Identify certain operators which often occur in chains.
# Checkbutton => 'Transmission checked',
# -variable => \$TRANS
# This usually improves appearance so it seems ok.
- @q = qw(&& || and or : ? . + - * /);
+ @q = qw( && || and or : ? . + - * / );
@is_chain_operator{@q} = (1) x scalar(@q);
# Operators that the user can request break before or after.
# Note that some are keywords
- @all_operators = qw(% + - * / x != == >= <= =~ !~ < > | &
+ @all_operators = qw{
+ % + - * / x != == >= <= =~ !~ < > | &
= **= += *= &= <<= &&= -= /= |= >>= ||= //= .= %= ^= x=
. : ? && || and or err xor
- );
+ };
# We can remove semicolons after blocks preceded by these keywords
- @q =
- qw(BEGIN END CHECK INIT AUTOLOAD DESTROY UNITCHECK continue if elsif else
- unless while until for foreach given when default);
+ @q = qw(
+ BEGIN END CHECK INIT AUTOLOAD DESTROY
+ UNITCHECK continue if elsif else unless
+ while until for foreach given when
+ default
+ );
@is_block_without_semicolon{@q} = (1) x scalar(@q);
# We will allow semicolons to be added within these block types
# 3. But not okay for other perltidy types including:
# { } ; G t
# 4. Test files: blktype.t, blktype1.t, semicolon.t
- @q =
- qw( BEGIN END CHECK INIT AUTOLOAD DESTROY UNITCHECK continue if elsif else
- unless do while until eval for foreach );
+ @q = qw( BEGIN END CHECK INIT AUTOLOAD DESTROY UNITCHECK continue if elsif
+ else unless do while until eval for foreach );
@ok_to_add_semicolon_for_block_type{@q} = (1) x scalar(@q);
# 'L' is token for opening { at hash key
@is_soft_keep_break_type{@q} = (1) x scalar(@q);
# these functions allow an identifier in the indirect object slot
- @q = qw( print printf sort exec system say);
+ @q = qw( print printf sort exec system say );
@is_indirect_object_taker{@q} = (1) x scalar(@q);
# Define here tokens which may follow the closing brace of a do statement
# on the same line, as in:
# } while ( $something);
- my @dof = qw(until while unless if ; : );
+ my @dof = qw( until while unless if ; : );
push @dof, ',';
@is_do_follower{@dof} = (1) x scalar(@dof);
push @obf, ',';
@is_other_brace_follower{@obf} = (1) x scalar(@obf);
+ # 'k'=builtin keyword, 'U'=user defined sub, 'w'=unknown bareword
+ @q = qw( k w U );
+ @is_kwU{@q} = (1) x scalar(@q);
+
+ # regular expression match operators
+ @q = qw( =~ !~);
+ @is_re_match_op{@q} = (1) x scalar(@q);
+
+ @q = qw ( my state our );
+ @is_my_state_our{@q} = (1) x scalar(@q);
+
+ # These keywords have prototypes which allow a special leading item
+ # followed by a list
+ @q =
+ qw( chmod formline grep join kill map pack printf push sprintf unshift );
+ @is_keyword_with_special_leading_term{@q} = (1) x scalar(@q);
+
} ## end BEGIN
{ ## begin closure to count instances
# methods to count instances
my $_count = 0;
- sub get_count { return $_count; }
sub _increment_count { return ++$_count }
sub _decrement_count { return --$_count }
} ## end closure to count instances
sub new {
- my ( $class, @args ) = @_;
+ my ( $class, @arglist ) = @_;
+ if ( @arglist % 2 ) { croak "Odd number of items in arg hash list\n" }
# we are given an object with a write_line() method to take lines
my %defaults = (
sink_object => undef,
diagnostics_object => undef,
logger_object => undef,
- length_function => sub { return length( $_[0] ) },
+ length_function => undef,
is_encoded_data => EMPTY_STRING,
fh_tee => undef,
);
- my %args = ( %defaults, @args );
+ my %args = ( %defaults, @arglist );
my $length_function = $args{length_function};
my $is_encoded_data = $args{is_encoded_data};
file_writer_object => $file_writer_object,
logger_object => $logger_object,
diagnostics_object => $diagnostics_object,
- length_function => $length_function,
);
write_logfile_entry("\nStarting tokenization pass...\n");
# Initialize the $self array reference.
# To add an item, first add a constant index in the BEGIN block above.
my $self = [];
+ bless $self, $class;
# Basic data structures...
$self->[_rlines_] = []; # = ref to array of lines of the file
$self->[_K_closing_container_] = {};
$self->[_K_opening_ternary_] = {};
$self->[_K_closing_ternary_] = {};
- $self->[_K_first_seq_item_] = undef; # K of first token with a sequence #
+
+ # A list of index K of sequenced tokens to allow loops over them all
+ $self->[_rK_sequenced_token_list_] = [];
# 'rSS' is the 'Signed Sequence' list, a continuous list of all sequence
# numbers with + or - indicating opening or closing. This list represents
# the entire container tree and is invariant under reformatting. It can be
# used to quickly travel through the tree. Indexes in the rSS array begin
- # with '$I' by convention. The 'Iss' arrays give the indexes in this list
- # of opening and closing sequence numbers.
- $self->[_rSS_] = [];
- $self->[_Iss_opening_] = [];
- $self->[_Iss_closing_] = [];
+ # with '$I' by convention.
+ $self->[_rSS_] = [];
+ $self->[_rI_opening_] = [];
+ $self->[_rI_closing_] = [];
+ $self->[_rK_next_seqno_by_K_] = [];
# Arrays to help traverse the tree
$self->[_rdepth_of_opening_seqno_] = [];
$self->[_ris_asub_block_] = {};
$self->[_ris_sub_block_] = {};
+ # Variables for --warn-mismatched-args and
+ # --dump-mismatched-args
+ # --dump-mismatched-returns
+ # --warn-mismatched-returns
+ $self->[_rK_package_list_] = [];
+ $self->[_rK_AT_underscore_by_sub_seqno_] = {};
+ $self->[_rK_first_self_by_sub_seqno_] = {};
+ $self->[_rK_bless_by_sub_seqno_] = {};
+ $self->[_rK_return_by_sub_seqno_] = {};
+ $self->[_rK_wantarray_by_sub_seqno_] = {};
+ $self->[_rsub_call_paren_info_by_seqno_] = {};
+ $self->[_rDOLLAR_underscore_by_sub_seqno_] = {};
+ $self->[_rK_sub_by_seqno_] = {};
+ $self->[_ris_my_sub_by_seqno_] = {};
+ $self->[_this_batch_] = [];
+
# Mostly list characteristics and processing flags
$self->[_rtype_count_by_seqno_] = {};
$self->[_ris_function_call_paren_] = {};
$self->[_vertical_aligner_object_] = $vertical_aligner_object;
$self->[_logger_object_] = $logger_object;
- # Reference to the batch being processed
- $self->[_this_batch_] = [];
-
# Memory of processed text...
$self->[_ris_special_identifier_token_] = {};
$self->[_last_line_leading_level_] = 0;
$self->[_tabbing_disagreement_count_] = 0;
$self->[_in_tabbing_disagreement_] = 0;
$self->[_saw_VERSION_in_this_file_] = !$rOpts->{'pass-version-line'};
+ $self->[_saw_use_strict_] = 0;
$self->[_saw_END_or_DATA_] = 0;
$self->[_first_brace_tabbing_disagreement_] = undef;
$self->[_in_brace_tabbing_disagreement_] = undef;
$self->[_rbreak_before_Kfirst_] = {};
$self->[_rbreak_after_Klast_] = {};
$self->[_converged_] = 0;
+ $self->[_want_second_iteration_] = 0;
# qw stuff
$self->[_rstarting_multiline_qw_seqno_by_K_] = {};
$self->[_rmax_vertical_tightness_] = {};
$self->[_no_vertical_tightness_flags_] = 0;
+ $self->[_last_vt_type_] = 0;
+ $self->[_rwant_arrow_before_seqno_] = {};
- # This flag will be updated later by a call to get_save_logfile()
- $self->[_save_logfile_] = defined($logger_object);
+ $self->[_rseqno_arrow_call_chain_start_] = {};
+ $self->[_rarrow_call_chain_] = {};
+
+ $self->[_save_logfile_] =
+ defined($logger_object) && $logger_object->get_save_logfile();
# Be sure all variables in $self have been initialized above. To find the
# correspondence of index numbers and array names, copy a list to a file
if (DEVEL_MODE) {
my @non_existant;
foreach ( 0 .. _LAST_SELF_INDEX_ ) {
- if ( !exists( $self->[$_] ) ) {
+ if ( !exists $self->[$_] ) {
push @non_existant, $_;
}
}
}
}
- bless $self, $class;
-
# Safety check..this is not a class yet
if ( _increment_count() > 1 ) {
confess
sub check_token_array {
my $self = shift;
+ #--------------
+ # Check @{$rLL}
+ #--------------
# Check for errors in the array of tokens. This is only called
# when the DEVEL_MODE flag is set, so this Fault will only occur
# during code development.
}
}
}
+
+ #---------------------------------
+ # Check $rK_next_seqno_by_K->[$KK]
+ #---------------------------------
+ my $Klimit = @{$rLL} - 1;
+ my $K_last_seqno;
+ my $rK_next_seqno_by_K = $self->[_rK_next_seqno_by_K_];
+ foreach my $KK ( 0 .. $Klimit ) {
+ my $K_next_seqno = $rK_next_seqno_by_K->[$KK];
+ if ( !defined($K_next_seqno) ) { $K_last_seqno = $KK; last }
+ if ( $K_next_seqno <= $KK || $K_next_seqno > $Klimit ) {
+ Fault(<<EOM);
+Error detected in array rK_next_seqno_by_K with limit K=$Klimit:
+at K=$KK the next seqno is $K_next_seqno
+K_next_seqno = $K_next_seqno is Out of bounds
+EOM
+ }
+ if ( !$rLL->[$K_next_seqno]->[_TYPE_SEQUENCE_] ) {
+ Fault(<<EOM);
+Error detected in array rK_next_seqno_by_K with limit K=$Klimit:
+at K=$KK the next seqno is $K_next_seqno:
+K_next_seqno = $K_next_seqno does not have a sequence number
+EOM
+ }
+ }
+
+ # upon hitting an undef, the remaining values should also be undef
+ if ( defined($K_last_seqno) ) {
+ foreach my $KK ( $K_last_seqno + 1 .. $Klimit ) {
+ my $Ktest = $rK_next_seqno_by_K->[$KK];
+ next if ( !defined($Ktest) );
+ Fault(<<EOM);
+Error detected in array rK_next_seqno_by_K with limit K=$Klimit
+with first undef at $K_last_seqno
+at K=$KK the next seqno is defined and is $Ktest
+EOM
+ }
+ }
+
+ #-----------------------------
+ # Check hash $rparent_of_seqno
+ #-----------------------------
+ my $rparent_of_seqno = $self->[_rparent_of_seqno_];
+ foreach my $seqno ( keys %{$rparent_of_seqno} ) {
+
+ # parent sequence numbers must always be less
+ my $seqno_parent = $rparent_of_seqno->{$seqno};
+ if ( $seqno_parent >= $seqno ) {
+ Fault(<<EOM);
+Error detected in hash rparent_of_seqno:
+The parent of seqno=$seqno is $seqno_parent but it should be less
+EOM
+ }
+ }
return;
} ## end sub check_token_array
{ ## begin closure check_line_hashes
- # This code checks that no autovivification occurs in the 'line' hash
+ # This code checks that no auto-vivification occurs in the 'line' hash
my %valid_line_hash;
_line_text
_line_type
_paren_depth
- _quote_character
_rK_range
_square_bracket_depth
_starting_in_quote
sub check_line_hashes {
my $self = shift;
my $rlines = $self->[_rlines_];
+
+ # Note that the keys ending in _0 are only required when a logfile
+ # is being saved, so we will just check for unknown keys, but not
+ # require an exact match.
foreach my $rline ( @{$rlines} ) {
my $iline = $rline->{_line_number};
my $line_type = $rline->{_line_type};
check_keys( $rline, \%valid_line_hash,
- "Checkpoint: line number =$iline, line_type=$line_type", 1 );
+ "Checkpoint: line number =$iline, line_type=$line_type", 0 );
}
return;
} ## end sub check_line_hashes
return;
}
- sub get_logger_object {
- return $logger_object;
- }
-
sub get_input_stream_name {
my $input_stream_name = EMPTY_STRING;
if ($logger_object) {
# interface to Perl::Tidy::Logger routines
sub warning {
- my ($msg) = @_;
- if ($logger_object) { $logger_object->warning($msg); }
+ my ( $msg, ($msg_line_number) ) = @_;
+
+ # Issue a warning message
+ # Given:
+ # $msg = text of warning
+ # $msg_line_number = optional line number prefix
+ if ($logger_object) {
+ $logger_object->warning( $msg, $msg_line_number );
+ }
return;
- }
+ } ## end sub warning
sub complain {
- my ($msg) = @_;
+ my ( $msg, ($msg_line_number) ) = @_;
+
+ # Issue a complaint message
+ # Given:
+ # $msg = text of complaint
+ # $msg_line_number = optional line number prefix
if ($logger_object) {
- $logger_object->complain($msg);
+ $logger_object->complain( $msg, $msg_line_number );
}
return;
} ## end sub complain
return;
}
+ # Available for debugging but not currently used:
sub write_diagnostics {
- my ($msg) = @_;
+ my ( $msg, $line_number ) = @_;
if ($diagnostics_object) {
- $diagnostics_object->write_diagnostics($msg);
+ $diagnostics_object->write_diagnostics( $msg, $line_number );
}
return;
} ## end sub write_diagnostics
return $self->[_converged_];
}
+sub want_second_iteration {
+ my ($self) = @_;
+ return $self->[_want_second_iteration_];
+}
+
sub get_output_line_number {
my ($self) = @_;
my $vao = $self->[_vertical_aligner_object_];
return;
} ## end sub write_unindented_line
+sub dump_verbatim {
+ my $self = shift;
+
+ # Dump the input file to the output verbatim. This is called when
+ # there is a severe error and formatted output cannot be made.
+ my $rlines = $self->[_rlines_];
+ foreach my $line ( @{$rlines} ) {
+ my $input_line = $line->{_line_text};
+ $self->write_unindented_line($input_line);
+ }
+ return;
+} ## end sub dump_verbatim
+
sub consecutive_nonblank_lines {
my ($self) = @_;
my $file_writer_object = $self->[_file_writer_object_];
sub split_words {
- # given a string containing words separated by whitespace,
- # return the list of words
+ # Given: a string containing words separated by whitespace,
+ # Return: the corresponding list of words
my ($str) = @_;
- return unless $str;
+ return unless defined($str);
$str =~ s/\s+$//;
$str =~ s/^\s+//;
- return split( /\s+/, $str );
+ return unless length($str);
+ return split /\s+/, $str;
} ## end sub split_words
-###########################################
-# CODE SECTION 3: Check and process options
-###########################################
-
-sub check_options {
+sub K_next_code {
+ my ( $self, $KK, ($rLL) ) = @_;
- # This routine is called to check the user-supplied run parameters
- # and to configure the control hashes to them.
- $rOpts = shift;
+ # Given:
+ # $KK = index of a token in $rLL
+ # $rLL = optional token array to use (default is $self->[_rLL_])
+ # Return:
+ # The index of the next nonblank, non-comment token after $KK, or
+ # undef if none
- $controlled_comma_style = 0;
+ return if ( !defined($KK) );
+ return if ( $KK < 0 );
- initialize_whitespace_hashes();
- initialize_bond_strength_hashes();
+ # The optional third arg is useful when we are copying tokens from an old
+ # $rLL to a new $rLL array.
+ $rLL = $self->[_rLL_] if ( !defined($rLL) );
- # This function must be called early to get hashes with grep initialized
- initialize_grep_and_friends();
+ my $Num = @{$rLL};
+ while ( ++$KK < $Num ) {
+ my $type = $rLL->[$KK]->[_TYPE_];
+ if ( $type ne 'b' && $type ne '#' ) {
+ return $KK;
+ }
+ } ## end while ( ++$KK < $Num )
- # Make needed regex patterns for matching text.
- # NOTE: sub_matching_patterns must be made first because later patterns use
- # them; see RT #133130.
- make_sub_matching_pattern(); # must be first pattern made
- make_static_block_comment_pattern();
- make_static_side_comment_pattern();
- make_closing_side_comment_prefix();
- make_closing_side_comment_list_pattern();
- $format_skipping_pattern_begin =
- make_format_skipping_pattern( 'format-skipping-begin', '#<<<' );
- $format_skipping_pattern_end =
- make_format_skipping_pattern( 'format-skipping-end', '#>>>' );
- make_non_indenting_brace_pattern();
+ return;
+} ## end sub K_next_code
- # If closing side comments ARE selected, then we can safely
- # delete old closing side comments unless closing side comment
- # warnings are requested. This is a good idea because it will
- # eliminate any old csc's which fall below the line count threshold.
- # We cannot do this if warnings are turned on, though, because we
- # might delete some text which has been added. So that must
- # be handled when comments are created. And we cannot do this
- # with -io because -csc will be skipped altogether.
- if ( $rOpts->{'closing-side-comments'} ) {
- if ( !$rOpts->{'closing-side-comment-warnings'}
- && !$rOpts->{'indent-only'} )
- {
- $rOpts->{'delete-closing-side-comments'} = 1;
- }
- }
+sub K_next_nonblank {
+ my ( $self, $KK, ($rLL) ) = @_;
- # If closing side comments ARE NOT selected, but warnings ARE
- # selected and we ARE DELETING csc's, then we will pretend to be
- # adding with a huge interval. This will force the comments to be
- # generated for comparison with the old comments, but not added.
- elsif ( $rOpts->{'closing-side-comment-warnings'} ) {
- if ( $rOpts->{'delete-closing-side-comments'} ) {
- $rOpts->{'delete-closing-side-comments'} = 0;
- $rOpts->{'closing-side-comments'} = 1;
- $rOpts->{'closing-side-comment-interval'} = 100_000_000;
- }
- }
+ # Given:
+ # $KK = index of a token in $rLL
+ # $rLL = optional token array to use (default is $self->[_rLL_])
+ # Return:
+ # The index of the next nonblank token after $KK, or
+ # undef if none
- make_bli_pattern();
+ # NOTE: does not skip over the leading type 'q' of a hanging side comment
+ # (use K_next_code)
+ return if ( !defined($KK) );
+ return if ( $KK < 0 );
- make_bl_pattern();
+ # use the standard array unless given otherwise
+ $rLL = $self->[_rLL_] if ( !defined($rLL) );
- make_block_brace_vertical_tightness_pattern();
+ # Normally, consecutive blanks do not occur. We could test for that
+ # here, but there are checks in the 'store_token' subs.
+ my $Num = @{$rLL};
+ while ( ++$KK < $Num ) {
+ if ( $rLL->[$KK]->[_TYPE_] ne 'b' ) { return $KK }
+ }
- make_blank_line_pattern();
+ return;
+} ## end sub K_next_nonblank
- make_keyword_group_list_pattern();
+sub K_previous_code {
- prepare_cuddled_block_types();
+ my ( $self, $KK, ($rLL) ) = @_;
- if ( $rOpts->{'dump-cuddled-block-list'} ) {
- dump_cuddled_block_list(*STDOUT);
- Exit(0);
- }
+ # Given:
+ # $KK = index of a token in $rLL
+ # $rLL = optional token array to use (default is $self->[_rLL_])
+ # Return:
+ # The index of the previous nonblank, non-comment token after $KK, or
+ # undef if none
+ # Call with $KK=undef to start search at the top of the array
- # -xlp implies -lp
- if ( $rOpts->{'extended-line-up-parentheses'} ) {
- $rOpts->{'line-up-parentheses'} ||= 1;
- }
+ # The optional third arg is useful when we are copying tokens from an old
+ # $rLL to a new $rLL array.
+ $rLL = $self->[_rLL_] unless ( defined($rLL) );
- if ( $rOpts->{'line-up-parentheses'} ) {
+ my $Num = @{$rLL};
+ if ( !defined($KK) ) { $KK = $Num }
- if ( $rOpts->{'indent-only'}
- || !$rOpts->{'add-newlines'}
- || !$rOpts->{'delete-old-newlines'} )
- {
- Warn(<<EOM);
------------------------------------------------------------------------
-Conflict: -lp conflicts with -io, -fnl, -nanl, or -ndnl; ignoring -lp
-
-The -lp indentation logic requires that perltidy be able to coordinate
-arbitrarily large numbers of line breakpoints. This isn't possible
-with these flags.
------------------------------------------------------------------------
-EOM
- $rOpts->{'line-up-parentheses'} = 0;
- $rOpts->{'extended-line-up-parentheses'} = 0;
- }
+ if ( $KK > $Num ) {
- if ( $rOpts->{'whitespace-cycle'} ) {
- Warn(<<EOM);
-Conflict: -wc cannot currently be used with the -lp option; ignoring -wc
+ # This fault can be caused by a programming error in which a bad $KK is
+ # given. The caller should make the first call with KK_new=undef to
+ # avoid this error.
+ if (DEVEL_MODE) {
+ Fault(<<EOM);
+Program Bug: K_previous_nonblank_new called with K=$KK which exceeds $Num
EOM
- $rOpts->{'whitespace-cycle'} = 0;
}
+ return;
}
- # At present, tabs are not compatible with the line-up-parentheses style
- # (it would be possible to entab the total leading whitespace
- # just prior to writing the line, if desired).
- if ( $rOpts->{'line-up-parentheses'} && $rOpts->{'tabs'} ) {
- Warn(<<EOM);
-Conflict: -t (tabs) cannot be used with the -lp option; ignoring -t; see -et.
-EOM
- $rOpts->{'tabs'} = 0;
+ while ( --$KK >= 0 ) {
+ my $type = $rLL->[$KK]->[_TYPE_];
+ if ( $type ne 'b' && $type ne '#' ) { return $KK }
}
+ return;
+} ## end sub K_previous_code
- # Likewise, tabs are not compatible with outdenting..
- if ( $rOpts->{'outdent-keywords'} && $rOpts->{'tabs'} ) {
- Warn(<<EOM);
-Conflict: -t (tabs) cannot be used with the -okw options; ignoring -t; see -et.
-EOM
- $rOpts->{'tabs'} = 0;
- }
+sub K_previous_nonblank {
- if ( $rOpts->{'outdent-labels'} && $rOpts->{'tabs'} ) {
- Warn(<<EOM);
-Conflict: -t (tabs) cannot be used with the -ola option; ignoring -t; see -et.
+ my ( $self, $KK, ($rLL) ) = @_;
+
+ # Given:
+ # $KK = index of a token in $rLL
+ # $rLL = optional token array to use (default is $self->[_rLL_])
+ # Return:
+ # The index of the previous nonblank token after $KK, or
+ # undef if none
+ # Call with $KK=undef to start search at the top of the array
+
+ # NOTE: does not skip over the leading type 'q' of a hanging side comment
+ # (use K_previous_code)
+
+ # use the standard array unless given otherwise
+ $rLL = $self->[_rLL_] unless ( defined($rLL) );
+ my $Num = @{$rLL};
+ if ( !defined($KK) ) { $KK = $Num }
+
+ if ( $KK > $Num ) {
+
+ # This fault can be caused by a programming error in which a bad $KK is
+ # given. The caller should make the first call with KK_new=undef to
+ # avoid this error.
+ if (DEVEL_MODE) {
+ Fault(<<EOM);
+Program Bug: K_previous_nonblank_new called with K=$KK which exceeds $Num
EOM
- $rOpts->{'tabs'} = 0;
+ }
+ return;
}
- if ( !$rOpts->{'space-for-semicolon'} ) {
- $want_left_space{'f'} = -1;
+ # Normally, consecutive blanks do not occur. We could test for that
+ # here, but there are checks in the 'store_token' subs.
+ while ( --$KK >= 0 ) {
+ if ( $rLL->[$KK]->[_TYPE_] ne 'b' ) { return $KK }
}
- if ( $rOpts->{'space-terminal-semicolon'} ) {
- $want_left_space{';'} = 1;
- }
+ return;
+} ## end sub K_previous_nonblank
- # We should put an upper bound on any -sil=n value. Otherwise enormous
- # files could be created by mistake.
- for ( $rOpts->{'starting-indentation-level'} ) {
- if ( $_ && $_ > 100 ) {
- Warn(<<EOM);
-The value --starting-indentation-level=$_ is very large; a mistake? resetting to 0;
-EOM
- $_ = 0;
- }
- }
+sub K_first_code {
- # Require -msp > 0 to avoid future parsing problems (issue c147)
- for ( $rOpts->{'minimum-space-to-comment'} ) {
- if ( !$_ || $_ <= 0 ) { $_ = 1 }
- }
+ my ( $self, ($rLL) ) = @_;
- # implement outdenting preferences for keywords
- %outdent_keyword = ();
- my @okw = split_words( $rOpts->{'outdent-keyword-list'} );
- unless (@okw) {
- @okw = qw(next last redo goto return); # defaults
- }
+ # Given:
+ # $rLL = optional token array to override default
+ # Return:
+ # index $K of first non-blank, non-comment code token, or
+ # undef if none (no tokens in the file)
- # FUTURE: if not a keyword, assume that it is an identifier
- foreach (@okw) {
- if ( $Perl::Tidy::Tokenizer::is_keyword{$_} ) {
- $outdent_keyword{$_} = 1;
- }
- else {
- Warn("ignoring '$_' in -okwl list; not a perl keyword");
- }
- }
+ $rLL = $self->[_rLL_] unless ( defined($rLL) );
- # setup hash for -kpit option
- %keyword_paren_inner_tightness = ();
- my $kpit_value = $rOpts->{'keyword-paren-inner-tightness'};
- if ( defined($kpit_value) && $kpit_value != 1 ) {
- my @kpit =
- split_words( $rOpts->{'keyword-paren-inner-tightness-list'} );
- unless (@kpit) {
- @kpit = qw(if elsif unless while until for foreach); # defaults
- }
+ return unless @{$rLL};
+ my $KK = 0;
+ my $type = $rLL->[$KK]->[_TYPE_];
+ if ( $type ne 'b' && $type ne '#' ) { return $KK }
+ return $self->K_next_code($KK);
+} ## end sub K_first_code
- # we will allow keywords and user-defined identifiers
- foreach (@kpit) {
- $keyword_paren_inner_tightness{$_} = $kpit_value;
- }
- }
+sub K_last_code {
- # implement user whitespace preferences
- if ( my @q = split_words( $rOpts->{'want-left-space'} ) ) {
- @want_left_space{@q} = (1) x scalar(@q);
- }
+ my ( $self, ($rLL) ) = @_;
- if ( my @q = split_words( $rOpts->{'want-right-space'} ) ) {
- @want_right_space{@q} = (1) x scalar(@q);
- }
+ # Given:
+ # $rLL = optional token array to override default
+ # Return:
+ # index of last non-blank, non-comment code token, or
+ # undef if none (no tokens in the file)
- if ( my @q = split_words( $rOpts->{'nowant-left-space'} ) ) {
- @want_left_space{@q} = (-1) x scalar(@q);
- }
+ $rLL = $self->[_rLL_] unless ( defined($rLL) );
- if ( my @q = split_words( $rOpts->{'nowant-right-space'} ) ) {
- @want_right_space{@q} = (-1) x scalar(@q);
- }
- if ( $rOpts->{'dump-want-left-space'} ) {
- dump_want_left_space(*STDOUT);
- Exit(0);
- }
+ return unless @{$rLL};
+ my $KK = @{$rLL} - 1;
+ my $type = $rLL->[$KK]->[_TYPE_];
+ if ( $type ne 'b' && $type ne '#' ) { return $KK }
+ return $self->K_previous_code($KK);
+} ## end sub K_last_code
- if ( $rOpts->{'dump-want-right-space'} ) {
- dump_want_right_space(*STDOUT);
- Exit(0);
+sub get_parent_containers {
+ my ( $self, $seqno ) = @_;
+
+ # Given:
+ # $seqno = sequence number of a container
+ # Return:
+ # ref to a list of parent container sequence numbers
+ my @list;
+ if ($seqno) {
+ my $rparent_of_seqno = $self->[_rparent_of_seqno_];
+ my $seqno_last = $seqno;
+ while ( $seqno = $rparent_of_seqno->{$seqno} ) {
+ last if ( $seqno == SEQ_ROOT );
+ if ( $seqno >= $seqno_last ) {
+ ## shouldn't happen - parent containers have lower seq numbers
+ DEVEL_MODE && Fault(<<EOM);
+Error in 'rparent_of_seqno': expecting seqno=$seqno < last seqno=$seqno_last
+EOM
+ last;
+ }
+ $seqno_last = $seqno;
+ push @list, $seqno;
+ } ## end while ( $seqno = $rparent_of_seqno...)
}
+ return \@list;
+} ## end sub get_parent_containers
- initialize_space_after_keyword();
+sub mark_parent_containers {
+ my ( $self, $seqno, $rhash, ($value) ) = @_;
- initialize_token_break_preferences();
+ # Task:
+ # set $rhash->{$seqno}=$value for all parent containers
+ # but not for $seqno itself
- #--------------------------------------------------------------
- # The combination -lp -iob -vmll -bbx=2 can be unstable (b1266)
- #--------------------------------------------------------------
- # The -vmll and -lp parameters do not really work well together.
- # To avoid instabilities, we will change any -bbx=2 to -bbx=1 (stable).
- # NOTE: we could make this more precise by looking at any exclusion
- # flags for -lp, and allowing -bbx=2 for excluded types.
- if ( $rOpts->{'variable-maximum-line-length'}
- && $rOpts->{'ignore-old-breakpoints'}
- && $rOpts->{'line-up-parentheses'} )
- {
- my @changed;
- foreach my $key ( keys %break_before_container_types ) {
- if ( $break_before_container_types{$key} == 2 ) {
- $break_before_container_types{$key} = 1;
- push @changed, $key;
- }
+ # Given:
+ # $seqno = sequence number of a container
+ # $rhash = ref to a hash with seqno as key
+ # $value = value for setting $rhash->{$seqno}=$value
+ # default = 1
+
+ return unless ($seqno);
+ if ( !defined($value) ) { $value = 1 }
+ my $rparent_of_seqno = $self->[_rparent_of_seqno_];
+ my $seqno_last = $seqno;
+ while ( $seqno = $rparent_of_seqno->{$seqno} ) {
+ last if ( $seqno == SEQ_ROOT );
+ if ( $seqno >= $seqno_last ) {
+ ## shouldn't happen - parent containers have lower sequence numbers
+ DEVEL_MODE && Fault(<<EOM);
+Error in 'rparent_of_seqno': expecting seqno=$seqno < last seqno=$seqno_last
+EOM
+ last;
}
- if (@changed) {
+ $seqno_last = $seqno;
+ $rhash->{$seqno} = $value;
+ } ## end while ( $seqno = $rparent_of_seqno...)
+ return;
+} ## end sub mark_parent_containers
- # we could write a warning here
- }
- }
+sub copy_token_as_type {
- #-----------------------------------------------------------
- # The combination -lp -vmll can be unstable if -ci<2 (b1267)
- #-----------------------------------------------------------
- # The -vmll and -lp parameters do not really work well together.
- # This is a very crude fix for an unusual parameter combination.
- if ( $rOpts->{'variable-maximum-line-length'}
- && $rOpts->{'line-up-parentheses'}
- && $rOpts->{'continuation-indentation'} < 2 )
- {
- $rOpts->{'continuation-indentation'} = 2;
- ##Warn("Increased -ci=n to n=2 for stability with -lp and -vmll\n");
- }
+ # This provides a quick way to create a new token by
+ # slightly modifying an existing token.
+ my ( $rold_token, $type, $token ) = @_;
- #-----------------------------------------------------------
- # The combination -lp -vmll -atc -dtc can be unstable
- #-----------------------------------------------------------
- # This fixes b1386 b1387 b1388 which had -wtc='b'
- # Updated to to include any -wtc to fix b1426
- if ( $rOpts->{'variable-maximum-line-length'}
- && $rOpts->{'line-up-parentheses'}
- && $rOpts->{'add-trailing-commas'}
- && $rOpts->{'delete-trailing-commas'}
- && $rOpts->{'want-trailing-commas'} )
- {
- $rOpts->{'delete-trailing-commas'} = 0;
-## Issuing a warning message causes trouble with test cases, and this combo is
-## so rare that it is unlikely to not occur in practice. So skip warning.
-## Warn(
-##"The combination -vmll -lp -atc -dtc can be unstable; turning off -dtc\n"
-## );
- }
+ my @rnew_token = @{$rold_token};
+ $rnew_token[_TYPE_] = $type;
+ $rnew_token[_TOKEN_] = $token;
+ $rnew_token[_TYPE_SEQUENCE_] = EMPTY_STRING;
+ return \@rnew_token;
+} ## end sub copy_token_as_type
- %container_indentation_options = ();
- foreach my $pair (
- [ 'break-before-hash-brace-and-indent', '{' ],
- [ 'break-before-square-bracket-and-indent', '[' ],
- [ 'break-before-paren-and-indent', '(' ],
- )
- {
- my ( $key, $tok ) = @{$pair};
- my $opt = $rOpts->{$key};
- if ( defined($opt) && $opt > 0 && $break_before_container_types{$tok} )
- {
+sub parent_seqno_by_K {
- # (1) -lp is not compatible with opt=2, silently set to opt=0
- # (2) opt=0 and 2 give same result if -i=-ci; but opt=0 is faster
- # (3) set opt=0 if -i < -ci (can be unstable, case b1355)
- if ( $opt == 2 ) {
- if (
- $rOpts->{'line-up-parentheses'}
- || ( $rOpts->{'indent-columns'} <=
- $rOpts->{'continuation-indentation'} )
- )
- {
- $opt = 0;
- }
- }
- $container_indentation_options{$tok} = $opt;
- }
- }
+ # Return the sequence number of the parent container of token K, if any.
- $right_bond_strength{'{'} = WEAK;
- $left_bond_strength{'{'} = VERY_STRONG;
+ my ( $self, $KK ) = @_;
+ my $rLL = $self->[_rLL_];
- # make -l=0 equal to -l=infinite
- if ( !$rOpts->{'maximum-line-length'} ) {
- $rOpts->{'maximum-line-length'} = 1_000_000;
+ # The task is to jump forward to the next container token
+ # and use the sequence number of either it or its parent.
+
+ # For example, consider the following with seqno=5 of the '[' and ']'
+ # being called with index K of the first token of each line:
+
+ # # result
+ # push @tests, # -
+ # [ # -
+ # sub { 99 }, 'do {&{%s} for 1,2}', # 5
+ # '(&{})(&{})', undef, # 5
+ # [ 2, 2, 0 ], 0 # 5
+ # ]; # -
+
+ # NOTE: The ending parent will be SEQ_ROOT for a balanced file. For
+ # unbalanced files, last sequence number will either be undefined or it may
+ # be at a deeper level. In either case we will just return SEQ_ROOT to
+ # have a defined value and allow formatting to proceed.
+ my $parent_seqno = SEQ_ROOT;
+ return $parent_seqno if ( !defined($KK) );
+ my $type_sequence = $rLL->[$KK]->[_TYPE_SEQUENCE_];
+ if ($type_sequence) {
+ $parent_seqno = $self->[_rparent_of_seqno_]->{$type_sequence};
}
+ else {
+ my $Kt = $self->[_rK_next_seqno_by_K_]->[$KK];
+ if ( defined($Kt) ) {
+ $type_sequence = $rLL->[$Kt]->[_TYPE_SEQUENCE_];
+ my $type = $rLL->[$Kt]->[_TYPE_];
- # make -lbl=0 equal to -lbl=infinite
- if ( !$rOpts->{'long-block-line-count'} ) {
- $rOpts->{'long-block-line-count'} = 1_000_000;
+ # if next container token is closing, it is the parent seqno
+ if ( $is_closing_type{$type} ) {
+ $parent_seqno = $type_sequence;
+ }
+
+ # otherwise we want its parent container
+ else {
+ $parent_seqno = $self->[_rparent_of_seqno_]->{$type_sequence};
+ }
+ }
}
+ $parent_seqno = SEQ_ROOT if ( !defined($parent_seqno) );
+ return $parent_seqno;
+} ## end sub parent_seqno_by_K
- # hashes used to simplify setting whitespace
- %tightness = (
- '{' => $rOpts->{'brace-tightness'},
- '}' => $rOpts->{'brace-tightness'},
- '(' => $rOpts->{'paren-tightness'},
- ')' => $rOpts->{'paren-tightness'},
- '[' => $rOpts->{'square-bracket-tightness'},
- ']' => $rOpts->{'square-bracket-tightness'},
- );
+sub parent_sub_seqno {
+ my ( $self, $seqno_paren ) = @_;
- if ( $rOpts->{'ignore-old-breakpoints'} ) {
+ # Find sequence number of the named sub (not asub) which contains a given
+ # sequenced item
- my @conflicts;
- if ( $rOpts->{'break-at-old-method-breakpoints'} ) {
- $rOpts->{'break-at-old-method-breakpoints'} = 0;
- push @conflicts, '--break-at-old-method-breakpoints (-bom)';
- }
- if ( $rOpts->{'break-at-old-comma-breakpoints'} ) {
- $rOpts->{'break-at-old-comma-breakpoints'} = 0;
- push @conflicts, '--break-at-old-comma-breakpoints (-boc)';
- }
- if ( $rOpts->{'break-at-old-semicolon-breakpoints'} ) {
- $rOpts->{'break-at-old-semicolon-breakpoints'} = 0;
- push @conflicts, '--break-at-old-semicolon-breakpoints (-bos)';
- }
- if ( $rOpts->{'keep-old-breakpoints-before'} ) {
- $rOpts->{'keep-old-breakpoints-before'} = EMPTY_STRING;
- push @conflicts, '--keep-old-breakpoints-before (-kbb)';
- }
- if ( $rOpts->{'keep-old-breakpoints-after'} ) {
- $rOpts->{'keep-old-breakpoints-after'} = EMPTY_STRING;
- push @conflicts, '--keep-old-breakpoints-after (-kba)';
+ # Given:
+ # $seqno_paren = sequence number of a token within the sub
+ # Returns:
+ # $seqno of the sub, or
+ # nothing if no sub found
+ return unless defined($seqno_paren);
+
+ # Search upward
+ my $seqno = $seqno_paren;
+ my $seqno_last = $seqno_paren;
+ while ( $seqno = $self->[_rparent_of_seqno_]->{$seqno} ) {
+ last if ( $seqno == SEQ_ROOT );
+ if ( $self->[_ris_sub_block_]->{$seqno} ) {
+ return $seqno;
+ }
+ if ( $seqno >= $seqno_last ) {
+ ## shouldn't happen - parent containers have lower sequence numbers
+ DEVEL_MODE && Fault(<<EOM);
+Error in 'rparent_of_seqno': expecting seqno=$seqno < last seqno=$seqno_last
+EOM
+ last;
}
+ $seqno_last = $seqno;
+ } ## end while ( $seqno = $self->[...])
+ return;
+} ## end sub parent_sub_seqno
- if (@conflicts) {
- my $msg = join( "\n ",
-" Conflict: These conflicts with --ignore-old-breakponts (-iob) will be turned off:",
- @conflicts )
- . "\n";
- Warn($msg);
- }
+sub parent_sub_seqno_by_K {
+ my ( $self, $KK ) = @_;
- # Note: These additional parameters are made inactive by -iob.
- # They are silently turned off here because they are on by default.
- # We would generate unexpected warnings if we issued a warning.
- $rOpts->{'break-at-old-keyword-breakpoints'} = 0;
- $rOpts->{'break-at-old-logical-breakpoints'} = 0;
- $rOpts->{'break-at-old-ternary-breakpoints'} = 0;
- $rOpts->{'break-at-old-attribute-breakpoints'} = 0;
- }
+ #--------------------------------------------------------------------
+ # NOTE: not currently called but keep for possible future development
+ #--------------------------------------------------------------------
- %keep_break_before_type = ();
- initialize_keep_old_breakpoints( $rOpts->{'keep-old-breakpoints-before'},
- 'kbb', \%keep_break_before_type );
+ # Find sequence number of the named sub which contains a given token
+ # Given:
+ # $K = index K of a token
+ # Returns:
+ # $seqno of the sub, or
+ # nothing if no sub found
- %keep_break_after_type = ();
- initialize_keep_old_breakpoints( $rOpts->{'keep-old-breakpoints-after'},
- 'kba', \%keep_break_after_type );
+ return unless defined($KK);
- # Modify %keep_break_before and %keep_break_after to avoid conflicts
- # with %want_break_before; fixes b1436.
- # This became necessary after breaks for some tokens were converted
- # from hard to soft (see b1433).
- # We could do this for all tokens, but to minimize changes to existing
- # code we currently only do this for the soft break tokens.
- foreach my $key ( keys %keep_break_before_type ) {
- if ( defined( $want_break_before{$key} )
- && !$want_break_before{$key}
- && $is_soft_keep_break_type{$key} )
- {
- $keep_break_after_type{$key} = $keep_break_before_type{$key};
- delete $keep_break_before_type{$key};
+ my $seqno_sub;
+ my $parent_seqno = $self->parent_seqno_by_K($KK);
+ if ( $self->[_ris_sub_block_]->{$parent_seqno} ) {
+ $seqno_sub = $parent_seqno;
+ }
+ else {
+ $seqno_sub = $self->parent_sub_seqno($parent_seqno);
+ }
+ return $seqno_sub;
+} ## end sub parent_sub_seqno_by_K
+
+sub is_in_block_by_i {
+ my ( $self, $i ) = @_;
+
+ # Return true if
+ # token at i is contained in a BLOCK
+ # or is at root level
+ # or there is some kind of error (i.e. unbalanced file)
+ # Return false otherwise
+
+ if ( $i < 0 ) {
+ DEVEL_MODE && Fault("Bad call, i='$i'\n");
+ return 1;
+ }
+
+ my $seqno = $parent_seqno_to_go[$i];
+ return 1 if ( !$seqno || $seqno == SEQ_ROOT );
+ return 1 if ( $self->[_rblock_type_of_seqno_]->{$seqno} );
+ return;
+} ## end sub is_in_block_by_i
+
+sub is_in_block_by_K {
+ my ( $self, $KK ) = @_;
+
+ # Return true if
+ # token at $KK is contained in a BLOCK
+ # or is at root level
+ # or there is some kind of error (i.e. unbalanced file)
+ # Return false otherwise
+
+ my $parent_seqno = $self->parent_seqno_by_K($KK);
+ return SEQ_ROOT if ( !$parent_seqno || $parent_seqno == SEQ_ROOT );
+ return $self->[_rblock_type_of_seqno_]->{$parent_seqno};
+} ## end sub is_in_block_by_K
+
+sub is_in_list_by_i {
+ my ( $self, $i ) = @_;
+
+ # Return true if token at i is contained in a LIST
+ # Return false otherwise
+ my $seqno = $parent_seqno_to_go[$i];
+ return if ( !$seqno );
+ return if ( $seqno == SEQ_ROOT );
+ if ( $self->[_ris_list_by_seqno_]->{$seqno} ) {
+ return 1;
+ }
+ return;
+} ## end sub is_in_list_by_i
+
+sub is_list_by_seqno {
+
+ # Return true if the immediate contents of a container appears to be a
+ # list.
+ my ( $self, $seqno ) = @_;
+ return unless defined($seqno);
+ return $self->[_ris_list_by_seqno_]->{$seqno};
+} ## end sub is_list_by_seqno
+
+sub is_interpolated_here_doc {
+ my ($token) = @_;
+
+ # Given:
+ # $token = the token text of a type 'h' token
+ # Return:
+ # true if the here doc is interpolated
+ # false if not
+
+ # Examples:
+ # <<EOM <-- interpolated
+ # <<"EOM" <-- interpolated
+ # <<'EOM' <-- not interpolated
+ return $token !~ /^ [^<]* << [~]? \' /x;
+} ## end sub is_interpolated_here_doc
+
+sub get_here_text {
+ my ( $self, $ix_HERE_BEG ) = @_;
+
+ # Collect the text of a here-doc
+ # Given:
+ # $ix_HERE_BEG = index of the line BEFORE the start of this here-doc
+ # Returns:
+ # $ix_HERE_END = line index of the last line of this here-doc
+ # $here_text = the here-doc text
+
+ # Example of $here_text with 2 lines:
+
+ # my $str=<<EOM; <--this line has index $ix_HERE_BEG
+ # here text line 1
+ # here text line 2
+ # EOM <--this line has index $ix_HERE_END
+
+ # If here-docs are stacked, then caller will use $ix_HERE_END as
+ # the beginning of the next here-doc.
+
+ my $rlines = $self->[_rlines_];
+
+ # Loop to collect the here doc text
+ my $ix_max = @{$rlines} - 1;
+ my $ix = $ix_HERE_BEG;
+ my $ix_HERE_END;
+ my $here_text = EMPTY_STRING;
+ while ( ++$ix <= $ix_max ) {
+ my $lhash = $rlines->[$ix];
+ my $ltype = $lhash->{_line_type};
+ if ( $ltype eq 'HERE' ) {
+ $here_text .= $lhash->{_line_text};
+ next;
+ }
+ elsif ( $ltype eq 'HERE_END' ) {
+ $ix_HERE_END = $ix;
+ last;
+ }
+ else {
+ DEVEL_MODE
+ && Fault("line_type=$ltype should be HERE..\n");
+ $ix_HERE_END = $ix;
+ last;
+ }
+ } ## end while ( ++$ix <= $ix_max )
+ return ( $ix_HERE_END, $here_text );
+} ## end sub get_here_text
+
+sub is_trailing_comma {
+ my ( $self, $KK ) = @_;
+
+ # Given:
+ # $KK - index of a comma in token list
+ # Return:
+ # true if the comma at index $KK is a trailing comma
+ # false if not
+
+ my $rLL = $self->[_rLL_];
+ my $type_KK = $rLL->[$KK]->[_TYPE_];
+ if ( $type_KK ne ',' ) {
+ DEVEL_MODE
+ && Fault("Bad call: expected type ',' but received '$type_KK'\n");
+ return;
+ }
+ my $Knnb = $self->K_next_nonblank($KK);
+ if ( defined($Knnb) ) {
+ my $type_sequence = $rLL->[$Knnb]->[_TYPE_SEQUENCE_];
+ my $type_Knnb = $rLL->[$Knnb]->[_TYPE_];
+ if ( $type_sequence && $is_closing_type{$type_Knnb} ) {
+ return 1;
}
}
- foreach my $key ( keys %keep_break_after_type ) {
- if ( defined( $want_break_before{$key} )
- && $want_break_before{$key}
- && $is_soft_keep_break_type{$key} )
+ return;
+} ## end sub is_trailing_comma
+
+sub cumulative_length_before_K {
+ my ( $self, $KK ) = @_;
+
+ # Returns the cumulative character length from the first token to
+ # token before the token at index $KK.
+ my $rLL = $self->[_rLL_];
+ return ( $KK <= 0 ) ? 0 : $rLL->[ $KK - 1 ]->[_CUMULATIVE_LENGTH_];
+} ## end sub cumulative_length_before_K
+
+###########################################
+# CODE SECTION 3: Check and process options
+###########################################
+
+sub check_options {
+
+ # This routine is called to check the user-supplied run parameters
+ # and to configure the control hashes to them.
+ ( $rOpts, my $wvt_in_args, my $num_files, my $line_range_clipped ) = @_;
+
+ initialize_whitespace_hashes();
+
+ if ( $rOpts->{'dump-want-left-space'} ) {
+ dump_want_left_space(*STDOUT);
+ Exit(0);
+ }
+
+ if ( $rOpts->{'dump-want-right-space'} ) {
+ dump_want_right_space(*STDOUT);
+ Exit(0);
+ }
+
+ initialize_bond_strength_hashes();
+
+ # This function must be called early to get hashes with grep initialized
+ initialize_grep_and_friends();
+
+ # Make needed regex patterns for matching text.
+ # NOTE: sub_matching_patterns must be made first because later patterns use
+ # them; see RT #133130.
+ make_sub_matching_pattern(); # MUST BE FIRST pattern made
+ make_static_block_comment_pattern();
+ make_static_side_comment_pattern();
+ $format_skipping_pattern_begin =
+ make_format_skipping_pattern( 'format-skipping-begin', '#<<<' );
+ $format_skipping_pattern_end =
+ make_format_skipping_pattern( 'format-skipping-end', '#>>>' );
+ make_non_indenting_brace_pattern();
+
+ initialize_closing_side_comments();
+
+ initialize_missing_else_comment();
+
+ initialize_call_paren_style();
+
+ initialize_warn_variable_types( $wvt_in_args, $num_files,
+ $line_range_clipped );
+
+ initialize_warn_mismatched();
+
+ make_bli_pattern();
+
+ make_bl_pattern();
+
+ make_block_brace_vertical_tightness_pattern();
+
+ make_blank_line_pattern();
+
+ make_keyword_group_list_pattern();
+
+ prepare_cuddled_block_types();
+
+ if ( $rOpts->{'dump-cuddled-block-list'} ) {
+ dump_cuddled_block_list(*STDOUT);
+ Exit(0);
+ }
+
+ # --indent-only skips the call to sub respace_tokens, which defines
+ # some essential data structures needed by some dump routines,
+ # or might be in the future. Since there is an immediate exit after a
+ # dump, we can turn off indent-only to get these structures for a -dump.
+ if ( $rOpts->{'indent-only'} ) {
+
+ if ( $rOpts->{'dump-mismatched-args'}
+ || $rOpts->{'dump-mismatched-returns'} )
{
- $keep_break_before_type{$key} = $keep_break_after_type{$key};
- delete $keep_break_after_type{$key};
+ $rOpts->{'indent-only'} = 0;
+ }
+
+ if ( $rOpts->{'dump-block-summary'} ) {
+ $rOpts->{'indent-only'} = 0;
}
}
- $controlled_comma_style ||= $keep_break_before_type{','};
- $controlled_comma_style ||= $keep_break_after_type{','};
+ initialize_line_up_parentheses();
+
+ check_tabs();
+
+ # We should put an upper bound on any -sil=n value. Otherwise enormous
+ # files could be created by mistake.
+ for ( $rOpts->{'starting-indentation-level'} ) {
+ if ( $_ && $_ > 100 ) {
+ Warn(<<EOM);
+The value --starting-indentation-level=$_ is very large; a mistake? resetting to 0;
+EOM
+ $_ = 0;
+ }
+ }
+
+ # Require -msp > 0 to avoid future parsing problems (issue c147)
+ for ( $rOpts->{'minimum-space-to-comment'} ) {
+ if ( !$_ || $_ <= 0 ) { $_ = 1 }
+ }
+
+ initialize_outdent_keyword();
+
+ initialize_keyword_paren_inner_tightness();
+
+ initialize_space_after_keyword();
+
+ initialize_extended_block_tightness_list();
+
+ # The flag '$controlled_comma_style' will be set if the user
+ # entered any of -wbb=',' -wba=',' -kbb=',' -kba=','
+ # see sub 'initialize_token_break_preferences',
+ # and sub 'initialize_old_breakpoint_controls'
+ $controlled_comma_style = 0;
+ initialize_token_break_preferences();
+ initialize_old_breakpoint_controls();
+
+ initialize_container_indentation_options();
+
+ # make -l=0 equal to -l=infinite
+ if ( !$rOpts->{'maximum-line-length'} ) {
+ $rOpts->{'maximum-line-length'} = 1_000_000;
+ }
+
+ # make -lbl=0 equal to -lbl=infinite
+ if ( !$rOpts->{'long-block-line-count'} ) {
+ $rOpts->{'long-block-line-count'} = 1_000_000;
+ }
+
+ initialize_tightness_vars();
+
+ initialize_multiple_token_tightness();
initialize_global_option_vars();
initialize_line_length_vars(); # after 'initialize_global_option_vars'
+ initialize_trailing_comma_break_rules();
+
initialize_trailing_comma_rules(); # after 'initialize_line_length_vars'
+ # and '_trailing_comma_break_rules'
+
+ initialize_interbracket_arrow_style();
initialize_weld_nested_exclusion_rules();
initialize_weld_fat_comma_rules();
- %line_up_parentheses_control_hash = ();
- $line_up_parentheses_control_is_lxpl = 1;
- my $lpxl = $rOpts->{'line-up-parentheses-exclusion-list'};
- my $lpil = $rOpts->{'line-up-parentheses-inclusion-list'};
- if ( $lpxl && $lpil ) {
- Warn( <<EOM );
-You entered values for both -lpxl=s and -lpil=s; the -lpil list will be ignored
-EOM
- }
- if ($lpxl) {
- $line_up_parentheses_control_is_lxpl = 1;
- initialize_line_up_parentheses_control_hash(
- $rOpts->{'line-up-parentheses-exclusion-list'}, 'lpxl' );
- }
- elsif ($lpil) {
- $line_up_parentheses_control_is_lxpl = 0;
- initialize_line_up_parentheses_control_hash(
- $rOpts->{'line-up-parentheses-inclusion-list'}, 'lpil' );
- }
+ initialize_lpxl_lpil();
return;
} ## end sub check_options
# re-initialize the hashes ... this is critical!
%is_sort_map_grep = ();
- my @q = qw(sort map grep);
+ my @q = qw( sort map grep );
@is_sort_map_grep{@q} = (1) x scalar(@q);
my $olbxl = $rOpts->{'one-line-block-exclusion-list'};
# Note that it is essential to always re-initialize the hash here:
%want_one_line_block = ();
if ( !$is_olb_exclusion_word{'*'} ) {
- foreach (qw(sort map grep eval)) {
+ foreach (qw( sort map grep eval )) {
if ( !$is_olb_exclusion_word{$_} ) { $want_one_line_block{$_} = 1 }
}
}
if (@grep_aliases) {
- @{is_sort_map_grep}{@grep_aliases} = (1) x scalar(@grep_aliases);
+ @is_sort_map_grep{@grep_aliases} = (1) x scalar(@grep_aliases);
if ( $want_one_line_block{'grep'} ) {
- @{want_one_line_block}{@grep_aliases} = (1) x scalar(@grep_aliases);
+ @want_one_line_block{@grep_aliases} = (1) x scalar(@grep_aliases);
}
}
- ##@q = qw(sort map grep eval);
%is_sort_map_grep_eval = %is_sort_map_grep;
$is_sort_map_grep_eval{'eval'} = 1;
- ##@q = qw(sort map grep eval do);
%is_sort_map_grep_eval_do = %is_sort_map_grep_eval;
$is_sort_map_grep_eval_do{'do'} = 1;
# we could remove sub and use ASUB pattern to also handle a
# prototype/signature. But that would slow things down and would probably
# never be useful.
- ##@q = qw( do sub eval sort map grep );
%is_block_with_ci = %is_sort_map_grep_eval_do;
$is_block_with_ci{'sub'} = 1;
- %is_keyword_returning_list = ();
- @q = qw(
- grep
- keys
- map
- reverse
- sort
- split
- );
+ @q = qw( grep keys map reverse sort split );
push @q, @grep_aliases;
+ %is_keyword_returning_list = ();
@is_keyword_returning_list{@q} = (1) x scalar(@q);
# This code enables vertical alignment of grep aliases for testing. It has
my $opt_name = 'weld-nested-exclusion-list';
my $str = $rOpts->{$opt_name};
+
+ # let a '0' be the same as not defined
return unless ($str);
$str =~ s/^\s+//;
$str =~ s/\s+$//;
my $msg2;
foreach my $item (@items) {
my $item_save = $item;
- my $tok = chop($item);
+ my $tok = chop $item;
my $key = $token_keys{$tok};
if ( !defined($key) ) {
$msg1 .= " '$item_save'";
return;
} ## end sub initialize_weld_fat_comma_rules
-sub initialize_line_up_parentheses_control_hash {
- my ( $str, $opt_name ) = @_;
- return unless ($str);
- $str =~ s/^\s+//;
- $str =~ s/\s+$//;
- return unless ($str);
+sub initialize_lpxl_lpil {
- # The format is space separated items, where each item must consist of a
- # string with a token type preceded by an optional text token and followed
- # by an integer:
- # For example:
+ %line_up_parentheses_control_hash = ();
+ $line_up_parentheses_control_is_lpxl = 1;
+ my $lpxl = $rOpts->{'line-up-parentheses-exclusion-list'};
+ my $lpil = $rOpts->{'line-up-parentheses-inclusion-list'};
+ if ( $lpxl && $lpil ) {
+ Warn(<<EOM);
+You entered values for both -lpxl=s and -lpil=s; the -lpil list will be ignored
+EOM
+ }
+ if ($lpxl) {
+ $line_up_parentheses_control_is_lpxl = 1;
+ initialize_line_up_parentheses_control_hash(
+ $rOpts->{'line-up-parentheses-exclusion-list'}, 'lpxl' );
+ }
+ elsif ($lpil) {
+ $line_up_parentheses_control_is_lpxl = 0;
+ initialize_line_up_parentheses_control_hash(
+ $rOpts->{'line-up-parentheses-inclusion-list'}, 'lpil' );
+ }
+ else {
+ # neither -lpxl nor -lpil specified
+ }
+ return;
+} ## end sub initialize_lpxl_lpil
+
+sub initialize_line_up_parentheses_control_hash {
+ my ( $str, $opt_name ) = @_;
+
+ # let a 0 be the same as not defined
+ return unless ($str);
+ $str =~ s/^\s+//;
+ $str =~ s/\s+$//;
+ return unless ($str);
+
+ # The format is space separated items, where each item must consist of a
+ # string with a token type preceded by an optional text token and followed
+ # by an integer:
+ # For example:
# W(1
# = (flag1)(key)(flag2), where
# flag1 = 'W'
foreach my $item (@items) {
my $item_save = $item;
my ( $flag1, $key, $flag2 );
- if ( $item =~ /^([^\(\]\{]*)?([\(\{\[])(\d)?$/ ) {
+ if ( $item =~ /^ ([^\(\[\{]*)? ([\(\{\[]) (\d)? $/x ) {
+ ## $flag1 $key $flag2
$flag1 = $1 if $1;
$key = $2 if $2;
- $flag2 = $3 if $3;
+ $flag2 = $3 if defined($3);
}
else {
$msg1 .= " '$item_save'";
}
# Check for valid flag1
- if ( !defined($flag1) ) { $flag1 = '*' }
- elsif ( $flag1 !~ /^[kKfFwW\*]$/ ) {
+ if ( !defined($flag1) ) { $flag1 = '*' }
+
+ if ( $flag1 !~ /^[kKfFwW\*]$/ ) {
$msg1 .= " '$item_save'";
next;
}
# 1 all containers with sublists match
# 2 all containers with sublists, code blocks or ternary operators match
# ... this could be extended in the future
- if ( !defined($flag2) ) { $flag2 = 0 }
- elsif ( $flag2 !~ /^[012]$/ ) {
+ if ( !defined($flag2) ) { $flag2 = 0 }
+
+ if ( $flag2 !~ /^[012]$/ ) {
$msg1 .= " '$item_save'";
next;
}
}
# Speedup: we can turn off -lp if it is not actually used
- if ($line_up_parentheses_control_is_lxpl) {
+ if ($line_up_parentheses_control_is_lpxl) {
my $all_off = 1;
foreach my $key (qw# ( { [ #) {
my $rflags = $line_up_parentheses_control_hash{$key};
sub initialize_space_after_keyword {
- # default keywords for which space is introduced before an opening paren
+ # Default keywords for which space is introduced before an opening paren:
# (at present, including them messes up vertical alignment)
- my @sak = qw(my local our and or xor err eq ne if else elsif until
- unless while for foreach return switch case given when catch);
+ my @sak = qw( my local our state and or xor err eq ne if else elsif until
+ unless while for foreach return switch case given when catch );
%space_after_keyword = map { $_ => 1 } @sak;
# first remove any or all of these if desired
if ( my @q = split_words( $rOpts->{'nospace-after-keyword'} ) ) {
# -nsak='*' selects all the above keywords
- if ( @q == 1 && $q[0] eq '*' ) { @q = keys(%space_after_keyword) }
+ if ( @q == 1 && $q[0] eq '*' ) { @q = keys %space_after_keyword }
@space_after_keyword{@q} = (0) x scalar(@q);
}
return;
} ## end sub initialize_space_after_keyword
+sub initialize_outdent_keyword {
+
+ # Implement outdenting preferences for keywords
+ %outdent_keyword = ();
+ my @okw = split_words( $rOpts->{'outdent-keyword-list'} );
+ if ( !@okw ) {
+ @okw = qw( next last redo goto return ); # defaults
+ }
+
+ # FUTURE: if not a keyword, assume that it is an identifier
+ foreach (@okw) {
+ if ( Perl::Tidy::Tokenizer::is_keyword($_) ) {
+ $outdent_keyword{$_} = 1;
+ }
+ else {
+ Warn("ignoring '$_' in -okwl list; not a perl keyword");
+ }
+ }
+ return;
+} ## end sub initialize_outdent_keyword
+
+sub initialize_keyword_paren_inner_tightness {
+
+ # Setup hash for -kpit option
+ %keyword_paren_inner_tightness = ();
+ my $kpit_value = $rOpts->{'keyword-paren-inner-tightness'};
+ if ( defined($kpit_value) && $kpit_value != 1 ) {
+ my @kpit =
+ split_words( $rOpts->{'keyword-paren-inner-tightness-list'} );
+ if ( !@kpit ) {
+ @kpit = qw( if elsif unless while until for foreach ); # defaults
+ }
+
+ # we will allow keywords and user-defined identifiers
+ foreach (@kpit) {
+ $keyword_paren_inner_tightness{$_} = $kpit_value;
+ }
+ }
+ return;
+} ## end sub initialize_keyword_paren_inner_tightness
+
+sub initialize_extended_block_tightness_list {
+
+ # Setup the control hash for --extended-block-tightness
+
+ # keywords taking indirect objects:
+ my @k_list = keys %is_indirect_object_taker;
+
+ # type symbols which may precede an opening block brace
+ my @t_list = qw( $ @ % & * );
+ push @t_list, '$#';
+
+ my @all = ( @k_list, @t_list );
+
+ # We will build the selection in %hash
+ # By default the option is 'on' for keywords only (-xbtl='k')
+ my %hash;
+ @hash{@k_list} = (1) x scalar(@k_list);
+ @hash{@t_list} = (0) x scalar(@t_list);
+
+ # This can be overridden with -xbtl="..."
+ my $long_name = 'extended-block-tightness-list';
+ if ( $rOpts->{$long_name} ) {
+ my @words = split_words( $rOpts->{$long_name} );
+ my @unknown;
+
+ # Turn everything off
+ @hash{@all} = (0) x scalar(@all);
+
+ # Then turn on selections
+ foreach my $word (@words) {
+
+ # 'print' etc turns on a specific word or symbol
+ if ( defined( $hash{$word} ) ) { $hash{$word} = 1; }
+
+ # 'k' turns on all keywords
+ elsif ( $word eq 'k' ) {
+ @hash{@k_list} = (1) x scalar(@k_list);
+ }
+
+ # 't' turns on all symbols
+ elsif ( $word eq 't' ) {
+ @hash{@t_list} = (1) x scalar(@t_list);
+ }
+
+ # 'kt' same as 'k' and 't' for convenience
+ elsif ( $word eq 'kt' ) {
+ @hash{@all} = (1) x scalar(@all);
+ }
+
+ # Anything else is an error
+ else { push @unknown, $word }
+ }
+ if (@unknown) {
+ my $num = @unknown;
+ local $LIST_SEPARATOR = SPACE;
+ Warn(<<EOM);
+$num unrecognized keyword(s) were input with --$long_name :
+@unknown
+EOM
+ }
+ }
+
+ # Transfer the result to the global hash
+ %extended_block_tightness_list = %hash;
+
+ return;
+} ## end sub initialize_extended_block_tightness_list
+
sub initialize_token_break_preferences {
- # implement user break preferences
+ # Initialize these global hashes defining break preferences:
+ # %want_break_before
+ # %break_before_container_types
+
my $break_after = sub {
my @toks = @_;
foreach my $tok (@toks) {
}
}
return;
- };
+ }; ## end $break_after = sub
my $break_before = sub {
my @toks = @_;
}
}
return;
- };
+ }; ## end $break_before = sub
$break_after->(@all_operators) if ( $rOpts->{'break-after-all-operators'} );
$break_before->(@all_operators)
$break_after->( split_words( $rOpts->{'want-break-after'} ) );
$break_before->( split_words( $rOpts->{'want-break-before'} ) );
- # make note if breaks are before certain key types
+ # Make note if breaks are before certain key types
+ # Added '->' for git #171.
%want_break_before = ();
- foreach my $tok ( @all_operators, ',' ) {
+ foreach my $tok ( @all_operators, ',', '->' ) {
$want_break_before{$tok} =
$left_bond_strength{$tok} < $right_bond_strength{$tok};
}
for ( $rOpts->{'break-before-paren'} ) {
$break_before_container_types{'('} = $_ if $_ && $_ > 0;
}
+
+ # Note: a fix for b1266 previously here is now covered by the
+ # updates for b1470, b1474, so it has been removed.
+
return;
} ## end sub initialize_token_break_preferences
-use constant DEBUG_KB => 0;
+sub initialize_line_up_parentheses {
-sub initialize_keep_old_breakpoints {
- my ( $str, $short_name, $rkeep_break_hash ) = @_;
- return unless $str;
+ # -xlp implies -lp
+ if ( $rOpts->{'extended-line-up-parentheses'} ) {
+ $rOpts->{'line-up-parentheses'} ||= 1;
+ }
- my %flags = ();
- my @list = split_words($str);
- if ( DEBUG_KB && @list ) {
- local $LIST_SEPARATOR = SPACE;
- print <<EOM;
-DEBUG_KB entering for '$short_name' with str=$str\n";
-list is: @list;
+ if ( $rOpts->{'line-up-parentheses'} ) {
+
+ if ( $rOpts->{'indent-only'}
+ || !$rOpts->{'add-newlines'}
+ || !$rOpts->{'delete-old-newlines'} )
+ {
+ Warn(<<EOM);
+-----------------------------------------------------------------------
+Conflict: -lp conflicts with -io, -fnl, -nanl, or -ndnl; ignoring -lp
+
+The -lp indentation logic requires that perltidy be able to coordinate
+arbitrarily large numbers of line breakpoints. This isn't possible
+with these flags.
+-----------------------------------------------------------------------
EOM
- }
+ $rOpts->{'line-up-parentheses'} = 0;
+ $rOpts->{'extended-line-up-parentheses'} = 0;
+ }
- # Ignore kbb='(' and '[' and '{': can cause unstable math formatting
- # (issues b1346, b1347, b1348) and likewise ignore kba=')' and ']' and '}'
- # Also always ignore ? and : (b1440 and b1433-b1439)
- if ( $short_name eq 'kbb' ) {
- @list = grep { !m/[\(\[\{\?\:]/ } @list;
- }
- elsif ( $short_name eq 'kba' ) {
- @list = grep { !m/[\)\]\}\?\:]/ } @list;
+ if ( $rOpts->{'whitespace-cycle'} ) {
+ Warn(<<EOM);
+Conflict: -wc cannot currently be used with the -lp option; ignoring -wc
+EOM
+ $rOpts->{'whitespace-cycle'} = 0;
+ }
}
- # pull out any any leading container code, like f( or *{
- # For example: 'f(' becomes flags hash entry '(' => 'f'
- foreach my $item (@list) {
- if ( $item =~ /^( [ \w\* ] )( [ \{\(\[\}\)\] ] )$/x ) {
- $item = $2;
- $flags{$2} = $1;
- }
+ #-----------------------------------------------------------
+ # The combination -lp -vmll can be unstable if -ci<2 (b1267)
+ #-----------------------------------------------------------
+ # The -vmll and -lp parameters do not really work well together.
+ # This is a very crude fix for an unusual parameter combination.
+ if ( $rOpts->{'variable-maximum-line-length'}
+ && $rOpts->{'line-up-parentheses'}
+ && $rOpts->{'continuation-indentation'} < 2 )
+ {
+ $rOpts->{'continuation-indentation'} = 2;
+ ##Warn("Increased -ci=n to n=2 for stability with -lp and -vmll\n");
}
- my @unknown_types;
- foreach my $type (@list) {
- if ( !Perl::Tidy::Tokenizer::is_valid_token_type($type) ) {
- push @unknown_types, $type;
- }
+ #-----------------------------------------------------------
+ # The combination -lp -vmll -atc -dtc can be unstable
+ #-----------------------------------------------------------
+ # This fixes b1386 b1387 b1388 which had -wtc='b'
+ # Updated to to include any -wtc to fix b1426
+ if ( $rOpts->{'variable-maximum-line-length'}
+ && $rOpts->{'line-up-parentheses'}
+ && $rOpts->{'add-trailing-commas'}
+ && $rOpts->{'delete-trailing-commas'}
+ && $rOpts->{'want-trailing-commas'} )
+ {
+ $rOpts->{'delete-trailing-commas'} = 0;
+## Issuing a warning message causes trouble with test cases, and this combo is
+## so rare that it is unlikely to not occur in practice. So skip warning.
+## Warn(
+##"The combination -vmll -lp -atc -dtc can be unstable; turning off -dtc\n"
+## );
}
- if (@unknown_types) {
- my $num = @unknown_types;
- local $LIST_SEPARATOR = SPACE;
- Warn(<<EOM);
-$num unrecognized token types were input with --$short_name :
-@unknown_types
-EOM
+ #-----------------------------------------------------------
+ # The combination -xlp -xci and ci>i can be unstable (b1466)
+ #-----------------------------------------------------------
+ # Deactivated: the fix for b1501 also fixed b1466 in a simpler way.
+ # So this block can eventually be removed.
+ if ( 0
+ && $rOpts->{'extended-line-up-parentheses'}
+ && $rOpts->{'extended-continuation-indentation'}
+ && $rOpts->{'continuation-indentation'} > $rOpts->{'indent-columns'}
+ && $rOpts->{'indent-columns'} > 1 )
+ {
+ $rOpts->{'continuation-indentation'} = $rOpts->{'indent-columns'};
+ ## This combination is only likely to occur during random testing, so
+ ## skip the warning.
+ ##Warn("The combination -xlp -xci -ci>-i can be unstable; reducing ci\n");
}
- @{$rkeep_break_hash}{@list} = (1) x scalar(@list);
+ return;
+} ## end sub initialize_line_up_parentheses
- foreach my $key ( keys %flags ) {
- my $flag = $flags{$key};
+sub check_tabs {
- if ( length($flag) != 1 ) {
- Warn(<<EOM);
-Multiple entries given for '$key' in '$short_name'
-EOM
- }
- elsif ( ( $key eq '(' || $key eq ')' ) && $flag !~ /^[kKfFwW\*]$/ ) {
- Warn(<<EOM);
-Unknown flag '$flag' given for '$key' in '$short_name'
-EOM
- }
- elsif ( ( $key eq '}' || $key eq '}' ) && $flag !~ /^[bB\*]$/ ) {
- Warn(<<EOM);
-Unknown flag '$flag' given for '$key' in '$short_name'
+ # At present, tabs are not compatible with the line-up-parentheses style
+ # (it would be possible to entab the total leading whitespace
+ # just prior to writing the line, if desired).
+ if ( $rOpts->{'line-up-parentheses'} && $rOpts->{'tabs'} ) {
+ Warn(<<EOM);
+Conflict: -t (tabs) cannot be used with the -lp option; ignoring -t; see -et.
EOM
- }
-
- $rkeep_break_hash->{$key} = $flag;
+ $rOpts->{'tabs'} = 0;
}
- if ( DEBUG_KB && @list ) {
- my @tmp = %flags;
- local $LIST_SEPARATOR = SPACE;
- print <<EOM;
-
-DEBUG_KB -$short_name flag: $str
-final keys: @list
-special flags: @tmp
+ # tabs are not compatible with outdenting..
+ if ( $rOpts->{'outdent-keywords'} && $rOpts->{'tabs'} ) {
+ Warn(<<EOM);
+Conflict: -t (tabs) cannot be used with the -okw options; ignoring -t; see -et.
EOM
+ $rOpts->{'tabs'} = 0;
+ }
+ if ( $rOpts->{'outdent-labels'} && $rOpts->{'tabs'} ) {
+ Warn(<<EOM);
+Conflict: -t (tabs) cannot be used with the -ola option; ignoring -t; see -et.
+EOM
+ $rOpts->{'tabs'} = 0;
}
return;
+} ## end sub check_tabs
-} ## end sub initialize_keep_old_breakpoints
-
-sub initialize_global_option_vars {
+sub initialize_container_indentation_options {
- #------------------------------------------------------------
- # Make global vars for frequently used options for efficiency
- #------------------------------------------------------------
+ %container_indentation_options = ();
+ foreach my $pair (
+ [ 'break-before-hash-brace-and-indent', '{' ],
+ [ 'break-before-square-bracket-and-indent', '[' ],
+ [ 'break-before-paren-and-indent', '(' ],
+ )
+ {
+ my ( $key, $tok ) = @{$pair};
+ my $opt = $rOpts->{$key};
+ if ( defined($opt) && $opt > 0 && $break_before_container_types{$tok} )
+ {
- $rOpts_add_newlines = $rOpts->{'add-newlines'};
- $rOpts_add_trailing_commas = $rOpts->{'add-trailing-commas'};
- $rOpts_add_whitespace = $rOpts->{'add-whitespace'};
- $rOpts_blank_lines_after_opening_block =
- $rOpts->{'blank-lines-after-opening-block'};
+ # (1) -lp is not compatible with opt=2, silently set to opt=0
+ # (2) opt=0 and 2 give same result if -i=-ci; but opt=0 is faster
+ # (3) set opt=0 if -i < -ci (can be unstable, case b1355)
+ if ( $opt == 2 ) {
+ if (
+ $rOpts->{'line-up-parentheses'}
+ || ( $rOpts->{'indent-columns'} <=
+ $rOpts->{'continuation-indentation'} )
+ )
+ {
+ $opt = 0;
+ }
+ }
+ $container_indentation_options{$tok} = $opt;
+ }
+ }
+ return;
+} ## end sub initialize_container_indentation_options
+
+sub initialize_old_breakpoint_controls {
+
+ if ( $rOpts->{'ignore-old-breakpoints'} ) {
+
+ my @conflicts;
+ if ( $rOpts->{'break-at-old-method-breakpoints'} ) {
+ $rOpts->{'break-at-old-method-breakpoints'} = 0;
+ push @conflicts, '--break-at-old-method-breakpoints (-bom)';
+ }
+ if ( $rOpts->{'break-at-old-comma-breakpoints'} ) {
+ $rOpts->{'break-at-old-comma-breakpoints'} = 0;
+ push @conflicts, '--break-at-old-comma-breakpoints (-boc)';
+ }
+ if ( $rOpts->{'break-at-old-semicolon-breakpoints'} ) {
+ $rOpts->{'break-at-old-semicolon-breakpoints'} = 0;
+ push @conflicts, '--break-at-old-semicolon-breakpoints (-bos)';
+ }
+ if ( $rOpts->{'keep-old-breakpoints-before'} ) {
+ $rOpts->{'keep-old-breakpoints-before'} = EMPTY_STRING;
+ push @conflicts, '--keep-old-breakpoints-before (-kbb)';
+ }
+ if ( $rOpts->{'keep-old-breakpoints-after'} ) {
+ $rOpts->{'keep-old-breakpoints-after'} = EMPTY_STRING;
+ push @conflicts, '--keep-old-breakpoints-after (-kba)';
+ }
+
+ if (@conflicts) {
+ my $msg = join( "\n ",
+" Conflict: These conflicts with --ignore-old-breakponts (-iob) will be turned off:",
+ @conflicts ) . "\n";
+ Warn($msg);
+ }
+
+ # Note: These additional parameters are made inactive by -iob.
+ # They are silently turned off here because they are on by default.
+ # We would generate unexpected warnings if we issued a warning.
+ $rOpts->{'break-at-old-keyword-breakpoints'} = 0;
+ $rOpts->{'break-at-old-logical-breakpoints'} = 0;
+ $rOpts->{'break-at-old-ternary-breakpoints'} = 0;
+ $rOpts->{'break-at-old-attribute-breakpoints'} = 0;
+ }
+
+ %keep_break_before_type = ();
+ initialize_keep_old_breakpoints( $rOpts->{'keep-old-breakpoints-before'},
+ 'kbb', \%keep_break_before_type );
+
+ %keep_break_after_type = ();
+ initialize_keep_old_breakpoints( $rOpts->{'keep-old-breakpoints-after'},
+ 'kba', \%keep_break_after_type );
+
+ # Modify %keep_break_before and %keep_break_after to avoid conflicts
+ # with %want_break_before; fixes b1436.
+ # This became necessary after breaks for some tokens were converted
+ # from hard to soft (see b1433).
+ # We could do this for all tokens, but to minimize changes to existing
+ # code we currently only do this for the soft break tokens.
+ foreach my $key ( keys %keep_break_before_type ) {
+ if ( defined( $want_break_before{$key} )
+ && !$want_break_before{$key}
+ && $is_soft_keep_break_type{$key} )
+ {
+ $keep_break_after_type{$key} = $keep_break_before_type{$key};
+ delete $keep_break_before_type{$key};
+ }
+ }
+ foreach my $key ( keys %keep_break_after_type ) {
+ if ( defined( $want_break_before{$key} )
+ && $want_break_before{$key}
+ && $is_soft_keep_break_type{$key} )
+ {
+ $keep_break_before_type{$key} = $keep_break_after_type{$key};
+ delete $keep_break_after_type{$key};
+ }
+ }
+
+ $controlled_comma_style ||= $keep_break_before_type{','};
+ $controlled_comma_style ||= $keep_break_after_type{','};
+
+ return;
+} ## end sub initialize_old_breakpoint_controls
+
+use constant DEBUG_KB => 0;
+
+sub initialize_keep_old_breakpoints {
+ my ( $str, $short_name, $rkeep_break_hash ) = @_;
+
+ # 0 will be treated same as not defined
+ return unless $str;
+
+ my %flags = ();
+ my @list = split_words($str);
+ if ( DEBUG_KB && @list ) {
+ local $LIST_SEPARATOR = SPACE;
+ print <<EOM;
+DEBUG_KB entering for '$short_name' with str=$str\n";
+list is: @list;
+EOM
+ }
+
+ # Ignore kbb='(' and '[' and '{': can cause unstable math formatting
+ # (issues b1346, b1347, b1348) and likewise ignore kba=')' and ']' and '}'
+ # Also always ignore ? and : (b1440 and b1433-b1439)
+ if ( $short_name eq 'kbb' ) {
+ @list = grep { !m/[\(\[\{\?\:]/ } @list;
+ }
+ elsif ( $short_name eq 'kba' ) {
+ @list = grep { !m/[\)\]\}\?\:]/ } @list;
+ }
+ else {
+ Fault(<<EOM);
+Bad call arg - received short name '$short_name' but expecting 'kbb' or 'kba'
+EOM
+ }
+
+ # pull out any any leading container code, like f( or *{
+ # For example: 'f(' becomes flags hash entry '(' => 'f'
+ foreach my $item (@list) {
+ if ( $item =~ /^( [ \w\* ] )( [ \{\(\[\}\)\] ] )$/x ) {
+ $item = $2;
+ $flags{$2} = $1;
+ }
+ }
+
+ my @unknown_types;
+ foreach my $type (@list) {
+ if ( !Perl::Tidy::Tokenizer::is_valid_token_type($type) ) {
+ push @unknown_types, $type;
+ }
+ }
+
+ if (@unknown_types) {
+ my $num = @unknown_types;
+ local $LIST_SEPARATOR = SPACE;
+ Warn(<<EOM);
+$num unrecognized token types were input with --$short_name :
+@unknown_types
+EOM
+ }
+
+ @{$rkeep_break_hash}{@list} = (1) x scalar(@list);
+
+ foreach my $key ( keys %flags ) {
+ my $flag = $flags{$key};
+
+ if ( length($flag) != 1 ) {
+ Warn(<<EOM);
+Multiple entries given for '$key' in '$short_name'
+EOM
+ }
+ elsif ( ( $key eq '(' || $key eq ')' ) && $flag !~ /^[kKfFwW\*]$/ ) {
+ Warn(<<EOM);
+Unknown flag '$flag' given for '$key' in '$short_name'
+EOM
+ }
+ elsif ( ( $key eq '}' || $key eq '}' ) && $flag !~ /^[bB\*]$/ ) {
+ Warn(<<EOM);
+Unknown flag '$flag' given for '$key' in '$short_name'
+EOM
+ }
+ else {
+ # no error seen
+ }
+
+ $rkeep_break_hash->{$key} = $flag;
+ }
+
+ if ( DEBUG_KB && @list ) {
+ my @tmp = %flags;
+ local $LIST_SEPARATOR = SPACE;
+ print <<EOM;
+
+DEBUG_KB -$short_name flag: $str
+final keys: @list
+special flags: @tmp
+EOM
+
+ }
+
+ return;
+
+} ## end sub initialize_keep_old_breakpoints
+
+sub initialize_tightness_vars {
+
+ # hashes used to simplify setting whitespace
+ %tightness = (
+ '{' => $rOpts->{'brace-tightness'},
+ '}' => $rOpts->{'brace-tightness'},
+ '(' => $rOpts->{'paren-tightness'},
+ ')' => $rOpts->{'paren-tightness'},
+ '[' => $rOpts->{'square-bracket-tightness'},
+ ']' => $rOpts->{'square-bracket-tightness'},
+ );
+
+ return;
+} ## end sub initialize_tightness_vars
+
+sub initialize_multiple_token_tightness {
+
+ # Initialization for --multiple-token-tightness
+ %multiple_token_tightness = ();
+
+ my $opt_name = 'multiple-token-tightness';
+ my $opt = $rOpts->{$opt_name};
+
+ # The default is to add spaces for the double diamond
+ if ( !$opt ) {
+ $multiple_token_tightness{'<<>>'} = 1;
+ return;
+ }
+
+ # These are valid input words for perltidy token types
+ # Note that 'qw' will be translated into the actual token type 'q'
+ my %is_type_option;
+ my @type_options = qw( <<>> qw Q h );
+ @is_type_option{@type_options} = (1) x scalar(@type_options);
+
+ # These are valid input words subtypes of token type 'Q'.
+ # Note qw must be treated specially and is in the previous list.
+ my %is_Q_subtype_option;
+ my @Q_subtype_options = qw( q qq qx qr s y tr m );
+ @is_Q_subtype_option{@Q_subtype_options} =
+ (1) x scalar(@Q_subtype_options);
+
+ my %is_valid_term = ( %is_type_option, %is_Q_subtype_option );
+
+ # Words can be negated by prefixing with the following character:
+ my $neg_char = '^';
+
+ # Scan the input
+ my %positive_input;
+ my %negative_input;
+ my $error_string = EMPTY_STRING;
+ if ( defined($opt) ) {
+ my @list = split_words($opt);
+ foreach my $word (@list) {
+
+ # The special word 'q*' means all of the Q_subtypes plus 'qw'
+ if ( $word eq 'q*' ) {
+ foreach (@Q_subtype_options) { $positive_input{$_} = 1 }
+ $positive_input{'qw'} = 1;
+ }
+ elsif ( $word eq $neg_char . 'q*' ) {
+ foreach (@Q_subtype_options) { $negative_input{$_} = 1 }
+ $negative_input{'qw'} = 1;
+ }
+ elsif ( $is_valid_term{$word} ) {
+ $positive_input{$word} = 1;
+ }
+ elsif ( substr( $word, 0, 1 ) eq $neg_char
+ && $is_valid_term{ substr( $word, 1 ) } )
+ {
+ $negative_input{ substr( $word, 1 ) } = 1;
+ }
+ else {
+ $error_string .= "$word ";
+ }
+ }
+ }
+
+ if ($error_string) {
+ $error_string =~ s/\s+$//;
+ Warn(<<EOM);
+Ignoring these unknown terms in --$opt_name: '$error_string'
+EOM
+ }
+
+ # The token '<<>>' is always a default unless rejected
+ if ( !$negative_input{'<<>>'} ) {
+ $positive_input{'<<>>'} = 1;
+ }
+
+ # Now construct the control hash
+ my @Q_subtype_list;
+ foreach my $word ( keys %positive_input ) {
+
+ # negative has priority over positive
+ next if ( $negative_input{$word} );
+
+ if ( $is_type_option{$word} ) {
+ if ( $word eq 'qw' ) { $word = 'q' }
+ $multiple_token_tightness{$word} = 1;
+ }
+ elsif ( $is_Q_subtype_option{$word} ) {
+ push @Q_subtype_list, $word;
+ }
+ else {
+ # something is wrong; previous checks should prevent arriving here
+ DEVEL_MODE
+ && Fault(
+ "unexpected word '$word' while initializing -mutt=$opt\n");
+ %multiple_token_tightness = ();
+ return;
+ }
+ }
+
+ # Construct a regex for the selected Q subtypes, in the form
+ # ^(?:qq|qx|qr|q|s|y|tr|m)\b
+ if (@Q_subtype_list) {
+ my $regex = q{^(?:} . join( '|', @Q_subtype_list ) . q{)\b};
+ if ( bad_pattern($regex) ) {
+
+ # shouldn't happen; there must be a coding error
+ my $msg =
+ "ERROR: the --$opt_name input caused an invalid regex '$regex'\n";
+ DEVEL_MODE && Fault($msg);
+ Warn($msg);
+ %multiple_token_tightness = ();
+ return;
+ }
+ $multiple_token_tightness{'Q'} = $regex;
+ }
+ return;
+} ## end sub initialize_multiple_token_tightness
+
+sub initialize_global_option_vars {
+
+ #------------------------------------------------------------
+ # Make global vars for frequently used options for efficiency
+ #------------------------------------------------------------
+
+ $rOpts_add_newlines = $rOpts->{'add-newlines'};
+ $rOpts_add_trailing_commas = $rOpts->{'add-trailing-commas'};
+ $rOpts_add_lone_trailing_commas = $rOpts->{'add-lone-trailing-commas'};
+ $rOpts_add_whitespace = $rOpts->{'add-whitespace'};
+ $rOpts_blank_lines_after_opening_block =
+ $rOpts->{'blank-lines-after-opening-block'};
$rOpts_block_brace_tightness = $rOpts->{'block-brace-tightness'};
$rOpts_block_brace_vertical_tightness =
$rOpts->{'block-brace-vertical-tightness'};
$rOpts->{'extended-continuation-indentation'};
$rOpts_delete_side_comments = $rOpts->{'delete-side-comments'};
$rOpts_delete_trailing_commas = $rOpts->{'delete-trailing-commas'};
+ $rOpts_delete_lone_trailing_commas =
+ $rOpts->{'delete-lone-trailing-commas'};
$rOpts_delete_weld_interfering_commas =
$rOpts->{'delete-weld-interfering-commas'};
$rOpts_format_skipping = $rOpts->{'format-skipping'};
$rOpts_ignore_old_breakpoints = $rOpts->{'ignore-old-breakpoints'};
$rOpts_ignore_side_comment_lengths =
$rOpts->{'ignore-side-comment-lengths'};
- $rOpts_indent_closing_brace = $rOpts->{'indent-closing-brace'};
- $rOpts_indent_columns = $rOpts->{'indent-columns'};
- $rOpts_indent_only = $rOpts->{'indent-only'};
- $rOpts_keep_interior_semicolons = $rOpts->{'keep-interior-semicolons'};
- $rOpts_line_up_parentheses = $rOpts->{'line-up-parentheses'};
+ $rOpts_ignore_perlcritic_comments = $rOpts->{'ignore-perlcritic-comments'};
+ $rOpts_indent_closing_brace = $rOpts->{'indent-closing-brace'};
+ $rOpts_indent_columns = $rOpts->{'indent-columns'};
+ $rOpts_indent_leading_semicolon = $rOpts->{'indent-leading-semicolon'};
+ $rOpts_indent_only = $rOpts->{'indent-only'};
+ $rOpts_keep_interior_semicolons = $rOpts->{'keep-interior-semicolons'};
+ $rOpts_line_up_parentheses = $rOpts->{'line-up-parentheses'};
+ $rOpts_extended_block_tightness = $rOpts->{'extended-block-tightness'};
$rOpts_extended_line_up_parentheses =
$rOpts->{'extended-line-up-parentheses'};
$rOpts_logical_padding = $rOpts->{'logical-padding'};
$rOpts_maximum_consecutive_blank_lines =
$rOpts->{'maximum-consecutive-blank-lines'};
- $rOpts_maximum_fields_per_table = $rOpts->{'maximum-fields-per-table'};
- $rOpts_maximum_line_length = $rOpts->{'maximum-line-length'};
+ $rOpts_maximum_fields_per_table = $rOpts->{'maximum-fields-per-table'};
+ $rOpts_maximum_line_length = $rOpts->{'maximum-line-length'};
+ $rOpts_minimize_continuation_indentation =
+ $rOpts->{'minimize-continuation-indentation'};
$rOpts_one_line_block_semicolons = $rOpts->{'one-line-block-semicolons'};
$rOpts_opening_brace_always_on_right =
$rOpts->{'opening-brace-always-on-right'};
$rOpts_outdent_long_quotes = $rOpts->{'outdent-long-quotes'};
$rOpts_outdent_static_block_comments =
$rOpts->{'outdent-static-block-comments'};
- $rOpts_recombine = $rOpts->{'recombine'};
+ $rOpts_recombine = $rOpts->{'recombine'};
+ $rOpts_qw_as_function = $rOpts->{'qw-as-function'};
$rOpts_short_concatenation_item_length =
$rOpts->{'short-concatenation-item-length'};
$rOpts_space_prototype_paren = $rOpts->{'space-prototype-paren'};
+ $rOpts_space_signature_paren = $rOpts->{'space-signature-paren'};
$rOpts_stack_closing_block_brace = $rOpts->{'stack-closing-block-brace'};
$rOpts_static_block_comments = $rOpts->{'static-block-comments'};
+ $rOpts_add_missing_else = $rOpts->{'add-missing-else'};
+ $rOpts_warn_missing_else = $rOpts->{'warn-missing-else'};
$rOpts_tee_block_comments = $rOpts->{'tee-block-comments'};
$rOpts_tee_pod = $rOpts->{'tee-pod'};
$rOpts_tee_side_comments = $rOpts->{'tee-side-comments'};
$rOpts_valign_code = $rOpts->{'valign-code'};
$rOpts_valign_side_comments = $rOpts->{'valign-side-comments'};
+ $rOpts_valign_if_unless = $rOpts->{'valign-if-unless'};
+ $rOpts_valign_wide_equals = $rOpts->{'valign-wide-equals'};
$rOpts_variable_maximum_line_length =
$rOpts->{'variable-maximum-line-length'};
- # Note that both opening and closing tokens can access the opening
- # and closing flags of their container types.
- %opening_vertical_tightness = (
- '(' => $rOpts->{'paren-vertical-tightness'},
- '{' => $rOpts->{'brace-vertical-tightness'},
- '[' => $rOpts->{'square-bracket-vertical-tightness'},
- ')' => $rOpts->{'paren-vertical-tightness'},
- '}' => $rOpts->{'brace-vertical-tightness'},
- ']' => $rOpts->{'square-bracket-vertical-tightness'},
- );
+ # Note that both opening and closing tokens can access the opening
+ # and closing flags of their container types.
+ %opening_vertical_tightness = (
+ '(' => $rOpts->{'paren-vertical-tightness'},
+ '{' => $rOpts->{'brace-vertical-tightness'},
+ '[' => $rOpts->{'square-bracket-vertical-tightness'},
+ ')' => $rOpts->{'paren-vertical-tightness'},
+ '}' => $rOpts->{'brace-vertical-tightness'},
+ ']' => $rOpts->{'square-bracket-vertical-tightness'},
+ );
+
+ %closing_vertical_tightness = (
+ '(' => $rOpts->{'paren-vertical-tightness-closing'},
+ '{' => $rOpts->{'brace-vertical-tightness-closing'},
+ '[' => $rOpts->{'square-bracket-vertical-tightness-closing'},
+ ')' => $rOpts->{'paren-vertical-tightness-closing'},
+ '}' => $rOpts->{'brace-vertical-tightness-closing'},
+ ']' => $rOpts->{'square-bracket-vertical-tightness-closing'},
+ );
+
+ # assume flag for '>' same as ')' for closing qw quotes
+ %closing_token_indentation = (
+ ')' => $rOpts->{'closing-paren-indentation'},
+ '}' => $rOpts->{'closing-brace-indentation'},
+ ']' => $rOpts->{'closing-square-bracket-indentation'},
+ '>' => $rOpts->{'closing-paren-indentation'},
+ );
+
+ # flag indicating if any closing tokens are indented
+ $some_closing_token_indentation =
+ $rOpts->{'closing-paren-indentation'}
+ || $rOpts->{'closing-brace-indentation'}
+ || $rOpts->{'closing-square-bracket-indentation'}
+ || $rOpts->{'indent-closing-brace'};
+
+ %opening_token_right = (
+ '(' => $rOpts->{'opening-paren-right'},
+ '{' => $rOpts->{'opening-hash-brace-right'},
+ '[' => $rOpts->{'opening-square-bracket-right'},
+ );
+
+ %stack_opening_token = (
+ '(' => $rOpts->{'stack-opening-paren'},
+ '{' => $rOpts->{'stack-opening-hash-brace'},
+ '[' => $rOpts->{'stack-opening-square-bracket'},
+ );
+
+ %stack_closing_token = (
+ ')' => $rOpts->{'stack-closing-paren'},
+ '}' => $rOpts->{'stack-closing-hash-brace'},
+ ']' => $rOpts->{'stack-closing-square-bracket'},
+ );
+ return;
+} ## end sub initialize_global_option_vars
+
+sub initialize_line_length_vars {
+
+ # Create a table of maximum line length vs level for later efficient use.
+ # We will make the tables very long to be sure it will not be exceeded.
+ # But we have to choose a fixed length. A check will be made at the start
+ # of sub 'finish_formatting' to be sure it is not exceeded. Note, some of
+ # my standard test problems have indentation levels of about 150, so this
+ # should be fairly large. If the choice of a maximum level ever becomes
+ # an issue then these table values could be returned in a sub with a simple
+ # memoization scheme.
+
+ # Also create a table of the maximum spaces available for text due to the
+ # level only. If a line has continuation indentation, then that space must
+ # be subtracted from the table value. This table is used for preliminary
+ # estimates in welding, extended_ci, BBX, and marking short blocks.
+ use constant LEVEL_TABLE_MAX => 1000;
+
+ # The basic scheme:
+ foreach my $level ( 0 .. LEVEL_TABLE_MAX ) {
+ my $indent = $level * $rOpts_indent_columns;
+ $maximum_line_length_at_level[$level] = $rOpts_maximum_line_length;
+ $maximum_text_length_at_level[$level] =
+ $rOpts_maximum_line_length - $indent;
+ }
+
+ # Correct the maximum_text_length table if the -wc=n flag is used
+ $rOpts_whitespace_cycle = $rOpts->{'whitespace-cycle'};
+ if ($rOpts_whitespace_cycle) {
+ if ( $rOpts_whitespace_cycle > 0 ) {
+ foreach my $level ( 0 .. LEVEL_TABLE_MAX ) {
+ my $level_mod = $level % $rOpts_whitespace_cycle;
+ my $indent = $level_mod * $rOpts_indent_columns;
+ $maximum_text_length_at_level[$level] =
+ $rOpts_maximum_line_length - $indent;
+ }
+ }
+ else {
+ $rOpts_whitespace_cycle = $rOpts->{'whitespace-cycle'} = 0;
+ }
+ }
+
+ # Correct the tables if the -vmll flag is used. These values override the
+ # previous values.
+ if ($rOpts_variable_maximum_line_length) {
+ foreach my $level ( 0 .. LEVEL_TABLE_MAX ) {
+ $maximum_text_length_at_level[$level] = $rOpts_maximum_line_length;
+ $maximum_line_length_at_level[$level] =
+ $rOpts_maximum_line_length + $level * $rOpts_indent_columns;
+ }
+ }
+
+ # Define two measures of indentation level, alpha and beta, at which some
+ # formatting features come under stress and need to start shutting down.
+ # Some combination of the two will be used to shut down different
+ # formatting features.
+ # Put a reasonable upper limit on stress level (say 100) in case the
+ # whitespace-cycle variable is used.
+ my $stress_level_limit = min( 100, LEVEL_TABLE_MAX );
+
+ # Find stress_level_alpha, targeted at very short maximum line lengths.
+ $stress_level_alpha = $stress_level_limit + 1;
+ foreach my $level_test ( 0 .. $stress_level_limit ) {
+ my $max_len = $maximum_text_length_at_level[ $level_test + 1 ];
+ my $excess_inside_space =
+ $max_len -
+ $rOpts_continuation_indentation -
+ $rOpts_indent_columns - 8;
+ if ( $excess_inside_space <= 0 ) {
+ $stress_level_alpha = $level_test;
+ last;
+ }
+ }
+
+ # Find stress level beta, a stress level targeted at formatting
+ # at deep levels near the maximum line length. We start increasing
+ # from zero and stop at the first level which shows no more space.
+
+ # 'const' is a fixed number of spaces for a typical variable.
+ # Cases b1197-b1204 work ok with const=12 but not with const=8
+ my $const = 16;
+ my $denom = max( 1, $rOpts_indent_columns );
+ $stress_level_beta = 0;
+ foreach my $level ( 0 .. $stress_level_limit ) {
+ my $remaining_cycles = max(
+ 0,
+ (
+ $maximum_text_length_at_level[$level] -
+ $rOpts_continuation_indentation - $const
+ ) / $denom
+ );
+ last if ( $remaining_cycles <= 3 ); # 2 does not work
+ $stress_level_beta = $level;
+ }
+
+ # This is a combined level which works well for turning off formatting
+ # features in most cases:
+ $high_stress_level = min( $stress_level_alpha, $stress_level_beta + 2 );
+
+ return;
+} ## end sub initialize_line_length_vars
+
+sub initialize_trailing_comma_break_rules {
+
+ # Setup control hash for making trailing comma breaks. Update c416.
+ # This sub is similar to 'sub initialize_trailing_comma_rules' but
+ # simpler.
+
+ # -btct=s, where s
+ #
+ # =" " none
+ # =0 : none
+ # =1 or * : all
+ # =m : break at trailing commas in multiline lists
+ # =b : break at bare trailing commas
+
+ %trailing_comma_break_rules = ();
+
+ my $rvalid_flags = [qw( 0 1 * m b )];
+
+ # Note that the hash keys are the CLOSING tokens but the input
+ # uses OPENING tokens.
+ my @all_keys = qw< ) ] } >;
+
+ my $option = $rOpts->{'break-at-trailing-comma-types'};
+
+ if ($option) {
+ $option =~ s/^\s+//;
+ $option =~ s/\s+$//;
+ }
+
+ # We need to use length() here because '0' is a possible option
+ if ( defined($option) && length($option) ) {
+ my $error_message;
+ my %rule_hash;
+ my @q = @{$rvalid_flags};
+ my %is_valid_flag;
+ @is_valid_flag{@q} = (1) x scalar(@q);
+
+ # handle the common case of a single control character, like -btct='b'
+ if ( length($option) == 1 ) {
+
+ # skip 0
+ if ($option) {
+ foreach my $key (@all_keys) {
+ $rule_hash{$key} = [ $option, EMPTY_STRING ];
+ }
+ }
+ }
+
+ # handle multi-character control(s), such as -btct='[m' or -btct='k(m'
+ else {
+ my @parts = split /\s+/, $option;
+ foreach my $part (@parts) {
+ my $part_input = $part;
+
+ # examples: b -b [b 0 * +f(b
+
+ # the letter value is the rightmost character
+ my $val = substr( $part, -1, 1 );
+
+ # skip 0
+ next unless ($val);
+ $part = substr( $part, 0, -1 );
+ if ( $val && !$is_valid_flag{$val} ) {
+ my $valid_str = join( SPACE, @{$rvalid_flags} );
+ $error_message .=
+"In '$part_input': unexpected value '$val'; must be one of: $valid_str\n";
+ next;
+ }
+
+ # set defaults for this item
+ my @keys = @all_keys;
+ my $paren_flag = EMPTY_STRING;
+
+ # look for opening container bracket
+ my $is_paren;
+ if ( length($part) ) {
+ my $token = substr( $part, -1, 1 );
+ if ( $is_opening_token{$token} ) {
+
+ # note that the hash key is the closing token
+ my $key = $matching_token{$token};
+ @keys = ($key);
+ $part = substr( $part, 0, -1 );
+ $is_paren = $token eq '(';
+ }
+ }
+
+ # anything left must be a paren modifier
+ if ( length($part) ) {
+ $paren_flag = substr( $part, -1, 1 );
+ $part = substr( $part, 0, -1 );
+ if ( $paren_flag !~ /^[kKfFwW]$/ ) {
+ $error_message .=
+"In '$part_input': Unexpected paren flag '$paren_flag'; must be one of: k K f F w W\n";
+ next;
+ }
+ if ( !$is_paren ) {
+ $error_message .=
+"In '$part_input': paren flag '$paren_flag' is only allowed before a '('\n";
+ next;
+ }
+ }
+
+ if ( length($part) ) {
+ $error_message .= "Unrecognized term: '$part_input'\n";
+ next;
+ }
+
+ my $duplicate;
+ foreach my $key (@keys) {
+ if ( defined( $rule_hash{$key} ) ) {
+ $duplicate = 1;
+ }
+ $rule_hash{$key} = [ $val, $paren_flag ];
+ }
+ if ($duplicate) {
+ $error_message .=
+ "This term overlaps a previous term: '$part_input'\n";
+ }
+ }
+ }
+
+ # check for conflicting signed options
+ if ($error_message) {
+ Warn(<<EOM);
+Error parsing --want-trailing-commas='$option':
+$error_message
+EOM
+ }
+
+ # Set the control hash if no errors
+ else {
+ %trailing_comma_break_rules = %rule_hash;
+ }
+ }
+
+ return;
+} ## end sub initialize_trailing_comma_break_rules
+
+sub initialize_trailing_comma_rules {
+
+ # Setup control hash for trailing commas
+
+ # -wtc=s defines desired trailing comma policy:
+ #
+ # =" " stable
+ # [ both -atc and -dtc ignored ]
+ # =0 : none
+ # [requires -dtc; -atc ignored]
+ # =1 or * : all
+ # [requires -atc; -dtc ignored]
+ # =m : multiline lists require trailing comma
+ # if -atc set => will add missing multiline trailing commas
+ # if -dtc set => will delete trailing single line commas
+ # =b or 'bare' (multiline) lists require trailing comma
+ # if -atc set => will add missing bare trailing commas
+ # if -dtc set => will delete non-bare trailing commas
+ # =h or 'hash': single column stable bare lists require trailing comma
+ # if -atc set will add these
+ # if -dtc set will delete other trailing commas
+
+ #-------------------------------------------------------------------
+ # Important:
+ # - This routine must be called after the alpha and beta stress levels
+ # have been defined in sub 'initialize_line_length_vars'.
+ # - and it must be called after sub 'initialize_trailing_comma_break_rules'
+ #-------------------------------------------------------------------
+
+ %trailing_comma_rules = ();
+
+ my $rvalid_flags = [qw( 0 1 * m b h i )];
+
+ # This hash shows i.e. that 'm' includes all 'b' includes all 'i' ...etc
+ # It is used to check for overlap when both + and - signs are used to
+ # cause adding and deleting of different types of trailing commas.
+ my %match_order = (
+ '1' => 0,
+ '*' => 0,
+ 'm' => 1,
+ 'b' => 2,
+ 'i' => 3,
+ 'h' => 4,
+ '0' => 5,
+ );
+
+ # Note that the hash keys are the CLOSING tokens but the input
+ # uses OPENING tokens.
+ my @all_keys = qw< ) ] } >;
+
+ my $option = $rOpts->{'want-trailing-commas'};
+
+ if ($option) {
+ $option =~ s/^\s+//;
+ $option =~ s/\s+$//;
+ }
+
+ # Pull out -btct paren flag for use in checking stability in marginal cases
+ my ( $tc_letter, $tc_paren_flag );
+ my $tc_paren_rule = $trailing_comma_break_rules{')'};
+ if ( defined($tc_paren_rule) ) {
+ ( $tc_letter, $tc_paren_flag ) = @{$tc_paren_rule};
+ }
+
+ # We need to use length() here because '0' is a possible option
+ if ( defined($option) && length($option) ) {
+ my $error_message;
+ my %rule_hash;
+ my @q = @{$rvalid_flags};
+ my %is_valid_flag;
+ @is_valid_flag{@q} = (1) x scalar(@q);
+
+ # handle the common case of a single control character, like -wtc='b'
+ if ( length($option) == 1 ) {
+ foreach my $key (@all_keys) {
+ my $paren_flag = EMPTY_STRING;
+ my $stable = defined( $trailing_comma_break_rules{$key} );
+ if ( $key eq ')' ) { $stable &&= $paren_flag eq $tc_paren_flag }
+ $rule_hash{add}->{$key} = [ $option, $paren_flag, $stable ];
+ $rule_hash{delete}->{$key} = [ $option, $paren_flag, $stable ];
+ }
+ }
+
+ # handle multi-character control(s), such as -wtc='[m' or -wtc='k(m'
+ else {
+ my @parts = split /\s+/, $option;
+ foreach my $part (@parts) {
+ my $part_input = $part;
+
+ # examples: b -b [b 0 * +f(b
+
+ # the letter value is the rightmost character
+ my $val = substr( $part, -1, 1 );
+ $part = substr( $part, 0, -1 );
+ if ( $val && !$is_valid_flag{$val} ) {
+ my $valid_str = join( SPACE, @{$rvalid_flags} );
+ $error_message .=
+"In '$part_input': unexpected value '$val'; must be one of: $valid_str\n";
+ next;
+ }
+
+ # set defaults for this item
+ my @signs = qw( add delete );
+ my @keys = @all_keys;
+ my $paren_flag = EMPTY_STRING;
+
+ # look for opening container bracket
+ my $is_paren;
+ if ( length($part) ) {
+ my $token = substr( $part, -1, 1 );
+ if ( $is_opening_token{$token} ) {
+
+ # note that the hash key is the closing token
+ my $key = $matching_token{$token};
+ @keys = ($key);
+ $part = substr( $part, 0, -1 );
+ $is_paren = $token eq '(';
+ }
+ }
+
+ # look for a leading sign, + or -
+ if ( length($part) ) {
+ my $sign = substr( $part, 0, 1 );
+ if ( $sign eq '+' ) {
+ @signs = qw(add);
+ $part = substr( $part, 1 );
+ }
+ elsif ( $sign eq '-' ) {
+ @signs = qw(delete);
+ $part = substr( $part, 1 );
+ }
+ else {
+ ## keep defaults
+ }
+ }
+
+ # anything left must be a paren modifier
+ if ( length($part) ) {
+ $paren_flag = substr( $part, -1, 1 );
+ $part = substr( $part, 0, -1 );
+ if ( $paren_flag !~ /^[kKfFwW]$/ ) {
+ $error_message .=
+"In '$part_input': Unexpected paren flag '$paren_flag'; must be one of: k K f F w W\n";
+ next;
+ }
+ if ( !$is_paren ) {
+ $error_message .=
+"In '$part_input': paren flag '$paren_flag' is only allowed before a '('\n";
+ next;
+ }
+ }
+
+ if ( length($part) ) {
+ $error_message .= "Unrecognized term: '$part_input'\n";
+ next;
+ }
+
+ my $duplicate;
+ foreach my $sign (@signs) {
+ foreach my $key (@keys) {
+
+ # New bare commas are stable if -bctc is set, and
+ # also paren flags do not disagree
+ my $stable =
+ defined( $trailing_comma_break_rules{$key} );
+ if ( $key eq ')' ) {
+ $stable &&= $paren_flag eq $tc_paren_flag;
+ }
+
+ if ( defined( $rule_hash{$sign}->{$key} ) ) {
+ $duplicate &&= 1;
+ }
+ $rule_hash{$sign}->{$key} =
+ [ $val, $paren_flag, $stable ];
+ }
+ }
+
+ if ($duplicate) {
+ $error_message .=
+ "This term overlaps a previous term: '$part_input'\n";
+ }
+ }
+ }
+
+ # check for conflicting signed options
+ if ( !$error_message ) {
+ my $radd = $rule_hash{add};
+ my $rdelete = $rule_hash{delete};
+ if ( defined($radd) && defined($rdelete) ) {
+ foreach my $key (@all_keys) {
+ my $radd_info = $radd->{$key};
+ my $rdelete_info = $rdelete->{$key};
+ if ( defined($radd_info) && defined($rdelete_info) ) {
+ my $add_val = $radd_info->[0];
+ my $delete_val = $rdelete_info->[0];
+ next if ( $add_val eq $delete_val );
+ my $add_order = $match_order{$add_val};
+ my $delete_order = $match_order{$delete_val};
+ if ( !defined($add_order) ) {
+ ## should have been caught earlier
+ DEVEL_MODE
+ && Fault("unexpected + value $add_val\n");
+ next;
+ }
+ if ( !defined($delete_order) ) {
+ ## should have been caught earlier
+ DEVEL_MODE
+ && Fault("unexpected - value $delete_val\n");
+ next;
+ }
+ if ( $add_order <= $delete_order ) {
+ my $token = $matching_token{$key};
+ $error_message .=
+"For token '$token': the range for '+$add_val' overlaps the range for '-$delete_val'\n";
+ }
+ }
+ }
+ }
+ }
+
+ if ($error_message) {
+ Warn(<<EOM);
+Error parsing --want-trailing-commas='$option':
+$error_message
+EOM
+ }
+
+ # Set the control hash if no errors
+ else {
+ %trailing_comma_rules = %rule_hash;
+ }
+ }
+
+ # Both adding and deleting commas can lead to instability in extreme cases
+ if ( $rOpts_add_trailing_commas && $rOpts_delete_trailing_commas ) {
+
+ # If the possible instability is significant, then we can turn off
+ # -dtc as a defensive measure to prevent it.
+
+ # We must turn off -dtc for very small values of --whitespace-cycle
+ # to avoid instability. A minimum value of -wc=3 fixes b1393, but a
+ # value of 4 is used here for safety. This parameter is seldom used,
+ # and much larger than this when used, so the cutoff value is not
+ # critical.
+ if ( $rOpts_whitespace_cycle && $rOpts_whitespace_cycle <= 4 ) {
+ $rOpts_delete_trailing_commas = 0;
+ }
+ }
+
+ return;
+} ## end sub initialize_trailing_comma_rules
+
+sub initialize_interbracket_arrow_style {
+
+ # Setup hash for desired arrow style
+ %interbracket_arrow_style = ();
+
+ # and check other parameters for conflicts
+ my $name_add = 'add-interbracket-arrows';
+ my $name_delete = 'delete-interbracket-arrows';
+ my $name_style = 'interbracket-arrow-style';
+
+ my $opt_add = $rOpts->{$name_add};
+ my $opt_delete = $rOpts->{$name_delete};
+ my $opt_style = $rOpts->{$name_style};
+
+ if ( $opt_add && $opt_delete && !$opt_style ) {
+ Die(<<EOM);
+Cannot use both --$name_add and --$name_delete
+ unless --$name_style is defined
+EOM
+ }
+
+ return unless defined($opt_style);
+ $opt_style =~ tr/,/ /;
+ $opt_style =~ s/^\s+//;
+ $opt_style =~ s/\s+$//;
+ return unless length($opt_style);
+
+ if ( $opt_style eq '0' ) { $opt_style = '] [ ] { } [ } {' }
+ elsif ( $opt_style eq '1' ) { $opt_style = ']->[ ]->{ }->[ }->{' }
+ elsif ( $opt_style eq '*' ) { $opt_style = ']->[ ]->{ }->[ }->{' }
+ else { }
+
+ # We are walking along a string such as
+ # $opt_style=" ][ ]->{ }->[ }{ ";
+ # ignoring spaces and looking for bracket pairs with optional
+ # arrow like: '][' or ]->{ or }->[ or }{
+ # The two bracket characters are the hash key and the hash value
+ # is 1 for an arrow and -1 for no arrow.
+
+ # $ch1 will hold most recent closing bracket
+ # $ch2 will hold a '->' if seen
+ my %rule_hash;
+ my ( $ch1, $ch2 );
+ my $err_msg;
+ my $pos_last;
+ while (1) {
+ $pos_last = pos($opt_style);
+ if (
+ $opt_style =~ m{
+ \G(?: # fix git #142
+ (\s+) # 1. whitespace
+ | ([\}\]]) # 2. closing bracket
+ | (->) # 3. arrow
+ | ([\[\{]) # 4. opening bracket
+ | (.*) # 5. something else, error
+ )
+ }gcx
+ )
+ {
+ if ($1) { next }
+ if ($2) {
+ if ( !$ch1 ) { $ch1 = $2 }
+ else { $err_msg = "unexpected '$2'"; last }
+ next;
+ }
+ if ($3) {
+ if ($ch1) { $ch2 = $3 }
+ else { $err_msg = "unexpected '$3'"; last }
+ next;
+ }
+ if ($4) {
+ if ( $ch1 || $ch2 ) {
+ my $key = $ch1 . $4;
+ if ( !defined( $rule_hash{$key} ) ) {
+ $rule_hash{$key} = $ch2 ? 1 : -1;
+ }
+ else { $err_msg = "multiple copies for '$key'"; last; }
+ $ch1 = $ch2 = undef;
+ }
+ else { $err_msg = "unexpected '$4'"; last }
+ next;
+ }
+ if ($5) {
+ my $bad = $5;
+ if ( length($bad) > 10 ) {
+ $bad = substr( $bad, 0, 10 ) . '...';
+ }
+ $err_msg = "confused at: '$bad'\n";
+ last;
+ }
+ }
+
+ # that's all..
+ else {
+ last;
+ }
+ } ## end while (1)
+
+ if ($err_msg) {
+ my $msg;
+ if ( $pos_last && length($opt_style) < 20 ) {
+ $msg = $opt_style . "\n" . SPACE x $pos_last . '^' . "\n";
+ }
+ $msg .= "Error parsing --$name_style: $err_msg";
+ Die($msg);
+ }
+
+ # Copy the rule hash, converting braces to token types
+ foreach my $key ( keys %rule_hash ) {
+ my $key_fix = $key;
+ $key_fix =~ tr/{}/LR/;
+ $interbracket_arrow_style{$key_fix} = $rule_hash{$key};
+ }
+
+ return;
+} ## end sub initialize_interbracket_arrow_style
+
+sub initialize_whitespace_hashes {
+
+ # This is called once before formatting begins to initialize these global
+ # hashes, which control the use of whitespace around tokens:
+ #
+ # %binary_ws_rules
+ # %want_left_space
+ # %want_right_space
+ # %space_after_keyword
+ #
+ # Many token types are identical to the tokens themselves.
+ # See the tokenizer for a complete list. Here are some special types:
+ # k = perl keyword
+ # f = semicolon in for statement
+ # m = unary minus
+ # p = unary plus
+ # Note that :: is excluded since it should be contained in an identifier
+ # Note that '->' is excluded because it never gets space
+ # parentheses and brackets are excluded since they are handled specially
+ # curly braces are included but may be overridden by logic, such as
+ # newline logic.
+
+ # NEW_TOKENS: create a whitespace rule here. This can be as
+ # simple as adding your new letter to @spaces_both_sides, for
+ # example.
+
+ # fix for c250: added space rules new package type 'P' and sub type 'S'
+ my @spaces_both_sides = qw#
+ + - * / % ? = . : x < > | & ^ .. << >> ** && .. || // => += -=
+ .= %= x= &= |= ^= *= <> <= >= == =~ !~ /= != ... <<= >>= ~~ !~~
+ **= &&= ||= //= <=> A k f w F n C Y U G v P S ^^
+ #;
+
+ my @spaces_left_side = qw< t ! ~ m p { \ h pp mm Z j >;
+ push( @spaces_left_side, '#' ); # avoids warning message
+
+ # c349: moved **= from @spaces_right_side to @spaces_both_sides
+ my @spaces_right_side = qw< ; } ) ] R J ++ -- >;
+ push( @spaces_right_side, ',' ); # avoids warning message
+
+ %want_left_space = ();
+ %want_right_space = ();
+ %binary_ws_rules = ();
+
+ # Note that we setting defaults here. Later in processing
+ # the values of %want_left_space and %want_right_space
+ # may be overridden by any user settings specified by the
+ # -wls and -wrs parameters. However the binary_whitespace_rules
+ # are hardwired and have priority.
+ @want_left_space{@spaces_both_sides} =
+ (1) x scalar(@spaces_both_sides);
+ @want_right_space{@spaces_both_sides} =
+ (1) x scalar(@spaces_both_sides);
+ @want_left_space{@spaces_left_side} =
+ (1) x scalar(@spaces_left_side);
+ @want_right_space{@spaces_left_side} =
+ (-1) x scalar(@spaces_left_side);
+ @want_left_space{@spaces_right_side} =
+ (-1) x scalar(@spaces_right_side);
+ @want_right_space{@spaces_right_side} =
+ (1) x scalar(@spaces_right_side);
+ $want_left_space{'->'} = WS_NO;
+ $want_right_space{'->'} = WS_NO;
+ $want_left_space{'**'} = WS_NO;
+ $want_right_space{'**'} = WS_NO;
+ $want_right_space{'CORE::'} = WS_NO;
+
+ # These binary_ws_rules are hardwired and have priority over the above
+ # settings. It would be nice to allow adjustment by the user,
+ # but it would be complicated to specify.
+ #
+ # hash type information must stay tightly bound
+ # as in : ${xxxx}
+ $binary_ws_rules{'i'}{'L'} = WS_NO;
+ $binary_ws_rules{'i'}{'{'} = WS_YES;
+ $binary_ws_rules{'k'}{'{'} = WS_YES;
+ $binary_ws_rules{'U'}{'{'} = WS_YES;
+ $binary_ws_rules{'i'}{'['} = WS_NO;
+ $binary_ws_rules{'R'}{'L'} = WS_NO;
+ $binary_ws_rules{'R'}{'{'} = WS_NO;
+ $binary_ws_rules{'t'}{'L'} = WS_NO;
+ $binary_ws_rules{'t'}{'{'} = WS_NO;
+ $binary_ws_rules{'t'}{'='} = WS_OPTIONAL; # for signatures; fixes b1123
+ $binary_ws_rules{'}'}{'L'} = WS_NO;
+ $binary_ws_rules{'}'}{'{'} = WS_OPTIONAL; # RT#129850; was WS_NO
+ $binary_ws_rules{'$'}{'L'} = WS_NO;
+ $binary_ws_rules{'$'}{'{'} = WS_NO;
+ $binary_ws_rules{'@'}{'L'} = WS_NO;
+ $binary_ws_rules{'@'}{'{'} = WS_NO;
+ $binary_ws_rules{'='}{'L'} = WS_YES;
+ $binary_ws_rules{'J'}{'J'} = WS_YES;
+
+ # the following includes ') {'
+ # as in : if ( xxx ) { yyy }
+ $binary_ws_rules{']'}{'L'} = WS_NO;
+ $binary_ws_rules{']'}{'{'} = WS_NO;
+ $binary_ws_rules{')'}{'{'} = WS_YES;
+ $binary_ws_rules{')'}{'['} = WS_NO;
+ $binary_ws_rules{']'}{'['} = WS_NO;
+ $binary_ws_rules{']'}{'{'} = WS_NO;
+ $binary_ws_rules{'}'}{'['} = WS_NO;
+ $binary_ws_rules{'R'}{'['} = WS_NO;
+
+ $binary_ws_rules{']'}{'++'} = WS_NO;
+ $binary_ws_rules{']'}{'--'} = WS_NO;
+ $binary_ws_rules{')'}{'++'} = WS_NO;
+ $binary_ws_rules{')'}{'--'} = WS_NO;
+
+ $binary_ws_rules{'R'}{'++'} = WS_NO;
+ $binary_ws_rules{'R'}{'--'} = WS_NO;
+
+ $binary_ws_rules{'i'}{'Q'} = WS_YES;
+ $binary_ws_rules{'n'}{'('} = WS_YES; # occurs in 'use package n ()'
+
+ $binary_ws_rules{'i'}{'('} = WS_NO;
+
+ $binary_ws_rules{'w'}{'('} = WS_NO;
+ $binary_ws_rules{'w'}{'{'} = WS_YES;
+
+ # user controls
+ if ( !$rOpts->{'space-for-semicolon'} ) {
+ $want_left_space{'f'} = -1;
+ }
+
+ if ( $rOpts->{'space-terminal-semicolon'} ) {
+ $want_left_space{';'} = 1;
+ }
+
+ # implement user whitespace preferences
+ if ( my @q = split_words( $rOpts->{'want-left-space'} ) ) {
+ @want_left_space{@q} = (1) x scalar(@q);
+ }
+
+ if ( my @q = split_words( $rOpts->{'want-right-space'} ) ) {
+ @want_right_space{@q} = (1) x scalar(@q);
+ }
+
+ if ( my @q = split_words( $rOpts->{'nowant-left-space'} ) ) {
+ @want_left_space{@q} = (-1) x scalar(@q);
+ }
+
+ if ( my @q = split_words( $rOpts->{'nowant-right-space'} ) ) {
+ @want_right_space{@q} = (-1) x scalar(@q);
+ }
+
+ return;
+
+} ## end sub initialize_whitespace_hashes
+
+{ #<<< begin closure set_whitespace_flags
+
+my %is_special_ws_type;
+my %is_wCUG;
+my %is_wi;
+
+BEGIN {
+
+ # The following hash is used to skip over needless if tests.
+ # Be sure to update it when adding new checks in its block.
+ my @q = qw( k w C m - Q );
+ push @q, '#';
+ @is_special_ws_type{@q} = (1) x scalar(@q);
+
+ # These hashes replace slower regex tests
+ @q = qw( w C U G );
+ @is_wCUG{@q} = (1) x scalar(@q);
+
+ @q = qw( w i );
+ @is_wi{@q} = (1) x scalar(@q);
+
+} ## end BEGIN
+
+use constant DEBUG_WHITE => 0;
+
+# Hashes to set spaces around container tokens according to their
+# sequence numbers. These are set as keywords are examined.
+# They are controlled by the -kpit and -kpitl flags.
+my %opening_container_inside_ws;
+my %closing_container_inside_ws;
+
+sub set_whitespace_flags {
+
+ my $self = shift;
+
+ # This routine is called once per file to set whitespace flags for that
+ # file. This routine examines each pair of nonblank tokens and sets a flag
+ # indicating if they should be separated by white space.
+ #
+ # $rwhitespace_flags->[$j] is a flag indicating whether a white space
+ # BEFORE token $j is needed, with the following values:
+ #
+ # WS_NO = -1 do not want a space BEFORE token $j
+ # WS_OPTIONAL= 0 optional space or $j is a whitespace
+ # WS_YES = 1 want a space BEFORE token $j
+ #
+
+ my $j_tight_closing_paren = -1;
+ my $rLL = $self->[_rLL_];
+ my $K_closing_container = $self->[_K_closing_container_];
+ my $jmax = @{$rLL} - 1;
+
+ %opening_container_inside_ws = ();
+ %closing_container_inside_ws = ();
+
+ my $rblock_type_of_seqno = $self->[_rblock_type_of_seqno_];
+
+ my $rOpts_space_keyword_paren = $rOpts->{'space-keyword-paren'};
+ my $rOpts_space_backslash_quote = $rOpts->{'space-backslash-quote'};
+ my $rOpts_space_function_paren = $rOpts->{'space-function-paren'};
+
+ my $rwhitespace_flags = [];
+ my $ris_function_call_paren = {};
+
+ return $rwhitespace_flags if ( $jmax < 0 );
+
+ # function to return $ws for a signature paren following a sub
+ my $ws_signature_paren = sub {
+ my ($jj) = @_;
+ my $ws;
+ if ( $rOpts_space_signature_paren == 1 ) {
+
+ # is the previous token a blank?
+ my $have_blank = $rLL->[ $jj - 1 ]->[_TYPE_] eq 'b';
+
+ # or a newline?
+ $have_blank ||=
+ $rLL->[$jj]->[_LINE_INDEX_] != $rLL->[ $jj - 1 ]->[_LINE_INDEX_];
+
+ $ws = $have_blank ? WS_YES : WS_NO;
+ }
+ else {
+ $ws = $rOpts_space_signature_paren == 0 ? WS_NO : WS_YES;
+ }
+ return $ws;
+ }; ## end $ws_signature_paren = sub
+
+ my $last_token = SPACE;
+ my $last_type = 'b';
+
+ my $last_token_dbg = SPACE;
+ my $last_type_dbg = 'b';
+
+ my $rtokh_last = [ @{ $rLL->[0] } ];
+ $rtokh_last->[_TOKEN_] = $last_token;
+ $rtokh_last->[_TYPE_] = $last_type;
+ $rtokh_last->[_TYPE_SEQUENCE_] = EMPTY_STRING;
+ $rtokh_last->[_LINE_INDEX_] = 0;
+
+ my $rtokh_last_last = $rtokh_last;
+
+ # This will identify braces to be treated as blocks for the -xbt flag
+ my %block_type_for_tightness;
+
+ my ( $ws_1, $ws_2, $ws_3, $ws_4 );
+
+ # main loop over all tokens to define the whitespace flags
+ my $last_type_is_opening;
+ my $j = -1;
+ my $type;
+ foreach my $rtokh ( @{$rLL} ) {
+
+ $j++;
+
+ if ( ( $type = $rtokh->[_TYPE_] ) eq 'b' ) {
+ $rwhitespace_flags->[$j] = WS_OPTIONAL;
+ next;
+ }
+
+ my $token = $rtokh->[_TOKEN_];
+
+ my $ws;
+
+ #---------------------------------------------------------------
+ # Whitespace Rules Section 1:
+ # Handle space on the inside of opening braces.
+ #---------------------------------------------------------------
+
+ if ($last_type_is_opening) {
+
+ $last_type_is_opening = 0;
+
+ my $seqno = $rtokh->[_TYPE_SEQUENCE_];
+ my $block_type = $rblock_type_of_seqno->{$seqno};
+ my $last_seqno = $rtokh_last->[_TYPE_SEQUENCE_];
+ my $last_block_type = $rblock_type_of_seqno->{$last_seqno}
+ || $block_type_for_tightness{$last_seqno};
+
+ $j_tight_closing_paren = -1;
+
+ # let us keep empty matched braces together: () {} []
+ # except for BLOCKS
+ if ( $token eq $matching_token{$last_token} ) {
+ if ($block_type) {
+ $ws = WS_YES;
+ }
+ else {
+ $ws = WS_NO;
+ }
+ }
+ else {
+
+ # we're considering the right of an opening brace
+ # tightness = 0 means always pad inside with space
+ # tightness = 1 means pad inside if "complex"
+ # tightness = 2 means never pad inside with space
+
+ my $tightness_here;
+ if ( $last_block_type && $last_token eq '{' ) {
+ $tightness_here = $rOpts_block_brace_tightness;
+ }
+ else { $tightness_here = $tightness{$last_token} }
+
+ #=============================================================
+ # Patch for test problem <<snippets/fabrice_bug.in>>
+ # We must always avoid spaces around a bare word beginning
+ # with ^ as in:
+ # my $before = ${^PREMATCH};
+ # Because all of the following cause an error in perl:
+ # my $before = ${ ^PREMATCH };
+ # my $before = ${ ^PREMATCH};
+ # my $before = ${^PREMATCH };
+ # So if brace tightness flag is -bt=0 we must temporarily reset
+ # to bt=1. Note that here we must set tightness=1 and not 2 so
+ # that the closing space is also avoided
+ # (via the $j_tight_closing_paren flag in coding)
+ if ( $type eq 'w' && substr( $token, 0, 1 ) eq '^' ) {
+ $tightness_here = 1;
+ }
+
+ #=============================================================
+
+ if ( $tightness_here <= 0 ) {
+ $ws = WS_YES;
+ }
+ elsif ( $tightness_here > 1 ) {
+ $ws = WS_NO;
+ }
+
+ # Default (tightness = 1) depends on the container token count
+ else {
+
+ # Find the index of the closing token
+ my $j_closing = $K_closing_container->{$last_seqno};
+
+ # Certain token types can be counted as multiple tokens for
+ # the default tightness. The meaning of hash values is:
+ # 1 => match this token type
+ # otherwise it is a regex; match if token matches regex
+ my $regex = $multiple_token_tightness{$type};
+ if ( $regex
+ && ( length($regex) == 1 || $token =~ /$regex/ ) )
+ {
+ $ws = WS_YES;
+ }
+
+ # If the closing token is less than five characters ahead
+ # we must take a closer look
+ elsif ( defined($j_closing)
+ && $j_closing - $j < 5
+ && $rLL->[$j_closing]->[_TYPE_SEQUENCE_] eq
+ $last_seqno )
+ {
+ # quick check
+ if ( $j + 1 >= $j_closing ) {
+ $ws = WS_NO;
+ $j_tight_closing_paren = $j_closing;
+ }
+
+ # slow check
+ else {
+ $ws =
+ ws_in_container( $j, $j_closing, $rLL, $type,
+ $token, $last_token );
+ if ( $ws == WS_NO ) {
+ $j_tight_closing_paren = $j_closing;
+ }
+ }
+ }
+ else {
+ $ws = WS_YES;
+ }
+ }
+ }
+
+ # check for special cases which override the above rules
+ if ( %opening_container_inside_ws && $last_seqno ) {
+ my $ws_override = $opening_container_inside_ws{$last_seqno};
+ if ($ws_override) { $ws = $ws_override }
+ }
+
+ $ws_4 = $ws_3 = $ws_2 = $ws_1 = $ws
+ if DEBUG_WHITE;
+
+ } ## end setting space flag inside opening tokens
+
+ #---------------------------------------------------------------
+ # Whitespace Rules Section 2:
+ # Special checks for certain types ...
+ #---------------------------------------------------------------
+ # The hash '%is_special_ws_type' significantly speeds up this routine,
+ # but be sure to update it if a new check is added.
+ # Currently has types: qw(k w C m - Q #)
+ if ( $is_special_ws_type{$type} ) {
+
+ if ( $type eq 'k' ) {
+
+ # Keywords 'for', 'foreach' are special cases for -kpit since
+ # the opening paren does not always immediately follow the
+ # keyword. So we have to search forward for the paren in this
+ # case. I have limited the search to 10 tokens ahead, just in
+ # case somebody has a big file and no opening paren. This
+ # should be enough for all normal code. Added the level check
+ # to fix b1236.
+ if ( $is_for_foreach{$token}
+ && %keyword_paren_inner_tightness
+ && defined( $keyword_paren_inner_tightness{$token} )
+ && $j < $jmax )
+ {
+ my $level = $rLL->[$j]->[_LEVEL_];
+ ## NOTE: we might use the KNEXT variable to avoid this loop
+ ## but profiling shows that little would be saved
+ foreach my $jp ( $j + 1 .. $j + 9 ) {
+ last if ( $jp > $jmax );
+ last if ( $rLL->[$jp]->[_LEVEL_] != $level ); # b1236
+ next unless ( $rLL->[$jp]->[_TOKEN_] eq '(' );
+ my $seqno_p = $rLL->[$jp]->[_TYPE_SEQUENCE_];
+ set_container_ws_by_keyword( $token, $seqno_p );
+ last;
+ }
+ }
+ }
+
+ # handle a comment
+ elsif ( $type eq '#' ) {
+
+ # newline before block comment ($j==0), and
+ # space before side comment ($j>0), so ..
+ $ws = WS_YES;
+
+ #---------------------------------
+ # Nothing more to do for a comment
+ #---------------------------------
+ $rwhitespace_flags->[$j] = $ws;
+ next;
+ }
+
+ # space_backslash_quote; RT #123774 <<snippets/rt123774.in>>
+ # allow a space between a backslash and single or double quote
+ # to avoid fooling html formatters
+ elsif ( $type eq 'Q' ) {
+ if ( $last_type eq '\\' && $token =~ /^[\"\']/ ) {
+ $ws =
+ !$rOpts_space_backslash_quote ? WS_NO
+ : $rOpts_space_backslash_quote == 1 ? WS_OPTIONAL
+ : $rOpts_space_backslash_quote == 2 ? WS_YES
+ : WS_YES;
+ }
+ }
+
+ # retain any space between '-' and bare word
+ elsif ( $type eq 'w' || $type eq 'C' ) {
+ $ws = WS_OPTIONAL if $last_type eq '-';
+ }
+
+ # retain any space between '-' and bare word; for example
+ # avoid space between 'USER' and '-' here: <<snippets/space2.in>>
+ # $myhash{USER-NAME}='steve';
+ elsif ( $type eq 'm' || $type eq '-' ) {
+ $ws = WS_OPTIONAL if ( $last_type eq 'w' );
+ }
+
+ else {
+ # A type $type was entered in %is_special_ws_type but
+ # there is no code block to handle it. Either remove it
+ # from the hash or add a code block to handle it.
+ DEVEL_MODE && Fault("no code to handle type $type\n");
+ }
+ } ## end elsif ( $is_special_ws_type{$type} ...
+
+ #---------------------------------------------------------------
+ # Whitespace Rules Section 3:
+ # Handle space on inside of closing brace pairs.
+ #---------------------------------------------------------------
+
+ # /[\}\)\]R]/
+ elsif ( $is_closing_type{$type} ) {
+
+ my $seqno = $rtokh->[_TYPE_SEQUENCE_];
+ if ( $j == $j_tight_closing_paren ) {
+
+ $j_tight_closing_paren = -1;
+ $ws = WS_NO;
+ }
+ else {
+
+ if ( !defined($ws) ) {
+
+ my $tightness_here;
+ my $block_type = $rblock_type_of_seqno->{$seqno}
+ || $block_type_for_tightness{$seqno};
+
+ if ( $block_type && $token eq '}' ) {
+ $tightness_here = $rOpts_block_brace_tightness;
+ }
+ else { $tightness_here = $tightness{$token} }
+
+ $ws = ( $tightness_here > 1 ) ? WS_NO : WS_YES;
+ }
+ }
+
+ # check for special cases which override the above rules
+ if ( %closing_container_inside_ws && $seqno ) {
+ my $ws_override = $closing_container_inside_ws{$seqno};
+ if ($ws_override) { $ws = $ws_override }
+ }
+
+ $ws_4 = $ws_3 = $ws_2 = $ws
+ if DEBUG_WHITE;
+ } ## end setting space flag inside closing tokens
+
+ #---------------------------------------------------------------
+ # Whitespace Rules Section 4:
+ #---------------------------------------------------------------
+ elsif ( $is_opening_type{$type} ) {
+
+ $last_type_is_opening = 1;
+
+ if ( $token eq '(' ) {
+
+ my $seqno = $rtokh->[_TYPE_SEQUENCE_];
+
+ # This will have to be tweaked as tokenization changes.
+ # We usually want a space at '} (', for example:
+ # <<snippets/space1.in>>
+ # map { 1 * $_; } ( $y, $M, $w, $d, $h, $m, $s );
+ #
+ # But not others:
+ # &{ $_->[1] }( delete $_[$#_]{ $_->[0] } );
+ # At present, the above & block is marked as type L/R so this
+ # case won't go through here.
+ if ( $last_type eq '}' && $last_token ne ')' ) { $ws = WS_YES }
+
+ # NOTE: some older versions of Perl had occasional problems if
+ # spaces are introduced between keywords or functions and
+ # opening parens. So the default is not to do this except is
+ # certain cases. The current Perl seems to tolerate spaces.
+
+ # Space between keyword and '('
+ elsif ( $last_type eq 'k' ) {
+
+ if ( $last_token eq 'sub' ) {
+ $ws = $ws_signature_paren->($j);
+ }
+ else {
+ $ws = WS_NO
+ unless ( $rOpts_space_keyword_paren
+ || $space_after_keyword{$last_token} );
+
+ # Set inside space flag if requested
+ set_container_ws_by_keyword( $last_token, $seqno );
+ }
+ }
+
+ # Space between function and '('
+ # -----------------------------------------------------
+ # 'w' and 'i' checks for something like:
+ # myfun( &myfun( ->myfun(
+ # -----------------------------------------------------
+
+ # Note that at this point an identifier may still have a
+ # leading arrow, but the arrow will be split off during token
+ # respacing. After that, the token may become a bare word
+ # without leading arrow. The point is, it is best to mark
+ # function call parens right here before that happens.
+ # Patch: added 'C' to prevent blinker, case b934, i.e. 'pi()'
+ # NOTE: this would be the place to allow spaces between
+ # repeated parens, like () () (), as in case c017, but I
+ # decided that would not be a good idea.
+
+ # Updated to allow detached '->' from tokenizer (issue c140)
+ elsif (
+
+ # /^[wCUG]$/
+ $is_wCUG{$last_type}
+
+ || (
+
+ # /^[wi]$/
+ $is_wi{$last_type}
+
+ && (
+
+ # with prefix '->' or '&'
+ $last_token =~ /^([\&]|->)/
+
+ # or preceding token '->' (see b1337; c140)
+ || $rtokh_last_last->[_TYPE_] eq '->'
+
+ # or preceding sub call operator token '&'
+ || ( $rtokh_last_last->[_TYPE_] eq 't'
+ && $rtokh_last_last->[_TOKEN_] =~ /^\&\s*$/ )
+ )
+ )
+ )
+ {
+ $ws =
+ $rOpts_space_function_paren
+ ? $self->ws_space_function_paren( $rtokh_last,
+ $rtokh_last_last )
+ : WS_NO;
+
+ # Note that this does not include functions called
+ # with '->(', so that case has to be handled separately
+ set_container_ws_by_keyword( $last_token, $seqno );
+ $ris_function_call_paren->{$seqno} = 1;
+ }
+
+ # space between something like $i and ( in 'snippets/space2.in'
+ # for $i ( 0 .. 20 ) {
+ elsif ( $last_type eq 'i' && $last_token =~ /^[\$\%\@]/ ) {
+ $ws = WS_YES;
+ }
+
+ # allow constant function followed by '()' to retain no space
+ elsif ($last_type eq 'C'
+ && $rLL->[ $j + 1 ]->[_TOKEN_] eq ')' )
+ {
+ $ws = WS_NO;
+ }
+
+ # a paren after a sub definition starts signature
+ elsif ( $last_type eq 'S' ) {
+ $ws = $ws_signature_paren->($j);
+ }
+
+ else {
+ # no special rule for this opening paren type
+ }
+ }
+
+ # patch for SWITCH/CASE: make space at ']{' optional
+ # since the '{' might begin a case or when block
+ elsif ( ( $token eq '{' && $type ne 'L' ) && $last_token eq ']' ) {
+ $ws = WS_OPTIONAL;
+ }
+ else {
+ # opening type not covered by a special rule
+ }
+
+ # keep space between 'sub' and '{' for anonymous sub definition,
+ # be sure type = 'k' (added for c140)
+ if ( $type eq '{' ) {
+ if ( $last_token eq 'sub'
+ && $last_type eq 'k'
+ && $token ne '(' )
+ {
+ $ws = WS_YES;
+ }
+
+ # this is needed to avoid no space in '){'
+ if ( $last_token eq ')' && $token eq '{' ) { $ws = WS_YES }
+
+ # avoid any space before the brace or bracket in something like
+ # @opts{'a','b',...}
+ if ( $last_type eq 'i' && $last_token =~ /^\@/ ) {
+ $ws = WS_NO;
+ }
+ }
+
+ # The --extended-block-tightness option allows certain braces
+ # to be treated as blocks just for setting inner whitespace
+ if ( $rOpts_extended_block_tightness && $token eq '{' ) {
+ my $seqno = $rtokh->[_TYPE_SEQUENCE_];
+ if ( !$rblock_type_of_seqno->{$seqno}
+ && $extended_block_tightness_list{$last_token} )
+ {
+
+ # Ok - make this brace a block type for tightness only
+ $block_type_for_tightness{$seqno} = $last_token;
+ }
+ }
+ } ## end elsif ( $is_opening_type{$type} ) {
+
+ else {
+ # $type not opening, closing, or covered by a special rule
+ }
+
+ # always preserve whatever space was used after a possible
+ # filehandle (except _)
+ if ( $last_type eq 'Z' && $last_token ne '_' ) {
+
+ # no space for '$ {' even if '$' is marked as type 'Z', issue c221
+ # note: redundant check on type 'h' here removed for c419 part 2b
+ if ( $last_type eq 'Z' && $last_token eq '$' && $token eq '{' ) {
+ $ws = WS_NO;
+ }
+ else {
+ $ws = WS_OPTIONAL;
+ }
+ }
+
+ $ws_4 = $ws_3 = $ws
+ if DEBUG_WHITE;
+
+ if ( !defined($ws) ) {
+
+ #---------------------------------------------------------------
+ # Whitespace Rules Section 4:
+ # Use the binary rule table.
+ #---------------------------------------------------------------
+ if ( defined( $binary_ws_rules{$last_type}{$type} ) ) {
+ $ws = $binary_ws_rules{$last_type}{$type};
+ $ws_4 = $ws if DEBUG_WHITE;
+ }
+
+ #---------------------------------------------------------------
+ # Whitespace Rules Section 5:
+ # Apply default rules not covered above.
+ #---------------------------------------------------------------
+
+ # If we fall through to here, look at the pre-defined hash tables
+ # for the two tokens, and:
+ # if (they are equal) use the common value
+ # if (either is zero or undef) use the other
+ # if (either is -1) use it
+ # That is,
+ # left vs right
+ # 1 vs 1 --> 1
+ # 0 vs 0 --> 0
+ # -1 vs -1 --> -1
+ #
+ # 0 vs -1 --> -1
+ # 0 vs 1 --> 1
+ # 1 vs 0 --> 1
+ # -1 vs 0 --> -1
+ #
+ # -1 vs 1 --> -1
+ # 1 vs -1 --> -1
+ else {
+ my $wl = $want_left_space{$type};
+ my $wr = $want_right_space{$last_type};
+ if ( !defined($wl) ) {
+ $ws = defined($wr) ? $wr : 0;
+ }
+ elsif ( !defined($wr) ) {
+ $ws = $wl;
+ }
+ else {
+ $ws =
+ ( ( $wl == $wr ) || ( $wl == -1 ) || !$wr ) ? $wl : $wr;
+ }
+ }
+ }
+
+ # Treat newline as a whitespace. Otherwise, we might combine
+ # 'Send' and '-recipients' here according to the above rules:
+ # <<snippets/space3.in>>
+ # my $msg = new Fax::Send
+ # -recipients => $to,
+ # -data => $data;
+ if ( !$ws
+ && $rtokh->[_LINE_INDEX_] != $rtokh_last->[_LINE_INDEX_] )
+ {
+ $ws = WS_YES;
+ }
+
+ # -qwaf phantom commas require space before type 'Q'
+ # See similar patch in sub is_essential_whitespace
+ if ( !$last_token
+ && $last_type eq ','
+ && $type eq 'Q'
+ && $rOpts_qw_as_function )
+ {
+ $ws = 1;
+ }
+
+ $rwhitespace_flags->[$j] = $ws;
+
+ # remember non-blank, non-comment tokens
+ $last_token = $token;
+ $last_type = $type;
+ $rtokh_last_last = $rtokh_last;
+ $rtokh_last = $rtokh;
+
+ # Programming note: for some reason, it is very much faster to 'next'
+ # out of this loop here than to put the DEBUG coding in a block.
+ # But note that the debug code must then update its own copies
+ # of $last_token and $last_type.
+ next if ( !DEBUG_WHITE );
+
+ my $str = substr( $last_token_dbg, 0, 15 );
+ $str .= SPACE x ( 16 - length($str) );
+ if ( !defined($ws_1) ) { $ws_1 = "*" }
+ if ( !defined($ws_2) ) { $ws_2 = "*" }
+ if ( !defined($ws_3) ) { $ws_3 = "*" }
+ if ( !defined($ws_4) ) { $ws_4 = "*" }
+ print {*STDOUT}
+"NEW WHITE: i=$j $str $last_type_dbg $type $ws_1 : $ws_2 : $ws_3 : $ws_4 : $ws \n";
+
+ # reset for next pass
+ $ws_1 = $ws_2 = $ws_3 = $ws_4 = undef;
+
+ $last_token_dbg = $token;
+ $last_type_dbg = $type;
+
+ } ## end main loop
+
+ if ( $rOpts->{'tight-secret-operators'} ) {
+ $self->new_secret_operator_whitespace($rwhitespace_flags);
+ }
+ $self->[_ris_function_call_paren_] = $ris_function_call_paren;
+ return $rwhitespace_flags;
+
+} ## end sub set_whitespace_flags
+
+sub set_container_ws_by_keyword {
+
+ my ( $word, $sequence_number ) = @_;
+ return unless (%keyword_paren_inner_tightness);
+
+ # We just saw a keyword (or other function name) followed by an opening
+ # paren. Now check to see if the following paren should have special
+ # treatment for its inside space. If so we set a hash value using the
+ # sequence number as key.
+ if ( $word && $sequence_number ) {
+ my $tightness_here = $keyword_paren_inner_tightness{$word};
+ if ( defined($tightness_here) && $tightness_here != 1 ) {
+ my $ws_flag = $tightness_here == 0 ? WS_YES : WS_NO;
+ $opening_container_inside_ws{$sequence_number} = $ws_flag;
+ $closing_container_inside_ws{$sequence_number} = $ws_flag;
+ }
+ }
+ else {
+ DEVEL_MODE
+ && Fault("unexpected token='$word' and seqno='$sequence_number'\n");
+ }
+ return;
+} ## end sub set_container_ws_by_keyword
+
+sub ws_in_container {
+
+ my ( $j, $j_closing, $rLL, $type, $token, $last_token ) = @_;
+
+ # Given:
+ # $j = index of token following an opening container token
+ # $type, $token = the type and token at index $j
+ # $j_closing = closing token of the container
+ # $last_token = the opening token of the container
+ # Return:
+ # WS_NO if there is just one token in the container (with exceptions)
+ # WS_YES otherwise
+
+ # quick check
+ if ( $j + 1 >= $j_closing ) { return WS_NO }
+
+ # special cases...
+
+ # Count '-foo' as single token so that each of
+ # $a{-foo} and $a{foo} and $a{'foo'}
+ # do not get spaces with default formatting.
+ my $j_here = $j;
+ ++$j_here
+ if ( $token eq '-'
+ && $last_token eq '{'
+ && $rLL->[ $j + 1 ]->[_TYPE_] eq 'w' );
+
+ # Count a sign separated from a number as a single token, as in the
+ # following line. Otherwise, it takes two steps to converge:
+ # deg2rad(- 0.5)
+ if ( ( $type eq 'm' || $type eq 'p' )
+ && $j < $j_closing + 1
+ && $rLL->[ $j + 1 ]->[_TYPE_] eq 'b'
+ && $rLL->[ $j + 2 ]->[_TYPE_] eq 'n'
+ && $rLL->[ $j + 2 ]->[_TOKEN_] =~ /^\d/ )
+ {
+ $j_here = $j + 2;
+ }
+
+ # recheck..
+ if ( $j_here + 1 >= $j_closing ) { return WS_NO }
+
+ # check for a blank after the first token
+ my $j_next =
+ ( $rLL->[ $j_here + 1 ]->[_TYPE_] eq 'b' )
+ ? $j_here + 2
+ : $j_here + 1;
+
+ return $j_next == $j_closing ? WS_NO : WS_YES;
+
+} ## end sub ws_in_container
+
+sub ws_space_function_paren {
+
+ my ( $self, $rtokh_last, $rtokh_last_last ) = @_;
+
+ # Called if --space-function-paren is set to see if it might cause
+ # a problem. The manual warns the user about potential problems with
+ # this flag. Here we just try to catch one common problem.
+
+ # Given:
+ # $j = index of '(' after function name
+ # Return:
+ # WS_NO if no space
+ # WS_YES otherwise
+
+ # This was added to fix for issue c166. Ignore -sfp at a possible indirect
+ # object location. For example, do not convert this:
+ # print header() ...
+ # to this:
+ # print header () ...
+ # because in this latter form, header may be taken to be a file handle
+ # instead of a function call.
+
+ # Start with the normal value for -sfp:
+ my $ws = WS_YES;
+
+ # now check to be sure we don't cause a problem:
+ my $type_ll = $rtokh_last_last->[_TYPE_];
+ my $tok_ll = $rtokh_last_last->[_TOKEN_];
+
+ # NOTE: this is just a minimal check. For example, we might also check
+ # for something like this:
+ # print ( header ( ..
+ if ( $type_ll eq 'k' && $is_indirect_object_taker{$tok_ll} ) {
+ $ws = WS_NO;
+ }
+
+ # do not let -sfp add space for qw's converted to functions by -qwaf
+ if ( $rOpts_qw_as_function
+ && $rtokh_last->[_TYPE_] eq 'U'
+ && $rtokh_last->[_TOKEN_] eq 'qw' )
+ {
+ $ws = WS_NO;
+ }
+
+ return $ws;
+
+} ## end sub ws_space_function_paren
+
+} ## end closure set_whitespace_flags
+
+sub dump_want_left_space {
+ my $fh = shift;
+ local $LIST_SEPARATOR = "\n";
+ $fh->print(<<EOM);
+These values are the main control of whitespace to the left of a token type;
+They may be altered with the -wls parameter.
+For a list of token types, use perltidy --dump-token-types (-dtt)
+ 1 means the token wants a space to its left
+-1 means the token does not want a space to its left
+------------------------------------------------------------------------
+EOM
+ foreach my $key ( sort keys %want_left_space ) {
+ $fh->print("$key\t$want_left_space{$key}\n");
+ }
+ return;
+} ## end sub dump_want_left_space
+
+sub dump_want_right_space {
+ my $fh = shift;
+ local $LIST_SEPARATOR = "\n";
+ $fh->print(<<EOM);
+These values are the main control of whitespace to the right of a token type;
+They may be altered with the -wrs parameter.
+For a list of token types, use perltidy --dump-token-types (-dtt)
+ 1 means the token wants a space to its right
+-1 means the token does not want a space to its right
+------------------------------------------------------------------------
+EOM
+ foreach my $key ( sort keys %want_right_space ) {
+ $fh->print("$key\t$want_right_space{$key}\n");
+ }
+ return;
+} ## end sub dump_want_right_space
+
+{ ## begin closure is_essential_whitespace
+
+ my %is_sort_grep_map;
+ my %is_digraph;
+ my %is_trigraph;
+ my %essential_whitespace_filter_l1;
+ my %essential_whitespace_filter_r1;
+ my %essential_whitespace_filter_l2;
+ my %essential_whitespace_filter_r2;
+ my %is_type_with_space_before_bareword;
+ my %is_special_variable_char;
+ my %is_digit_char;
+
+ BEGIN {
+
+ my @q;
+
+ # NOTE: This hash is like the global %is_sort_map_grep, but it ignores
+ # grep aliases on purpose, since here we are looking parens, not braces
+ @q = qw( sort grep map );
+ @is_sort_grep_map{@q} = (1) x scalar(@q);
+
+ @q = qw{
+ .. :: << >> ** && || // -> => += -=
+ .= %= &= |= ^= *= <> <= >= == =~ !~
+ != ++ -- /= x= ~~ ~. |. &. ^. ^^
+ };
+ @is_digraph{@q} = (1) x scalar(@q);
+
+ @q = qw( ... **= <<= >>= &&= ||= //= <=> !~~ &.= |.= ^.= <<~ );
+ @is_trigraph{@q} = (1) x scalar(@q);
+
+ # These are used as a speedup filters for sub is_essential_whitespace.
+
+ # Filter 1:
+ # These left side token types USUALLY do not require a space:
+ @q = qw( ; { } [ ] L R );
+ push @q, ',';
+ push @q, ')';
+ push @q, '(';
+ @essential_whitespace_filter_l1{@q} = (1) x scalar(@q);
+
+ # BUT some might if followed by these right token types
+ @q = qw( pp mm << <<= h );
+ @essential_whitespace_filter_r1{@q} = (1) x scalar(@q);
+
+ # Filter 2:
+ # These right side filters usually do not require a space
+ @q = qw( ; ] R } );
+ push @q, ',';
+ push @q, ')';
+ @essential_whitespace_filter_r2{@q} = (1) x scalar(@q);
+
+ # BUT some might if followed by these left token types
+ @q = qw( h Z );
+ @essential_whitespace_filter_l2{@q} = (1) x scalar(@q);
+
+ # Keep a space between certain types and any bareword:
+ # Q: keep a space between a quote and a bareword to prevent the
+ # bareword from becoming a quote modifier.
+ # &: do not remove space between an '&' and a bare word because
+ # it may turn into a function evaluation, like here
+ # between '&' and 'O_ACCMODE', producing a syntax error [File.pm]
+ # $opts{rdonly} = (($opts{mode} & O_ACCMODE) == O_RDONLY);
+ @q = qw( Q & );
+ @is_type_with_space_before_bareword{@q} = (1) x scalar(@q);
+
+ # These are the only characters which can (currently) form special
+ # variables, like $^W: (issue c066, c068).
+ @q =
+ qw{ ? A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ };
+ @is_special_variable_char{@q} = (1) x scalar(@q);
+
+ @q = qw( 0 1 2 3 4 5 6 7 8 9 );
+ @is_digit_char{@q} = (1) x scalar(@q);
+
+ } ## end BEGIN
+
+ sub is_essential_whitespace {
+
+ my ( $tokenll, $typell, $tokenl, $typel, $tokenr, $typer ) = @_;
+
+ # Essential whitespace means whitespace which cannot be safely deleted
+ # without risking the introduction of a syntax error.
+
+ # Given: three tokens and their types:
+ # ($tokenll, $typell) = previous nonblank token to the left of $tokenl
+ # ($tokenl, $typel) = the token to the left of the space in question
+ # ($tokenr, $typer) = the token to the right of the space in question
+
+ # Return:
+ # true if whitespace is needed
+ # false if whitespace may be deleted
+ #
+ # Note1: This routine should almost never need to be changed. It is
+ # for avoiding syntax problems rather than for formatting.
+
+ # Note2: The -mangle option causes large numbers of calls to this
+ # routine and therefore is a good test. So if a change is made, be sure
+ # to use nytprof to profile with both old and revised coding using the
+ # -mangle option and check differences.
+
+ # This is potentially a very slow routine but the following quick
+ # filters typically catch and handle over 90% of the calls.
+
+ # -qwaf phantom commas require space before type 'Q'
+ # See similar patch in sub set_whitespace_flags
+ if ( !$tokenl
+ && $typel eq ','
+ && $typer eq 'Q'
+ && $rOpts_qw_as_function )
+ {
+ return 1;
+ }
+
+ # Filter 1: usually no space required after common types ; , [ ] { } ( )
+ return
+ if ( $essential_whitespace_filter_l1{$typel}
+ && !$essential_whitespace_filter_r1{$typer} );
+
+ # Filter 2: usually no space before common types ; ,
+ return
+ if ( $essential_whitespace_filter_r2{$typer}
+ && !$essential_whitespace_filter_l2{$typel} );
+
+ # Filter 3: Handle side comments: a space is only essential if the left
+ # token ends in '$' For example, we do not want to create $#foo below:
+
+ # sub t086
+ # ( #foo)))
+ # $ #foo)))
+ # a #foo)))
+ # ) #foo)))
+ # { ... }
+
+ # Also, I prefer not to put a ? and # together because ? used to be
+ # a pattern delimiter and spacing was used if guessing was needed.
+
+ if ( $typer eq '#' ) {
+
+ return 1
+ if ( $tokenl
+ && ( $typel eq '?' || substr( $tokenl, -1 ) eq '$' ) );
+ return;
+ }
+
+ my $tokenr_leading_ch = substr( $tokenr, 0, 1 );
+ my $tokenr_leading_ch2 = substr( $tokenr, 0, 2 );
+ my $tokenr_is_open_paren = $tokenr eq '(';
+ my $token_joined = $tokenl . $tokenr;
+ my $tokenl_is_dash = $tokenl eq '-';
+ my $tokenr_is_bareword = ord($tokenr_leading_ch) > ORD_PRINTABLE_MAX
+
+ # always correct but slow
+ ? $tokenr =~ /^[^\d\W]/
+
+ # fast but ascii only
+ : ( $tokenr_leading_ch =~ tr/a-zA-Z_/a-zA-Z_/ );
+
+ #-------------------
+ # Must do full check
+ #-------------------
+
+ # This long logical expression gives the result
+ my $result =
+
+ # never combine two bare words or numbers
+ # examples: and ::ok(1)
+ # return ::spw(...)
+ # for bla::bla:: abc
+ # example is "%overload:: and" in files Dumpvalue.pm or colonbug.pl
+ # $input eq"quit" to make $inputeq"quit"
+ # my $size=-s::SINK if $file; <==OK but we won't do it
+ # don't join something like: for bla::bla:: abc
+ # example is "%overload:: and" in files Dumpvalue.pm or colonbug.pl
+ (
+ (
+ ## ( $tokenr =~ /^([\'\w]|\:\:)/ )
+ $tokenr_is_bareword
+ || $is_digit_char{$tokenr_leading_ch}
+ || $tokenr_leading_ch eq "'"
+ || $tokenr_leading_ch2 eq '::'
+ )
+
+ && ( $tokenl =~ /([\'\w]|\:\:)$/ && $typel ne 'CORE::' )
+ )
+
+ # do not combine a number with a concatenation dot
+ # example: pom.caputo:
+ # $vt100_compatible ? "\e[0;0H" : ('-' x 78 . "\n");
+ || $typel eq 'n' && $tokenr eq '.'
+ || $typer eq 'n' && $tokenl eq '.'
+
+ # cases of a space before a bareword...
+ || (
+ $tokenr_is_bareword && (
+
+ # do not join a minus with a bare word, because you might form
+ # a file test operator. Example from Complex.pm:
+ # if (CORE::abs($z - i) < $eps);
+ # "z-i" would be taken as a file test.
+ $tokenl_is_dash && length($tokenr) == 1
+
+ # and something like this could become ambiguous without space
+ # after the '-':
+ # use constant III=>1;
+ # $a = $b - III;
+ # and even this:
+ # $a = - III;
+ || $tokenl_is_dash && $typer =~ /^[wC]$/
+
+ # keep space between types Q & and a bareword
+ || $is_type_with_space_before_bareword{$typel}
+
+ # +-: binary plus and minus before a bareword could get
+ # converted into unary plus and minus on next pass through the
+ # tokenizer. This can lead to blinkers: cases b660 b670 b780
+ # b781 b787 b788 b790 So we keep a space unless the +/- clearly
+ # follows an operator
+ || ( ( $typel eq '+' || $typel eq '-' )
+ && $typell !~ /^[niC\)\}\]R]$/ )
+
+ # keep a space between a token ending in '$' and any word;
+ # this caused trouble: "die @$ if $@"
+ || $typel eq 'i' && substr( $tokenl, -1, 1 ) eq '$'
+
+ # don't combine $$ or $# with any alphanumeric
+ # (testfile mangle.t with --mangle)
+ || $tokenl eq '$$'
+ || $tokenl eq '$#'
+
+ )
+ ) ## end $tokenr_is_bareword
+
+ # OLD, not used
+ # '= -' should not become =- or you will get a warning
+ # about reversed -=
+ # || ($tokenr eq '-')
+
+ # do not join a bare word with a minus, like between 'Send' and
+ # '-recipients' here <<snippets/space3.in>>
+ # my $msg = new Fax::Send
+ # -recipients => $to,
+ # -data => $data;
+ # This is the safest thing to do. If we had the token to the right of
+ # the minus we could do a better check.
+ #
+ # And do not combine a bareword and a quote, like this:
+ # oops "Your login, $Bad_Login, is not valid";
+ # It can cause a syntax error if oops is a sub
+ || $typel eq 'w' && ( $tokenr eq '-' || $typer eq 'Q' )
+
+ # perl is very fussy about spaces before <<; c419 part 1
+ || $tokenr_leading_ch2 eq '<<' && $typel ne '{' && $typel ne ','
+
+ # avoid combining tokens to create new meanings. Example:
+ # $a+ +$b must not become $a++$b
+ || ( $is_digraph{$token_joined} )
+ || $is_trigraph{$token_joined}
+
+ # another example: do not combine these two &'s:
+ # allow_options & &OPT_EXECCGI
+ || $is_digraph{ $tokenl . $tokenr_leading_ch }
+
+ # retain any space after possible filehandle
+ # (testfiles prnterr1.t with --extrude and mangle.t with --mangle)
+ # but no space for '$ {' even if '$' is marked as type 'Z', issue c221
+ || ( $typel eq 'Z' && !( $tokenl eq '$' && $tokenr eq '{' ) )
+
+ # Added 'Y' here 16 Jan 2021 to prevent -mangle option from removing
+ # space after type Y. Otherwise, it will get parsed as type 'Z' later
+ # and any space would have to be added back manually if desired.
+ || $typel eq 'Y'
+
+ # Perl is sensitive to whitespace after the + here:
+ # $b = xvals $a + 0.1 * yvals $a;
+ || $typell eq 'Z' && $typel =~ /^[\/\?\+\-\*]$/
+
+ || (
+ $tokenr_is_open_paren && (
+
+ # keep paren separate in 'use Foo::Bar ()'
+ ( $typel eq 'w' && $typell eq 'k' && $tokenll eq 'use' )
+
+ # OLD: keep any space between filehandle and paren:
+ # file mangle.t with --mangle:
+ # NEW: this test is no longer necessary here (moved above)
+ ## || $typel eq 'Y'
+
+ # must have space between grep and left paren; "grep(" will fail
+ || $is_sort_grep_map{$tokenl}
+
+ # don't stick numbers next to left parens, as in:
+ #use Mail::Internet 1.28 (); (see Entity.pm, Head.pm, Test.pm)
+ || $typel eq 'n'
+ )
+ ) ## end $tokenr_is_open_paren
+
+ # retain any space after here doc operator ( see hereerr.t)
+ # c419, part 2a: unless followed by '}' or ','. See also part 2b.
+ # or ; (git174)
+ || $typel eq 'h' && $typer ne '}' && $typer ne ',' && $typer ne ';'
+
+ # Be careful with a space around ++ and --, to avoid ambiguity as to
+ # which token it applies
+ || ( $typer eq 'pp' || $typer eq 'mm' ) && $tokenl !~ /^[\;\{\(\[]/
+ || ( $typel eq '++' || $typel eq '--' )
+ && $tokenr !~ /^[\;\}\)\]]/
+
+ # Need space after 'for my' or 'foreach my';
+ # for example, this will fail in older versions of Perl:
+ # foreach my$ft(@filetypes)...
+ || ( $tokenl eq 'my'
+ && $tokenr_leading_ch eq '$'
+ && $is_for_foreach{$tokenll} )
+
+ # Keep space after $^ if needed to avoid forming a different
+ # special variable (issue c068). For example:
+ # my $aa = $^ ? "none" : "ok";
+ # The problem is that '$^?' is a valid special variable
+ || ( $typel eq 'i'
+ && length($tokenl) == 2
+ && substr( $tokenl, 1, 1 ) eq '^'
+ && $is_special_variable_char{$tokenr_leading_ch} )
+
+ # We must be sure that a space between a ? and a quoted string
+ # remains if the space before the ? remains. [Loca.pm, lockarea]
+ # ie,
+ # $b=join $comma ? ',' : ':', @_; # ok
+ # $b=join $comma?',' : ':', @_; # ok!
+ # $b=join $comma ?',' : ':', @_; # error!
+ # Not really required:
+ ## || ( ( $typel eq '?' ) && ( $typer eq 'Q' ) )
+
+ # Space stacked labels...
+ # Not really required: Perl seems to accept non-spaced labels.
+ ## || $typel eq 'J' && $typer eq 'J'
+
+ ; # the value of this long logic sequence is the result we want
+ return $result;
+ } ## end sub is_essential_whitespace
+} ## end closure is_essential_whitespace
+
+{ ## begin closure new_secret_operator_whitespace
+
+ my %secret_operators;
+ my %is_leading_secret_token;
+
+ BEGIN {
+
+ # token lists for perl secret operators as compiled by Philippe Bruhat
+ # at: https://metacpan.org/module/perlsecret
+ %secret_operators = (
+ 'Goatse' => [qw#= ( ) =#], #=( )=
+ 'Venus1' => [qw#0 +#], # 0+
+ 'Venus2' => [qw#+ 0#], # +0
+ 'Enterprise' => [qw#) x ! !#], # ()x!!
+ 'Kite1' => [qw#~ ~ <>#], # ~~<>
+ 'Kite2' => [qw#~~ <>#], # ~~<>
+ 'Winking Fat Comma' => [ ( ',', '=>' ) ], # ,=>
+ 'Bang bang' => [qw#! !#], # !!
+ );
+
+ # The following operators and constants are not included because they
+ # are normally kept tight by perltidy:
+ # ~~ <~>
+ #
+
+ # Make a lookup table indexed by the first token of each operator:
+ # first token => [list, list, ...]
+ foreach my $value ( values(%secret_operators) ) {
+ my $tok = $value->[0];
+ push @{ $is_leading_secret_token{$tok} }, $value;
+ }
+ } ## end BEGIN
+
+ sub new_secret_operator_whitespace {
+
+ my ( $self, $rwhitespace_flags ) = @_;
+
+ # Implement --tight-secret-operators
+ # Given:
+ # $rwhitespace_flags = whitespase flags, to be updated
+
+ # Loop over all tokens in this line
+ my $rLL = $self->[_rLL_];
+ my $jmax = @{$rLL} - 1;
+ foreach my $j ( 0 .. $jmax ) {
+
+ # Skip unless this token might start a secret operator
+ my $type = $rLL->[$j]->[_TYPE_];
+ next if ( $type eq 'b' );
+
+ my $token = $rLL->[$j]->[_TOKEN_];
+ next unless ( $is_leading_secret_token{$token} );
+
+ # Loop over all secret operators with this leading token
+ foreach my $rpattern ( @{ $is_leading_secret_token{$token} } ) {
+ my $jend = $j - 1;
+ foreach my $tok ( @{$rpattern} ) {
+ $jend++;
+
+ $jend++
+ if ( $jend <= $jmax
+ && $rLL->[$jend]->[_TYPE_] eq 'b' );
+
+ if ( $jend > $jmax
+ || $tok ne $rLL->[$jend]->[_TOKEN_] )
+ {
+ $jend = undef;
+ last;
+ }
+ }
+
+ if ($jend) {
+
+ # set flags to prevent spaces within this operator
+ foreach my $jj ( $j + 1 .. $jend ) {
+ $rwhitespace_flags->[$jj] = WS_NO;
+ }
+ $j = $jend;
+ last;
+ }
+ } ## End Loop over all operators
+ } ## End loop over all tokens
+ return;
+ } ## end sub new_secret_operator_whitespace
+} ## end closure new_secret_operator_whitespace
+
+{ ## begin closure set_bond_strengths
+
+ # These routines and variables are involved in deciding where to break very
+ # long lines.
+
+ # NEW_TOKENS must add bond strength rules
+
+ my %is_good_keyword_breakpoint;
+ my %is_container_token;
+
+ my %binary_bond_strength_nospace;
+ my %binary_bond_strength;
+ my %nobreak_lhs;
+ my %nobreak_rhs;
+
+ my @bias_tokens;
+ my %bias_hash;
+ my %bias;
+ my $delta_bias;
+
+ sub initialize_bond_strength_hashes {
+
+ my @q;
+ @q = qw( if unless while until for foreach );
+ @is_good_keyword_breakpoint{@q} = (1) x scalar(@q);
+
+ @q = qw/ ( [ { } ] ) /;
+ @is_container_token{@q} = (1) x scalar(@q);
+
+ # The decision about where to break a line depends upon a "bond
+ # strength" between tokens. The LOWER the bond strength, the MORE
+ # likely a break. A bond strength may be any value but to simplify
+ # things there are several pre-defined strength levels:
+
+ # NO_BREAK => 10000;
+ # VERY_STRONG => 100;
+ # STRONG => 2.1;
+ # NOMINAL => 1.1;
+ # WEAK => 0.8;
+ # VERY_WEAK => 0.55;
+
+ # The strength values are based on trial-and-error, and need to be
+ # tweaked occasionally to get desired results. Some comments:
+ #
+ # 1. Only relative strengths are important. small differences
+ # in strengths can make big formatting differences.
+ # 2. Each indentation level adds one unit of bond strength.
+ # 3. A value of NO_BREAK makes an unbreakable bond
+ # 4. A value of VERY_WEAK is the strength of a ','
+ # 5. Values below NOMINAL are considered ok break points.
+ # 6. Values above NOMINAL are considered poor break points.
+ #
+ # The bond strengths should roughly follow precedence order where
+ # possible. If you make changes, please check the results very
+ # carefully on a variety of scripts. Testing with the -extrude
+ # options is particularly helpful in exercising all of the rules.
+
+ # Wherever possible, bond strengths are defined in the following
+ # tables. There are two main stages to setting bond strengths and
+ # two types of tables:
+ #
+ # The first stage involves looking at each token individually and
+ # defining left and right bond strengths, according to if we want
+ # to break to the left or right side, and how good a break point it
+ # is. For example tokens like =, ||, && make good break points and
+ # will have low strengths, but one might want to break on either
+ # side to put them at the end of one line or beginning of the next.
+ #
+ # The second stage involves looking at certain pairs of tokens and
+ # defining a bond strength for that particular pair. This second
+ # stage has priority.
+
+ #---------------------------------------------------------------
+ # Bond Strength BEGIN Section 1.
+ # Set left and right bond strengths of individual tokens.
+ #---------------------------------------------------------------
+
+ # NOTE: NO_BREAK's set in this section first are HINTS which will
+ # probably not be honored. Essential NO_BREAKS's should be set in
+ # BEGIN Section 2 or hardwired in the NO_BREAK coding near the end
+ # of this subroutine.
+
+ # Note that we are setting defaults in this section. The user
+ # cannot change bond strengths but can cause the left and right
+ # bond strengths of any token type to be swapped through the use of
+ # the -wba and -wbb flags. In this way the user can determine if a
+ # breakpoint token should appear at the end of one line or the
+ # beginning of the next line.
+
+ %right_bond_strength = ();
+ %left_bond_strength = ();
+ %binary_bond_strength_nospace = ();
+ %binary_bond_strength = ();
+ %nobreak_lhs = ();
+ %nobreak_rhs = ();
+
+ # The hash keys in this section are token types, plus the text of
+ # certain keywords like 'or', 'and'.
+
+ # no break around possible filehandle
+ $left_bond_strength{'Z'} = NO_BREAK;
+ $right_bond_strength{'Z'} = NO_BREAK;
+
+ # never put a bare word on a new line:
+ # example print (STDERR, "bla"); will fail with break after (
+ $left_bond_strength{'w'} = NO_BREAK;
+
+ # blanks always have infinite strength to force breaks after
+ # real tokens
+ $right_bond_strength{'b'} = NO_BREAK;
+
+ # try not to break on exponentiation
+ @q = qw# ** .. ... <=> #;
+ @left_bond_strength{@q} = (STRONG) x scalar(@q);
+ @right_bond_strength{@q} = (STRONG) x scalar(@q);
+
+ # The comma-arrow has very low precedence but not a good break point
+ $left_bond_strength{'=>'} = NO_BREAK;
+ $right_bond_strength{'=>'} = NOMINAL;
+
+ # ok to break after label
+ $left_bond_strength{'J'} = NO_BREAK;
+ $right_bond_strength{'J'} = NOMINAL;
+ $left_bond_strength{'j'} = STRONG;
+ $right_bond_strength{'j'} = STRONG;
+ $left_bond_strength{'A'} = STRONG;
+ $right_bond_strength{'A'} = STRONG;
+
+ $left_bond_strength{'->'} = STRONG;
+ $right_bond_strength{'->'} = VERY_STRONG;
+
+ $left_bond_strength{'CORE::'} = NOMINAL;
+ $right_bond_strength{'CORE::'} = NO_BREAK;
+
+ # Fix for c250: added strengths for new type 'P'
+ # Note: these are working okay, but may eventually need to be
+ # adjusted or even removed.
+ $left_bond_strength{'P'} = NOMINAL;
+ $right_bond_strength{'P'} = NOMINAL;
+
+ # breaking AFTER modulus operator is ok:
+ @q = qw< % >;
+ @left_bond_strength{@q} = (STRONG) x scalar(@q);
+ @right_bond_strength{@q} =
+ ( 0.1 * NOMINAL + 0.9 * STRONG ) x scalar(@q);
+
+ # Break AFTER math operators * and /
+ @q = qw< * / x >;
+ @left_bond_strength{@q} = (STRONG) x scalar(@q);
+ @right_bond_strength{@q} = (NOMINAL) x scalar(@q);
+
+ # Break AFTER weakest math operators + and -
+ # Make them weaker than * but a bit stronger than '.'
+ @q = qw< + - >;
+ @left_bond_strength{@q} = (STRONG) x scalar(@q);
+ @right_bond_strength{@q} =
+ ( 0.91 * NOMINAL + 0.09 * WEAK ) x scalar(@q);
+
+ # Define left strength of unary plus and minus (fixes case b511)
+ $left_bond_strength{p} = $left_bond_strength{'+'};
+ $left_bond_strength{m} = $left_bond_strength{'-'};
+
+ # And make right strength of unary plus and minus very high.
+ # Fixes cases b670 b790
+ $right_bond_strength{p} = NO_BREAK;
+ $right_bond_strength{m} = NO_BREAK;
+
+ # breaking BEFORE these is just ok:
+ @q = qw# >> << #;
+ @right_bond_strength{@q} = (STRONG) x scalar(@q);
+ @left_bond_strength{@q} = (NOMINAL) x scalar(@q);
+
+ # breaking before the string concatenation operator seems best
+ # because it can be hard to see at the end of a line
+ $right_bond_strength{'.'} = STRONG;
+ $left_bond_strength{'.'} = 0.9 * NOMINAL + 0.1 * WEAK;
+
+ @q = qw< } ] ) R >;
+ @left_bond_strength{@q} = (STRONG) x scalar(@q);
+ @right_bond_strength{@q} = (NOMINAL) x scalar(@q);
+
+ # make these a little weaker than nominal so that they get
+ # favored for end-of-line characters
+ @q = qw< != == =~ !~ ~~ !~~ >;
+ @left_bond_strength{@q} = (STRONG) x scalar(@q);
+ @right_bond_strength{@q} =
+ ( 0.9 * NOMINAL + 0.1 * WEAK ) x scalar(@q);
+
+ # break AFTER these
+ @q = qw# < > | & >= <= #;
+ @left_bond_strength{@q} = (VERY_STRONG) x scalar(@q);
+ @right_bond_strength{@q} =
+ ( 0.8 * NOMINAL + 0.2 * WEAK ) x scalar(@q);
+
+ # breaking either before or after a quote is ok
+ # but bias for breaking before a quote
+ $left_bond_strength{'Q'} = NOMINAL;
+ $right_bond_strength{'Q'} = NOMINAL + 0.02;
+ $left_bond_strength{'q'} = NOMINAL;
+ $right_bond_strength{'q'} = NOMINAL;
+
+ # starting a line with a keyword is usually ok
+ $left_bond_strength{'k'} = NOMINAL;
+
+ # we usually want to bond a keyword strongly to what immediately
+ # follows, rather than leaving it stranded at the end of a line
+ $right_bond_strength{'k'} = STRONG;
+
+ $left_bond_strength{'G'} = NOMINAL;
+ $right_bond_strength{'G'} = STRONG;
+
+ # assignment operators
+ @q = qw( = **= += *= &= <<= &&= -= /= |= >>= ||= //= .= %= ^= x= );
+
+ # Default is to break AFTER various assignment operators
+ @left_bond_strength{@q} = (STRONG) x scalar(@q);
+ @right_bond_strength{@q} =
+ ( 0.4 * WEAK + 0.6 * VERY_WEAK ) x scalar(@q);
+
+ # Default is to break BEFORE '&&' and '||' and '//'
+ # set strength of '||' to same as '=' so that chains like
+ # $a = $b || $c || $d will break before the first '||'
+ $right_bond_strength{'||'} = NOMINAL;
+ $left_bond_strength{'||'} = $right_bond_strength{'='};
+
+ # same thing for '//'
+ $right_bond_strength{'//'} = NOMINAL;
+ $left_bond_strength{'//'} = $right_bond_strength{'='};
+
+ # set strength of && a little higher than ||
+ $right_bond_strength{'&&'} = NOMINAL;
+ $left_bond_strength{'&&'} = $left_bond_strength{'||'} + 0.1;
+
+ # set strength of ^^ between && and ||. See git157.
+ # "1 || 0 ^^ 0 || 1" = true, so ^^ is stronger than ||
+ # "1 ^^ 1 && 0" = true, so && is stronger than ^^
+ $right_bond_strength{'^^'} = NOMINAL;
+ $left_bond_strength{'^^'} = $left_bond_strength{'||'} + 0.05;
+
+ $left_bond_strength{';'} = VERY_STRONG;
+ $right_bond_strength{';'} = VERY_WEAK;
+ $left_bond_strength{'f'} = VERY_STRONG;
+
+ # make right strength of for ';' a little less than '='
+ # to make for contents break after the ';' to avoid this:
+ # for ( $j = $number_of_fields - 1 ; $j < $item_count ; $j +=
+ # $number_of_fields )
+ # and make it weaker than ',' and 'and' too
+ $right_bond_strength{'f'} = VERY_WEAK - 0.03;
+
+ # The strengths of ?/: should be somewhere between
+ # an '=' and a quote (NOMINAL),
+ # make strength of ':' slightly less than '?' to help
+ # break long chains of ? : after the colons
+ $left_bond_strength{':'} = 0.4 * WEAK + 0.6 * NOMINAL;
+ $right_bond_strength{':'} = NO_BREAK;
+ $left_bond_strength{'?'} = $left_bond_strength{':'} + 0.01;
+ $right_bond_strength{'?'} = NO_BREAK;
+
+ $left_bond_strength{','} = VERY_STRONG;
+ $right_bond_strength{','} = VERY_WEAK;
+
+ # remaining digraphs and trigraphs not defined above
+ @q = qw( :: <> ++ -- );
+ @left_bond_strength{@q} = (WEAK) x scalar(@q);
+ @right_bond_strength{@q} = (STRONG) x scalar(@q);
+
+ # Set bond strengths of certain keywords
+ # make 'or', 'err', 'and' slightly weaker than a ','
+ $left_bond_strength{'and'} = VERY_WEAK - 0.01;
+ $left_bond_strength{'or'} = VERY_WEAK - 0.02;
+ $left_bond_strength{'err'} = VERY_WEAK - 0.02;
+ $left_bond_strength{'xor'} = VERY_WEAK - 0.01;
+
+ @q = qw( ne eq );
+ @left_bond_strength{@q} = (NOMINAL) x scalar(@q);
+
+ @q = qw( lt gt le ge );
+ @left_bond_strength{@q} = ( 0.9 * NOMINAL + 0.1 * STRONG ) x scalar(@q);
+
+ @q = qw( and or err xor ne eq );
+ @right_bond_strength{@q} = (NOMINAL) x scalar(@q);
+
+ $right_bond_strength{'{'} = WEAK;
+ $left_bond_strength{'{'} = VERY_STRONG;
+
+ #---------------------------------------------------------------
+ # Bond Strength BEGIN Section 2.
+ # Set binary rules for bond strengths between certain token types.
+ #---------------------------------------------------------------
+
+ # We have a little problem making tables which apply to the
+ # container tokens. Here is a list of container tokens and
+ # their types:
+ #
+ # type tokens // meaning
+ # { {, [, ( // indent
+ # } }, ], ) // outdent
+ # [ [ // left non-structural [ (enclosing an array index)
+ # ] ] // right non-structural square bracket
+ # ( ( // left non-structural paren
+ # ) ) // right non-structural paren
+ # L { // left non-structural curly brace (enclosing a key)
+ # R } // right non-structural curly brace
+ #
+ # Some rules apply to token types and some to just the token
+ # itself. We solve the problem by combining type and token into a
+ # new hash key for the container types.
+ #
+ # If a rule applies to a token 'type' then we need to make rules
+ # for each of these 'type.token' combinations:
+ # Type Type.Token
+ # { {{, {[, {(
+ # [ [[
+ # ( ((
+ # L L{
+ # } }}, }], })
+ # ] ]]
+ # ) ))
+ # R R}
+ #
+ # If a rule applies to a token then we need to make rules for
+ # these 'type.token' combinations:
+ # Token Type.Token
+ # { {{, L{
+ # [ {[, [[
+ # ( {(, ((
+ # } }}, R}
+ # ] }], ]]
+ # ) }), ))
+
+ # allow long lines before final { in an if statement, as in:
+ # if (..........
+ # ..........)
+ # {
+ #
+ # Otherwise, the line before the { tends to be too short.
+
+ $binary_bond_strength{'))'}{'{{'} = VERY_WEAK + 0.03;
+ $binary_bond_strength{'(('}{'{{'} = NOMINAL;
+
+ # break on something like '} (', but keep this stronger than a ','
+ # example is in 'howe.pl'
+ $binary_bond_strength{'R}'}{'(('} = 0.8 * VERY_WEAK + 0.2 * WEAK;
+ $binary_bond_strength{'}}'}{'(('} = 0.8 * VERY_WEAK + 0.2 * WEAK;
+
+ # keep matrix and hash indices together
+ # but make them a little below STRONG to allow breaking open
+ # something like {'some-word'}{'some-very-long-word'} at the }{
+ # (bracebrk.t)
+ $binary_bond_strength{']]'}{'[['} = 0.9 * STRONG + 0.1 * NOMINAL;
+ $binary_bond_strength{']]'}{'L{'} = 0.9 * STRONG + 0.1 * NOMINAL;
+ $binary_bond_strength{'R}'}{'[['} = 0.9 * STRONG + 0.1 * NOMINAL;
+ $binary_bond_strength{'R}'}{'L{'} = 0.9 * STRONG + 0.1 * NOMINAL;
+
+ # increase strength to the point where a break in the following
+ # will be after the opening paren rather than at the arrow:
+ # $a->$b($c);
+ $binary_bond_strength{'i'}{'->'} = 1.45 * STRONG;
+
+ # Added for c140 to make 'w ->' and 'i ->' behave the same
+ $binary_bond_strength{'w'}{'->'} = 1.45 * STRONG;
+
+ # Note that the following alternative strength would make the break at
+ # the '->' rather than opening the '('. Both have advantages and
+ # disadvantages.
+ # $binary_bond_strength{'i'}{'->'} = 0.5*STRONG + 0.5 * NOMINAL; #
+
+ $binary_bond_strength{'))'}{'->'} = 0.1 * STRONG + 0.9 * NOMINAL;
+ $binary_bond_strength{']]'}{'->'} = 0.1 * STRONG + 0.9 * NOMINAL;
+ $binary_bond_strength{'})'}{'->'} = 0.1 * STRONG + 0.9 * NOMINAL;
+ $binary_bond_strength{'}]'}{'->'} = 0.1 * STRONG + 0.9 * NOMINAL;
+ $binary_bond_strength{'}}'}{'->'} = 0.1 * STRONG + 0.9 * NOMINAL;
+ $binary_bond_strength{'R}'}{'->'} = 0.1 * STRONG + 0.9 * NOMINAL;
+
+ $binary_bond_strength{'))'}{'[['} = 0.2 * STRONG + 0.8 * NOMINAL;
+ $binary_bond_strength{'})'}{'[['} = 0.2 * STRONG + 0.8 * NOMINAL;
+ $binary_bond_strength{'))'}{'{['} = 0.2 * STRONG + 0.8 * NOMINAL;
+ $binary_bond_strength{'})'}{'{['} = 0.2 * STRONG + 0.8 * NOMINAL;
+
+ #---------------------------------------------------------------
+ # Binary NO_BREAK rules
+ #---------------------------------------------------------------
+
+ # use strict requires that bare word and => not be separated
+ $binary_bond_strength{'C'}{'=>'} = NO_BREAK;
+ $binary_bond_strength{'U'}{'=>'} = NO_BREAK;
+
+ # Never break between a bareword and a following paren because
+ # perl may give an error. For example, if a break is placed
+ # between 'to_filehandle' and its '(' the following line will
+ # give a syntax error [Carp.pm]: my( $no) =fileno(
+ # to_filehandle( $in)) ;
+ $binary_bond_strength{'C'}{'(('} = NO_BREAK;
+ $binary_bond_strength{'C'}{'{('} = NO_BREAK;
+ $binary_bond_strength{'U'}{'(('} = NO_BREAK;
+ $binary_bond_strength{'U'}{'{('} = NO_BREAK;
+
+ # use strict requires that bare word within braces not start new
+ # line
+ $binary_bond_strength{'L{'}{'w'} = NO_BREAK;
+
+ $binary_bond_strength{'w'}{'R}'} = NO_BREAK;
+
+ # The following two rules prevent a syntax error caused by breaking up
+ # a construction like '{-y}'. The '-' quotes the 'y' and prevents
+ # it from being taken as a transliteration. We have to keep
+ # token types 'L m w' together to prevent this error.
+ $binary_bond_strength{'L{'}{'m'} = NO_BREAK;
+ $binary_bond_strength_nospace{'m'}{'w'} = NO_BREAK;
+
+ # keep 'bareword-' together, but only if there is no space between
+ # the word and dash. Do not keep together if there is a space.
+ # example 'use perl6-alpha'
+ $binary_bond_strength_nospace{'w'}{'m'} = NO_BREAK;
+
+ # use strict requires that bare word and => not be separated
+ $binary_bond_strength{'w'}{'=>'} = NO_BREAK;
+
+ # use strict does not allow separating type info from trailing { }
+ # testfile is readmail.pl
+ $binary_bond_strength{'t'}{'L{'} = NO_BREAK;
+ $binary_bond_strength{'i'}{'L{'} = NO_BREAK;
+
+ # Fix for c250: set strength for new 'S' to be same as 'i'
+ # testfile is test11/Hub.pm
+ $binary_bond_strength{'S'}{'L{'} = NO_BREAK;
+
+ # As a defensive measure, do not break between a '(' and a
+ # filehandle. In some cases, this can cause an error. For
+ # example, the following program works:
+ # my $msg="hi!\n";
+ # print
+ # ( STDOUT
+ # $msg
+ # );
+ #
+ # But this program fails:
+ # my $msg="hi!\n";
+ # print
+ # (
+ # STDOUT
+ # $msg
+ # );
+ #
+ # This is normally only a problem with the 'extrude' option
+ $binary_bond_strength{'(('}{'Y'} = NO_BREAK;
+ $binary_bond_strength{'{('}{'Y'} = NO_BREAK;
+
+ # never break between sub name and opening paren
+ $binary_bond_strength{'w'}{'(('} = NO_BREAK;
+ $binary_bond_strength{'w'}{'{('} = NO_BREAK;
+
+ # keep '}' together with ';'
+ $binary_bond_strength{'}}'}{';'} = NO_BREAK;
+
+ # Breaking before a ++ can cause perl to guess wrong. For
+ # example the following line will cause a syntax error
+ # with -extrude if we break between '$i' and '++' [fixstyle2]
+ # print( ( $i++ & 1 ) ? $_ : ( $change{$_} || $_ ) );
+ $nobreak_lhs{'++'} = NO_BREAK;
+
+ # Do not break before a possible file handle
+ $nobreak_lhs{'Z'} = NO_BREAK;
+
+ # use strict hates bare words on any new line. For
+ # example, a break before the underscore here provokes the
+ # wrath of use strict:
+ # if ( -r $fn && ( -s _ || $AllowZeroFilesize)) {
+ $nobreak_rhs{'F'} = NO_BREAK;
+ $nobreak_rhs{'CORE::'} = NO_BREAK;
+
+ # To prevent the tokenizer from switching between types 'w' and 'G' we
+ # need to avoid breaking between type 'G' and the following code block
+ # brace. Fixes case b929.
+ $nobreak_rhs{G} = NO_BREAK;
+
+ #---------------------------------------------------------------
+ # Bond Strength BEGIN Section 3.
+ # Define tables and values for applying a small bias to the above
+ # values.
+ #---------------------------------------------------------------
+ # Adding a small 'bias' to strengths is a simple way to make a line
+ # break at the first of a sequence of identical terms. For
+ # example, to force long string of conditional operators to break
+ # with each line ending in a ':', we can add a small number to the
+ # bond strength of each ':' (colon.t)
+ @bias_tokens = qw( : && || f and or . ); # tokens which get bias
+ %bias_hash = map { $_ => 0 } @bias_tokens;
+ $delta_bias = 0.0001; # a very small strength level
+ return;
+
+ } ## end sub initialize_bond_strength_hashes
+
+ use constant DEBUG_BOND => 0;
+
+ sub set_bond_strengths {
+
+ my ($self) = @_;
+
+ # Define a 'bond strength' for each token pair in an output batch.
+ # See comments above for definition of bond strength.
+
+ my $rbond_strength_to_go = [];
+
+ my $rLL = $self->[_rLL_];
+ my $rK_weld_right = $self->[_rK_weld_right_];
+ my $rK_weld_left = $self->[_rK_weld_left_];
+ my $ris_list_by_seqno = $self->[_ris_list_by_seqno_];
+
+ # patch-its always ok to break at end of line
+ $nobreak_to_go[$max_index_to_go] = 0;
+
+ # we start a new set of bias values for each line
+ %bias = %bias_hash;
+
+ my $code_bias = -.01; # bias for closing block braces
+
+ my $type = 'b';
+ my $token = SPACE;
+ my $token_length = 1;
+ my $last_type;
+ my $last_nonblank_type = $type;
+ my $last_nonblank_token = $token;
+ my $list_str = $left_bond_strength{'?'};
+
+ my ( $bond_str_1, $bond_str_2, $bond_str_3, $bond_str_4 );
+
+ my ( $block_type, $i_next, $i_next_nonblank, $next_nonblank_token,
+ $next_nonblank_type, $next_token, $next_type,
+ $total_nesting_depth );
+
+ # main loop to compute bond strengths between each pair of tokens
+ foreach my $i ( 0 .. $max_index_to_go ) {
+ $last_type = $type;
+ if ( $type ne 'b' ) {
+ $last_nonblank_type = $type;
+ $last_nonblank_token = $token;
+ }
+ $type = $types_to_go[$i];
+
+ # strength on both sides of a blank is the same
+ if ( $type eq 'b' && $last_type ne 'b' ) {
+ $rbond_strength_to_go->[$i] = $rbond_strength_to_go->[ $i - 1 ];
+ $nobreak_to_go[$i] ||= $nobreak_to_go[ $i - 1 ]; # fix for b1257
+ next;
+ }
+
+ $token = $tokens_to_go[$i];
+ $token_length = $token_lengths_to_go[$i];
+ $block_type = $block_type_to_go[$i];
+ $i_next = $i + 1;
+ $next_type = $types_to_go[$i_next];
+ $next_token = $tokens_to_go[$i_next];
+ $total_nesting_depth = $nesting_depth_to_go[$i_next];
+ $i_next_nonblank = ( ( $next_type eq 'b' ) ? $i + 2 : $i + 1 );
+ $next_nonblank_type = $types_to_go[$i_next_nonblank];
+ $next_nonblank_token = $tokens_to_go[$i_next_nonblank];
+
+ my $seqno = $type_sequence_to_go[$i];
+ my $next_nonblank_seqno = $type_sequence_to_go[$i_next_nonblank];
+
+ # We are computing the strength of the bond between the current
+ # token and the NEXT token.
+
+ #---------------------------------------------------------------
+ # Bond Strength Section 1:
+ # First Approximation.
+ # Use minimum of individual left and right tabulated bond
+ # strengths.
+ #---------------------------------------------------------------
+ my $bsr = $right_bond_strength{$type};
+ my $bsl = $left_bond_strength{$next_nonblank_type};
+
+ # define right bond strengths of certain keywords
+ if ( $type eq 'k' ) {
+ if ( defined( $right_bond_strength{$token} ) ) {
+ $bsr = $right_bond_strength{$token};
+ }
+ }
+
+ # set terminal bond strength to the nominal value
+ # this will cause good preceding breaks to be retained
+ if ( $i_next_nonblank > $max_index_to_go ) {
+ $bsl = NOMINAL;
+
+ # But weaken the bond at a 'missing terminal comma'. If an
+ # optional comma is missing at the end of a broken list, use
+ # the strength of a comma anyway to make formatting the same as
+ # if it were there. Fixes issue c133.
+ if ( !defined($bsr) || $bsr > VERY_WEAK ) {
+ my $seqno_px = $parent_seqno_to_go[$max_index_to_go];
+ if ( $ris_list_by_seqno->{$seqno_px} ) {
+ my $KK = $K_to_go[$max_index_to_go];
+ my $Kn = $self->K_next_nonblank($KK);
+ my $seqno_n = $rLL->[$Kn]->[_TYPE_SEQUENCE_];
+ if ( $seqno_n && $seqno_n eq $seqno_px ) {
+ $bsl = VERY_WEAK;
+ }
+ }
+ }
+ }
+
+ # define left bond strengths of certain keywords
+ if ( $next_nonblank_type eq 'k' ) {
+ if ( defined( $left_bond_strength{$next_nonblank_token} ) ) {
+ $bsl = $left_bond_strength{$next_nonblank_token};
+ }
+ }
+
+ # Use the minimum of the left and right strengths. Note: it might
+ # seem that we would want to keep a NO_BREAK if either token has
+ # this value. This didn't work, for example because in an arrow
+ # list, it prevents the comma from separating from the following
+ # bare word (which is probably quoted by its arrow). So necessary
+ # NO_BREAK's have to be handled as special cases in the final
+ # section.
+ if ( !defined($bsr) ) { $bsr = VERY_STRONG }
+ if ( !defined($bsl) ) { $bsl = VERY_STRONG }
+ my $bond_str = ( $bsr < $bsl ) ? $bsr : $bsl;
+ $bond_str_1 = $bond_str if (DEBUG_BOND);
+
+ #---------------------------------------------------------------
+ # Bond Strength Section 2:
+ # Apply hardwired rules..
+ #---------------------------------------------------------------
+
+ # Patch to put terminal or clauses on a new line: Weaken the bond
+ # at an || followed by die or similar keyword to make the terminal
+ # or clause fall on a new line, like this:
+ #
+ # my $class = shift
+ # || die "Cannot add broadcast: No class identifier found";
+ #
+ # Otherwise the break will be at the previous '=' since the || and
+ # = have the same starting strength and the or is biased, like
+ # this:
+ #
+ # my $class =
+ # shift || die "Cannot add broadcast: No class identifier found";
+ #
+ # In any case if the user places a break at either the = or the ||
+ # it should remain there.
+ if ( $type eq '||' || $type eq 'k' && $token eq 'or' ) {
+
+ # /^(die|confess|croak|warn)$/
+ if ( $is_die_confess_croak_warn{$next_nonblank_token} ) {
+ if ( $want_break_before{$token} && $i > 0 ) {
+ $rbond_strength_to_go->[ $i - 1 ] -= $delta_bias;
+
+ # keep bond strength of a token and its following blank
+ # the same
+ if ( $types_to_go[ $i - 1 ] eq 'b' && $i > 2 ) {
+ $rbond_strength_to_go->[ $i - 2 ] -= $delta_bias;
+ }
+ }
+ else {
+ $bond_str -= $delta_bias;
+ }
+ }
+ }
+
+ # good to break after end of code blocks
+ if ( $type eq '}' && $block_type && $next_nonblank_type ne ';' ) {
+
+ $bond_str = 0.5 * WEAK + 0.5 * VERY_WEAK + $code_bias;
+ $code_bias += $delta_bias;
+ }
+
+ if ( $type eq 'k' ) {
+
+ # allow certain control keywords to stand out
+ if ( $next_nonblank_type eq 'k'
+ && $is_last_next_redo_return{$token} )
+ {
+ $bond_str = 0.45 * WEAK + 0.55 * VERY_WEAK;
+ }
+
+ # Don't break after keyword my. This is a quick fix for a
+ # rare problem with perl. An example is this line from file
+ # Container.pm:
+
+ # foreach my $question( Debian::DebConf::ConfigDb::gettree(
+ # $this->{'question'} ) )
+
+ if ( $token eq 'my' ) {
+ $bond_str = NO_BREAK;
+ }
+
+ }
+
+ if ( $next_nonblank_type eq 'k' && $type ne 'CORE::' ) {
+
+ if ( $is_keyword_returning_list{$next_nonblank_token} ) {
+ $bond_str = $list_str if ( $bond_str > $list_str );
+ }
+
+ # keywords like 'unless', 'if', etc, within statements
+ # make good breaks
+ if ( $is_good_keyword_breakpoint{$next_nonblank_token} ) {
+ $bond_str = VERY_WEAK / 1.05;
+ }
+ }
+
+ # try not to break before a comma-arrow
+ elsif ( $next_nonblank_type eq '=>' ) {
+ if ( $bond_str < STRONG ) { $bond_str = STRONG }
+ }
+ else {
+ # no applicable hardwired change
+ }
+
+ #---------------------------------------------------------------
+ # Additional hardwired NOBREAK rules
+ #---------------------------------------------------------------
+
+ # map1.t -- correct for a quirk in perl
+ if ( $token eq '('
+ && $next_nonblank_type eq 'i'
+ && $last_nonblank_type eq 'k'
+ && $is_sort_map_grep{$last_nonblank_token} )
+
+ # /^(sort|map|grep)$/ )
+ {
+ $bond_str = NO_BREAK;
+ }
+
+ # extrude.t: do not break before paren at:
+ # -l pid_filename(
+ if ( $last_nonblank_type eq 'F' && $next_nonblank_token eq '(' ) {
+ $bond_str = NO_BREAK;
+ }
+
+ # OLD COMMENT: In older version of perl, use strict can cause
+ # problems with breaks before bare words following opening parens.
+ # For example, this will fail under older versions if a break is
+ # made between '(' and 'MAIL':
+
+ # use strict; open( MAIL, "a long filename or command"); close MAIL;
+
+ # NEW COMMENT: Third fix for b1213:
+ # This option does not seem to be needed any longer, and it can
+ # cause instabilities. It can be turned off, but to minimize
+ # changes to existing formatting it is retained only in the case
+ # where the previous token was 'open' and there was no line break.
+ # Even this could eventually be removed if it causes instability.
+ if ( $type eq '{' ) {
+
+ if ( $token eq '('
+ && $next_nonblank_type eq 'w'
+ && $last_nonblank_type eq 'k'
+ && $last_nonblank_token eq 'open'
+ && !$old_breakpoint_to_go[$i] )
+ {
+ $bond_str = NO_BREAK;
+ }
+ }
+
+ # Do not break between a possible filehandle and a ? or / and do
+ # not introduce a break after it if there is no blank
+ # (extrude.t)
+ elsif ( $type eq 'Z' ) {
+
+ # don't break..
+ if (
+
+ # if there is no blank and we do not want one. Examples:
+ # print $x++ # do not break after $x
+ # print HTML"HELLO" # break ok after HTML
+ (
+ $next_type ne 'b'
+ && defined( $want_left_space{$next_type} )
+ && $want_left_space{$next_type} == WS_NO
+ )
+
+ # or we might be followed by the start of a quote,
+ # and this is not an existing breakpoint; fixes c039.
+ || !$old_breakpoint_to_go[$i]
+ && substr( $next_nonblank_token, 0, 1 ) eq '/'
+
+ )
+ {
+ $bond_str = NO_BREAK;
+ }
+ }
+
+ # Fix for c039
+ elsif ( $type eq 'w' ) {
+ $bond_str = NO_BREAK
+ if ( !$old_breakpoint_to_go[$i]
+ && substr( $next_nonblank_token, 0, 1 ) eq '/'
+ && $next_nonblank_type ne '//' );
+ }
+ else {
+ # no hardwired rule applies
+ }
+
+ # Breaking before a ? before a quote can cause trouble if
+ # they are not separated by a blank.
+ # Example: a syntax error occurs if you break before the ? here
+ # my$logic=join$all?' && ':' || ',@regexps;
+ # From: Professional_Perl_Programming_Code/multifind.pl
+ if ( $next_nonblank_type eq '?' ) {
+ $bond_str = NO_BREAK
+ if ( $types_to_go[ $i_next_nonblank + 1 ] eq 'Q' );
+ }
+
+ # Breaking before a . followed by a number
+ # can cause trouble if there is no intervening space
+ # Example: a syntax error occurs if you break before the .2 here
+ # $str .= pack($endian.2, ensurrogate($ord));
+ # From: perl58/Unicode.pm
+ elsif ( $next_nonblank_type eq '.' ) {
+ $bond_str = NO_BREAK
+ if ( $types_to_go[ $i_next_nonblank + 1 ] eq 'n' );
+ }
+
+ # Do not break before a phantom comma because it will confuse
+ # the convergence test (STRANGE message is emitted)
+ elsif ( $next_nonblank_type eq ',' ) {
+ if ( !length($next_nonblank_token) ) {
+ $bond_str = NO_BREAK;
+ }
+ }
+ else {
+ # no special NO_BREAK rule applies
+ }
+
+ $bond_str_2 = $bond_str if (DEBUG_BOND);
+
+ #---------------------------------------------------------------
+ # End of hardwired rules
+ #---------------------------------------------------------------
+
+ #---------------------------------------------------------------
+ # Bond Strength Section 3:
+ # Apply table rules. These have priority over the above
+ # hardwired rules.
+ #---------------------------------------------------------------
+
+ my $tabulated_bond_str;
+ my $ltype = $type;
+ my $rtype = $next_nonblank_type;
+ if ( $seqno && $is_container_token{$token} ) {
+ $ltype = $type . $token;
+ }
+
+ if ( $next_nonblank_seqno
+ && $is_container_token{$next_nonblank_token} )
+ {
+ $rtype = $next_nonblank_type . $next_nonblank_token;
+
+ # Alternate Fix #1 for issue b1299. This version makes the
+ # decision as soon as possible. See Alternate Fix #2 also.
+ # Do not separate a bareword identifier from its paren: b1299
+ # This is currently needed for stability because if the bareword
+ # gets separated from a preceding '->' and following '(' then
+ # the tokenizer may switch from type 'i' to type 'w'. This
+ # patch will prevent this by keeping it adjacent to its '('.
+## if ( $next_nonblank_token eq '('
+## && $ltype eq 'i'
+## && substr( $token, 0, 1 ) =~ /^\w$/ )
+## {
+## $ltype = 'w';
+## }
+ }
+
+ # apply binary rules which apply regardless of space between tokens
+ if ( $binary_bond_strength{$ltype}{$rtype} ) {
+ $bond_str = $binary_bond_strength{$ltype}{$rtype};
+ $tabulated_bond_str = $bond_str;
+ }
+
+ # apply binary rules which apply only if no space between tokens
+ if ( $binary_bond_strength_nospace{$ltype}{$next_type} ) {
+ $bond_str = $binary_bond_strength{$ltype}{$next_type};
+ $tabulated_bond_str = $bond_str;
+ }
+
+ if ( $nobreak_rhs{$ltype} || $nobreak_lhs{$rtype} ) {
+ $bond_str = NO_BREAK;
+ $tabulated_bond_str = $bond_str;
+ }
+
+ $bond_str_3 = $bond_str if (DEBUG_BOND);
+
+ # If the hardwired rules conflict with the tabulated bond
+ # strength then there is an inconsistency that should be fixed
+ DEBUG_BOND
+ && $tabulated_bond_str
+ && $bond_str_1
+ && $bond_str_1 != $bond_str_2
+ && $bond_str_2 != $tabulated_bond_str
+ && do {
+ print {*STDOUT}
+"BOND_TABLES: ltype=$ltype rtype=$rtype $bond_str_1->$bond_str_2->$bond_str_3\n";
+ };
+
+ #-----------------------------------------------------------------
+ # Bond Strength Section 4:
+ # Modify strengths of certain tokens which often occur in sequence
+ # by adding a small bias to each one in turn so that the breaks
+ # occur from left to right.
+ #
+ # Note that we only changing strengths by small amounts here,
+ # and usually increasing, so we should not be altering any NO_BREAKs.
+ # Other routines which check for NO_BREAKs will use a tolerance
+ # of one to avoid any problem.
+ #-----------------------------------------------------------------
+
+ # The bias tables use special keys:
+ # $type - if not keyword
+ # $token - if keyword, but map some keywords together
+ my $left_key =
+ $type eq 'k' ? $token eq 'err' ? 'or' : $token : $type;
+ my $right_key =
+ $next_nonblank_type eq 'k'
+ ? $next_nonblank_token eq 'err'
+ ? 'or'
+ : $next_nonblank_token
+ : $next_nonblank_type;
+
+ # bias left token
+ if ( defined( $bias{$left_key} ) ) {
+ if ( !$want_break_before{$left_key} ) {
+ $bias{$left_key} += $delta_bias;
+ $bond_str += $bias{$left_key};
+ }
+ }
+
+ # bias right token
+ if ( defined( $bias{$right_key} ) ) {
+ if ( $want_break_before{$right_key} ) {
+
+ # for leading '.' align all but 'short' quotes; the idea
+ # is to not place something like "\n" on a single line.
+ if ( $right_key eq '.' ) {
+
+ my $is_short_quote = $last_nonblank_type eq '.'
+ && ( $token_length <=
+ $rOpts_short_concatenation_item_length )
+ && !$is_closing_token{$token};
+
+ if ( !$is_short_quote ) {
+ $bias{$right_key} += $delta_bias;
+ }
+ }
+ else {
+ $bias{$right_key} += $delta_bias;
+ }
+ $bond_str += $bias{$right_key};
+ }
+ }
+
+ $bond_str_4 = $bond_str if (DEBUG_BOND);
+
+ #---------------------------------------------------------------
+ # Bond Strength Section 5:
+ # Fifth Approximation.
+ # Take nesting depth into account by adding the nesting depth
+ # to the bond strength.
+ #---------------------------------------------------------------
+ my $strength;
+
+ if ( defined($bond_str) && !$nobreak_to_go[$i] ) {
+ if ( $total_nesting_depth > 0 ) {
+ $strength = $bond_str + $total_nesting_depth;
+ }
+ else {
+ $strength = $bond_str;
+ }
+ }
+ else {
+ $strength = NO_BREAK;
+
+ # For critical code such as lines with here targets we must
+ # be absolutely sure that we do not allow a break. So for
+ # these the nobreak flag exceeds 1 as a signal. Otherwise we
+ # can run into trouble when small tolerances are added.
+ $strength += 1
+ if ( $nobreak_to_go[$i] && $nobreak_to_go[$i] > 1 );
+ }
+
+ #---------------------------------------------------------------
+ # Bond Strength Section 6:
+ # Sixth Approximation. Welds.
+ #---------------------------------------------------------------
+
+ # Do not allow a break within welds
+ if ( $total_weld_count && $seqno ) {
+ my $KK = $K_to_go[$i];
+ if ( $rK_weld_right->{$KK} ) {
+ $strength = NO_BREAK;
+ }
+
+ # But encourage breaking after opening welded tokens
+ elsif ($rK_weld_left->{$KK}
+ && $is_opening_token{$token} )
+ {
+ $strength -= 1;
+ }
+ else {
+ # not welded left or right
+ }
+ }
+
+ # always break after side comment
+ if ( $type eq '#' ) { $strength = 0 }
+
+ $rbond_strength_to_go->[$i] = $strength;
+
+ # Fix for case c001: be sure NO_BREAK's are enforced by later
+ # routines, except at a '?' because '?' as quote delimiter is
+ # deprecated.
+ if ( $strength >= NO_BREAK && $next_nonblank_type ne '?' ) {
+ $nobreak_to_go[$i] ||= 1;
+ }
+
+ DEBUG_BOND && do {
+ my $str = substr( $token, 0, 15 );
+ $str .= SPACE x ( 16 - length($str) );
+ print {*STDOUT}
+"BOND: i=$i $str $type $next_nonblank_type depth=$total_nesting_depth strength=$bond_str_1 -> $bond_str_2 -> $bond_str_3 -> $bond_str_4 $bond_str -> $strength \n";
+
+ # reset for next pass
+ $bond_str_1 = $bond_str_2 = $bond_str_3 = $bond_str_4 = undef;
+ };
+
+ } ## end main loop
+ return $rbond_strength_to_go;
+ } ## end sub set_bond_strengths
+} ## end closure set_bond_strengths
+
+sub bad_pattern {
+ my ($pattern) = @_;
+
+ # Return true if a regex pattern has an error
+ # Note: Tokenizer.pm also has a copy of this
+ my $regex_uu = eval { qr/$pattern/ };
+ return $EVAL_ERROR;
+} ## end sub bad_pattern
+
+{ ## begin closure prepare_cuddled_block_types
+
+ my %no_cuddle;
+
+ # Add keywords here which really should not be cuddled
+ BEGIN {
+ my @q = qw( if unless for foreach while );
+ @no_cuddle{@q} = (1) x scalar(@q);
+ }
+
+ sub prepare_cuddled_block_types {
+
+ # Construct a hash needed by the cuddled-else style
+
+ my $cuddled_string = EMPTY_STRING;
+ if ( $rOpts->{'cuddled-else'} ) {
+
+ # set the default
+ $cuddled_string = 'elsif else continue catch finally'
+ unless ( $rOpts->{'cuddled-block-list-exclusive'} );
+
+ # This is the old equivalent but more complex version
+ # $cuddled_string = 'if-elsif-else unless-elsif-else -continue ';
+
+ # Add users other blocks to be cuddled
+ my $cuddled_block_list = $rOpts->{'cuddled-block-list'};
+ if ($cuddled_block_list) {
+ $cuddled_string .= SPACE . $cuddled_block_list;
+ }
+ }
+
+ # If we have a cuddled string of the form
+ # 'try-catch-finally'
+
+ # we want to prepare a hash of the form
+
+ # $rcuddled_block_types = {
+ # 'try' => {
+ # 'catch' => 1,
+ # 'finally' => 1
+ # },
+ # };
+
+ # use -dcbl to dump this hash
+
+ # Multiple such strings are input as a space or comma separated list
+
+ # If we get two lists with the same leading type, such as
+ # -cbl = "-try-catch-finally -try-catch-otherwise"
+ # then they will get merged as follows:
+ # $rcuddled_block_types = {
+ # 'try' => {
+ # 'catch' => 1,
+ # 'finally' => 2,
+ # 'otherwise' => 1,
+ # },
+ # };
+ # This will allow either type of chain to be followed.
+
+ $cuddled_string =~ s/,/ /g; # allow space or comma separated lists
+ my @cuddled_strings = split /\s+/, $cuddled_string;
+
+ $rcuddled_block_types = {};
+
+ # process each dash-separated string...
+ my $string_count = 0;
+ foreach my $string (@cuddled_strings) {
+ next unless $string;
+ my @words = split /-+/, $string; # allow multiple dashes
+
+ # we could look for and report possible errors here...
+ next if ( @words <= 0 );
+
+ # allow either '-continue' or *-continue' for arbitrary starting type
+ my $start = '*';
+
+ # a single word without dashes is a secondary block type
+ if ( @words > 1 ) {
+ $start = shift @words;
+ }
+
+ # always make an entry for the leading word. If none follow, this
+ # will still prevent a wildcard from matching this word.
+ if ( !defined( $rcuddled_block_types->{$start} ) ) {
+ $rcuddled_block_types->{$start} = {};
+ }
+
+ # The count gives the original word order in case we ever want it.
+ $string_count++;
+ my $word_count = 0;
+ foreach my $word (@words) {
+ next unless $word;
+ if ( $no_cuddle{$word} ) {
+ Warn(
+"## Ignoring keyword '$word' in -cbl; does not seem right\n"
+ );
+ next;
+ }
+ $word_count++;
+ $rcuddled_block_types->{$start}->{$word} =
+ 1; #"$string_count.$word_count";
+
+ # git#9: Remove this word from the list of desired one-line
+ # blocks
+ $want_one_line_block{$word} = 0;
+ }
+ }
+ return;
+ } ## end sub prepare_cuddled_block_types
+} ## end closure prepare_cuddled_block_types
+
+sub dump_cuddled_block_list {
+ my ($fh) = @_;
+
+ # ORIGINAL METHOD: Here is the format of the cuddled block type hash
+ # which controls this routine
+ # my $rcuddled_block_types = {
+ # 'if' => {
+ # 'else' => 1,
+ # 'elsif' => 1
+ # },
+ # 'try' => {
+ # 'catch' => 1,
+ # 'finally' => 1
+ # },
+ # };
+
+ # SIMPLIFIED METHOD: the simplified method uses a wildcard for
+ # the starting block type and puts all cuddled blocks together:
+ # my $rcuddled_block_types = {
+ # '*' => {
+ # 'else' => 1,
+ # 'elsif' => 1
+ # 'catch' => 1,
+ # 'finally' => 1
+ # },
+ # };
+
+ # Both methods work, but the simplified method has proven to be adequate and
+ # easier to manage.
+
+ my $cuddled_string = $rOpts->{'cuddled-block-list'};
+ $cuddled_string = EMPTY_STRING unless $cuddled_string;
+
+ my $flags = EMPTY_STRING;
+ $flags .= "-ce" if ( $rOpts->{'cuddled-else'} );
+ $flags .= " -cbl='$cuddled_string'";
+
+ if ( !$rOpts->{'cuddled-else'} ) {
+ $flags .= "\nNote: You must specify -ce to generate a cuddled hash";
+ }
+
+ $fh->print(<<EOM);
+------------------------------------------------------------------------
+Hash of cuddled block types prepared for a run with these parameters:
+ $flags
+------------------------------------------------------------------------
+EOM
+
+ use Data::Dumper;
+ $fh->print( Dumper($rcuddled_block_types) );
+
+ $fh->print(<<EOM);
+------------------------------------------------------------------------
+EOM
+ return;
+} ## end sub dump_cuddled_block_list
+
+sub make_static_block_comment_pattern {
+
+ # create the pattern used to identify static block comments
+ $static_block_comment_pattern = '^\s*##';
+
+ # allow the user to change it
+ if ( $rOpts->{'static-block-comment-prefix'} ) {
+ my $prefix = $rOpts->{'static-block-comment-prefix'};
+ $prefix =~ s/^\s+//;
+ my $pattern = $prefix;
+
+ # user may give leading caret to force matching left comments only
+ if ( $prefix !~ /^\^#/ ) {
+ if ( $prefix !~ /^#/ ) {
+ Die(
+"ERROR: the -sbcp prefix is '$prefix' but must begin with '#' or '^#'\n"
+ );
+ }
+ $pattern = '^\s*' . $prefix;
+ }
+ if ( bad_pattern($pattern) ) {
+ Die(
+"ERROR: the -sbc prefix '$prefix' causes the invalid regex '$pattern'\n"
+ );
+ }
+ $static_block_comment_pattern = $pattern;
+ }
+ return;
+} ## end sub make_static_block_comment_pattern
+
+sub make_format_skipping_pattern {
+ my ( $opt_name, $default ) = @_;
+ my $param = $rOpts->{$opt_name};
+ if ( !$param ) { $param = $default }
+ $param =~ s/^\s+//;
+ if ( $param !~ /^#/ ) {
+ Die("ERROR: the $opt_name parameter '$param' must begin with '#'\n");
+ }
+ my $pattern = '^' . $param . '\s';
+ if ( bad_pattern($pattern) ) {
+ Die(
+"ERROR: the $opt_name parameter '$param' causes the invalid regex '$pattern'\n"
+ );
+ }
+ return $pattern;
+} ## end sub make_format_skipping_pattern
+
+sub make_non_indenting_brace_pattern {
+
+ # Create the pattern used to identify static side comments.
+ # Note that we are ending the pattern in a \s. This will allow
+ # the pattern to be followed by a space and some text, or a newline.
+ # The pattern is used in sub 'non_indenting_braces'
+ $non_indenting_brace_pattern = '^#<<<\s';
+
+ # allow the user to change it
+ if ( $rOpts->{'non-indenting-brace-prefix'} ) {
+ my $prefix = $rOpts->{'non-indenting-brace-prefix'};
+ $prefix =~ s/^\s+//;
+ if ( $prefix !~ /^#/ ) {
+ Die("ERROR: the -nibp parameter '$prefix' must begin with '#'\n");
+ }
+ my $pattern = '^' . $prefix . '\s';
+ if ( bad_pattern($pattern) ) {
+ Die(
+"ERROR: the -nibp prefix '$prefix' causes the invalid regex '$pattern'\n"
+ );
+ }
+ $non_indenting_brace_pattern = $pattern;
+ }
+ return;
+} ## end sub make_non_indenting_brace_pattern
+
+sub make_closing_side_comment_list_pattern {
+
+ # turn any input list into a regex for recognizing selected block types
+ $closing_side_comment_list_pattern = '^\w+';
+
+ # '1' is an impossible block name
+ $closing_side_comment_exclusion_pattern = '^1';
+
+ # Need a separate flag for anonymous subs because they are the only
+ # types where the side comment might follow a ';'
+ $closing_side_comment_want_asub = 1;
+
+ my $cscl = $rOpts->{'closing-side-comment-list'};
+ if ( defined($cscl) && $cscl ) {
+ $closing_side_comment_list_pattern =
+ make_block_pattern( '-cscl', $cscl );
+ $closing_side_comment_want_asub = $cscl =~ /\basub\b/;
+ }
+
+ my $cscxl = $rOpts->{'closing-side-comment-exclusion-list'};
+ if ( defined($cscxl) && $cscxl ) {
+ $closing_side_comment_exclusion_pattern =
+ make_block_pattern( '-cscxl', $cscxl );
+ if ( $cscxl =~ /\basub\b/ ) {
+ $closing_side_comment_want_asub = 0;
+ }
+ }
+ return;
+} ## end sub make_closing_side_comment_list_pattern
+
+sub initialize_closing_side_comments {
+
+ make_closing_side_comment_prefix();
+ make_closing_side_comment_list_pattern();
+
+ # If closing side comments ARE selected, then we can safely
+ # delete old closing side comments unless closing side comment
+ # warnings are requested. This is a good idea because it will
+ # eliminate any old csc's which fall below the line count threshold.
+ # We cannot do this if warnings are turned on, though, because we
+ # might delete some text which has been added. So that must
+ # be handled when comments are created. And we cannot do this
+ # with -io because -csc will be skipped altogether.
+ if ( $rOpts->{'closing-side-comments'} ) {
+ if ( !$rOpts->{'closing-side-comment-warnings'}
+ && !$rOpts->{'indent-only'} )
+ {
+ $rOpts->{'delete-closing-side-comments'} = 1;
+ }
+ }
+
+ # If closing side comments ARE NOT selected, but warnings ARE
+ # selected and we ARE DELETING csc's, then we will pretend to be
+ # adding with a huge interval. This will force the comments to be
+ # generated for comparison with the old comments, but not added.
+ elsif ( $rOpts->{'closing-side-comment-warnings'} ) {
+ if ( $rOpts->{'delete-closing-side-comments'} ) {
+ $rOpts->{'delete-closing-side-comments'} = 0;
+ $rOpts->{'closing-side-comments'} = 1;
+ $rOpts->{'closing-side-comment-interval'} = 100_000_000;
+ }
+ }
+ else {
+ # no -csc flags
+ }
+
+ return;
+} ## end sub initialize_closing_side_comments
+
+sub initialize_missing_else_comment {
+
+ my $comment = $rOpts->{'add-missing-else-comment'};
+ if ( !$comment ) {
+ $comment = '##FIX' . 'ME - added with perltidy -ame';
+ }
+ else {
+ $comment = substr( $comment, 0, 60 );
+ $comment =~ s/^\s+//;
+ $comment =~ s/\s+$//;
+ $comment =~ s/\n/ /g;
+ if ( substr( $comment, 0, 1 ) ne '#' ) {
+ $comment = '#' . $comment;
+ }
+ }
+ $rOpts->{'add-missing-else-comment'} = $comment;
+
+ return;
+} ## end sub initialize_missing_else_comment
+
+sub make_sub_matching_pattern {
+
+ # Patterns for standardizing matches to block types for regular subs and
+ # anonymous subs. Examples
+ # 'sub process' is a named sub
+ # 'sub ::m' is a named sub
+ # 'sub' is an anonymous sub
+ # 'sub:' is a label, not a sub
+ # 'sub :' is a label, not a sub ( block type will be <sub:> )
+ # sub'_ is a named sub ( block type will be <sub '_> )
+ # 'substr' is a keyword
+ # So note that named subs always have a space after 'sub'
+ $SUB_PATTERN = '^sub\s'; # match normal sub
+ $ASUB_PATTERN = '^sub$'; # match anonymous sub
+ %matches_ASUB = ( 'sub' => 1 );
+
+ # Fix the patterns to include any sub aliases:
+ # Note that any 'sub-alias-list' has been preprocessed to
+ # be a trimmed, space-separated list which includes 'sub'
+ # for example, it might be 'sub method fun'
+ my @words;
+ my $sub_alias_list = $rOpts->{'sub-alias-list'};
+ if ($sub_alias_list) {
+ @words = split /\s+/, $sub_alias_list;
+ }
+ else {
+ push @words, 'sub';
+ }
+
+ # add 'method' unless use-feature='noclass' is set.
+ if ( !defined( $rOpts->{'use-feature'} )
+ || $rOpts->{'use-feature'} !~ /\bnoclass\b/ )
+ {
+ push @words, 'method';
+ }
+
+ # Note (see also RT #133130): These patterns are used by
+ # sub make_block_pattern, which is used for making most patterns.
+ # So this sub needs to be called before other pattern-making routines.
+ if ( @words > 1 ) {
+
+ # Two ways are provided to match an anonymous sub:
+ # $ASUB_PATTERN - with a regex (old method, slow)
+ # %matches_ASUB - with a hash lookup (new method, faster)
+
+ @matches_ASUB{@words} = (1) x scalar(@words);
+ my $alias_list = join '|', keys %matches_ASUB;
+ $SUB_PATTERN =~ s/sub/\($alias_list\)/;
+ $ASUB_PATTERN =~ s/sub/\($alias_list\)/;
+ }
+ return;
+} ## end sub make_sub_matching_pattern
+
+sub make_bl_pattern {
+
+ # Set defaults lists to retain historical default behavior for -bl:
+ my $bl_list_string = '*';
+ my $bl_exclusion_list_string = 'sort map grep eval asub';
+
+ my $bl_long_name = 'opening-brace-on-new-line';
+ my $bll_long_name = 'brace-left-list';
+ my $blxl_long_name = 'brace-left-exclusion-list';
+ my $sbl_long_name = 'opening-sub-brace-on-new-line';
+ my $asbl_long_name = 'opening-anonymous-sub-brace-on-new-line';
+
+ if ( defined( $rOpts->{$bll_long_name} ) && $rOpts->{$bll_long_name} ) {
+ $bl_list_string = $rOpts->{$bll_long_name};
+ }
+ if ( $bl_list_string =~ /\bsub\b/ ) {
+ $rOpts->{$sbl_long_name} ||= $rOpts->{$bl_long_name};
+ }
+ if ( $bl_list_string =~ /\basub\b/ ) {
+ $rOpts->{$asbl_long_name} ||= $rOpts->{$bl_long_name};
+ }
+
+ $bl_pattern = make_block_pattern( '-bll', $bl_list_string );
+
+ # for -bl, a list with '*' turns on -sbl and -asbl
+ if ( $bl_pattern =~ /\.\*/ ) {
+
+ if ( !defined( $rOpts->{$sbl_long_name} ) ) {
+ $rOpts->{$sbl_long_name} = $rOpts->{$bl_long_name};
+ }
+
+ if ( !defined( $rOpts->{$asbl_long_name} )
+ && defined( $rOpts->{$bll_long_name} ) )
+ {
+ $rOpts->{$asbl_long_name} = $rOpts->{$bl_long_name};
+ }
+ }
+
+ if ( defined( $rOpts->{$blxl_long_name} )
+ && $rOpts->{$blxl_long_name} )
+ {
+ $bl_exclusion_list_string = $rOpts->{$blxl_long_name};
+ if ( $bl_exclusion_list_string =~ /\bsub\b/ ) {
+ $rOpts->{$sbl_long_name} = 0;
+ }
+ if ( $bl_exclusion_list_string =~ /\basub\b/ ) {
+ $rOpts->{$asbl_long_name} = 0;
+ }
+ }
+
+ $bl_exclusion_pattern =
+ make_block_pattern( '-blxl', $bl_exclusion_list_string );
+ return;
+} ## end sub make_bl_pattern
+
+sub make_bli_pattern {
+
+ # Default list of block types for which -bli would apply:
+ my $bli_list_string = 'if else elsif unless while for foreach do : sub';
+ my $bli_exclusion_list_string = SPACE;
+
+ if ( defined( $rOpts->{'brace-left-and-indent-list'} )
+ && $rOpts->{'brace-left-and-indent-list'} )
+ {
+ $bli_list_string = $rOpts->{'brace-left-and-indent-list'};
+ }
+
+ $bli_pattern = make_block_pattern( '-blil', $bli_list_string );
+
+ if ( defined( $rOpts->{'brace-left-and-indent-exclusion-list'} )
+ && $rOpts->{'brace-left-and-indent-exclusion-list'} )
+ {
+ $bli_exclusion_list_string =
+ $rOpts->{'brace-left-and-indent-exclusion-list'};
+ }
+ $bli_exclusion_pattern =
+ make_block_pattern( '-blixl', $bli_exclusion_list_string );
+ return;
+} ## end sub make_bli_pattern
+
+sub make_keyword_group_list_pattern {
+
+ # Turn any input list into a regex for recognizing selected block types.
+ # Here are the defaults:
+ $keyword_group_list_pattern = '^(our|local|my|use|require|)$';
+ $keyword_group_list_comment_pattern = EMPTY_STRING;
+ if ( defined( $rOpts->{'keyword-group-blanks-list'} )
+ && $rOpts->{'keyword-group-blanks-list'} )
+ {
+ my @words = split /\s+/, $rOpts->{'keyword-group-blanks-list'};
+ my @keyword_list;
+ my @comment_list;
+ foreach my $word (@words) {
+ if ( $word eq 'BC' || $word eq 'SBC' ) {
+ push @comment_list, $word;
+ if ( $word eq 'SBC' ) { push @comment_list, 'SBCX' }
+ }
+ else {
+ push @keyword_list, $word;
+ }
+ }
+ $keyword_group_list_pattern =
+ make_block_pattern( '-kgbl', $rOpts->{'keyword-group-blanks-list'} );
+ $keyword_group_list_comment_pattern =
+ make_block_pattern( '-kgbl', join( SPACE, @comment_list ) );
+ }
+ return;
+} ## end sub make_keyword_group_list_pattern
+
+sub make_block_brace_vertical_tightness_pattern {
+
+ # Turn any input list into a regex for recognizing selected block types
+ $block_brace_vertical_tightness_pattern =
+ '^((if|else|elsif|unless|while|for|foreach|do|\w+:)$|sub)';
+ if ( defined( $rOpts->{'block-brace-vertical-tightness-list'} )
+ && $rOpts->{'block-brace-vertical-tightness-list'} )
+ {
+ $block_brace_vertical_tightness_pattern =
+ make_block_pattern( '-bbvtl',
+ $rOpts->{'block-brace-vertical-tightness-list'} );
+ }
+ return;
+} ## end sub make_block_brace_vertical_tightness_pattern
+
+sub make_blank_line_pattern {
+
+ $blank_lines_before_closing_block_pattern = $SUB_PATTERN;
+ my $key = 'blank-lines-before-closing-block-list';
+ if ( defined( $rOpts->{$key} ) && $rOpts->{$key} ) {
+ $blank_lines_before_closing_block_pattern =
+ make_block_pattern( '-blbcl', $rOpts->{$key} );
+ }
+
+ $blank_lines_after_opening_block_pattern = $SUB_PATTERN;
+ $key = 'blank-lines-after-opening-block-list';
+ if ( defined( $rOpts->{$key} ) && $rOpts->{$key} ) {
+ $blank_lines_after_opening_block_pattern =
+ make_block_pattern( '-blaol', $rOpts->{$key} );
+ }
+ return;
+} ## end sub make_blank_line_pattern
+
+sub make_block_pattern {
+
+ # Given a string of block-type keywords, return a regex to match them
+ # The only tricky part is that labels are indicated with a single ':'
+ # and the 'sub' token text may have additional text after it (name of
+ # sub).
+ #
+ # Example:
+ #
+ # input string: "if else elsif unless while for foreach do : sub";
+ # pattern: '^((if|else|elsif|unless|while|for|foreach|do|\w+:)$|sub)';
+
+ # Minor Update:
+ #
+ # To distinguish between anonymous subs and named subs, use 'sub' to
+ # indicate a named sub, and 'asub' to indicate an anonymous sub
+
+ my ( $abbrev, $string ) = @_;
+ my @list = split_words($string);
+ my @words = ();
+ my %seen;
+ for my $i (@list) {
+ if ( $i eq '*' ) { my $pattern = '^.*'; return $pattern }
+ next if $seen{$i};
+ $seen{$i} = 1;
+ if ( $i eq 'sub' ) {
+ }
+ elsif ( $i eq 'asub' ) {
+ }
+ elsif ( $i eq ';' ) {
+ push @words, ';';
+ }
+ elsif ( $i eq '{' ) {
+ push @words, '\{';
+ }
+ elsif ( $i eq ':' ) {
+ push @words, '\w+:';
+ }
+ elsif ( $i =~ /^\w/ ) {
+ push @words, $i;
+ }
+ else {
+ Warn("unrecognized block type $i after $abbrev, ignoring\n");
+ }
+ }
+
+ # Fix 2 for c091, prevent the pattern from matching an empty string
+ # '1 ' is an impossible block name.
+ if ( !@words ) { push @words, "1 " }
+
+ my $pattern = '(' . join( '|', @words ) . ')$';
+ my $sub_patterns = EMPTY_STRING;
+ if ( $seen{'sub'} ) {
+ $sub_patterns .= '|' . $SUB_PATTERN;
+ }
+ if ( $seen{'asub'} ) {
+ $sub_patterns .= '|' . $ASUB_PATTERN;
+ }
+ if ($sub_patterns) {
+ $pattern = '(' . $pattern . $sub_patterns . ')';
+ }
+ $pattern = '^' . $pattern;
+ return $pattern;
+} ## end sub make_block_pattern
+
+sub make_static_side_comment_pattern {
+
+ # Create the pattern used to identify static side comments
+ $static_side_comment_pattern = '^##';
+
+ # allow the user to change it
+ if ( $rOpts->{'static-side-comment-prefix'} ) {
+ my $prefix = $rOpts->{'static-side-comment-prefix'};
+ $prefix =~ s/^\s+//;
+ my $pattern = '^' . $prefix;
+ if ( bad_pattern($pattern) ) {
+ Die(
+"ERROR: the -sscp prefix '$prefix' causes the invalid regex '$pattern'\n"
+ );
+ }
+ $static_side_comment_pattern = $pattern;
+ }
+ return;
+} ## end sub make_static_side_comment_pattern
+
+sub make_closing_side_comment_prefix {
+
+ # Be sure we have a valid closing side comment prefix
+ my $csc_prefix = $rOpts->{'closing-side-comment-prefix'};
+ my $csc_prefix_pattern;
+ if ( !defined($csc_prefix) ) {
+ $csc_prefix = '## end';
+ $csc_prefix_pattern = '^##\s+end';
+ }
+ else {
+ my $test_csc_prefix = $csc_prefix;
+ if ( $test_csc_prefix !~ /^#/ ) {
+ $test_csc_prefix = '#' . $test_csc_prefix;
+ }
+
+ # make a regex to recognize the prefix
+ my $test_csc_prefix_pattern = $test_csc_prefix;
+
+ # escape any special characters
+ $test_csc_prefix_pattern =~ s/([^#\s\w])/\\$1/g;
+
+ $test_csc_prefix_pattern = '^' . $test_csc_prefix_pattern;
+
+ # allow exact number of intermediate spaces to vary
+ $test_csc_prefix_pattern =~ s/\s+/\\s\+/g;
+
+ # make sure we have a good pattern
+ # if we fail this we probably have an error in escaping
+ # characters.
+
+ if ( bad_pattern($test_csc_prefix_pattern) ) {
+
+ # shouldn't happen..must have screwed up escaping, above
+ if (DEVEL_MODE) {
+ Fault(<<EOM);
+Program Error: the -cscp prefix '$csc_prefix' caused the invalid regex '$csc_prefix_pattern'
+EOM
+ }
+
+ # just warn and keep going with defaults
+ Warn(
+"Error: the -cscp prefix '$csc_prefix' caused the invalid regex '$csc_prefix_pattern'\n"
+ );
+ Warn("Please consider using a simpler -cscp prefix\n");
+ Warn("Using default -cscp instead; please check output\n");
+ }
+ else {
+ $csc_prefix = $test_csc_prefix;
+ $csc_prefix_pattern = $test_csc_prefix_pattern;
+ }
+ }
+ $rOpts->{'closing-side-comment-prefix'} = $csc_prefix;
+ $closing_side_comment_prefix_pattern = $csc_prefix_pattern;
+ return;
+} ## end sub make_closing_side_comment_prefix
+
+##################################################
+# CODE SECTION 4: receive lines from the tokenizer
+##################################################
+
+{ ## begin closure write_line
+
+ my $nesting_depth;
+
+ # Variables used by sub check_sequence_numbers:
+ my $initial_seqno;
+ my $last_seqno;
+ my %saw_opening_seqno;
+ my %saw_closing_seqno;
+
+ # variables for the -qwaf option
+ my $in_qw_seqno;
+ my $in_qw_comma_count;
+ my $last_new_seqno;
+ my %new_seqno_from_old_seqno;
+ my $last_ending_in_quote;
+ my $added_seqno_count;
+
+ sub initialize_write_line {
+
+ $nesting_depth = undef;
+
+ $initial_seqno = undef;
+ $last_seqno = SEQ_ROOT;
+ $last_new_seqno = SEQ_ROOT;
+ %saw_opening_seqno = ();
+ %saw_closing_seqno = ();
+
+ $in_qw_seqno = 0;
+ $in_qw_comma_count = 0; # b1491
+ %new_seqno_from_old_seqno = ();
+ $last_ending_in_quote = 0;
+ $added_seqno_count = 0;
+
+ return;
+ } ## end sub initialize_write_line
+
+ sub check_sequence_numbers {
+
+ # Routine for checking sequence numbers. This only needs to be
+ # done occasionally in DEVEL_MODE to be sure everything is working
+ # correctly.
+ my ( $rtokens, $rtoken_type, $rtype_sequence, $input_line_no ) = @_;
+ my $jmax = @{$rtokens} - 1;
+ return if ( $jmax < 0 );
+ foreach my $j ( 0 .. $jmax ) {
+ my $seqno = $rtype_sequence->[$j];
+ my $token = $rtokens->[$j];
+ my $type = $rtoken_type->[$j];
+ $seqno = EMPTY_STRING unless ( defined($seqno) );
+ my $err_msg =
+"Error at j=$j, line number $input_line_no, seqno='$seqno', type='$type', tok='$token':\n";
+
+ if ( !$seqno ) {
+
+ # Sequence numbers are generated for opening tokens, so every opening
+ # token should be sequenced. Closing tokens will be unsequenced
+ # if they do not have a matching opening token.
+ if ( $is_opening_sequence_token{$token}
+ && $type ne 'q'
+ && $type ne 'Q' )
+ {
+ Fault(
+ <<EOM
+$err_msg Unexpected opening token without sequence number
+EOM
+ );
+ }
+ }
+ else {
+
+ # Save starting seqno to identify sequence method:
+ # New method starts with 2 and has continuous numbering
+ # Old method (NOT USED) starts with >2 and may have gaps
+ if ( !defined($initial_seqno) ) {
+ $initial_seqno = $seqno;
+
+ # Be sure that sequence numbers start with 2. If not,
+ # there is a programming error in the tokenizer.
+ if ( $initial_seqno != 2 ) {
+ Fault(<<EOM);
+Expecting initial sequence number of 2 but got '$initial_seqno'
+EOM
+ }
+
+ # Be sure the root sequence number is 1. This is set
+ # as a constant at the top of this module.
+ if ( SEQ_ROOT != 1 ) {
+ my $SEQ_ROOT = SEQ_ROOT;
+ Fault(<<EOM);
+The constant SEQ_ROOT has been changed from 1 to '$SEQ_ROOT'.
+EOM
+ }
+ }
+
+ if ( $is_opening_sequence_token{$token} ) {
+
+ # New method should have continuous numbering
+ if ( $initial_seqno == 2 && $seqno != $last_seqno + 1 ) {
+ Fault(
+ <<EOM
+$err_msg Unexpected opening sequence number: previous seqno=$last_seqno, but seqno= $seqno
+EOM
+ );
+ }
+ $last_seqno = $seqno;
+
+ # Numbers must be unique
+ if ( $saw_opening_seqno{$seqno} ) {
+ my $lno = $saw_opening_seqno{$seqno};
+ Fault(
+ <<EOM
+$err_msg Already saw an opening tokens at line $lno with this sequence number
+EOM
+ );
+ }
+ $saw_opening_seqno{$seqno} = $input_line_no;
+ }
+
+ # only one closing item per seqno
+ elsif ( $is_closing_sequence_token{$token} ) {
+ if ( $saw_closing_seqno{$seqno} ) {
+ my $lno = $saw_closing_seqno{$seqno};
+ Fault(
+ <<EOM
+$err_msg Already saw a closing token with this seqno at line $lno
+EOM
+ );
+ }
+ $saw_closing_seqno{$seqno} = $input_line_no;
+
+ # Every closing seqno must have an opening seqno
+ if ( !$saw_opening_seqno{$seqno} ) {
+ Fault(
+ <<EOM
+$err_msg Saw a closing token but no opening token with this seqno
+EOM
+ );
+ }
+ }
+
+ # Sequenced items must be opening or closing
+ else {
+ Fault(
+ <<EOM
+$err_msg Unexpected token type with a sequence number
+EOM
+ );
+ }
+ }
+ }
+ return;
+ } ## end sub check_sequence_numbers
+
+ # hash keys which are common to old and new line_of_tokens
+ my @common_keys;
+
+ BEGIN {
+ @common_keys = qw(
+ _curly_brace_depth _ending_in_quote
+ _guessed_indentation_level _line_number
+ _line_text _line_type
+ _paren_depth _square_bracket_depth
+ _starting_in_quote
+ );
+ }
+
+ sub write_line {
+
+ my ( $self, $line_of_tokens_input ) = @_;
+
+ # This routine receives lines one-by-one from the tokenizer and stores
+ # them in a format suitable for further processing. After the last
+ # line has been sent, the tokenizer will call sub 'finish_formatting'
+ # to do the actual formatting.
+
+ # Given:
+ # $line_of_tokens_input = hash ref of one line from the tokenizer
+
+ my $rLL = $self->[_rLL_];
+ my $line_of_tokens = {};
+
+ # copy common hash key values
+ @{$line_of_tokens}{@common_keys} =
+ @{$line_of_tokens_input}{@common_keys};
+
+ my $line_type = $line_of_tokens_input->{_line_type};
+ my $tee_output;
+
+ my $Klimit = $self->[_Klimit_];
+ my ( $Kfirst, $Klast );
+
+ # Handle line of non-code
+ if ( $line_type ne 'CODE' ) {
+ $tee_output ||= $rOpts_tee_pod
+ && substr( $line_type, 0, 3 ) eq 'POD';
+
+ $line_of_tokens->{_level_0} = 0;
+ $line_of_tokens->{_ci_level_0} = 0;
+ $line_of_tokens->{_nesting_blocks_0} = EMPTY_STRING;
+ $line_of_tokens->{_nesting_tokens_0} = EMPTY_STRING;
+ $line_of_tokens->{_ended_in_blank_token} = undef;
+ }
+
+ # Handle line of code
+ else {
+
+ my $rtokens = $line_of_tokens_input->{_rtokens};
+ my $jmax = @{$rtokens} - 1;
+
+ if ( $jmax >= 0 ) {
+
+ $Kfirst = defined($Klimit) ? $Klimit + 1 : 0;
+
+ #----------------------------
+ # get the tokens on this line
+ #----------------------------
+ $self->write_line_inner_loop( $line_of_tokens_input,
+ $line_of_tokens );
+
+ # update Klimit for added tokens
+ $Klimit = @{$rLL} - 1;
+ $Klast = $Klimit;
+
+ } ## end if ( $jmax >= 0 )
+ else {
+
+ # blank line
+ $line_of_tokens->{_level_0} = 0;
+ $line_of_tokens->{_ci_level_0} = 0;
+ $line_of_tokens->{_nesting_blocks_0} = EMPTY_STRING;
+ $line_of_tokens->{_nesting_tokens_0} = EMPTY_STRING;
+ $line_of_tokens->{_ended_in_blank_token} = undef;
+
+ }
+
+ $tee_output ||=
+ $rOpts_tee_block_comments
+ && $jmax == 0
+ && $rLL->[$Kfirst]->[_TYPE_] eq '#';
+
+ $tee_output ||=
+ $rOpts_tee_side_comments
+ && defined($Kfirst)
+ && $Klimit > $Kfirst
+ && $rLL->[$Klimit]->[_TYPE_] eq '#';
+
+ } ## end if ( $line_type eq 'CODE')
+
+ # Finish storing line variables
+ $line_of_tokens->{_rK_range} = [ $Kfirst, $Klast ];
+ $self->[_Klimit_] = $Klimit;
+ my $rlines = $self->[_rlines_];
+ push @{$rlines}, $line_of_tokens;
+
+ if ($tee_output) {
+ my $fh_tee = $self->[_fh_tee_];
+ my $line_text = $line_of_tokens_input->{_line_text};
+ $fh_tee->print($line_text) if ($fh_tee);
+ }
+
+ # We must use the old line because the qw logic may change this flag
+ $last_ending_in_quote = $line_of_tokens_input->{_ending_in_quote};
+
+ return;
+ } ## end sub write_line
+
+ sub qw_to_function {
+ my ( $self, $line_of_tokens, $is_ending_token ) = @_;
+
+ # This sub implements the -qwaf option:
+ # It is called for every type 'q' token which is part of a 'qw(' list.
+ # Essentially all of the coding for the '-qwaf' option is in this sub.
+
+ # Input parameters:
+ # $line_of_tokens = information hash for this line from the tokenizer,
+ # $is_ending_token = true if this qw does not extend to the next line
+
+ # Method:
+ # This qw token has already been pushed onto the output token stack, so
+ # we will pop it off and push on a sequence of tokens created by
+ # breaking it into an opening, a sequence of comma-separated quote
+ # items, and a closing paren. For multi-line qw quotes, there will be
+ # one call per input line until the end of the qw text is reached
+ # and processed.
+
+ # Note 1: A critical issue is to correctly generate and insert a new
+ # sequence number for the new parens into the sequence number stream.
+ # The new sequence number is the closure variable '$in_qw_seqno'. It
+ # is defined when the leading 'qw(' is seen, and is undefined when the
+ # closing ')' is output.
+
+ # Note 2: So far, no reason has been found to coordinate this logic
+ # with the logic which adds and deletes commas. We are adding trailing
+ # phantom commas here, except for a single list item, so no additional
+ # trailing comma should be added. And if a phantom trailing comma gets
+ # deleted, it should not matter because it does not get displayed.
+
+ my $rLL = $self->[_rLL_];
+ my $rSS = $self->[_rSS_];
+ my $rdepth_of_opening_seqno = $self->[_rdepth_of_opening_seqno_];
+
+ # Does this qw text spill over onto another line?
+ my $is_continued =
+ ( $is_ending_token && $line_of_tokens->{_ending_in_quote} );
+
+ my $qw_text = $rLL->[-1]->[_TOKEN_];
+ my $qw_type = $rLL->[-1]->[_TYPE_];
+ my $qw_level = $rLL->[-1]->[_LEVEL_];
+ my $qw_text_start = $qw_text;
+ my $opening = EMPTY_STRING;
+ my $closing = EMPTY_STRING;
+ my $has_opening_space;
+ my $has_closing_space;
+
+ # the new word tokens are 1 level deeper than the original 'q' token
+ my $level_words = $qw_level + 1;
+
+ if ( $qw_type ne 'q' ) {
+
+ # This should never happen because the calling sub should have just
+ # pushed a token of type 'q' onto the token list.
+ my $lno = $line_of_tokens->{_line_number};
+ Fault("$lno: expecting type 'q' but got $qw_type");
+ return;
+ }
+
+ if ( !length($qw_text) ) {
+
+ # This seems to be an empty type 'q' token. A blank line within a
+ # qw quote is marked as a blank line rather than a blank 'q' token.
+ # So this should never happen.
+ my $lno = $line_of_tokens->{_line_number};
+ DEVEL_MODE && Fault("$lno: received empty type 'q' text\n");
+ return;
+ }
+
+ # remove leading 'qw(' if we are starting a new qw
+ if ( !$in_qw_seqno ) {
+ $opening = substr( $qw_text, 0, 3 );
+ if ( $opening ne 'qw(' ) {
+
+ # Caller should have checked this before calling
+ my $lno = $line_of_tokens->{_line_number};
+ DEVEL_MODE && Fault("$lno: unexpected qw opening: $opening\n");
+ return;
+ }
+ $qw_text = substr( $qw_text, 3 );
+ $has_opening_space = $qw_text =~ s/^\s+//;
+
+ # Do not use -qwaf under high stress (b1482,b1483,b1484,b1485,1486)
+ # Note: so far all known cases of stress instability have had -naws
+ # set, so this is included for now. It may eventually need to be
+ # removed.
+ # NOTE: The update for b1491 also fixes cases b1482-6 in a
+ # more general way, so this test can be deactivated.
+ if ( 0
+ && !$rOpts_add_whitespace
+ && $level_words >= $high_stress_level )
+ {
+ return;
+ }
+ }
+
+ # Look for and remove any closing ')'
+ if ( !$is_continued ) {
+ if ( length($qw_text) > 0 && substr( $qw_text, -1, 1 ) eq ')' ) {
+ $closing = substr( $qw_text, -1, 1 );
+ $qw_text = substr( $qw_text, 0, -1 );
+ $qw_text =~ s/\s+$//;
+ $has_closing_space = $qw_text =~ s/^\s+//;
+ }
+ else {
+
+ # We are at the end of a 'qw(' quote according to the
+ # tokenizer flag '_ending_in_quote', but there is no
+ # ending ')'. The '$is_continued' flag seems to be wrong.
+ my $lno = $line_of_tokens->{_line_number};
+ Fault(<<EOM);
+qwaf inconsistency at input line $lno:
+closing token is '$closing'
+is_continued = $is_continued
+EOM
+ return;
+ }
+ }
+
+ # Get any quoted words
+ my @words;
+ if ( length($qw_text) ) {
+ @words = split /\s+/, $qw_text;
+ }
+
+ # Be sure we have something left to output
+ if ( !$opening && !$closing && !@words ) {
+ my $lno = $line_of_tokens->{_line_number};
+ DEVEL_MODE && Fault(<<EOM);
+Error parsing the following qw string at line $lno:
+$qw_text_start
+EOM
+ return;
+ }
+
+ # The combination -naws -lp can currently be unstable for multi-line qw
+ # (b1487, b1488).
+ # NOTE: this instability has been fixed by following the input
+ # whitespace within parens, but keep this code for a while in case the
+ # issue arises in the future (b1487).
+ if ( 0
+ && !$rOpts_add_whitespace
+ && $rOpts_line_up_parentheses
+ && ( !$opening || !$closing ) )
+ {
+ return;
+ }
+
+ # c414 and c424: do not join a '\' and a closing ')' like here:
+ # my @clock_chars = qw( | / - \ | / - \ );
+ if (
+ @words
+ && $closing
+ && substr( $words[-1], -1, 1 ) eq BACKSLASH
+ && (
+ !$rOpts_add_whitespace
+ || ( $tightness{')'} == 2
+ || $tightness{')'} == 1 && @words == 1 )
+ )
+ )
+ {
+ # fix by including a space after the \
+ $words[-1] .= SPACE;
+
+ # and for symmetry, before the first word if the '(' is on this line
+ if ( $opening && $rOpts_add_whitespace ) {
+ $words[0] = SPACE . $words[0];
+ }
+ }
+
+ #---------------------------------------------------------------------
+ # This is the point of no return if the transformation has not started
+ #---------------------------------------------------------------------
+
+ # pop old type q token
+ my $rtoken_q = pop @{$rLL};
+
+ # now push on the replacement tokens
+ my $comma_count = 0;
+
+ if ($opening) {
+
+ # generate a new sequence number, one greater than the previous,
+ # and update a count for synchronization with the calling sub.
+ $in_qw_seqno = ++$last_new_seqno;
+ $added_seqno_count++;
+ my $seqno = $in_qw_seqno;
+
+ # update relevant seqno hashes
+ $self->[_K_opening_container_]->{$seqno} = @{$rLL};
+ $rdepth_of_opening_seqno->[$seqno] = $nesting_depth;
+ $nesting_depth++;
+ $self->[_rI_opening_]->[$seqno] = @{$rSS};
+
+ if ( $level_words > $self->[_maximum_level_] ) {
+ my $input_line_no = $line_of_tokens->{_line_number};
+ $self->[_maximum_level_] = $level_words;
+ $self->[_maximum_level_at_line_] = $input_line_no;
+ }
+ push @{$rSS}, $seqno;
+
+ # make and push the 'qw' token
+ my $rtoken_qw = copy_token_as_type( $rtoken_q, 'U', 'qw' );
+ push @{$rLL}, $rtoken_qw;
+
+ # make and push the '(' with the new sequence number
+ my $rtoken_opening = copy_token_as_type( $rtoken_q, '{', '(' );
+ $rtoken_opening->[_TYPE_SEQUENCE_] = $seqno;
+ push @{$rLL}, $rtoken_opening;
+ }
+
+ # All words must be followed by a comma except for an intact
+ # structure with a single word, like 'qw(hello)'
+ my $commas_needed =
+ !( ( $opening || !$in_qw_comma_count ) && $closing && @words == 1 );
+
+ # Make and push each word as a type 'Q' quote followed by a phantom
+ # comma. The phantom comma is type ',' and is processed
+ # exactly like any other comma, but it has an empty string as the token
+ # text, so the line will display as a regular qw quote.
+ if (@words) {
+
+ foreach my $word (@words) {
+
+ # always space after a comma; follow input spacing after '('
+ if ( $comma_count || $has_opening_space ) {
+ my $rtoken_space =
+ copy_token_as_type( $rtoken_q, 'b', SPACE );
+ $rtoken_space->[_LEVEL_] = $level_words;
+ push @{$rLL}, $rtoken_space;
+ }
+
+ # this quoted text
+ my $rtoken_word = copy_token_as_type( $rtoken_q, 'Q', $word );
+ $rtoken_word->[_LEVEL_] = $level_words;
+ push @{$rLL}, $rtoken_word;
+
+ # Add a comma if needed. NOTE on trailing commas:
+ # - For multiple words: Trailing commas must be added.
+ # Otherwise, -atc might put a comma in a qw list.
+ # - For single words: Trailing commas are not required, and
+ # are best avoided. This is because:
+ # - atc will not add commas to a list which has no commas
+ # - This will make the single-item spacing rule work as
+ # expected.
+ # - This will reduce the chance of instability (b1491)
+ if ($commas_needed) {
+ my $rtoken_comma =
+ copy_token_as_type( $rtoken_q, ',', EMPTY_STRING );
+ $rtoken_comma->[_LEVEL_] = $level_words;
+ push @{$rLL}, $rtoken_comma;
+ $comma_count++;
+ }
+ }
+ }
+
+ # make and push closing sequenced item ')'
+ if ($closing) {
+
+ # OPTIONAL: remove a previous comma if it is the only one. This can
+ # happen if this closing paren starts a new line and there was just
+ # one word in the qw list. The reason for doing this would be
+ # to avoid possible instability, but none is currently known. b1491.
+ # This has been tested but is currently inactive because it has not
+ # been found to be necessary.
+ if ( 0
+ && !@words
+ && $in_qw_comma_count == 1
+ && $rLL->[-1]->[_TYPE_] eq ',' )
+ {
+
+ # It is simpler to convert it to a blank; otherwise it would
+ # be necessary to change the range [Kfirst,Klast] of the
+ # previous line and the current line.
+ $rLL->[-1]->[_TYPE_] = 'b';
+ }
+
+ # follow input spacing before ')'
+ if ($has_closing_space) {
+ my $rtoken_space = copy_token_as_type( $rtoken_q, 'b', SPACE );
+ $rtoken_space->[_LEVEL_] = $level_words;
+ push @{$rLL}, $rtoken_space;
+ }
+
+ my $seqno = $in_qw_seqno;
+ $self->[_K_closing_container_]->{$seqno} = @{$rLL};
+ $nesting_depth = $rdepth_of_opening_seqno->[$seqno];
+ $self->[_rI_closing_]->[$seqno] = @{$rSS};
+ push @{$rSS}, -1 * $seqno;
+
+ # make the ')'
+ my $rtoken_closing = copy_token_as_type( $rtoken_q, '}', ')' );
+ $rtoken_closing->[_TYPE_SEQUENCE_] = $in_qw_seqno;
+ push @{$rLL}, $rtoken_closing;
+
+ # all done with this qw list
+ $in_qw_seqno = 0;
+ $in_qw_comma_count = 0;
+ }
+ else {
+ $in_qw_comma_count += $comma_count;
+ }
+
+ # The '_ending_in_quote' flag for this line is no longer needed
+ if ($is_continued) { $line_of_tokens->{_ending_in_quote} = 0 }
+
+ return;
+ } ## end sub qw_to_function
+
+ sub write_line_inner_loop {
+ my ( $self, $line_of_tokens_old, $line_of_tokens ) = @_;
+
+ # Copy the tokens on one line received from the tokenizer to their new
+ # storage locations.
+
+ # Input parameters:
+ # $line_of_tokens_old = line received from tokenizer
+ # $line_of_tokens = line of tokens being formed for formatter
+
+ my $rtokens = $line_of_tokens_old->{_rtokens};
+ my $jmax = @{$rtokens} - 1;
+ if ( $jmax < 0 ) {
+
+ # safety check; shouldn't happen
+ my $lno = $line_of_tokens->{_line_number};
+ DEVEL_MODE && Fault("$lno: unexpected jmax=$jmax\n");
+ return;
+ }
+
+ my $line_index = $line_of_tokens_old->{_line_number} - 1;
+ my $rtoken_type = $line_of_tokens_old->{_rtoken_type};
+ my $rblock_type = $line_of_tokens_old->{_rblock_type};
+ my $rtype_sequence = $line_of_tokens_old->{_rtype_sequence};
+ my $rlevels = $line_of_tokens_old->{_rlevels};
+
+ my $rLL = $self->[_rLL_];
+ my $rSS = $self->[_rSS_];
+ my $rdepth_of_opening_seqno = $self->[_rdepth_of_opening_seqno_];
+
+ DEVEL_MODE
+ && check_sequence_numbers( $rtokens, $rtoken_type,
+ $rtype_sequence, $line_index + 1 );
+
+ # Find the starting nesting depth ...
+ # It must be the value of variable 'level' of the first token
+ # because the nesting depth is used as a token tag in the
+ # vertical aligner and is compared to actual levels.
+ # So vertical alignment problems will occur with any other
+ # starting value.
+ if ( !defined($nesting_depth) ) {
+ $nesting_depth = $rlevels->[0];
+ $nesting_depth = 0 if ( $nesting_depth < 0 );
+ $rdepth_of_opening_seqno->[SEQ_ROOT] = $nesting_depth - 1;
+ }
+
+ # error check for -qwaf:
+ if ($in_qw_seqno) {
+ if ( $rtoken_type->[0] ne 'q' ) {
+
+ # -qwaf is expecting another 'q' token for multiline -qw
+ # based on the {_ending_in_quote} flag from the tokenizer
+ # of the previous line, but a 'q' didn't arrive.
+ my $lno = $line_index + 1;
+ Fault(
+"$lno: -qwaf expecting qw continuation line but saw type '$rtoken_type->[0]'\n"
+ );
+ }
+ }
+
+ my $j = -1;
+
+ # NOTE: coding efficiency is critical in this loop over all tokens
+ foreach my $token ( @{$rtokens} ) {
+
+ # NOTE: Do not clip the 'level' variable yet if it is negative. We
+ # will do that later, in sub 'store_token_to_go'. The reason is
+ # that in files with level errors, the logic in 'weld_cuddled_else'
+ # uses a stack logic that will give bad welds if we clip levels
+ # here. (A recent update will probably not even allow negative
+ # levels to arrive here any longer).
+
+ my @tokary;
+
+ # Handle tokens with sequence numbers ...
+ # note the ++ increment hidden here for efficiency
+ if ( $rtype_sequence->[ ++$j ] ) {
+ my $seqno_old = $rtype_sequence->[$j];
+ my $seqno = $seqno_old;
+
+ my $sign = 1;
+ if ( $is_opening_token{$token} ) {
+ if ($added_seqno_count) {
+ $seqno += $added_seqno_count;
+ $new_seqno_from_old_seqno{$seqno_old} = $seqno;
+ }
+ if ( DEVEL_MODE && $seqno != $last_new_seqno + 1 ) {
+ my $lno = $line_index + 1;
+ Fault("$lno: seqno=$seqno last=$last_new_seqno\n");
+ }
+ $last_new_seqno = $seqno;
+ $self->[_K_opening_container_]->{$seqno} = @{$rLL};
+ $rdepth_of_opening_seqno->[$seqno] = $nesting_depth;
+ $nesting_depth++;
+
+ # Save a sequenced block type at its opening token.
+ # Note that unsequenced block types can occur in
+ # unbalanced code with errors but are ignored here.
+ if ( $rblock_type->[$j] ) {
+ my $block_type = $rblock_type->[$j];
+
+ # Store the block type with sequence number as hash key
+ $self->[_rblock_type_of_seqno_]->{$seqno} = $block_type;
+
+ # and save anynymous subs and named subs in separate
+ # hashes to avoid future pattern tests
+ if ( $matches_ASUB{$block_type} ) {
+ $self->[_ris_asub_block_]->{$seqno} = 1;
+ }
+
+ # The pre-check on space speeds up this test:
+ elsif ($block_type =~ /\s/
+ && $block_type =~ /$SUB_PATTERN/ )
+ {
+ $self->[_ris_sub_block_]->{$seqno} = 1;
+ }
+ else {
+ # not a sub type
+ }
+ }
+ }
+ elsif ( $is_closing_token{$token} ) {
+
+ if ($added_seqno_count) {
+ $seqno =
+ $new_seqno_from_old_seqno{$seqno_old} || $seqno_old;
+ }
+
+ # The opening depth should always be defined, and
+ # it should equal $nesting_depth-1. To protect
+ # against unforeseen error conditions, however, we
+ # will check this and fix things if necessary. For
+ # a test case see issue c055.
+ my $opening_depth = $rdepth_of_opening_seqno->[$seqno];
+ if ( !defined($opening_depth) ) {
+ $opening_depth = $nesting_depth - 1;
+ $opening_depth = 0 if ( $opening_depth < 0 );
+ $rdepth_of_opening_seqno->[$seqno] = $opening_depth;
+
+ # This is not fatal but should not happen. The
+ # tokenizer generates sequence numbers
+ # incrementally upon encountering each new
+ # opening token, so every positive sequence
+ # number should correspond to an opening token.
+ my $lno = $line_index + 1;
+ DEVEL_MODE && Fault(<<EOM);
+$lno: No opening token seen for closing token = '$token' at seq=$seqno at depth=$opening_depth
+EOM
+ }
+ $self->[_K_closing_container_]->{$seqno} = @{$rLL};
+ $nesting_depth = $opening_depth;
+ $sign = -1;
+ }
+ elsif ( $token eq '?' ) {
+ if ($added_seqno_count) {
+ $seqno += $added_seqno_count;
+ $new_seqno_from_old_seqno{$seqno_old} = $seqno;
+ }
+ if ( DEVEL_MODE && $seqno != $last_new_seqno + 1 ) {
+ my $lno = $line_index + 1;
+ Fault("$lno: seqno=$seqno last=$last_new_seqno\n");
+ }
+ $last_new_seqno = $seqno;
+ $self->[_K_opening_ternary_]->{$seqno} = @{$rLL};
+ }
+ elsif ( $token eq ':' ) {
+ if ($added_seqno_count) {
+ $seqno =
+ $new_seqno_from_old_seqno{$seqno_old} || $seqno_old;
+ }
+ $sign = -1;
+ $self->[_K_closing_ternary_]->{$seqno} = @{$rLL};
+ }
+
+ # The only sequenced types output by the tokenizer are
+ # the opening & closing containers and the ternary
+ # types. So we would only get here if the tokenizer has
+ # been changed to mark some other tokens with sequence
+ # numbers, or if an error has been introduced in a
+ # hash such as %is_opening_container
+ else {
+ my $lno = $line_index + 1;
+ DEVEL_MODE && Fault(<<EOM);
+$lno: Unexpected sequenced token '$token' of type '$rtoken_type->[$j]', sequence=$seqno arrived from tokenizer.
+Expecting only opening or closing container tokens or ternary tokens with sequence numbers.
+EOM
+ }
+
+ if ( $sign > 0 ) {
+ $self->[_rI_opening_]->[$seqno] = @{$rSS};
+
+ # For efficiency, we find the maximum level of
+ # opening tokens of any type. The actual maximum
+ # level will be that of their contents which is 1
+ # greater. That will be fixed in sub
+ # 'finish_formatting'.
+ my $level = $rlevels->[$j];
+ if ( $level > $self->[_maximum_level_] ) {
+ $self->[_maximum_level_] = $level;
+ $self->[_maximum_level_at_line_] = $line_index + 1;
+ }
+ }
+ else { $self->[_rI_closing_]->[$seqno] = @{$rSS} }
+ push @{$rSS}, $sign * $seqno;
+ $tokary[_TYPE_SEQUENCE_] = $seqno;
+ }
+ else {
+ $tokary[_TYPE_SEQUENCE_] = EMPTY_STRING;
+ }
+
+ # Here we are storing the first five variables per token. The
+ # remaining token variables will be added later as follows:
+ # _TOKEN_LENGTH_ is added by sub store_token
+ # _CUMULATIVE_LENGTH_ is added by sub store_token
+ # _CI_LEVEL_ is added by sub set_ci
+ # So all token variables are available for use after sub set_ci.
+
+ $tokary[_TOKEN_] = $token;
+ $tokary[_TYPE_] = $rtoken_type->[$j];
+ $tokary[_LEVEL_] = $rlevels->[$j];
+ $tokary[_LINE_INDEX_] = $line_index;
+
+ push @{$rLL}, \@tokary;
+
+ # handle -qwaf option for converting a qw quote (type = 'q') to
+ # function call
+ if (
+ $rOpts_qw_as_function
+ && $rtoken_type->[$j] eq 'q'
+ && (
+
+ # continuing in a qw?
+ $in_qw_seqno
+
+ # starting a new qw?
+ || ( ( $j > 0 || !$last_ending_in_quote )
+ && substr( $token, 0, 3 ) eq 'qw(' )
+ )
+ )
+ {
+ $self->qw_to_function( $line_of_tokens, $j == $jmax );
+ }
+
+ } ## end token loop
+
+ # Need to remember if we can trim the input line
+ $line_of_tokens->{_ended_in_blank_token} = $rtoken_type->[$jmax] eq 'b';
+
+ # Values needed by Logger if a logfile is saved:
+ if ( $self->[_save_logfile_] ) {
+ $line_of_tokens->{_level_0} = $rlevels->[0];
+ $line_of_tokens->{_ci_level_0} = 0; # fix later
+ $line_of_tokens->{_nesting_blocks_0} =
+ $line_of_tokens_old->{_nesting_blocks_0};
+ $line_of_tokens->{_nesting_tokens_0} =
+ $line_of_tokens_old->{_nesting_tokens_0};
+ }
+
+ return;
+
+ } ## end sub write_line_inner_loop
+
+} ## end closure write_line
+
+#############################################
+# CODE SECTION 5: Pre-process the entire file
+#############################################
+
+sub finish_formatting {
+
+ my ( $self, $severe_error ) = @_;
+
+ # The file has been tokenized and is ready to be formatted.
+ # All of the relevant data is stored in $self, ready to go.
+
+ # Given:
+ # $severe_error = true if a severe error was encountered
+
+ # Returns:
+ # true if input file was copied verbatim due to errors
+ # false otherwise
+
+ # Some of the code in sub break_lists is not robust enough to process code
+ # with arbitrary brace errors. The simplest fix is to just return the file
+ # verbatim if there are brace errors. This fixes issue c160.
+ $severe_error ||= get_saw_brace_error();
+
+ # Check the maximum level. If it is extremely large we will give up and
+ # output the file verbatim. Note that the actual maximum level is 1
+ # greater than the saved value, so we fix that here.
+ $self->[_maximum_level_] += 1;
+ my $maximum_level = $self->[_maximum_level_];
+ my $maximum_table_index = $#maximum_line_length_at_level;
+ if ( !$severe_error && $maximum_level >= $maximum_table_index ) {
+ $severe_error ||= 1;
+ Warn(<<EOM);
+The maximum indentation level, $maximum_level, exceeds the builtin limit of $maximum_table_index.
+Something may be wrong; formatting will be skipped.
+EOM
+ }
+
+ #----------------------------------------------------------------
+ # Output file verbatim if severe error or no formatting requested
+ #----------------------------------------------------------------
+ if ( $severe_error || $rOpts->{notidy} ) {
+ $self->dump_verbatim();
+ $self->wrapup($severe_error);
+ return 1;
+ }
+
+ {
+ my $rix_side_comments = $self->set_CODE_type();
+
+ $self->find_non_indenting_braces($rix_side_comments);
+
+ # Handle any requested side comment deletions. It is easier to get
+ # this done here rather than farther down the pipeline because IO
+ # lines take a different route, and because lines with deleted HSC
+ # become BL lines. We have already handled any tee requests in sub
+ # getline, so it is safe to delete side comments now.
+ $self->delete_side_comments($rix_side_comments)
+ if ( $rOpts_delete_side_comments
+ || $rOpts_delete_closing_side_comments );
+ }
+
+ # Verify that the line hash does not have any unknown keys.
+ $self->check_line_hashes() if (DEVEL_MODE);
+
+ $self->interbracket_arrow_check();
+
+ {
+ # Make a pass through all tokens, adding or deleting any whitespace as
+ # required. Also make any other changes, such as adding semicolons.
+ # All token changes must be made here so that the token data structure
+ # remains fixed for the rest of this iteration.
+ my ( $error, $rqw_lines ) = $self->respace_tokens();
+ if ($error) {
+ $self->dump_verbatim();
+ $self->wrapup();
+ return 1;
+ }
+
+ # sub 'set_ci' is called after sub respace to allow use of type counts
+ # Token variable _CI_LEVEL_ is only defined after this call
+ $self->set_ci();
+
+ $self->find_multiline_qw($rqw_lines);
+ }
+
+ # Dump unique hash keys
+ if ( $rOpts->{'dump-unique-keys'} ) {
+ $self->dump_unique_keys();
+ Exit(0);
+ }
+
+ # Dump any requested block summary data
+ if ( $rOpts->{'dump-block-summary'} ) {
+ $self->dump_block_summary();
+ Exit(0);
+ }
+
+ # Dump variable usage info if requested
+ if ( $rOpts->{'dump-unusual-variables'} ) {
+ $self->dump_unusual_variables();
+ Exit(0);
+ }
+
+ # Act on -warn-variable-types if requested and the logger is available
+ # (the logger is deactivated during iterations)
+ $self->warn_variable_types()
+ if ( %{$rwarn_variable_types}
+ && $self->[_logger_object_] );
+
+ if ( $rOpts->{'warn-mismatched-args'}
+ || $rOpts->{'warn-mismatched-returns'} )
+ {
+ $self->warn_mismatched()
+ if ( $self->[_logger_object_] );
+ }
+
+ if ( $rOpts->{'dump-mismatched-args'} ) {
+ $self->dump_mismatched_args();
+ Exit(0);
+ }
+
+ if ( $rOpts->{'dump-mismatched-returns'} ) {
+ $self->dump_mismatched_returns();
+ Exit(0);
+ }
+
+ if ( $rOpts->{'dump-mixed-call-parens'} ) {
+ $self->dump_mixed_call_parens();
+ Exit(0);
+ }
+
+ # Act on -want-call-parens and --nowant-call-parens requested and the
+ # logger is available (the logger is deactivated during iterations)
+ $self->scan_call_parens()
+ if ( %call_paren_style
+ && $self->[_logger_object_] );
+
+ $self->examine_vertical_tightness_flags();
+
+ $self->set_excluded_lp_containers();
+
+ $self->keep_old_line_breaks();
+
+ # Implement any welding needed for the -wn or -cb options
+ $self->weld_containers();
+
+ # Collect info needed to implement the -xlp style
+ $self->xlp_collapsed_lengths()
+ if ( $rOpts_line_up_parentheses && $rOpts_extended_line_up_parentheses );
+
+ # Locate small nested blocks which should not be broken
+ $self->mark_short_nested_blocks();
+
+ $self->special_indentation_adjustments();
+
+ # Verify that the main token array looks OK. If this ever causes a fault
+ # then place similar checks before the sub calls above to localize the
+ # problem.
+ $self->check_rLL("Before 'process_all_lines'") if (DEVEL_MODE);
+
+ # Finishes formatting and write the result to the line sink.
+ # Eventually this call should just change the 'rlines' data according to the
+ # new line breaks and then return so that we can do an internal iteration
+ # before continuing with the next stages of formatting.
+ $self->process_all_lines();
+
+ # A final routine to tie up any loose ends
+ $self->wrapup();
+ return;
+} ## end sub finish_formatting
+
+my %is_loop_type;
+
+BEGIN {
+ my @q = qw( for foreach while do until );
+ @is_loop_type{@q} = (1) x scalar(@q);
+}
+
+sub find_level_info {
+
+ my ($self) = @_;
+
+ # Find level ranges and total variations of all code blocks in this file.
+
+ # Returns:
+ # ref to hash with block info, with seqno as key (see below)
+
+ # The array _rSS_ has the complete container tree for this file.
+ my $rSS = $self->[_rSS_];
+
+ # We will be ignoring everything except code block containers
+ my $rblock_type_of_seqno = $self->[_rblock_type_of_seqno_];
+
+ my @stack;
+ my %level_info;
+
+ # TREE_LOOP:
+ foreach my $sseq ( @{$rSS} ) {
+ my $stack_depth = @stack;
+ my $seq_next = $sseq > 0 ? $sseq : -$sseq;
+
+ next if ( !$rblock_type_of_seqno->{$seq_next} );
+ if ( $sseq > 0 ) {
+
+ # STACK_LOOP:
+ my $item;
+ foreach my $seq (@stack) {
+ $item = $level_info{$seq};
+ if ( $item->{maximum_depth} < $stack_depth ) {
+ $item->{maximum_depth} = $stack_depth;
+ }
+ $item->{block_count}++;
+ } ## end STACK LOOP
+
+ push @stack, $seq_next;
+ my $block_type = $rblock_type_of_seqno->{$seq_next};
+
+ # If this block is a loop nested within a loop, then we
+ # will mark it as an 'inner_loop'. This is a useful
+ # complexity measure.
+ my $is_inner_loop = 0;
+ if ( $is_loop_type{$block_type} && defined($item) ) {
+ $is_inner_loop = $is_loop_type{ $item->{block_type} };
+ }
+
+ $level_info{$seq_next} = {
+ starting_depth => $stack_depth,
+ maximum_depth => $stack_depth,
+ block_count => 1,
+ block_type => $block_type,
+ is_inner_loop => $is_inner_loop,
+ };
+ }
+ else {
+ my $seq_test = pop @stack;
+
+ # error check
+ if ( $seq_test != $seq_next ) {
+
+ # Shouldn't happen - the $rSS array must have an error
+ DEVEL_MODE && Fault("stack error finding total depths\n");
+
+ %level_info = ();
+ last;
+ }
+ }
+ } ## end TREE_LOOP
+
+ return \%level_info;
+} ## end sub find_level_info
+
+sub find_loop_label {
+
+ my ( $self, $seqno ) = @_;
+
+ # Given:
+ # $seqno = sequence number of a block of code for a loop
+ # Return:
+ # $label = the loop label text, if any, or an empty string
+
+ my $rLL = $self->[_rLL_];
+ my $rlines = $self->[_rlines_];
+ my $K_opening_container = $self->[_K_opening_container_];
+
+ my $label = EMPTY_STRING;
+ my $K_opening = $K_opening_container->{$seqno};
+
+ # backup to the line with the opening paren, if any, in case the
+ # keyword is on a different line
+ my $Kp = $self->K_previous_code($K_opening);
+ return $label unless ( defined($Kp) );
+ if ( $rLL->[$Kp]->[_TOKEN_] eq ')' ) {
+ $seqno = $rLL->[$Kp]->[_TYPE_SEQUENCE_];
+ $K_opening = $K_opening_container->{$seqno};
+ }
+
+ return $label unless ( defined($K_opening) );
+ my $lx_open = $rLL->[$K_opening]->[_LINE_INDEX_];
+
+ # look for a label within a few lines; allow a couple of blank lines
+ foreach my $lx ( reverse( $lx_open - 3 .. $lx_open ) ) {
+ last if ( $lx < 0 );
+ my $line_of_tokens = $rlines->[$lx];
+ my $line_type = $line_of_tokens->{_line_type};
+
+ # stop search on a non-code line
+ last if ( $line_type ne 'CODE' );
+
+ my $rK_range = $line_of_tokens->{_rK_range};
+ my ( $Kfirst, $Klast_uu ) = @{$rK_range};
+
+ # skip a blank line
+ next if ( !defined($Kfirst) );
+
+ # check for a label
+ if ( $rLL->[$Kfirst]->[_TYPE_] eq 'J' ) {
+ $label = $rLL->[$Kfirst]->[_TOKEN_];
+ last;
+ }
+
+ # quit the search if we are above the starting line
+ last if ( $lx < $lx_open );
+ }
+
+ return $label;
+} ## end sub find_loop_label
+
+{ ## closure find_mccabe_count
+ my %is_mccabe_logic_keyword;
+ my %is_mccabe_logic_operator;
+
+ BEGIN {
+ my @q = (qw( && || ||= &&= ? <<= >>= ));
+ @is_mccabe_logic_operator{@q} = (1) x scalar(@q);
+
+ @q = (qw( and or xor if else elsif unless until while for foreach ));
+ @is_mccabe_logic_keyword{@q} = (1) x scalar(@q);
+ } ## end BEGIN
+
+ sub find_mccabe_count {
+ my ($self) = @_;
+
+ # Find the cumulative mccabe count to each token
+ # Return '$rmccabe_count_sum' = ref to array with cumulative
+ # mccabe count to each token $K
+
+ # NOTE: This sub currently follows the definitions in Perl::Critic
+
+ my $rmccabe_count_sum;
+ my $rLL = $self->[_rLL_];
+ my $count = 0;
+ my $Klimit = $self->[_Klimit_];
+ foreach my $KK ( 0 .. $Klimit ) {
+ $rmccabe_count_sum->{$KK} = $count;
+ my $type = $rLL->[$KK]->[_TYPE_];
+ if ( $type eq 'k' ) {
+ my $token = $rLL->[$KK]->[_TOKEN_];
+ if ( $is_mccabe_logic_keyword{$token} ) { $count++ }
+ }
+ else {
+ if ( $is_mccabe_logic_operator{$type} ) {
+ $count++;
+ }
+ }
+ }
+ $rmccabe_count_sum->{ $Klimit + 1 } = $count;
+ return $rmccabe_count_sum;
+ } ## end sub find_mccabe_count
+} ## end closure find_mccabe_count
+
+sub find_code_line_count {
+ my ($self) = @_;
+
+ # Find the cumulative number of lines of code, excluding blanks,
+ # comments and pod.
+ # Return '$rcode_line_count' = ref to array with cumulative
+ # code line count for each input line number.
+
+ my $rcode_line_count;
+ my $rLL = $self->[_rLL_];
+ my $rlines = $self->[_rlines_];
+ my $ix_line = -1;
+ my $code_line_count = 0;
+
+ # loop over all lines
+ foreach my $line_of_tokens ( @{$rlines} ) {
+ $ix_line++;
+
+ # what type of line?
+ my $line_type = $line_of_tokens->{_line_type};
+
+ # if 'CODE' it must be non-blank and non-comment
+ if ( $line_type eq 'CODE' ) {
+ my $rK_range = $line_of_tokens->{_rK_range};
+ my ( $Kfirst, $Klast ) = @{$rK_range};
+
+ if ( defined($Kfirst) ) {
+
+ # it is non-blank
+ my $jmax = defined($Kfirst) ? $Klast - $Kfirst : -1;
+ if ( $jmax > 0 || $rLL->[$Klast]->[_TYPE_] ne '#' ) {
+
+ # ok, it is a non-comment
+ $code_line_count++;
+ }
+ }
+ }
+
+ # Count all other special line types except pod;
+ # For a list of line types see sub 'process_all_lines'
+ else {
+ if ( $line_type !~ /^POD/ ) { $code_line_count++ }
+ }
+
+ # Store the cumulative count using the input line index
+ $rcode_line_count->[$ix_line] = $code_line_count;
+ }
+ return $rcode_line_count;
+} ## end sub find_code_line_count
+
+sub find_selected_packages {
+
+ my ( $self, $rdump_block_types ) = @_;
+
+ # Returns a list of all selected package statements in a file for use
+ # in dumping block information.
+ if ( !$rdump_block_types->{'*'}
+ && !$rdump_block_types->{'package'}
+ && !$rdump_block_types->{'class'} )
+ {
+ return [];
+ }
+
+ # Find all 'package' tokens in the file
+ my $rLL = $self->[_rLL_];
+ my @K_package_list;
+ foreach my $KK ( 0 .. @{$rLL} - 1 ) {
+ next if ( $rLL->[$KK]->[_TYPE_] ne 'P' );
+ push @K_package_list, $KK;
+ }
+
+ # Get the information needed for the block dump
+ my $rpackage_lists = $self->package_info_maker( \@K_package_list );
+ my $rpackage_info_list = $rpackage_lists->{'rpackage_info_list'};
+
+ # Remove the first item in the info list, which is a dummy package main
+ shift @{$rpackage_info_list};
+
+ # Remove BLOCK format packages since they get reported as blocks separately
+ my @filtered_list = grep { !$_->{is_block} } @{$rpackage_info_list};
+
+ return \@filtered_list;
+} ## end sub find_selected_packages
+
+sub find_selected_blocks {
+
+ my ( $self, $rdump_block_types, $rlevel_info ) = @_;
+
+ # Find blocks needed for --dump-block-summary
+ # Given:
+ # $rdump_block_types = hash of user selected block types
+ # $rlevel_info = info on max depth of blocks
+ # Returns:
+ # $rslected_blocks = ref to a list of information on the selected blocks
+
+ my $rLL = $self->[_rLL_];
+ my $rlines = $self->[_rlines_];
+ my $rblock_type_of_seqno = $self->[_rblock_type_of_seqno_];
+ my $K_opening_container = $self->[_K_opening_container_];
+ my $K_closing_container = $self->[_K_closing_container_];
+ my $ris_asub_block = $self->[_ris_asub_block_];
+ my $ris_sub_block = $self->[_ris_sub_block_];
+
+ my $dump_all_types = $rdump_block_types->{'*'};
+
+ my @selected_blocks;
+
+ #---------------------------------------------------
+ # BEGIN loop over all blocks to find selected blocks
+ #---------------------------------------------------
+ foreach my $seqno ( keys %{$rblock_type_of_seqno} ) {
+
+ my $type;
+ my $name = EMPTY_STRING;
+ my $block_type = $rblock_type_of_seqno->{$seqno};
+ my $K_opening = $K_opening_container->{$seqno};
+ my $K_closing = $K_closing_container->{$seqno};
+ my $level = $rLL->[$K_opening]->[_LEVEL_];
+
+ my $lx_open = $rLL->[$K_opening]->[_LINE_INDEX_];
+ my $line_of_tokens = $rlines->[$lx_open];
+ my $rK_range = $line_of_tokens->{_rK_range};
+ my ( $Kfirst, $Klast ) = @{$rK_range};
+ if ( !defined($Kfirst) || !defined($Klast) || $Kfirst > $K_opening ) {
+ my $line_type = $line_of_tokens->{_line_type};
+
+ # shouldn't happen
+ my $CODE_type = $line_of_tokens->{_code_type};
+ DEVEL_MODE && Fault(<<EOM);
+unexpected line_type=$line_type at line $lx_open, code type=$CODE_type
+EOM
+ next;
+ }
+
+ my ( $max_change, $block_count, $inner_loop_plus ) =
+ ( 0, 0, EMPTY_STRING );
+ my $item = $rlevel_info->{$seqno};
+ if ( defined($item) ) {
+ my $starting_depth = $item->{starting_depth};
+ my $maximum_depth = $item->{maximum_depth};
+ $block_count = $item->{block_count};
+ $max_change = $maximum_depth - $starting_depth + 1;
+
+ # this is a '+' character if this block is an inner loops
+ $inner_loop_plus = $item->{is_inner_loop} ? '+' : EMPTY_STRING;
+ }
+
+ # Skip closures unless type 'closure' is explicitly requested
+ if ( ( $block_type eq '}' || $block_type eq ';' )
+ && $rdump_block_types->{'closure'} )
+ {
+ $type = 'closure';
+ }
+
+ # Both 'sub' and 'asub' select an anonymous sub.
+ # This allows anonymous subs to be explicitly selected
+ elsif (
+ $ris_asub_block->{$seqno}
+ && ( $dump_all_types
+ || $rdump_block_types->{'sub'}
+ || $rdump_block_types->{'asub'} )
+ )
+ {
+ $type = 'asub';
+
+ # Look back to try to find some kind of name, such as
+ # my $var = sub { - var is type 'i'
+ # var => sub { - var is type 'w'
+ # -var => sub { - var is type 'w'
+ # 'var' => sub { - var is type 'Q'
+ my ( $saw_equals, $saw_fat_comma, $blank_count );
+ foreach my $KK ( reverse( $Kfirst .. $K_opening - 1 ) ) {
+ my $token_type = $rLL->[$KK]->[_TYPE_];
+ if ( $token_type eq 'b' ) { $blank_count++; next }
+ if ( $token_type eq '=>' ) { $saw_fat_comma++; next }
+ if ( $token_type eq '=' ) { $saw_equals++; next }
+ if ( $token_type eq 'i' && $saw_equals
+ || ( $token_type eq 'w' || $token_type eq 'Q' )
+ && $saw_fat_comma )
+ {
+ $name = $rLL->[$KK]->[_TOKEN_];
+ last;
+ }
+ }
+ my $rarg = { seqno => $seqno };
+ $self->count_sub_input_args($rarg);
+ my $count = $rarg->{shift_count_min};
+ if ( !defined($count) ) { $count = '*' }
+
+ $type .= '(' . $count . ')';
+ }
+ elsif ( $ris_sub_block->{$seqno}
+ && ( $dump_all_types || $rdump_block_types->{'sub'} ) )
+ {
+ $type = 'sub';
+
+ # what we want:
+ # $block_type $name
+ # 'sub setidentifier($)' => 'setidentifier'
+ # 'method setidentifier($)' => 'setidentifier'
+ my @parts = split /\s+/, $block_type;
+ $name = $parts[1];
+ $name =~ s/\(.*$//;
+
+ my $rarg = { seqno => $seqno };
+ $self->count_sub_input_args($rarg);
+ my $count = $rarg->{shift_count_min};
+ if ( !defined($count) ) { $count = '*' }
+
+ $type .= '(' . $count . ')';
+ }
+ elsif (
+ $block_type =~ /^(package|class)\b/
+ && ( $dump_all_types
+ || $rdump_block_types->{'package'}
+ || $rdump_block_types->{'class'} )
+ )
+ {
+ $type = 'class';
+ my @parts = split /\s+/, $block_type;
+ $name = $parts[1];
+ $name =~ s/\(.*$//;
+ }
+ elsif (
+ $is_loop_type{$block_type}
+ && ( $dump_all_types
+ || $rdump_block_types->{$block_type}
+ || $rdump_block_types->{ $block_type . $inner_loop_plus }
+ || $rdump_block_types->{$inner_loop_plus} )
+ )
+ {
+ $type = $block_type . $inner_loop_plus;
+ }
+ elsif ( $dump_all_types || $rdump_block_types->{$block_type} ) {
+ if ( $is_loop_type{$block_type} ) {
+ $name = $self->find_loop_label($seqno);
+ }
+ $type = $block_type;
+ }
+ else {
+ next;
+ }
+
+ push @selected_blocks,
+ {
+ K_opening => $K_opening,
+ K_closing => $K_closing,
+ line_start => $lx_open + 1,
+ name => $name,
+ type => $type,
+ level => $level,
+ max_change => $max_change,
+ block_count => $block_count,
+ };
+ } ## END loop to get info for selected blocks
+ return \@selected_blocks;
+} ## end sub find_selected_blocks
- %closing_vertical_tightness = (
- '(' => $rOpts->{'paren-vertical-tightness-closing'},
- '{' => $rOpts->{'brace-vertical-tightness-closing'},
- '[' => $rOpts->{'square-bracket-vertical-tightness-closing'},
- ')' => $rOpts->{'paren-vertical-tightness-closing'},
- '}' => $rOpts->{'brace-vertical-tightness-closing'},
- ']' => $rOpts->{'square-bracket-vertical-tightness-closing'},
- );
+sub find_if_chains {
- # assume flag for '>' same as ')' for closing qw quotes
- %closing_token_indentation = (
- ')' => $rOpts->{'closing-paren-indentation'},
- '}' => $rOpts->{'closing-brace-indentation'},
- ']' => $rOpts->{'closing-square-bracket-indentation'},
- '>' => $rOpts->{'closing-paren-indentation'},
- );
+ my ( $self, $rdump_block_types, $rlevel_info ) = @_;
- # flag indicating if any closing tokens are indented
- $some_closing_token_indentation =
- $rOpts->{'closing-paren-indentation'}
- || $rOpts->{'closing-brace-indentation'}
- || $rOpts->{'closing-square-bracket-indentation'}
- || $rOpts->{'indent-closing-brace'};
+ # Find if-chains for --dump-block-summary
- %opening_token_right = (
- '(' => $rOpts->{'opening-paren-right'},
- '{' => $rOpts->{'opening-hash-brace-right'},
- '[' => $rOpts->{'opening-square-bracket-right'},
- );
+ # Given:
+ # $rdump_block_types = ref to hash with user block type selections
+ # $rlevel_info = info on max depth of blocks
+ # Returns:
+ # $rslected_blocks = ref to a list of information on the selected blocks
- %stack_opening_token = (
- '(' => $rOpts->{'stack-opening-paren'},
- '{' => $rOpts->{'stack-opening-hash-brace'},
- '[' => $rOpts->{'stack-opening-square-bracket'},
- );
+ my $rblock_type_of_seqno = $self->[_rblock_type_of_seqno_];
- %stack_closing_token = (
- ')' => $rOpts->{'stack-closing-paren'},
- '}' => $rOpts->{'stack-closing-hash-brace'},
- ']' => $rOpts->{'stack-closing-square-bracket'},
- );
- return;
-} ## end sub initialize_global_option_vars
+ # For example, 'elsif4' means all if-chains with 4 or more 'elsif's
+ my @selected_blocks;
-sub initialize_line_length_vars {
+ # See if user requested any if-chains
+ # allow 'elsif3' or 'elsif+3'
+ my @elsif_d = grep { /^elsif\+?\d+$/ } keys %{$rdump_block_types};
+ if ( !@elsif_d ) { return \@selected_blocks }
- # Create a table of maximum line length vs level for later efficient use.
- # We will make the tables very long to be sure it will not be exceeded.
- # But we have to choose a fixed length. A check will be made at the start
- # of sub 'finish_formatting' to be sure it is not exceeded. Note, some of
- # my standard test problems have indentation levels of about 150, so this
- # should be fairly large. If the choice of a maximum level ever becomes
- # an issue then these table values could be returned in a sub with a simple
- # memoization scheme.
+ # In case of multiple selections, use the minimum
+ my $elsif_count_min;
+ foreach my $word (@elsif_d) {
+ if ( $word =~ /(\d+)$/ ) {
+ my $num = $1;
+ if ( !defined($elsif_count_min) || $elsif_count_min > $num ) {
+ $elsif_count_min = $num;
+ }
+ }
+ }
- # Also create a table of the maximum spaces available for text due to the
- # level only. If a line has continuation indentation, then that space must
- # be subtracted from the table value. This table is used for preliminary
- # estimates in welding, extended_ci, BBX, and marking short blocks.
- use constant LEVEL_TABLE_MAX => 1000;
+ # Loop over blocks
+ foreach my $seqno ( keys %{$rblock_type_of_seqno} ) {
- # The basic scheme:
- foreach my $level ( 0 .. LEVEL_TABLE_MAX ) {
- my $indent = $level * $rOpts_indent_columns;
- $maximum_line_length_at_level[$level] = $rOpts_maximum_line_length;
- $maximum_text_length_at_level[$level] =
- $rOpts_maximum_line_length - $indent;
+ my $block_type = $rblock_type_of_seqno->{$seqno};
+
+ # Must be 'if' or 'unless'
+ next unless ( $block_type eq 'if' || $block_type eq 'unless' );
+
+ # Collect info for this if-chain
+ my $rif_chain =
+ $self->follow_if_chain( $seqno, $rlevel_info, $elsif_count_min );
+ next unless ($rif_chain);
+
+ push @selected_blocks, $rif_chain;
}
+ return \@selected_blocks;
+} ## end sub find_if_chains
- # Correct the maximum_text_length table if the -wc=n flag is used
- $rOpts_whitespace_cycle = $rOpts->{'whitespace-cycle'};
- if ($rOpts_whitespace_cycle) {
- if ( $rOpts_whitespace_cycle > 0 ) {
- foreach my $level ( 0 .. LEVEL_TABLE_MAX ) {
- my $level_mod = $level % $rOpts_whitespace_cycle;
- my $indent = $level_mod * $rOpts_indent_columns;
- $maximum_text_length_at_level[$level] =
- $rOpts_maximum_line_length - $indent;
+sub follow_if_chain {
+ my ( $self, $seqno_if, $rlevel_info, $elsif_count_min ) = @_;
+
+ # Follow a chain of if-elsif-elsif-...-else blocks.
+
+ # Given:
+ # $seqno_if = sequence number of an 'if' block
+ # $rlevel_info = hash of block level information
+ # $elsif_min_count = minimum number of 'elsif' blocks wanted
+ # Return:
+ # nothing if number of 'elsif' blocks is less than $elsif_count_min
+ # ref to block info hash otherwise
+
+ my $rLL = $self->[_rLL_];
+ my $K_opening_container = $self->[_K_opening_container_];
+ my $K_closing_container = $self->[_K_closing_container_];
+ my $rblock_type_of_seqno = $self->[_rblock_type_of_seqno_];
+
+ # Verify that seqno is an 'if' or 'unless'
+ my $block_type = $rblock_type_of_seqno->{$seqno_if};
+ if ( $block_type ne 'if' && $block_type ne 'unless' ) {
+ Fault(
+"Bad call: expecting block type 'if' or 'unless' but got '$block_type' for seqno=$seqno_if\n"
+ );
+ return;
+ }
+
+ # save sequence numbers in the chain for debugging
+ my @seqno_list;
+
+ # Loop to follow the chain
+ my $max_change = 0;
+ my $block_count = 0;
+ my $elsif_count = 0;
+
+ # we are tracing the sequence numbers of consecutive if/elsif/else blocks
+ my $seqno = $seqno_if;
+ while ($seqno) {
+ push @seqno_list, $seqno;
+
+ # Update info for this block
+ $block_type = $rblock_type_of_seqno->{$seqno};
+ if ( $block_type eq 'elsif' ) { $elsif_count++ }
+ my $item = $rlevel_info->{$seqno};
+ if ( defined($item) ) {
+ my $starting_depth = $item->{starting_depth};
+ my $maximum_depth = $item->{maximum_depth};
+ $block_count += $item->{block_count};
+ my $mxc = $maximum_depth - $starting_depth + 1;
+ if ( $mxc > $max_change ) { $max_change = $mxc }
+ }
+
+ # Chain ends if this is an 'else' block
+ last if ( $block_type eq 'else' );
+
+ # Look at the token following the closing brace
+ my $Kc = $K_closing_container->{$seqno};
+ my $K_k = $self->K_next_code($Kc);
+ last unless defined($K_k);
+ my $type_k = $rLL->[$K_k]->[_TYPE_];
+ my $token_k = $rLL->[$K_k]->[_TOKEN_];
+
+ # Chain ends unless we arrive at keyword 'elsif' or 'else'
+ last
+ unless ( $type_k eq 'k'
+ && ( $token_k eq 'elsif' || $token_k eq 'else' ) );
+
+ # Handle keyword 'else' : next token be the opening block brace
+ if ( $token_k eq 'else' ) {
+
+ # } else {
+ # ^ ^ ^
+ # Kc | |
+ # K_k Ko
+
+ my $Ko = $self->K_next_code($K_k);
+ last unless defined($Ko);
+ $seqno = $rLL->[$Ko]->[_TYPE_SEQUENCE_];
+ if ( $seqno && $rblock_type_of_seqno->{$seqno} eq 'else' ) {
+ next;
}
+
+ # Shouldn't happen unless file has an error
+ last;
}
- else {
- $rOpts_whitespace_cycle = $rOpts->{'whitespace-cycle'} = 0;
- }
- }
- # Correct the tables if the -vmll flag is used. These values override the
- # previous values.
- if ($rOpts_variable_maximum_line_length) {
- foreach my $level ( 0 .. LEVEL_TABLE_MAX ) {
- $maximum_text_length_at_level[$level] = $rOpts_maximum_line_length;
- $maximum_line_length_at_level[$level] =
- $rOpts_maximum_line_length + $level * $rOpts_indent_columns;
+ # Handle keyword 'elsif':
+
+ # } elsif ( $something ) {
+ # ^ ^ ^ ^ ^
+ # Kc | | | |
+ # K_k Kpo Kpc Ko
+
+ # hop over the elsif parens
+ my $kpo = $self->K_next_code($K_k);
+ last unless defined($kpo);
+ my $seqno_p = $rLL->[$kpo]->[_TYPE_SEQUENCE_];
+ last unless ( $seqno_p && $rLL->[$kpo]->[_TOKEN_] eq '(' );
+ my $Kpc = $K_closing_container->{$seqno_p};
+ last unless defined($Kpc);
+
+ # should be at the opening elsif brace
+ my $Ko = $self->K_next_code($Kpc);
+ last unless defined($Ko);
+ $seqno = $rLL->[$Ko]->[_TYPE_SEQUENCE_];
+ if ( $seqno && $rblock_type_of_seqno->{$seqno} eq 'elsif' ) {
+ next;
}
- }
- # Define two measures of indentation level, alpha and beta, at which some
- # formatting features come under stress and need to start shutting down.
- # Some combination of the two will be used to shut down different
- # formatting features.
- # Put a reasonable upper limit on stress level (say 100) in case the
- # whitespace-cycle variable is used.
- my $stress_level_limit = min( 100, LEVEL_TABLE_MAX );
+ # Shouldn't happen unless file has an error
+ last;
+ } ## end while ($seqno)
+
+ # check count
+ return if ( $elsif_count < $elsif_count_min );
+
+ # Store the chain
+ my $K_opening = $K_opening_container->{$seqno_if};
+ my $K_closing = $K_closing_container->{$seqno};
+ my $lx_open = $rLL->[$K_opening]->[_LINE_INDEX_];
+ my $level = $rLL->[$K_opening]->[_LEVEL_];
+
+ my $rchain = {
+ K_opening => $K_opening,
+ K_closing => $K_closing,
+ line_start => $lx_open + 1,
+ name => "elsif+$elsif_count",
+ type => "if-chain",
+ level => $level,
+ max_change => $max_change,
+ block_count => $block_count,
+ };
- # Find stress_level_alpha, targeted at very short maximum line lengths.
- $stress_level_alpha = $stress_level_limit + 1;
- foreach my $level_test ( 0 .. $stress_level_limit ) {
- my $max_len = $maximum_text_length_at_level[ $level_test + 1 ];
- my $excess_inside_space =
- $max_len -
- $rOpts_continuation_indentation -
- $rOpts_indent_columns - 8;
- if ( $excess_inside_space <= 0 ) {
- $stress_level_alpha = $level_test;
- last;
+ return $rchain;
+} ## end sub follow_if_chain
+
+sub dump_unique_keys {
+ my ($self) = @_;
+
+ # Implement --dump-unique-keys, -duk
+ # Dump a list of hash keys used just one time
+
+ my $rLL = $self->[_rLL_];
+ my $Klimit = $self->[_Klimit_];
+ my $ris_list_by_seqno = $self->[_ris_list_by_seqno_];
+ my $K_opening_container = $self->[_K_opening_container_];
+ my $K_closing_container = $self->[_K_closing_container_];
+
+ # stack holds [$seqno, $KK, $KK_last_nb]
+ my @stack;
+
+ my $KK = -1;
+ my $KK_last_nb;
+ my $KK_this_nb = 0;
+
+ my $K_end_skip = -1;
+
+ #----------------------------------------------
+ # Main loop to examine all hash keys and quotes
+ #----------------------------------------------
+ my @Q_list;
+ my @K_start_qw_list;
+ my $rwords = {};
+
+ # Table of some known keys
+ my %is_known_key = (
+ ALRM => { '$SIG' => 1 },
+ TERM => { '$SIG' => 1 },
+ INT => { '$SIG' => 1 },
+ __DIE__ => { '$SIG' => 1 },
+ __WARN__ => { '$SIG' => 1 },
+ HOME => { '$ENV' => 1 },
+ USER => { '$ENV' => 1 },
+ LOGNAME => { '$ENV' => 1 },
+ PATH => { '$ENV' => 1 },
+ SHELL => { '$ENV' => 1 },
+ PERL5LIB => { '$ENV' => 1 },
+ PERLLIB => { '$ENV' => 1 },
+ );
+
+ my $add_known_keys = sub {
+ my ( $rhash, $name ) = @_;
+ foreach my $key ( keys %{$rhash} ) {
+ if ( !defined( $is_known_key{$key} ) ) {
+ $is_known_key{$key} = { $name => 1 };
+ }
+ else {
+ $is_known_key{$key}->{$name} = 1;
+ }
}
- }
+ }; ## end $add_known_keys = sub
- # Find stress level beta, a stress level targeted at formatting
- # at deep levels near the maximum line length. We start increasing
- # from zero and stop at the first level which shows no more space.
+ # Add keys which may be unique to this environment.
+ $add_known_keys->( \%SIG, '$SIG' );
+ $add_known_keys->( \%ENV, '$ENV' );
+ $add_known_keys->( \%ERRNO, '$!' );
- # 'const' is a fixed number of spaces for a typical variable.
- # Cases b1197-b1204 work ok with const=12 but not with const=8
- my $const = 16;
- my $denom = max( 1, $rOpts_indent_columns );
- $stress_level_beta = 0;
- foreach my $level ( 0 .. $stress_level_limit ) {
- my $remaining_cycles = max(
- 0,
- (
- $maximum_text_length_at_level[$level] -
- $rOpts_continuation_indentation - $const
- ) / $denom
- );
- last if ( $remaining_cycles <= 3 ); # 2 does not work
- $stress_level_beta = $level;
- }
+ my $is_known_hash = sub {
+ my ($key) = @_;
- # This is a combined level which works well for turning off formatting
- # features in most cases:
- $high_stress_level = min( $stress_level_alpha, $stress_level_beta + 2 );
+ # Given a hash key '$key',
+ # Return:
+ # true if it is known and should be excluded
+ # false if it is not known
+
+ my $rhash_names = $is_known_key{$key};
+ return if ( !$rhash_names );
+
+ # The key is known, now see if its hash name is known
+ return if ( !@stack );
+ my $Kbrace = $stack[-1]->[1];
+ my $Khash = $stack[-1]->[2];
+ return if ( !defined($Kbrace) );
+ return if ( !defined($Khash) );
+ return if ( $rLL->[$Kbrace]->[_TYPE_] ne 'L' );
+ my $hash_name = $rLL->[$Khash]->[_TOKEN_];
+ return if ( !$rhash_names->{$hash_name} );
+ return 1;
+ }; ## end $is_known_hash = sub
- return;
-} ## end sub initialize_line_length_vars
+ my $push_KK_last_nb = sub {
-sub initialize_trailing_comma_rules {
+ # if the previous nonblank token was a hash key of type
+ # 'Q' or 'w', then update its count
- # Setup control hash for trailing commas
+ # We are ignoring constant definitions
+ if ( $KK < $K_end_skip ) { return }
- # -wtc=s defines desired trailing comma policy:
- #
- # =" " stable
- # [ both -atc and -dtc ignored ]
- # =0 : none
- # [requires -dtc; -atc ignored]
- # =1 or * : all
- # [requires -atc; -dtc ignored]
- # =m : multiline lists require trailing comma
- # if -atc set => will add missing multiline trailing commas
- # if -dtc set => will delete trailing single line commas
- # =b or 'bare' (multiline) lists require trailing comma
- # if -atc set => will add missing bare trailing commas
- # if -dtc set => will delete non-bare trailing commas
- # =h or 'hash': single column stable bare lists require trailing comma
- # if -atc set will add these
- # if -dtc set will delete other trailing commas
+ my $type_last = $rLL->[$KK_last_nb]->[_TYPE_];
+ my $token_last = $rLL->[$KK_last_nb]->[_TOKEN_];
+ my $word;
+ if ( $type_last eq 'w' ) {
+ $word = $token_last;
+ }
+ elsif ( $type_last eq 'Q' ) {
+ $word = substr( $token_last, 1, -1 );
+
+ # Ignore text with interpolated values
+ my $ch0 = substr( $token_last, 0, 1 );
+ if ( $ch0 eq '"' ) {
+ foreach my $sigil ( '$', '@' ) {
+ my $pos = index( $word, $sigil );
+ next if ( $pos < 0 );
+ return if ( $pos == 0 );
+ my $ch_test = substr( $word, $pos - 1, 1 );
+ return if ( $ch_test ne '\\' );
+ }
+ }
+
+ pop @Q_list;
+ }
+ else {
+ # not a quote - possibly identifier
+ return;
+ }
+ return unless ($word);
+
+ # Bump count of known keys by 1 so that they will not appear as unique
+ my $one = 1;
+ if ( $is_known_key{$word} && $is_known_hash->($word) ) { $one++ }
+
+ if ( !defined( $rwords->{$word} ) ) {
+ $rwords->{$word} = [ $one, $KK_last_nb ];
+ }
+ else {
+ $rwords->{$word}->[0]++;
+ }
+ return;
+ }; ## end $push_KK_last_nb = sub
- #-------------------------------------------------------------------
- # This routine must be called after the alpha and beta stress levels
- # have been defined in sub 'initialize_line_length_vars'.
- #-------------------------------------------------------------------
+ #--------------------------
+ # Main loop over all tokens
+ #--------------------------
+ while ( ++$KK <= $Klimit ) {
- %trailing_comma_rules = ();
+ my $type = $rLL->[$KK]->[_TYPE_];
+ next if ( $type eq 'b' );
+ next if ( $type eq '#' );
+ $KK_last_nb = $KK_this_nb;
+ $KK_this_nb = $KK;
+ my $seqno = $rLL->[$KK]->[_TYPE_SEQUENCE_];
+ if ($seqno) {
+ if ( $is_opening_type{$type} ) {
- my $rvalid_flags = [qw(0 1 * m b h i)];
+ if ( $type eq 'L' ) {
- my $option = $rOpts->{'want-trailing-commas'};
+ # Skip past something like ${word}
+ if ( $KK_last_nb && $rLL->[$KK_last_nb]->[_TYPE_] eq 't' ) {
+ my $Kc = $K_closing_container->{$seqno};
+ my $Kn = $self->K_next_code($KK);
+ $Kn = $self->K_next_code($Kn);
+ if ( $Kn && $Kc && $Kn == $Kc && $Kc > $K_end_skip ) {
+ $K_end_skip = $Kc;
+ }
+ }
+ }
+ push @stack, [ $seqno, $KK, $KK_last_nb ];
+ }
+ elsif ( $is_closing_type{$type} ) {
- if ($option) {
- $option =~ s/^\s+//;
- $option =~ s/\s+$//;
- }
+ if ( $type eq 'R' ) {
- # We need to use length() here because '0' is a possible option
- if ( defined($option) && length($option) ) {
- my $error_message;
- my %rule_hash;
- my @q = @{$rvalid_flags};
- my %is_valid_flag;
- @is_valid_flag{@q} = (1) x scalar(@q);
+ # require a single item within the hash braces
+ my $Ko = $K_opening_container->{$seqno};
+ my $Kn = $self->K_next_code($Ko);
+ if ( defined($Kn) && $Kn == $KK_last_nb ) {
+ $push_KK_last_nb->();
+ }
+ }
- # handle single character control, such as -wtc='b'
- if ( length($option) == 1 ) {
- foreach (qw< ) ] } >) {
- $rule_hash{$_} = [ $option, EMPTY_STRING ];
+ my $item = pop @stack;
+ if ( !$item || $item->[0] != $seqno ) {
+ if (DEVEL_MODE) {
+
+ # shouldn't happen for a balanced file
+ my $num = @stack;
+ my $got = $num ? $item->[0] : 'undef';
+ my $lno = $rLL->[$KK]->[_LINE_INDEX_];
+ Fault <<EOM;
+stack error at seqno=$seqno type=$type num=$num got seqno=$got lno=$lno
+EOM
+ }
+ }
+ }
+ else {
+ ## ternary
}
}
-
- # handle multi-character control(s), such as -wtc='[m' or -wtc='k(m'
else {
- my @parts = split /\s+/, $option;
- foreach my $part (@parts) {
- if ( length($part) >= 2 && length($part) <= 3 ) {
- my $val = substr( $part, -1, 1 );
- my $key_o = substr( $part, -2, 1 );
- if ( $is_opening_token{$key_o} ) {
- my $paren_flag = EMPTY_STRING;
- if ( length($part) == 3 ) {
- $paren_flag = substr( $part, 0, 1 );
+ if ( $type eq '=>' ) {
+ my $parent_seqno = $self->parent_seqno_by_K($KK);
+ if ( $parent_seqno && $ris_list_by_seqno->{$parent_seqno} ) {
+ $push_KK_last_nb->();
+ }
+ }
+ elsif ( $type eq 'Q' ) {
+ push @Q_list, $KK;
+ }
+ elsif ( $type eq 'q' ) {
+ if ( !defined($KK_last_nb)
+ || $rLL->[$KK_last_nb]->[_TYPE_] ne 'q' )
+ {
+ push @K_start_qw_list, $KK;
+ }
+ }
+ elsif ( $type eq 'k' ) {
+
+ # Look for 'use constant' and define its ending token
+ if ( $rLL->[$KK]->[_TOKEN_] eq 'use' ) {
+ my $Kn = $self->K_next_code($KK);
+ next if ( !defined($Kn) );
+ next if ( $rLL->[$Kn]->[_TOKEN_] ne 'constant' );
+ $Kn = $self->K_next_code($Kn);
+ next if ( !defined($Kn) );
+ my $seqno_n = $rLL->[$Kn]->[_TYPE_SEQUENCE_];
+ if ($seqno_n) {
+
+ # skip a block of constant definitions
+ my $token_n = $rLL->[$Kn]->[_TOKEN_];
+ if ( $token_n eq '{' ) {
+ $K_end_skip = $K_closing_container->{$seqno_n};
+ }
+ else {
+ ## unexpected format, skip
}
- my $key = $matching_token{$key_o};
- $rule_hash{$key} = [ $val, $paren_flag ];
}
else {
- $error_message .= "Unrecognized term: '$part'\n";
+
+ # skip a single constant definition
+ $K_end_skip = $Kn + 1;
}
}
- else {
- $error_message .= "Unrecognized term: '$part'\n";
- }
+ }
+ else {
+ # continue search
}
}
+ } ## end while ( ++$KK <= $Klimit )
- # check for valid control characters
- if ( !$error_message ) {
- foreach my $key ( keys %rule_hash ) {
- my $item = $rule_hash{$key};
- my ( $val, $paren_flag ) = @{$item};
- if ( $val && !$is_valid_flag{$val} ) {
- my $valid_str = join( SPACE, @{$rvalid_flags} );
- $error_message .=
- "Unexpected value '$val'; must be one of: $valid_str\n";
- last;
- }
- if ($paren_flag) {
- if ( $paren_flag !~ /^[kKfFwW]$/ ) {
- $error_message .=
-"Unexpected paren flag '$paren_flag'; must be one of: k K f F w W\n";
- last;
- }
- if ( $key ne ')' ) {
- $error_message .=
-"paren flag '$paren_flag' is only allowed before a '('\n";
- last;
- }
+ # find hash keys seen just one time
+ my %unique_words;
+ foreach my $key ( keys %{$rwords} ) {
+ my ( $count, $K ) = @{ $rwords->{$key} };
+ next if ( $count != 1 );
+ $unique_words{$key} = $K;
+ }
+
+ return if ( !%unique_words );
+
+ # check each unique word against the list of type Q tokens
+ if (@Q_list) {
+ my $imax = $#Q_list;
+ foreach my $i ( 0 .. $imax ) {
+
+ # Ignore multiline quotes
+ my $K = $Q_list[$i];
+ if ( ( $i == 0 || $Q_list[ $i - 1 ] + 1 != $K )
+ && ( $i == $imax || $Q_list[ $i + 1 ] != $K + 1 ) )
+ {
+
+ # remove quotes
+ my $word = substr( $rLL->[$K]->[_TOKEN_], 1, -1 );
+
+ if ( $unique_words{$word} ) {
+ delete $unique_words{$word};
}
}
}
+ }
- if ($error_message) {
- Warn(<<EOM);
-Error parsing --want-trailing-commas='$option':
-$error_message
-EOM
- }
+ return if ( !%unique_words );
- # Set the control hash if no errors
- else {
- %trailing_comma_rules = %rule_hash;
+ # Remove any keys which are also in a qw list
+ foreach my $Kqw (@K_start_qw_list) {
+ my ( $K_last_q_uu, $rlist ) = $self->get_qw_list($Kqw);
+ foreach my $word ( @{$rlist} ) {
+ if ( $unique_words{$word} ) {
+ delete $unique_words{$word};
+ }
}
}
- # Both adding and deleting commas can lead to instability in extreme cases
- if ( $rOpts_add_trailing_commas && $rOpts_delete_trailing_commas ) {
-
- # If the possible instability is significant, then we can turn off
- # -dtc as a defensive measure to prevent it.
+ return if ( !%unique_words );
- # We must turn off -dtc for very small values of --whitespace-cycle
- # to avoid instability. A minimum value of -wc=3 fixes b1393, but a
- # value of 4 is used here for safety. This parameter is seldom used,
- # and much larger than this when used, so the cutoff value is not
- # critical.
- if ( $rOpts_whitespace_cycle && $rOpts_whitespace_cycle <= 4 ) {
- $rOpts_delete_trailing_commas = 0;
- }
+ # report unique words
+ my $output_string = EMPTY_STRING;
+ my @list;
+ foreach my $word ( keys %unique_words ) {
+ my $K = $unique_words{$word};
+ my $lno = $rLL->[$K]->[_LINE_INDEX_] + 1;
+ push @list, [ $word, $lno ];
+ }
+ @list = sort { $a->[1] <=> $b->[1] || $a->[0] cmp $b->[0] } @list;
+ foreach my $item (@list) {
+ my ( $word, $lno ) = @{$item};
+ $output_string .= "$word,$lno\n";
+ }
+ if ($output_string) {
+ my $input_stream_name = get_input_stream_name();
+ chomp $output_string;
+ print {*STDOUT} <<EOM;
+==> $input_stream_name <==
+$output_string
+EOM
}
return;
-} ## end sub initialize_trailing_comma_rules
-sub initialize_whitespace_hashes {
+} ## end sub dump_unique_keys
- # This is called once before formatting begins to initialize these global
- # hashes, which control the use of whitespace around tokens:
- #
- # %binary_ws_rules
- # %want_left_space
- # %want_right_space
- # %space_after_keyword
- #
- # Many token types are identical to the tokens themselves.
- # See the tokenizer for a complete list. Here are some special types:
- # k = perl keyword
- # f = semicolon in for statement
- # m = unary minus
- # p = unary plus
- # Note that :: is excluded since it should be contained in an identifier
- # Note that '->' is excluded because it never gets space
- # parentheses and brackets are excluded since they are handled specially
- # curly braces are included but may be overridden by logic, such as
- # newline logic.
+sub dump_block_summary {
+ my ($self) = @_;
- # NEW_TOKENS: create a whitespace rule here. This can be as
- # simple as adding your new letter to @spaces_both_sides, for
- # example.
+ # Dump information about selected code blocks to STDOUT
+ # This sub is called when
+ # --dump-block-summary (-dbs) is set.
- my @spaces_both_sides = qw#
- + - * / % ? = . : x < > | & ^ .. << >> ** && .. || // => += -=
- .= %= x= &= |= ^= *= <> <= >= == =~ !~ /= != ... <<= >>= ~~ !~~
- &&= ||= //= <=> A k f w F n C Y U G v
- #;
+ # The following controls are available:
+ # --dump-block-types=s (-dbt=s), where s is a list of block types
+ # (if else elsif for foreach while do ... sub) ; default is 'sub'
+ # --dump-block-minimum-lines=n (-dbml=n), where n is the minimum
+ # number of lines for a block to be included; default is 20.
- my @spaces_left_side = qw<
- t ! ~ m p { \ h pp mm Z j
- >;
- push( @spaces_left_side, '#' ); # avoids warning message
+ my $rOpts_dump_block_types = $rOpts->{'dump-block-types'};
+ if ( !defined($rOpts_dump_block_types) ) { $rOpts_dump_block_types = 'sub' }
+ $rOpts_dump_block_types =~ s/^\s+//;
+ $rOpts_dump_block_types =~ s/\s+$//;
+ my @list = split /\s+/, $rOpts_dump_block_types;
+ my %dump_block_types;
+ @dump_block_types{@list} = (1) x scalar(@list);
- my @spaces_right_side = qw<
- ; } ) ] R J ++ -- **=
- >;
- push( @spaces_right_side, ',' ); # avoids warning message
+ # Get level variation info for code blocks
+ my $rlevel_info = $self->find_level_info();
- %want_left_space = ();
- %want_right_space = ();
- %binary_ws_rules = ();
+ # Get block info
+ my $rselected_blocks =
+ $self->find_selected_blocks( \%dump_block_types, $rlevel_info );
- # Note that we setting defaults here. Later in processing
- # the values of %want_left_space and %want_right_space
- # may be overridden by any user settings specified by the
- # -wls and -wrs parameters. However the binary_whitespace_rules
- # are hardwired and have priority.
- @want_left_space{@spaces_both_sides} =
- (1) x scalar(@spaces_both_sides);
- @want_right_space{@spaces_both_sides} =
- (1) x scalar(@spaces_both_sides);
- @want_left_space{@spaces_left_side} =
- (1) x scalar(@spaces_left_side);
- @want_right_space{@spaces_left_side} =
- (-1) x scalar(@spaces_left_side);
- @want_left_space{@spaces_right_side} =
- (-1) x scalar(@spaces_right_side);
- @want_right_space{@spaces_right_side} =
- (1) x scalar(@spaces_right_side);
- $want_left_space{'->'} = WS_NO;
- $want_right_space{'->'} = WS_NO;
- $want_left_space{'**'} = WS_NO;
- $want_right_space{'**'} = WS_NO;
- $want_right_space{'CORE::'} = WS_NO;
+ # Get if-chains
+ my $rselected_if_chains =
+ $self->find_if_chains( \%dump_block_types, $rlevel_info );
- # These binary_ws_rules are hardwired and have priority over the above
- # settings. It would be nice to allow adjustment by the user,
- # but it would be complicated to specify.
- #
- # hash type information must stay tightly bound
- # as in : ${xxxx}
- $binary_ws_rules{'i'}{'L'} = WS_NO;
- $binary_ws_rules{'i'}{'{'} = WS_YES;
- $binary_ws_rules{'k'}{'{'} = WS_YES;
- $binary_ws_rules{'U'}{'{'} = WS_YES;
- $binary_ws_rules{'i'}{'['} = WS_NO;
- $binary_ws_rules{'R'}{'L'} = WS_NO;
- $binary_ws_rules{'R'}{'{'} = WS_NO;
- $binary_ws_rules{'t'}{'L'} = WS_NO;
- $binary_ws_rules{'t'}{'{'} = WS_NO;
- $binary_ws_rules{'t'}{'='} = WS_OPTIONAL; # for signatures; fixes b1123
- $binary_ws_rules{'}'}{'L'} = WS_NO;
- $binary_ws_rules{'}'}{'{'} = WS_OPTIONAL; # RT#129850; was WS_NO
- $binary_ws_rules{'$'}{'L'} = WS_NO;
- $binary_ws_rules{'$'}{'{'} = WS_NO;
- $binary_ws_rules{'@'}{'L'} = WS_NO;
- $binary_ws_rules{'@'}{'{'} = WS_NO;
- $binary_ws_rules{'='}{'L'} = WS_YES;
- $binary_ws_rules{'J'}{'J'} = WS_YES;
+ # Get package info
+ my $rpackages = $self->find_selected_packages( \%dump_block_types );
- # the following includes ') {'
- # as in : if ( xxx ) { yyy }
- $binary_ws_rules{']'}{'L'} = WS_NO;
- $binary_ws_rules{']'}{'{'} = WS_NO;
- $binary_ws_rules{')'}{'{'} = WS_YES;
- $binary_ws_rules{')'}{'['} = WS_NO;
- $binary_ws_rules{']'}{'['} = WS_NO;
- $binary_ws_rules{']'}{'{'} = WS_NO;
- $binary_ws_rules{'}'}{'['} = WS_NO;
- $binary_ws_rules{'R'}{'['} = WS_NO;
+ # merge
+ my @all_blocks =
+ ( @{$rselected_blocks}, @{$rselected_if_chains}, @{$rpackages} );
+
+ return unless (@all_blocks);
+
+ my $input_stream_name = get_input_stream_name();
+
+ # Get code line count
+ my $rcode_line_count = $self->find_code_line_count();
+
+ # Get mccabe count
+ my $rmccabe_count_sum = $self->find_mccabe_count();
+
+ my $rOpts_dump_block_minimum_lines = $rOpts->{'dump-block-minimum-lines'};
+ if ( !defined($rOpts_dump_block_minimum_lines) ) {
+ $rOpts_dump_block_minimum_lines = 20;
+ }
+
+ my $rLL = $self->[_rLL_];
+
+ # add various counts, filter and print to STDOUT
+ my $routput_lines = [];
+ foreach my $item (@all_blocks) {
+
+ my $K_opening = $item->{K_opening};
+ my $K_closing = $item->{K_closing};
+
+ # define total number of lines
+ my $lx_open = $rLL->[$K_opening]->[_LINE_INDEX_];
+ my $lx_close = $rLL->[$K_closing]->[_LINE_INDEX_];
+ my $line_count = $lx_close - $lx_open + 1;
+
+ # define total number of lines of code excluding blanks, comments, pod
+ my $code_lines_open = $rcode_line_count->[$lx_open];
+ my $code_lines_close = $rcode_line_count->[$lx_close];
+ my $code_lines = 0;
+ if ( defined($code_lines_open) && defined($code_lines_close) ) {
+ $code_lines = $code_lines_close - $code_lines_open + 1;
+ }
+
+ # filter out blocks below the selected code line limit
+ if ( $code_lines < $rOpts_dump_block_minimum_lines ) {
+ next;
+ }
+
+ # add mccabe_count for this block
+ my $mccabe_closing = $rmccabe_count_sum->{ $K_closing + 1 };
+ my $mccabe_opening = $rmccabe_count_sum->{$K_opening};
+ my $mccabe_count = 1; # add 1 to match Perl::Critic
+ if ( defined($mccabe_opening) && defined($mccabe_closing) ) {
+ $mccabe_count += $mccabe_closing - $mccabe_opening;
+ }
- $binary_ws_rules{']'}{'++'} = WS_NO;
- $binary_ws_rules{']'}{'--'} = WS_NO;
- $binary_ws_rules{')'}{'++'} = WS_NO;
- $binary_ws_rules{')'}{'--'} = WS_NO;
+ # Store the final set of print variables
+ # Note: K_opening is added for sorting but deleted before printing
+ push @{$routput_lines}, [
- $binary_ws_rules{'R'}{'++'} = WS_NO;
- $binary_ws_rules{'R'}{'--'} = WS_NO;
+ $input_stream_name,
+ $item->{line_start},
+ $line_count,
+ $code_lines,
+ $item->{type},
+ $item->{name},
+ $item->{level},
+ $item->{max_change},
+ $item->{block_count},
+ $mccabe_count,
+ $K_opening,
- $binary_ws_rules{'i'}{'Q'} = WS_YES;
- $binary_ws_rules{'n'}{'('} = WS_YES; # occurs in 'use package n ()'
+ ];
+ }
- $binary_ws_rules{'i'}{'('} = WS_NO;
+ return unless @{$routput_lines};
- $binary_ws_rules{'w'}{'('} = WS_NO;
- $binary_ws_rules{'w'}{'{'} = WS_YES;
- return;
+ # Sort blocks and packages on starting line number
+ my @sorted_lines = sort { $a->[-1] <=> $b->[-1] } @{$routput_lines};
-} ## end sub initialize_whitespace_hashes
+ print {*STDOUT}
+"file,line,line_count,code_lines,type,name,level,max_change,block_count,mccabe_count\n";
-{ #<<< begin closure set_whitespace_flags
+ foreach my $rline_vars (@sorted_lines) {
-my %is_special_ws_type;
-my %is_wCUG;
-my %is_wi;
+ # remove K_opening which was added for stable sorting
+ pop @{$rline_vars};
+ my $line = join( ",", @{$rline_vars} ) . "\n";
+ print {*STDOUT} $line;
+ }
+ return;
+} ## end sub dump_block_summary
-BEGIN {
+sub set_ci {
- # The following hash is used to skip over needless if tests.
- # Be sure to update it when adding new checks in its block.
- my @q = qw(k w C m - Q);
- push @q, '#';
- @is_special_ws_type{@q} = (1) x scalar(@q);
+ my ($self) = @_;
- # These hashes replace slower regex tests
- @q = qw( w C U G );
- @is_wCUG{@q} = (1) x scalar(@q);
+ # Set the basic continuation indentation (ci) for all tokens.
+ # This is a replacement for the values previously computed in
+ # sub Perl::Tidy::Tokenizer::tokenizer_wrapup. In most cases it
+ # produces identical results, but in a few cases it is an improvement.
+
+ use constant DEBUG_SET_CI => 0;
+
+ # This turns on an optional piece of logic which makes the new and
+ # old computations of ci agree. It has almost no effect on actual
+ # programs but is useful for testing.
+ use constant SET_CI_OPTION_0 => 1;
+
+ # This is slightly different from the hash in in break_lists
+ # with a similar name (removed '?' and ':' to fix t007 and others)
+ my %is_logical_container_for_ci;
+ my @q = qw# if elsif unless while and or err not && | || ! #;
+ @is_logical_container_for_ci{@q} = (1) x scalar(@q);
+
+ # This is slightly different from a tokenizer hash with a similar name:
+ my %is_container_label_type_for_ci;
+ @q = qw# k && | || ? : ! #;
+ @is_container_label_type_for_ci{@q} = (1) x scalar(@q);
+
+ # Undo ci of closing list paren followed by these binary operators:
+ # - initially defined for issue t027, then
+ # - added '=' for t015
+ # - added '=~' for 'locale.in'
+ # - added '<=>' for 'corelist.in'
+ # Note:
+ # See @value_requestor_type for more that might be included
+ # See also @is_binary_type
+ my %bin_op_type;
+ @q = qw# . ** -> + - / * = != ^ < > % >= <= =~ !~ <=> x #;
+ @bin_op_type{@q} = (1) x scalar(@q);
+
+ my %is_list_end_type;
+ @q = qw( ; { } );
+ push @q, ',';
+ @is_list_end_type{@q} = (1) x scalar(@q);
- @q = qw( w i );
- @is_wi{@q} = (1) x scalar(@q);
-} ## end BEGIN
+ my $rLL = $self->[_rLL_];
+ my $Klimit = $self->[_Klimit_];
+ return unless defined($Klimit);
+
+ my $token = ';';
+ my $type = ';';
+ my $last_token = $token;
+ my $last_type = $type;
+ my $ci_last = 0;
+ my $ci_next = 0;
+ my $ci_next_next = 1;
+ my $rstack = [];
+
+ my $seq_root = SEQ_ROOT;
+ my $rparent = {
+ _seqno => $seq_root,
+ _ci_open => 0,
+ _ci_open_next => 0,
+ _ci_close => 0,
+ _ci_close_next => 0,
+ _container_type => 'Block',
+ _ci_next_next => $ci_next_next,
+ _comma_count => 0,
+ _semicolon_count => 0,
+ _Kc => undef,
+ };
-use constant DEBUG_WHITE => 0;
+ # Debug stuff
+ my @debug_lines;
+ my %saw_ci_diff;
-# Hashes to set spaces around container tokens according to their
-# sequence numbers. These are set as keywords are examined.
-# They are controlled by the -kpit and -kpitl flags.
-my %opening_container_inside_ws;
-my %closing_container_inside_ws;
+ my $rblock_type_of_seqno = $self->[_rblock_type_of_seqno_];
+ my $ris_sub_block = $self->[_ris_sub_block_];
+ my $ris_asub_block = $self->[_ris_asub_block_];
+ my $K_opening_container = $self->[_K_opening_container_];
+ my $K_closing_container = $self->[_K_closing_container_];
+ my $K_closing_ternary = $self->[_K_closing_ternary_];
+ my $rlines = $self->[_rlines_];
+ my $rtype_count_by_seqno = $self->[_rtype_count_by_seqno_];
-sub set_whitespace_flags {
+ my $want_break_before_comma = $want_break_before{','};
- # This routine is called once per file to set whitespace flags for that
- # file. This routine examines each pair of nonblank tokens and sets a flag
- # indicating if white space is needed.
- #
- # $rwhitespace_flags->[$j] is a flag indicating whether a white space
- # BEFORE token $j is needed, with the following values:
- #
- # WS_NO = -1 do not want a space BEFORE token $j
- # WS_OPTIONAL= 0 optional space or $j is a whitespace
- # WS_YES = 1 want a space BEFORE token $j
- #
+ my $map_block_follows = sub {
- my $self = shift;
+ # return true if a sort/map/etc block follows the closing brace
+ # of container $seqno
+ my ($seqno) = @_;
+ my $Kc = $K_closing_container->{$seqno};
+ return unless defined($Kc);
+
+ # Skip past keyword
+ my $Kcn = $self->K_next_code($Kc);
+ return unless defined($Kcn);
+ my $seqno_n = $rLL->[$Kcn]->[_TYPE_SEQUENCE_];
+ return if ($seqno_n);
+
+ # Look for opening block brace
+ my $Knn = $self->K_next_code($Kcn);
+ return unless defined($Knn);
+ my $seqno_nn = $rLL->[$Knn]->[_TYPE_SEQUENCE_];
+ return unless ($seqno_nn);
+ my $K_nno = $K_opening_container->{$seqno_nn};
+ return unless ( $K_nno && $K_nno == $Knn );
+ my $block_type = $rblock_type_of_seqno->{$seqno_nn};
+
+ if ($block_type) {
+ return $is_block_with_ci{$block_type};
+ }
+ return;
+ }; ## end $map_block_follows = sub
- my $j_tight_closing_paren = -1;
- my $rLL = $self->[_rLL_];
- my $jmax = @{$rLL} - 1;
+ my $redo_preceding_comment_ci = sub {
- %opening_container_inside_ws = ();
- %closing_container_inside_ws = ();
+ # We need to reset the ci of the previous comment(s)
+ my ( $K, $ci ) = @_;
+ my $Km = $self->K_previous_code($K);
+ return if ( !defined($Km) );
+ foreach my $Kt ( $Km + 1 .. $K - 1 ) {
+ if ( $rLL->[$Kt]->[_TYPE_] eq '#' ) {
+ $rLL->[$Kt]->[_CI_LEVEL_] = $ci;
+ }
+ }
+ return;
+ }; ## end $redo_preceding_comment_ci = sub
- my $rblock_type_of_seqno = $self->[_rblock_type_of_seqno_];
+ # Definitions of the sequence of ci_values being maintained:
+ # $ci_last = the ci value of the previous non-blank, non-comment token
+ # $ci_this = the ci value to be stored for this token at index $KK
+ # $ci_next = the normal ci for the next token, set by the previous tok
+ # $ci_next_next = the normal next value of $ci_next in this container
- my $rOpts_space_keyword_paren = $rOpts->{'space-keyword-paren'};
- my $rOpts_space_backslash_quote = $rOpts->{'space-backslash-quote'};
- my $rOpts_space_function_paren = $rOpts->{'space-function-paren'};
+ #--------------------------
+ # Main loop over all tokens
+ #--------------------------
+ my $KK = -1;
+ foreach my $rtoken_K ( @{$rLL} ) {
- my $rwhitespace_flags = [];
- my $ris_function_call_paren = {};
+ $KK++;
- return $rwhitespace_flags if ( $jmax < 0 );
+ #------------------
+ # Section 1. Blanks
+ #------------------
+ if ( ( $type = $rtoken_K->[_TYPE_] ) eq 'b' ) {
- my %is_for_foreach = ( 'for' => 1, 'foreach' => 1 );
+ $rtoken_K->[_CI_LEVEL_] = $ci_next;
- my $last_token = SPACE;
- my $last_type = 'b';
+ # 'next' to avoid saving last_ values for blanks and commas
+ next;
+ }
- my $rtokh_last = [ @{ $rLL->[0] } ];
- $rtokh_last->[_TOKEN_] = $last_token;
- $rtokh_last->[_TYPE_] = $last_type;
- $rtokh_last->[_TYPE_SEQUENCE_] = EMPTY_STRING;
- $rtokh_last->[_LINE_INDEX_] = 0;
+ #--------------------
+ # Section 2. Comments
+ #--------------------
+ if ( $type eq '#' ) {
+
+ my $ci_this = $ci_next;
+
+ # If at '#' in ternary before a ? or :, use that level to make
+ # the comment line up with the next ? or : line. (see c202/t052)
+ # i.e. if a nested ? follows, we increase the '#' level by 1, and
+ # if a nested : follows, we decrease the '#' level by 1.
+ # This is the only place where this sub changes a _LEVEL_ value.
+ my $Kn;
+ my $parent_container_type = $rparent->{_container_type};
+ if ( $parent_container_type eq 'Ternary' ) {
+ $Kn = $self->K_next_code($KK);
+ if ($Kn) {
+ my $type_kn = $rLL->[$Kn]->[_TYPE_];
+ if ( $is_ternary{$type_kn} ) {
+ $rLL->[$KK]->[_LEVEL_] = $rLL->[$Kn]->[_LEVEL_];
+
+ # and use the ci of a terminating ':'
+ if ( $Kn == $rparent->{_Kc} ) {
+ $ci_this = $rparent->{_ci_close};
+ }
+ }
+ }
+ }
- my $rtokh_last_last = $rtokh_last;
+ # Undo ci for a block comment followed by a closing token or , or ;
+ # provided that the parent container:
+ # - ends without ci, or
+ # - starts ci=0 and is a comma list or this follows a closing type
+ # - has a level jump
+ if (
+ $ci_this
+ && (
+ !$rparent->{_ci_close}
+ || (
+ !$rparent->{_ci_open_next}
+ && ( ( $rparent->{_comma_count} || $last_type eq ',' )
+ || $is_closing_type{$last_type} )
+ )
+ )
+ )
+ {
+ # Be sure this is a block comment
+ my $lx = $rtoken_K->[_LINE_INDEX_];
+ my $rK_range = $rlines->[$lx]->{_rK_range};
+ my $Kfirst;
+ if ($rK_range) { $Kfirst = $rK_range->[0] }
+ if ( defined($Kfirst) && $Kfirst == $KK ) {
+
+ # Look for trailing closing token
+ # [ and possibly ',' or ';' ]
+ $Kn = $self->K_next_code($KK) if ( !$Kn );
+ my $Kc = $rparent->{_Kc};
+ if (
+ $Kn
+ && $Kc
+ && (
+ $Kn == $Kc
+
+ # only look for comma if -wbb=',' is set
+ # to minimize changes to existing formatting
+ || ( $rLL->[$Kn]->[_TYPE_] eq ','
+ && $want_break_before_comma
+ && $parent_container_type eq 'List' )
+
+ # do not look ahead for a bare ';' because
+ # it changes old formatting with little benefit.
+## || ( $rLL->[$Kn]->[_TYPE_] eq ';'
+## && $parent_container_type eq 'Block' )
+ )
+ )
+ {
- my ( $ws_1, $ws_2, $ws_3, $ws_4 );
+ # Be sure container has a level jump
+ my $level_KK = $rLL->[$KK]->[_LEVEL_];
+ my $level_Kc = $rLL->[$Kc]->[_LEVEL_];
- # main loop over all tokens to define the whitespace flags
- my $last_type_is_opening;
- my ( $token, $type );
- my $j = -1;
- foreach my $rtokh ( @{$rLL} ) {
+ # And be sure this is not a hanging side comment
+ my $CODE_type = $rlines->[$lx]->{_code_type};
+ my $is_HSC = $CODE_type && $CODE_type eq 'HSC';
- $j++;
+ if ( $level_Kc < $level_KK && !$is_HSC ) {
+ $ci_this = 0;
+ }
+ }
+ }
+ }
- $type = $rtokh->[_TYPE_];
- if ( $type eq 'b' ) {
- $rwhitespace_flags->[$j] = WS_OPTIONAL;
+ $ci_next = $ci_this;
+ $rtoken_K->[_CI_LEVEL_] = $ci_this;
+
+ # 'next' to avoid saving last_ values for blanks and commas
next;
}
- $token = $rtokh->[_TOKEN_];
-
- my $ws;
+ #------------------------------------------------------------
+ # Section 3. Continuing with non-blank and non-comment tokens
+ #------------------------------------------------------------
- #---------------------------------------------------------------
- # Whitespace Rules Section 1:
- # Handle space on the inside of opening braces.
- #---------------------------------------------------------------
+ $token = $rtoken_K->[_TOKEN_];
- # /^[L\{\(\[]$/
- if ($last_type_is_opening) {
+ # Set ci values appropriate for most tokens:
+ my $ci_this = $ci_next;
+ $ci_next = $ci_next_next;
- $last_type_is_opening = 0;
+ # Now change these ci values as necessary for special cases...
- my $seqno = $rtokh->[_TYPE_SEQUENCE_];
- my $block_type = $rblock_type_of_seqno->{$seqno};
- my $last_seqno = $rtokh_last->[_TYPE_SEQUENCE_];
- my $last_block_type = $rblock_type_of_seqno->{$last_seqno};
+ #----------------------------
+ # Section 4. Container tokens
+ #----------------------------
+ if ( $rtoken_K->[_TYPE_SEQUENCE_] ) {
- $j_tight_closing_paren = -1;
+ my $seqno = $rtoken_K->[_TYPE_SEQUENCE_];
- # let us keep empty matched braces together: () {} []
- # except for BLOCKS
- if ( $token eq $matching_token{$last_token} ) {
- if ($block_type) {
- $ws = WS_YES;
+ #-------------------------------------
+ # Section 4.1 Opening container tokens
+ #-------------------------------------
+ if ( $is_opening_sequence_token{$token} ) {
+
+ my $level = $rtoken_K->[_LEVEL_];
+
+ # Default ci values for the closing token, to be modified
+ # as necessary:
+ my $ci_close = $ci_next;
+ my $ci_close_next = $ci_next_next;
+
+ my $Kc =
+ $type eq '?'
+ ? $K_closing_ternary->{$seqno}
+ : $K_closing_container->{$seqno};
+
+ # $Kn = $self->K_next_nonblank($KK);
+ my $Kn;
+ if ( $KK < $Klimit ) {
+ $Kn = $KK + 1;
+ if ( $rLL->[$Kn]->[_TYPE_] eq 'b' && $Kn < $Klimit ) {
+ $Kn += 1;
+ }
}
- else {
- $ws = WS_NO;
+
+ # $Kcn = $self->K_next_code($Kc);
+ my $Kcn;
+ if ( $Kc && $Kc < $Klimit ) {
+ $Kcn = $Kc + 1;
+ if ( $rLL->[$Kcn]->[_TYPE_] eq 'b' && $Kcn < $Klimit ) {
+ $Kcn += 1;
+ }
+ if ( $rLL->[$Kcn]->[_TYPE_] eq '#' ) {
+ $Kcn = $self->K_next_code($Kcn);
+ }
}
- }
- else {
- # we're considering the right of an opening brace
- # tightness = 0 means always pad inside with space
- # tightness = 1 means pad inside if "complex"
- # tightness = 2 means never pad inside with space
+ my $opening_level_jump =
+ $Kn ? $rLL->[$Kn]->[_LEVEL_] - $level : 0;
- my $tightness;
- if ( $last_type eq '{'
- && $last_token eq '{'
- && $last_block_type )
- {
- $tightness = $rOpts_block_brace_tightness;
+ # initialize ci_next_next to its standard value
+ $ci_next_next = 1;
+
+ # Default: ci of first item of list with level jump is same as
+ # ci of first item of container
+ if ( $opening_level_jump > 0 ) {
+ $ci_next = $rparent->{_ci_open_next};
}
- else { $tightness = $tightness{$last_token} }
- #=============================================================
- # Patch for test problem <<snippets/fabrice_bug.in>>
- # We must always avoid spaces around a bare word beginning
- # with ^ as in:
- # my $before = ${^PREMATCH};
- # Because all of the following cause an error in perl:
- # my $before = ${ ^PREMATCH };
- # my $before = ${ ^PREMATCH};
- # my $before = ${^PREMATCH };
- # So if brace tightness flag is -bt=0 we must temporarily reset
- # to bt=1. Note that here we must set tightness=1 and not 2 so
- # that the closing space is also avoided
- # (via the $j_tight_closing_paren flag in coding)
- if ( $type eq 'w' && $token =~ /^\^/ ) { $tightness = 1 }
+ my ( $comma_count, $semicolon_count );
+ my $rtype_count = $rtype_count_by_seqno->{$seqno};
+ if ($rtype_count) {
+ $comma_count = $rtype_count->{','};
+ $semicolon_count = $rtype_count->{';'};
+
+ # Do not include a terminal semicolon in the count (the
+ # comma_count has already been corrected by respace_tokens)
+ # We only need to know if there are semicolons or not, so
+ # for speed we can just do this test if the count is 1.
+ if ( $semicolon_count && $semicolon_count == 1 ) {
+ my $Kcm = $self->K_previous_code($Kc);
+ if ( $rLL->[$Kcm]->[_TYPE_] eq ';' ) {
+ $semicolon_count--;
+ }
+ }
+ }
- #=============================================================
+ my $container_type;
- if ( $tightness <= 0 ) {
- $ws = WS_YES;
+ #-------------------------
+ # Section 4.1.1 Code Block
+ #-------------------------
+ my $block_type = $rblock_type_of_seqno->{$seqno};
+ if ($block_type) {
+ $container_type = 'Block';
+
+ # set default depending on block type
+ $ci_close = 0;
+
+ my $no_semicolon =
+ $is_block_without_semicolon{$block_type}
+ || $ris_sub_block->{$seqno}
+ || $last_type eq 'J';
+
+ if ( !$no_semicolon ) {
+
+ # Optional fix for block types sort/map/etc which use
+ # zero ci at terminal brace if previous keyword had
+ # zero ci. This will cause sort/map/grep filter blocks
+ # to line up. Note that sub 'undo_ci' will also try to
+ # do this, so this is not a critical operation.
+ if ( $is_block_with_ci{$block_type} ) {
+ my $parent_seqno = $rparent->{_seqno};
+ if (
+
+ # only do this within containers
+ $parent_seqno != SEQ_ROOT
+
+ # only in containers without ',' and ';'
+ && !$rparent->{_comma_count}
+ && !$rparent->{_semicolon_count}
+
+ && $map_block_follows->($seqno)
+ )
+ {
+ if ($ci_last) {
+ $ci_close = $ci_this;
+ }
+ }
+ else {
+ $ci_close = $ci_this;
+ }
+ }
+
+ # keep ci if certain operators follow (fix c202/t024)
+ if ( !$ci_close && $Kcn ) {
+ my $type_kcn = $rLL->[$Kcn]->[_TYPE_];
+ my $token_kcn = $rLL->[$Kcn]->[_TOKEN_];
+ if ( $type_kcn =~ /^(\.|\&\&|\|\|)$/
+ || $type_kcn eq 'k' && $is_and_or{$token_kcn} )
+ {
+ $ci_close = $ci_this;
+ }
+ }
+ }
+
+ if ( $rparent->{_container_type} ne 'Ternary' ) {
+ $ci_this = 0;
+ }
+ $ci_next = 0;
+ $ci_close_next = $ci_close;
}
- elsif ( $tightness > 1 ) {
- $ws = WS_NO;
+
+ #----------------------
+ # Section 4.1.2 Ternary
+ #----------------------
+ elsif ( $type eq '?' ) {
+ $container_type = 'Ternary';
+ if ( $rparent->{_container_type} eq 'List'
+ && !$rparent->{_ci_open_next} )
+ {
+ $ci_this = 0;
+ $ci_close = 0;
+ }
+
+ # redo ci of any preceding comments if necessary
+ # at an outermost ? (which has no level jump)
+ if ( !$opening_level_jump ) {
+ $redo_preceding_comment_ci->( $KK, $ci_this );
+ }
}
+
+ #-------------------------------
+ # Section 4.1.3 Logical or List?
+ #-------------------------------
else {
+ my $is_logical = $is_container_label_type_for_ci{$last_type}
+ && $is_logical_container_for_ci{$last_token}
- # find the index of the closing token
- my $j_closing =
- $self->[_K_closing_container_]->{$last_seqno};
+ # Part 1 of optional patch to get agreement with previous
+ # ci This makes almost no difference in a typical program
+ # because we will seldom break within an array index.
+ || $type eq '[' && SET_CI_OPTION_0;
- # If the closing token is less than five characters ahead
- # we must take a closer look
- if ( defined($j_closing)
- && $j_closing - $j < 5
- && $rLL->[$j_closing]->[_TYPE_SEQUENCE_] eq
- $last_seqno )
- {
- $ws =
- ws_in_container( $j, $j_closing, $rLL, $type, $token,
- $last_token );
- if ( $ws == WS_NO ) {
- $j_tight_closing_paren = $j_closing;
+ if ( !$is_logical && $token eq '(' ) {
+
+ # 'foreach' and 'for' paren contents are treated as
+ # logical except for C-style 'for'
+ if ( $last_type eq 'k' ) {
+ $is_logical ||= $last_token eq 'foreach';
+
+ # C-style 'for' container will be type 'List'
+ if ( $last_token eq 'for' ) {
+ $is_logical =
+ !( $rtype_count && $rtype_count->{'f'} );
+ }
+ }
+
+ # Check for 'for' and 'foreach' loops with iterators
+ elsif ( $last_type eq 'i' && defined($Kcn) ) {
+ my $seqno_kcn = $rLL->[$Kcn]->[_TYPE_SEQUENCE_];
+ my $type_kcn = $rLL->[$Kcn]->[_TOKEN_];
+ if ( $seqno_kcn && $type_kcn eq '{' ) {
+ my $block_type_kcn =
+ $rblock_type_of_seqno->{$seqno_kcn};
+ $is_logical ||= $block_type_kcn
+ && ( $block_type_kcn eq 'for'
+ || $block_type_kcn eq 'foreach' );
+ }
+
+ # Search backwards for 'for'/'foreach' with
+ # iterator in case user is running from an editor
+ # and did not include the block (fixes case
+ # 'xci.in').
+ my $Km = $self->K_previous_code($KK);
+ foreach ( 0 .. 2 ) {
+ $Km = $self->K_previous_code($Km);
+ last unless defined($Km);
+ last unless ( $rLL->[$Km]->[_TYPE_] eq 'k' );
+ my $tok = $rLL->[$Km]->[_TOKEN_];
+ next if ( $tok eq 'my' );
+ $is_logical ||=
+ ( $tok eq 'for' || $tok eq 'foreach' );
+ last;
+ }
+ }
+ elsif ( $last_token eq '(' ) {
+ $is_logical ||=
+ $rparent->{_container_type} eq 'Logical';
+ }
+ else {
+ # does not look like a logical paren
+ }
+ }
+
+ #------------------------
+ # Section 4.1.3.1 Logical
+ #------------------------
+ if ($is_logical) {
+ $container_type = 'Logical';
+
+ # Pass ci though an '!'
+ if ( $last_type eq '!' ) { $ci_this = $ci_last }
+
+ $ci_next_next = 0;
+ $ci_close_next = $ci_this;
+
+ # Part 2 of optional patch to get agreement with
+ # previous ci
+ if ( $type eq '[' && SET_CI_OPTION_0 ) {
+
+ $ci_next_next = $ci_this;
+
+ # Undo ci at a chain of indexes or hash keys
+ if ( $last_type eq '}' ) {
+ $ci_this = $ci_last;
+ }
+ }
+
+ if ($opening_level_jump) {
+ $ci_next = 0;
}
}
+
+ #---------------------
+ # Section 4.1.3.2 List
+ #---------------------
else {
- $ws = WS_YES;
- }
- }
- }
- # check for special cases which override the above rules
- if ( %opening_container_inside_ws && $last_seqno ) {
- my $ws_override = $opening_container_inside_ws{$last_seqno};
- if ($ws_override) { $ws = $ws_override }
- }
+ # Here 'List' is a catchall for none of the above types
+ $container_type = 'List';
- $ws_4 = $ws_3 = $ws_2 = $ws_1 = $ws
- if DEBUG_WHITE;
+ # lists in blocks ...
+ if ( $rparent->{_container_type} eq 'Block' ) {
- } ## end setting space flag inside opening tokens
+ # undo ci if another closing token follows
+ if ( defined($Kcn) ) {
+ my $closing_level_jump =
+ $rLL->[$Kcn]->[_LEVEL_] - $level;
+ if ( $closing_level_jump < 0 ) {
+ $ci_close = $ci_this;
+ }
+ }
+ }
- #---------------------------------------------------------------
- # Whitespace Rules Section 2:
- # Special checks for certain types ...
- #---------------------------------------------------------------
- # The hash '%is_special_ws_type' significantly speeds up this routine,
- # but be sure to update it if a new check is added.
- # Currently has types: qw(k w C m - Q #)
- if ( $is_special_ws_type{$type} ) {
+ # lists not in blocks ...
+ else {
- if ( $type eq 'k' ) {
+ if ( !$rparent->{_comma_count} ) {
- # Keywords 'for', 'foreach' are special cases for -kpit since
- # the opening paren does not always immediately follow the
- # keyword. So we have to search forward for the paren in this
- # case. I have limited the search to 10 tokens ahead, just in
- # case somebody has a big file and no opening paren. This
- # should be enough for all normal code. Added the level check
- # to fix b1236.
- if ( $is_for_foreach{$token}
- && %keyword_paren_inner_tightness
- && defined( $keyword_paren_inner_tightness{$token} )
- && $j < $jmax )
- {
- my $level = $rLL->[$j]->[_LEVEL_];
- my $jp = $j;
- ## NOTE: we might use the KNEXT variable to avoid this loop
- ## but profiling shows that little would be saved
- foreach my $inc ( 1 .. 9 ) {
- $jp++;
- last if ( $jp > $jmax );
- last if ( $rLL->[$jp]->[_LEVEL_] != $level ); # b1236
- next unless ( $rLL->[$jp]->[_TOKEN_] eq '(' );
- my $seqno_p = $rLL->[$jp]->[_TYPE_SEQUENCE_];
- set_container_ws_by_keyword( $token, $seqno_p );
- last;
+ $ci_close = $ci_this;
+
+ # undo ci at binary op after right paren if no
+ # commas in container; fixes t027, t028
+ if ( $ci_close_next != $ci_close
+ && defined($Kcn)
+ && $bin_op_type{ $rLL->[$Kcn]->[_TYPE_] } )
+ {
+ $ci_close_next = $ci_close;
+ }
+ }
+
+ if ( $rparent->{_container_type} eq 'Ternary' ) {
+ $ci_next = 0;
+ }
+ }
+
+ # Undo ci at a chain of indexes or hash keys
+ if ( $token ne '(' && $last_type eq '}' ) {
+ $ci_this = $ci_close = $ci_last;
+ }
}
}
- }
- # handle a comment
- elsif ( $type eq '#' ) {
+ #---------------------------------------
+ # Section 4.1.4 Store opening token info
+ #---------------------------------------
- # newline before block comment ($j==0), and
- # space before side comment ($j>0), so ..
- $ws = WS_YES;
+ # Most closing tokens should align with their opening tokens.
+ if (
+ $type eq '{'
+ && $token ne '('
+ && $is_list_end_type{$last_type}
- #---------------------------------
- # Nothing more to do for a comment
- #---------------------------------
- $rwhitespace_flags->[$j] = $ws;
- next;
+ # avoid asub blocks, which may have prototypes ending in '}'
+ && !$ris_asub_block->{$seqno}
+ )
+ {
+ $ci_close = $ci_this;
+ }
+
+ # Closing ci must never be less than opening
+ if ( $ci_close < $ci_this ) { $ci_close = $ci_this }
+
+ push @{$rstack}, $rparent;
+ $rparent = {
+ _seqno => $seqno,
+ _container_type => $container_type,
+ _ci_next_next => $ci_next_next,
+ _ci_open => $ci_this,
+ _ci_open_next => $ci_next,
+ _ci_close => $ci_close,
+ _ci_close_next => $ci_close_next,
+ _comma_count => $comma_count,
+ _semicolon_count => $semicolon_count,
+ _Kc => $Kc,
+ };
}
- # retain any space between '-' and bare word
- elsif ( $type eq 'w' || $type eq 'C' ) {
- $ws = WS_OPTIONAL if $last_type eq '-';
- }
+ #-------------------------------------
+ # Section 4.2 Closing container tokens
+ #-------------------------------------
+ else {
- # retain any space between '-' and bare word; for example
- # avoid space between 'USER' and '-' here: <<snippets/space2.in>>
- # $myhash{USER-NAME}='steve';
- elsif ( $type eq 'm' || $type eq '-' ) {
- $ws = WS_OPTIONAL if ( $last_type eq 'w' );
- }
+ my $seqno_test = $rparent->{_seqno};
+ if ( $seqno_test ne $seqno ) {
- # space_backslash_quote; RT #123774 <<snippets/rt123774.in>>
- # allow a space between a backslash and single or double quote
- # to avoid fooling html formatters
- elsif ( $last_type eq '\\' && $type eq 'Q' && $token =~ /^[\"\']/ )
- {
- if ($rOpts_space_backslash_quote) {
- if ( $rOpts_space_backslash_quote == 1 ) {
- $ws = WS_OPTIONAL;
- }
- elsif ( $rOpts_space_backslash_quote == 2 ) { $ws = WS_YES }
- else { } # shouldnt happen
+ # Shouldn't happen if we are processing balanced text.
+ # (Unbalanced text should go out verbatim)
+ DEVEL_MODE
+ && Fault("stack error: $seqno_test != $seqno\n");
+ }
+
+ # Use ci_this, ci_next values set by the matching opening token:
+ $ci_this = $rparent->{_ci_close};
+ $ci_next = $rparent->{_ci_close_next};
+ my $ci_open_old = $rparent->{_ci_open};
+
+ # Then pop the stack and use the parent ci_next_next value:
+ if ( @{$rstack} ) {
+ $rparent = pop @{$rstack};
+ $ci_next_next = $rparent->{_ci_next_next};
}
else {
- $ws = WS_NO;
+
+ # Shouldn't happen if we are processing balanced text.
+ DEVEL_MODE && Fault("empty stack - shouldn't happen\n");
+ }
+
+ # Fix: undo ci at a closing token followed by a closing token.
+ # Goal is to keep formatting independent of the existence of a
+ # trailing comma or semicolon.
+ if ( $ci_this > 0 && !$ci_open_old && !$rparent->{_ci_close} ) {
+ my $Kc = $rparent->{_Kc};
+ my $Kn = $self->K_next_code($KK);
+ if ( $Kc && $Kn && $Kc == $Kn ) {
+ $ci_this = $ci_next = 0;
+ }
}
}
- } ## end elsif ( $is_special_ws_type{$type} ...
+ }
- #---------------------------------------------------------------
- # Whitespace Rules Section 3:
- # Handle space on inside of closing brace pairs.
- #---------------------------------------------------------------
+ #---------------------------------
+ # Section 5. Semicolons and Labels
+ #---------------------------------
+ # The next token after a ';' and label (type 'J') starts a new stmt
+ # The ci after a C-style for ';' (type 'f') is handled similarly.
+ elsif ( $type eq ';' || $type eq 'J' || $type eq 'f' ) {
+ $ci_next = 0;
+ if ( $is_closing_type{$last_type} ) { $ci_this = $ci_last }
+ }
- # /[\}\)\]R]/
- elsif ( $is_closing_type{$type} ) {
+ #--------------------
+ # Section 6. Keywords
+ #--------------------
+ # Undo ci after a format statement
+ elsif ( $type eq 'k' ) {
+ if ( substr( $token, 0, 6 ) eq 'format' ) {
+ $ci_next = 0;
+ }
+ }
- my $seqno = $rtokh->[_TYPE_SEQUENCE_];
- if ( $j == $j_tight_closing_paren ) {
+ #------------------
+ # Section 7. Commas
+ #------------------
+ # A comma and the subsequent item normally have ci undone
+ # unless ci has been set at a lower level
+ elsif ( $type eq ',' ) {
- $j_tight_closing_paren = -1;
- $ws = WS_NO;
+ if ( $rparent->{_container_type} eq 'List' ) {
+ $ci_this = $ci_next = $rparent->{_ci_open_next};
}
- else {
+ }
- if ( !defined($ws) ) {
+ else {
+ # not a special ci type
+ }
+
+ # Save debug info if requested
+ DEBUG_SET_CI && do {
+
+ my $seqno = $rtoken_K->[_TYPE_SEQUENCE_];
+ my $level = $rtoken_K->[_LEVEL_];
+ my $ci = $rtoken_K->[_CI_LEVEL_];
+ if ( $ci > 1 ) { $ci = 1 }
+
+ my $tok = $token;
+ my $last_tok = $last_token;
+ $tok =~ s/\t//g;
+ $last_tok =~ s/\t//g;
+ $tok = length($tok) > 3 ? substr( $tok, 0, 8 ) : $tok;
+ $last_tok =
+ length($last_tok) > 3 ? substr( $last_tok, 0, 8 ) : $last_tok;
+ $tok =~ s/["']//g;
+ $last_tok =~ s/["']//g;
+ my $block_type;
+ $block_type = $rblock_type_of_seqno->{$seqno} if ($seqno);
+ $block_type = EMPTY_STRING unless ($block_type);
+ my $ptype = $rparent->{_container_type};
+ my $pname = $ptype;
- my $tightness;
- my $block_type = $rblock_type_of_seqno->{$seqno};
- if ( $type eq '}' && $token eq '}' && $block_type ) {
- $tightness = $rOpts_block_brace_tightness;
- }
- else { $tightness = $tightness{$token} }
+ my $error =
+ $ci_this == $ci ? EMPTY_STRING : $type eq 'b' ? "error" : "ERROR";
+ if ($error) { $saw_ci_diff{$KK} = 1 }
+
+ my $lno = $rtoken_K->[_LINE_INDEX_] + 1;
+ $debug_lines[$KK] = <<EOM;
+$lno\t$ci\t$ci_this\t$ci_next\t$last_type\t$last_tok\t$type\t$tok\t$seqno\t$level\t$pname\t$block_type\t$error
+EOM
+ };
- $ws = ( $tightness > 1 ) ? WS_NO : WS_YES;
+ #----------------------------------
+ # Store the ci value for this token
+ #----------------------------------
+ $rtoken_K->[_CI_LEVEL_] = $ci_this;
+
+ # Remember last nonblank, non-comment token info for the next pass
+ $ci_last = $ci_this;
+ $last_token = $token;
+ $last_type = $type;
+
+ } ## End main loop over tokens
+
+ #----------------------
+ # Post-loop operations:
+ #----------------------
+
+ if (DEBUG_SET_CI) {
+ my @output_lines;
+ foreach my $Kd ( 0 .. $Klimit ) {
+ my $line = $debug_lines[$Kd];
+ if ($line) {
+ my $Kp = $self->K_previous_code($Kd);
+ my $Kn = $self->K_next_code($Kd);
+ if ( DEBUG_SET_CI > 1
+ || $Kp && $saw_ci_diff{$Kp}
+ || $saw_ci_diff{$Kd}
+ || $Kn && $saw_ci_diff{$Kn} )
+ {
+ push @output_lines, $line;
}
}
-
- # check for special cases which override the above rules
- if ( %closing_container_inside_ws && $seqno ) {
- my $ws_override = $closing_container_inside_ws{$seqno};
- if ($ws_override) { $ws = $ws_override }
+ }
+ if (@output_lines) {
+ unshift @output_lines, <<EOM;
+lno\tci\tci_this\tci_next\tlast_type\tlast_tok\ttype\ttok\tseqno\tlevel\tpname\tblock_type\terror?
+EOM
+ foreach my $line (@output_lines) {
+ chomp $line;
+ print {*STDOUT} $line, "\n";
}
+ }
+ }
- $ws_4 = $ws_3 = $ws_2 = $ws
- if DEBUG_WHITE;
- } ## end setting space flag inside closing tokens
-
- #---------------------------------------------------------------
- # Whitespace Rules Section 4:
- #---------------------------------------------------------------
- # /^[L\{\(\[]$/
- elsif ( $is_opening_type{$type} ) {
-
- $last_type_is_opening = 1;
+ return;
+} ## end sub set_ci
- if ( $token eq '(' ) {
+sub set_CODE_type {
+ my ($self) = @_;
- my $seqno = $rtokh->[_TYPE_SEQUENCE_];
+ # Examine each line of code and set a flag '$CODE_type' to describe it.
+ # Also return a list of lines with side comments.
- # This will have to be tweaked as tokenization changes.
- # We usually want a space at '} (', for example:
- # <<snippets/space1.in>>
- # map { 1 * $_; } ( $y, $M, $w, $d, $h, $m, $s );
- #
- # But not others:
- # &{ $_->[1] }( delete $_[$#_]{ $_->[0] } );
- # At present, the above & block is marked as type L/R so this
- # case won't go through here.
- if ( $last_type eq '}' && $last_token ne ')' ) { $ws = WS_YES }
+ my $rLL = $self->[_rLL_];
+ my $rlines = $self->[_rlines_];
- # NOTE: some older versions of Perl had occasional problems if
- # spaces are introduced between keywords or functions and
- # opening parens. So the default is not to do this except is
- # certain cases. The current Perl seems to tolerate spaces.
+ my $rOpts_format_skipping_begin = $rOpts->{'format-skipping-begin'};
+ my $rOpts_format_skipping_end = $rOpts->{'format-skipping-end'};
+ my $rOpts_static_block_comment_prefix =
+ $rOpts->{'static-block-comment-prefix'};
- # Space between keyword and '('
- elsif ( $last_type eq 'k' ) {
- $ws = WS_NO
- unless ( $rOpts_space_keyword_paren
- || $space_after_keyword{$last_token} );
+ # Remember indexes of lines with side comments
+ my @ix_side_comments;
- # Set inside space flag if requested
- set_container_ws_by_keyword( $last_token, $seqno );
- }
+ my $In_format_skipping_section = 0;
+ my $Saw_VERSION_in_this_file = 0;
+ my $has_side_comment = 0;
+ my $last_line_had_side_comment = 0;
+ my ( $Kfirst, $Klast );
+ my $CODE_type;
- # Space between function and '('
- # -----------------------------------------------------
- # 'w' and 'i' checks for something like:
- # myfun( &myfun( ->myfun(
- # -----------------------------------------------------
+ # Loop to set CODE_type
- # Note that at this point an identifier may still have a
- # leading arrow, but the arrow will be split off during token
- # respacing. After that, the token may become a bare word
- # without leading arrow. The point is, it is best to mark
- # function call parens right here before that happens.
- # Patch: added 'C' to prevent blinker, case b934, i.e. 'pi()'
- # NOTE: this would be the place to allow spaces between
- # repeated parens, like () () (), as in case c017, but I
- # decided that would not be a good idea.
+ # Possible CODE_types
+ # 'VB' = Verbatim - line goes out verbatim (a quote)
+ # 'FS' = Format Skipping - line goes out verbatim
+ # 'BL' = Blank Line
+ # 'HSC' = Hanging Side Comment - fix this hanging side comment
+ # 'SBCX'= Static Block Comment Without Leading Space
+ # 'SBC' = Static Block Comment
+ # 'BC' = Block Comment - an ordinary full line comment
+ # 'IO' = Indent Only - line goes out unchanged except for indentation
+ # 'NIN' = No Internal Newlines - line does not get broken
+ # 'VER' = VERSION statement
+ # '' = ordinary line of code with no restrictions
- # Updated to allow detached '->' from tokenizer (issue c140)
- elsif (
+ #--------------------
+ # Loop over all lines
+ #--------------------
+ my $ix_line = -1;
+ foreach my $line_of_tokens ( @{$rlines} ) {
+ $ix_line++;
+ my $line_type = $line_of_tokens->{_line_type};
- # /^[wCUG]$/
- $is_wCUG{$last_type}
+ my $last_CODE_type = $CODE_type;
- || (
+ # Set default to be ordinary code
+ $CODE_type = EMPTY_STRING;
- # /^[wi]$/
- $is_wi{$last_type}
+ #-------------------------------------
+ # This is only for lines marked 'CODE'
+ #-------------------------------------
+ if ( $line_type ne 'CODE' ) {
+ next;
+ }
- && (
+ my $input_line = $line_of_tokens->{_line_text};
- # with prefix '->' or '&'
- $last_token =~ /^([\&]|->)/
+ my $Klast_prev = $Klast;
+ ( $Kfirst, $Klast ) = @{ $line_of_tokens->{_rK_range} };
+ my $jmax = defined($Kfirst) ? $Klast - $Kfirst : -1;
- # or preceding token '->' (see b1337; c140)
- || $rtokh_last_last->[_TYPE_] eq '->'
+ my $is_block_comment;
+ if ( $jmax >= 0 && $rLL->[$Klast]->[_TYPE_] eq '#' ) {
+ if ( $jmax == 0 ) { $is_block_comment = 1; }
+ else { $has_side_comment = 1 }
+ }
- # or preceding sub call operator token '&'
- || ( $rtokh_last_last->[_TYPE_] eq 't'
- && $rtokh_last_last->[_TOKEN_] =~ /^\&\s*$/ )
- )
- )
- )
- {
- $ws =
- $rOpts_space_function_paren
- ? $self->ws_space_function_paren( $j, $rtokh_last_last )
- : WS_NO;
+ #-----------------------------------------------------------
+ # Write line verbatim if we are in a formatting skip section
+ #-----------------------------------------------------------
+ if ($In_format_skipping_section) {
- set_container_ws_by_keyword( $last_token, $seqno );
- $ris_function_call_paren->{$seqno} = 1;
- }
+ # Note: extra space appended to comment simplifies pattern matching
+ if (
+ $is_block_comment
- # space between something like $i and ( in 'snippets/space2.in'
- # for $i ( 0 .. 20 ) {
- elsif ( $last_type eq 'i' && $last_token =~ /^[\$\%\@]/ ) {
- $ws = WS_YES;
- }
+ # optional fast pre-check
+ && ( substr( $rLL->[$Kfirst]->[_TOKEN_], 0, 4 ) eq '#>>>'
+ || $rOpts_format_skipping_end )
- # allow constant function followed by '()' to retain no space
- elsif ($last_type eq 'C'
- && $rLL->[ $j + 1 ]->[_TOKEN_] eq ')' )
- {
- $ws = WS_NO;
- }
+ && ( $rLL->[$Kfirst]->[_TOKEN_] . SPACE ) =~
+ /$format_skipping_pattern_end/
+ )
+ {
+ $In_format_skipping_section = 0;
+ my $input_line_no = $line_of_tokens->{_line_number};
+ write_logfile_entry(
+ "Line $input_line_no: Exiting format-skipping section\n");
}
+ elsif (
+ $is_block_comment
- # patch for SWITCH/CASE: make space at ']{' optional
- # since the '{' might begin a case or when block
- elsif ( ( $token eq '{' && $type ne 'L' ) && $last_token eq ']' ) {
- $ws = WS_OPTIONAL;
- }
+ # optional fast pre-check
+ && ( substr( $rLL->[$Kfirst]->[_TOKEN_], 0, 4 ) eq '#<<<'
+ || $rOpts_format_skipping_begin )
- # keep space between 'sub' and '{' for anonymous sub definition,
- # be sure type = 'k' (added for c140)
- if ( $type eq '{' ) {
- if ( $last_token eq 'sub' && $last_type eq 'k' ) {
- $ws = WS_YES;
- }
+ && $rOpts_format_skipping
+ && ( $rLL->[$Kfirst]->[_TOKEN_] . SPACE ) =~
+ /$format_skipping_pattern_begin/
+ )
+ {
+ # warn of duplicate starting comment lines, git #118
+ my $input_line_no = $line_of_tokens->{_line_number};
+ warning(
+"Already in format-skipping section which started at line $In_format_skipping_section\n",
+ $input_line_no
+ );
+ }
+ else {
+ # not at a format skipping control line
+ }
+ $CODE_type = 'FS';
+ next;
+ }
- # this is needed to avoid no space in '){'
- if ( $last_token eq ')' && $token eq '{' ) { $ws = WS_YES }
+ #----------------------------
+ # Check for a continued quote
+ #----------------------------
+ if ( $line_of_tokens->{_starting_in_quote} ) {
- # avoid any space before the brace or bracket in something like
- # @opts{'a','b',...}
- if ( $last_type eq 'i' && $last_token =~ /^\@/ ) {
- $ws = WS_NO;
+ # A line which is entirely a quote or pattern must go out
+ # verbatim. Note: the \n is contained in $input_line.
+ if ( $jmax <= 0 ) {
+ if ( $self->[_save_logfile_] && $input_line =~ /\t/ ) {
+ my $input_line_number = $line_of_tokens->{_line_number};
+ $self->note_embedded_tab($input_line_number);
}
+ $CODE_type = 'VB';
+ next;
}
- } ## end if ( $is_opening_type{$type} ) {
+ }
- # always preserve whatever space was used after a possible
- # filehandle (except _) or here doc operator
+ #-------------------------------------------------
+ # See if we are entering a formatting skip section
+ #-------------------------------------------------
if (
- (
- ( $last_type eq 'Z' && $last_token ne '_' )
- || $last_type eq 'h'
- )
- && $type ne '#' # no longer required due to early exit for '#' above
+ $is_block_comment
+
+ # optional fast pre-check
+ && ( substr( $rLL->[$Kfirst]->[_TOKEN_], 0, 4 ) eq '#<<<'
+ || $rOpts_format_skipping_begin )
+
+ && $rOpts_format_skipping
+ && ( $rLL->[$Kfirst]->[_TOKEN_] . SPACE ) =~
+ /$format_skipping_pattern_begin/
)
{
- $ws = WS_OPTIONAL;
+ my $input_line_no = $line_of_tokens->{_line_number};
+ $In_format_skipping_section = $input_line_no;
+ write_logfile_entry(
+ "Line $input_line_no: Entering format-skipping section\n");
+ $CODE_type = 'FS';
+ next;
}
- $ws_4 = $ws_3 = $ws
- if DEBUG_WHITE;
+ # ignore trailing blank tokens (they will get deleted later)
+ if ( $jmax > 0 && $rLL->[$Klast]->[_TYPE_] eq 'b' ) {
+ $jmax--;
+ }
- if ( !defined($ws) ) {
+ #-----------
+ # Blank line
+ #-----------
+ if ( $jmax < 0 ) {
+ $CODE_type = 'BL';
+ next;
+ }
- #---------------------------------------------------------------
- # Whitespace Rules Section 4:
- # Use the binary rule table.
- #---------------------------------------------------------------
- if ( defined( $binary_ws_rules{$last_type}{$type} ) ) {
- $ws = $binary_ws_rules{$last_type}{$type};
- $ws_4 = $ws if DEBUG_WHITE;
- }
+ #---------
+ # Comments
+ #---------
+ if ($is_block_comment) {
- #---------------------------------------------------------------
- # Whitespace Rules Section 5:
- # Apply default rules not covered above.
- #---------------------------------------------------------------
+ # see if this is a static block comment (starts with ## by default)
+ my $is_static_block_comment = 0;
+ my $no_leading_space = substr( $input_line, 0, 1 ) eq '#';
+ if (
- # If we fall through to here, look at the pre-defined hash tables
- # for the two tokens, and:
- # if (they are equal) use the common value
- # if (either is zero or undef) use the other
- # if (either is -1) use it
- # That is,
- # left vs right
- # 1 vs 1 --> 1
- # 0 vs 0 --> 0
- # -1 vs -1 --> -1
- #
- # 0 vs -1 --> -1
- # 0 vs 1 --> 1
- # 1 vs 0 --> 1
- # -1 vs 0 --> -1
- #
- # -1 vs 1 --> -1
- # 1 vs -1 --> -1
- else {
- my $wl = $want_left_space{$type};
- my $wr = $want_right_space{$last_type};
- if ( !defined($wl) ) {
- $ws = defined($wr) ? $wr : 0;
- }
- elsif ( !defined($wr) ) {
- $ws = $wl;
- }
- else {
- $ws =
- ( ( $wl == $wr ) || ( $wl == -1 ) || !$wr ) ? $wl : $wr;
- }
+ # optional fast pre-check
+ (
+ substr( $rLL->[$Kfirst]->[_TOKEN_], 0, 2 ) eq '##'
+ || $rOpts_static_block_comment_prefix
+ )
+
+ && $rOpts_static_block_comments
+ && $input_line =~ /$static_block_comment_pattern/
+ )
+ {
+ $is_static_block_comment = 1;
}
- }
- # Treat newline as a whitespace. Otherwise, we might combine
- # 'Send' and '-recipients' here according to the above rules:
- # <<snippets/space3.in>>
- # my $msg = new Fax::Send
- # -recipients => $to,
- # -data => $data;
- if ( !$ws
- && $rtokh->[_LINE_INDEX_] != $rtokh_last->[_LINE_INDEX_] )
- {
- $ws = WS_YES;
- }
+ # Check for comments which are line directives
+ # Treat exactly as static block comments without leading space
+ # reference: perlsyn, near end, section Plain Old Comments (Not!)
+ # example: '# line 42 "new_filename.plx"'
+ if (
+ $no_leading_space
+ && $input_line =~ m{^\# \s*
+ line \s+ (\d+) \s*
+ (?:\s("?)([^"]+)\2)? \s*
+ $}x
+ )
+ {
+ $is_static_block_comment = 1;
+ }
- $rwhitespace_flags->[$j] = $ws;
+ # look for hanging side comment ...
+ if (
+ $last_line_had_side_comment # this follows as side comment
+ && !$no_leading_space # with some leading space, and
+ && !$is_static_block_comment # this is not a static comment
+ )
+ {
- # remember non-blank, non-comment tokens
- $last_token = $token;
- $last_type = $type;
- $rtokh_last_last = $rtokh_last;
- $rtokh_last = $rtokh;
+ # continuing an existing HSC chain?
+ if ( $last_CODE_type eq 'HSC' ) {
+ $has_side_comment = 1;
+ $CODE_type = 'HSC';
+ next;
+ }
- next if ( !DEBUG_WHITE );
+ # starting a new HSC chain?
+ if (
- my $str = substr( $last_token, 0, 15 );
- $str .= SPACE x ( 16 - length($str) );
- if ( !defined($ws_1) ) { $ws_1 = "*" }
- if ( !defined($ws_2) ) { $ws_2 = "*" }
- if ( !defined($ws_3) ) { $ws_3 = "*" }
- if ( !defined($ws_4) ) { $ws_4 = "*" }
- print STDOUT
-"NEW WHITE: i=$j $str $last_type $type $ws_1 : $ws_2 : $ws_3 : $ws_4 : $ws \n";
+ $rOpts->{'hanging-side-comments'} # user is allowing
+ # hanging side comments
+ # like this
- # reset for next pass
- $ws_1 = $ws_2 = $ws_3 = $ws_4 = undef;
+ && ( defined($Klast_prev) && $Klast_prev > 1 )
- } ## end main loop
+ # and the previous side comment was not static (issue c070)
+ && !(
+ $rOpts->{'static-side-comments'}
+ && $rLL->[$Klast_prev]->[_TOKEN_] =~
+ /$static_side_comment_pattern/
+ )
- if ( $rOpts->{'tight-secret-operators'} ) {
- new_secret_operator_whitespace( $rLL, $rwhitespace_flags );
- }
- $self->[_ris_function_call_paren_] = $ris_function_call_paren;
- return $rwhitespace_flags;
+ )
+ {
-} ## end sub set_whitespace_flags
+ # and it is not a closing side comment (issue c070).
+ my $K_penult = $Klast_prev - 1;
+ $K_penult -= 1 if ( $rLL->[$K_penult]->[_TYPE_] eq 'b' );
+ my $follows_csc =
+ ( $rLL->[$K_penult]->[_TOKEN_] eq '}'
+ && $rLL->[$K_penult]->[_TYPE_] eq '}'
+ && $rLL->[$Klast_prev]->[_TOKEN_] =~
+ /$closing_side_comment_prefix_pattern/ );
-sub set_container_ws_by_keyword {
+ if ( !$follows_csc ) {
+ $has_side_comment = 1;
+ $CODE_type = 'HSC';
+ next;
+ }
+ }
+ }
- my ( $word, $sequence_number ) = @_;
- return unless (%keyword_paren_inner_tightness);
+ if ($is_static_block_comment) {
+ $CODE_type = $no_leading_space ? 'SBCX' : 'SBC';
+ next;
+ }
+ elsif ( $last_line_had_side_comment
+ && !$rOpts_maximum_consecutive_blank_lines
+ && $rLL->[$Kfirst]->[_LEVEL_] > 0 )
+ {
+ # Emergency fix to keep a block comment from becoming a hanging
+ # side comment. This fix is for the case that blank lines
+ # cannot be inserted. There is related code in sub
+ # 'process_line_of_CODE'
+ $CODE_type = 'SBCX';
+ next;
+ }
+ else {
+ $CODE_type = 'BC';
+ next;
+ }
+ }
- # We just saw a keyword (or other function name) followed by an opening
- # paren. Now check to see if the following paren should have special
- # treatment for its inside space. If so we set a hash value using the
- # sequence number as key.
- if ( $word && $sequence_number ) {
- my $tightness = $keyword_paren_inner_tightness{$word};
- if ( defined($tightness) && $tightness != 1 ) {
- my $ws_flag = $tightness == 0 ? WS_YES : WS_NO;
- $opening_container_inside_ws{$sequence_number} = $ws_flag;
- $closing_container_inside_ws{$sequence_number} = $ws_flag;
+ #-------------------------
+ # Other special code types
+ #-------------------------
+ if ($rOpts_indent_only) {
+ $CODE_type = 'IO';
+ next;
}
- }
- return;
-} ## end sub set_container_ws_by_keyword
-sub ws_in_container {
+ if ( !$rOpts_add_newlines ) {
+ $CODE_type = 'NIN';
+ next;
+ }
- my ( $j, $j_closing, $rLL, $type, $token, $last_token ) = @_;
+ # Patch needed for MakeMaker. Do not break a statement
+ # in which $VERSION may be calculated. See MakeMaker.pm;
+ # this is based on the coding in it.
+ # The first line of a file that matches this will be eval'd:
+ # /([\$*])(([\w\:\']*)\bVERSION)\b.*\=/
+ # Examples:
+ # *VERSION = \'1.01';
+ # ( $VERSION ) = '$Revision: 1.74 $ ' =~ /\$Revision:\s+([^\s]+)/;
+ # We will pass such a line straight through without breaking
+ # it unless -npvl is used.
- # Given:
- # $j = index of token following an opening container token
- # $type, $token = the type and token at index $j
- # $j_closing = closing token of the container
- # $last_token = the opening token of the container
- # Return:
- # WS_NO if there is just one token in the container (with exceptions)
- # WS_YES otherwise
+ # Patch for problem reported in RT #81866, where files
+ # had been flattened into a single line and couldn't be
+ # tidied without -npvl. There are two parts to this patch:
+ # First, it is not done for a really long line (80 tokens for now).
+ # Second, we will only allow up to one semicolon
+ # before the VERSION. We need to allow at least one semicolon
+ # for statements like this:
+ # require Exporter; our $VERSION = $Exporter::VERSION;
+ # where both statements must be on a single line for MakeMaker
- #------------------------------------
- # Look forward for the closing token;
- #------------------------------------
- if ( $j + 1 > $j_closing ) { return WS_NO }
+ if ( !$Saw_VERSION_in_this_file
+ && $jmax < 80
+ && $input_line =~
+ /^[^;]*;?[^;]*([\$*])(([\w\:\']*)\bVERSION)\b.*\=/ )
+ {
+ $Saw_VERSION_in_this_file = 1;
+ write_logfile_entry("passing VERSION line; -npvl deactivates\n");
- # Patch to count '-foo' as single token so that
- # each of $a{-foo} and $a{foo} and $a{'foo'} do
- # not get spaces with default formatting.
- my $j_here = $j;
- ++$j_here
- if ( $token eq '-'
- && $last_token eq '{'
- && $rLL->[ $j + 1 ]->[_TYPE_] eq 'w' );
+ # This code type has lower priority than others
+ $CODE_type = 'VER';
+ next;
+ }
+ }
+ continue {
+ $line_of_tokens->{_code_type} = $CODE_type;
- # Patch to count a sign separated from a number as a single token, as
- # in the following line. Otherwise, it takes two steps to converge:
- # deg2rad(- 0.5)
- if ( ( $type eq 'm' || $type eq 'p' )
- && $j < $j_closing + 1
- && $rLL->[ $j + 1 ]->[_TYPE_] eq 'b'
- && $rLL->[ $j + 2 ]->[_TYPE_] eq 'n'
- && $rLL->[ $j + 2 ]->[_TOKEN_] =~ /^\d/ )
- {
- $j_here = $j + 2;
+ $last_line_had_side_comment = $has_side_comment;
+ if ($has_side_comment) {
+ push @ix_side_comments, $ix_line;
+ $has_side_comment = 0;
+ }
}
- # $j_next is where a closing token should be if the container has
- # just a "single" token
- if ( $j_here + 1 > $j_closing ) { return WS_NO }
- my $j_next =
- ( $rLL->[ $j_here + 1 ]->[_TYPE_] eq 'b' )
- ? $j_here + 2
- : $j_here + 1;
+ return \@ix_side_comments;
+} ## end sub set_CODE_type
- #-----------------------------------------------------------------
- # Now decide: if we get to the closing token we will keep it tight
- #-----------------------------------------------------------------
- if (
- $j_next == $j_closing
+sub block_seqno_of_paren_keyword {
- # OLD PROBLEM: but watch out for this: [ [ ] (misc.t)
- # No longer necessary because of the previous check on sequence numbers
- ##&& $last_token ne $token
+ my ( $self, $KK ) = @_;
- # double diamond is usually spaced
- && $token ne '<<>>'
+ # Find brace at '){' after keyword such as for, foreach, ...
+ # SEE ALSO: sub block_seqno_of_paren_seqno
- )
- {
- return WS_NO;
- }
+ # Given:
+ # $KK = index of a keyword followed by parens and block '... ( ) {'
+ # such as 'for', 'foreach', 'while', 'if', 'elsif' ..
+ # Return:
+ # $seqno of the opening block brace for this keyword, if any
+ # $K_end_iterator = index of the last token of an iterator, if any
+ # or
+ # nothing if not found
- return WS_YES;
+ # 'for my $var (..) { ... }'
+ # ^ ^
+ # | |
+ # --$KK --$seqno of brace that we want
-} ## end sub ws_in_container
+ my $rLL = $self->[_rLL_];
-sub ws_space_function_paren {
+ # look ahead for an opening paren
+ my $K_paren = $self->[_rK_next_seqno_by_K_]->[$KK];
+ return unless defined($K_paren);
+ my $token_paren = $rLL->[$K_paren]->[_TOKEN_];
+ return unless ( $token_paren eq '(' );
- my ( $self, $j, $rtokh_last_last ) = @_;
+ # found a paren, but does it belong to this keyword?
+ my $seqno_paren = $rLL->[$K_paren]->[_TYPE_SEQUENCE_];
- # Called if --space-function-paren is set to see if it might cause
- # a problem. The manual warns the user about potential problems with
- # this flag. Here we just try to catch one common problem.
+ # see if this opening paren immediately follows the keyword
+ my $K_n = $self->K_next_code($KK);
+ return unless $K_n;
- # Given:
- # $j = index of '(' after function name
- # Return:
- # WS_NO if no space
- # WS_YES otherwise
+ # is it the next token? this is the common case
+ my $K_end_iterator;
+ my $saw_my;
+ my $token_KK = $rLL->[$KK]->[_TOKEN_];
+ if ( $K_n != $K_paren ) {
- # This was added to fix for issue c166. Ignore -sfp at a possible indirect
- # object location. For example, do not convert this:
- # print header() ...
- # to this:
- # print header () ...
- # because in this latter form, header may be taken to be a file handle
- # instead of a function call.
+ # look for 'for $var (', 'for my $var (', 'for my (', 'for $var ('
+ if ( $is_for_foreach{$token_KK} ) {
+ my $type_K_n = $rLL->[$K_n]->[_TYPE_];
+ my $token_K_n = $rLL->[$K_n]->[_TOKEN_];
- # Start with the normal value for -sfp:
- my $ws = WS_YES;
+ # skip past a 'my'
+ if ( $type_K_n eq 'k' ) {
+ if ( $is_my_state_our{$token_K_n} ) {
+ $K_n = $self->K_next_code($K_n);
+ $saw_my = 1;
+ }
+ else { return }
+ }
- # now check to be sure we don't cause a problem:
- my $type_ll = $rtokh_last_last->[_TYPE_];
- my $tok_ll = $rtokh_last_last->[_TOKEN_];
+ # skip an identifier
+ if ( $K_n && $K_n != $K_paren && $rLL->[$K_n]->[_TYPE_] eq 'i' ) {
+ $K_n = $self->K_next_code($K_n);
- # NOTE: this is just a minimal check. For example, we might also check
- # for something like this:
- # print ( header ( ..
- if ( $type_ll eq 'k' && $is_indirect_object_taker{$tok_ll} ) {
- $ws = WS_NO;
+ # force this iterator to be entered as new lexical
+ $K_end_iterator = $K_paren;
+ }
+ }
}
- return $ws;
-
-} ## end sub ws_space_function_paren
+ # we must be at the paren
+ return unless ( $K_n && $K_n == $K_paren );
-} ## end closure set_whitespace_flags
+ # now jump to the closing paren
+ $K_paren = $self->[_K_closing_container_]->{$seqno_paren};
-sub dump_want_left_space {
- my $fh = shift;
- local $LIST_SEPARATOR = "\n";
- $fh->print(<<EOM);
-These values are the main control of whitespace to the left of a token type;
-They may be altered with the -wls parameter.
-For a list of token types, use perltidy --dump-token-types (-dtt)
- 1 means the token wants a space to its left
--1 means the token does not want a space to its left
-------------------------------------------------------------------------
-EOM
- foreach my $key ( sort keys %want_left_space ) {
- $fh->print("$key\t$want_left_space{$key}\n");
- }
- return;
-} ## end sub dump_want_left_space
+ # then look for the opening brace immediately after it
+ my $K_brace = $self->K_next_code($K_paren);
+ return unless ($K_brace);
-sub dump_want_right_space {
- my $fh = shift;
- local $LIST_SEPARATOR = "\n";
- $fh->print(<<EOM);
-These values are the main control of whitespace to the right of a token type;
-They may be altered with the -wrs parameter.
-For a list of token types, use perltidy --dump-token-types (-dtt)
- 1 means the token wants a space to its right
--1 means the token does not want a space to its right
-------------------------------------------------------------------------
+ # check for experimental 'for list': for my ( $a, $b) (@list) {
+ # ^
+ if ( $rLL->[$K_brace]->[_TOKEN_] eq '('
+ && !$K_end_iterator
+ && $is_for_foreach{$token_KK} )
+ {
+ if ( !$saw_my ) { $K_end_iterator = $K_brace }
+ my $seqno_test = $rLL->[$K_brace]->[_TYPE_SEQUENCE_];
+ my $K_test = $self->[_K_closing_container_]->{$seqno_test};
+ return unless $K_test;
+ $K_brace = $self->K_next_code($K_test);
+ return unless ($K_brace);
+ }
+
+ return unless ( $rLL->[$K_brace]->[_TOKEN_] eq '{' );
+ my $seqno_brace = $rLL->[$K_brace]->[_TYPE_SEQUENCE_];
+ return unless ($seqno_brace);
+ my $block_type = $self->[_rblock_type_of_seqno_]->{$seqno_brace};
+
+ # Verify that this is the correct brace
+ if ( $block_type ne $token_KK ) {
+
+ # If not, this is unexpected and should be investigated
+ # (the block type may have been mis-marked)
+ my $lno = $rLL->[$KK]->[_LINE_INDEX_] + 1;
+ DEVEL_MODE && Fault(<<EOM);
+at line $lno: found block type $block_type: expecting $token_KK - please check
EOM
- foreach my $key ( sort keys %want_right_space ) {
- $fh->print("$key\t$want_right_space{$key}\n");
+ return;
}
- return;
-} ## end sub dump_want_right_space
-
-{ ## begin closure is_essential_whitespace
- my %is_sort_grep_map;
- my %is_for_foreach;
- my %is_digraph;
- my %is_trigraph;
- my %essential_whitespace_filter_l1;
- my %essential_whitespace_filter_r1;
- my %essential_whitespace_filter_l2;
- my %essential_whitespace_filter_r2;
- my %is_type_with_space_before_bareword;
- my %is_special_variable_char;
+ return ( $seqno_brace, $K_end_iterator );
+} ## end sub block_seqno_of_paren_keyword
- BEGIN {
+sub has_complete_package {
+ my ($self) = @_;
- my @q;
+ # return true if this file appears to contain at least one complete package
- # NOTE: This hash is like the global %is_sort_map_grep, but it ignores
- # grep aliases on purpose, since here we are looking parens, not braces
- @q = qw(sort grep map);
- @is_sort_grep_map{@q} = (1) x scalar(@q);
+ my $Klast = $self->K_last_code();
+ return unless defined($Klast);
- @q = qw(for foreach);
- @is_for_foreach{@q} = (1) x scalar(@q);
+ my $rLL = $self->[_rLL_];
- @q = qw(
- .. :: << >> ** && || // -> => += -= .= %= &= |= ^= *= <>
- <= >= == =~ !~ != ++ -- /= x= ~~ ~. |. &. ^.
- );
- @is_digraph{@q} = (1) x scalar(@q);
+ my $rK_package_list = $self->[_rK_package_list_];
+ return unless ( defined($rK_package_list) && @{$rK_package_list} );
- @q = qw( ... **= <<= >>= &&= ||= //= <=> !~~ &.= |.= ^.= <<~);
- @is_trigraph{@q} = (1) x scalar(@q);
+ # look for a file like this:
+ # package A::B
+ # ...
+ # 1;
- # These are used as a speedup filters for sub is_essential_whitespace.
+ my $KK = $rK_package_list->[0];
+ my $item = $rLL->[$KK];
+ my $type = $item->[_TYPE_];
- # Filter 1:
- # These left side token types USUALLY do not require a space:
- @q = qw( ; { } [ ] L R );
- push @q, ',';
- push @q, ')';
- push @q, '(';
- @essential_whitespace_filter_l1{@q} = (1) x scalar(@q);
+ # Stored K values may be off by 1 due to an added blank
+ if ( $type eq 'b' ) {
+ $KK += 1;
+ $item = $rLL->[$KK];
+ $type = $item->[_TYPE_];
+ }
- # BUT some might if followed by these right token types
- @q = qw( pp mm << <<= h );
- @essential_whitespace_filter_r1{@q} = (1) x scalar(@q);
+ # safety check - shouldn't happen
+ if ( $type ne 'P' ) {
+ DEVEL_MODE && Fault("Expecting type 'P' but found '$type'");
+ return;
+ }
+ my $level = $item->[_LEVEL_];
+ return unless ( $level == 0 );
- # Filter 2:
- # These right side filters usually do not require a space
- @q = qw( ; ] R } );
- push @q, ',';
- push @q, ')';
- @essential_whitespace_filter_r2{@q} = (1) x scalar(@q);
+ # Look for '1;' at next package, if any, and at end of file
+ my @K_semicolon_test = ($Klast);
+ if ( @{$rK_package_list} > 1 ) {
+ my $K_package = $rK_package_list->[1];
+ my $Ktest = $self->K_previous_code($K_package);
+ push @K_semicolon_test, $Ktest;
+ }
- # BUT some might if followed by these left token types
- @q = qw( h Z );
- @essential_whitespace_filter_l2{@q} = (1) x scalar(@q);
+ foreach my $Ktest (@K_semicolon_test) {
+ if ( $rLL->[$Ktest]->[_TYPE_] eq 'b' ) { $Ktest -= 1 }
+ if ( $Ktest > $KK && $Ktest && $rLL->[$Ktest]->[_TYPE_] eq ';' ) {
+ my $K1 = $self->K_previous_code($Ktest);
+ if ( $K1 && $rLL->[$K1]->[_TOKEN_] eq '1' ) {
+ return 1;
+ }
+ }
+ }
+ return;
+} ## end sub has_complete_package
- # Keep a space between certain types and any bareword:
- # Q: keep a space between a quote and a bareword to prevent the
- # bareword from becoming a quote modifier.
- # &: do not remove space between an '&' and a bare word because
- # it may turn into a function evaluation, like here
- # between '&' and 'O_ACCMODE', producing a syntax error [File.pm]
- # $opts{rdonly} = (($opts{mode} & O_ACCMODE) == O_RDONLY);
- @q = qw( Q & );
- @is_type_with_space_before_bareword{@q} = (1) x scalar(@q);
+sub is_complete_script {
+ my ( $self, $rline_type_count, $rkeyword_count ) = @_;
- # These are the only characters which can (currently) form special
- # variables, like $^W: (issue c066, c068).
- @q =
- qw{ ? A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ };
- @{is_special_variable_char}{@q} = (1) x scalar(@q);
+ # Guess if we are formatting a complete script
+ # Given:
+ # $rline_type_count = hash ref of count of line types
+ # $rkeyword_count = hash ref of count of keywords
+ # Return: true or false
+
+ # Goal: help decide if we should skip certain warning checks when
+ # operating on just part of a script (such as from an editor).
+
+ #----------------------------------------------------------------
+ # TEST 1: Assume a file with known extension is a complete script
+ #----------------------------------------------------------------
+ my %is_standard_file_extension = (
+ 'pm' => 1,
+ 'pl' => 1,
+ 'plx' => 1,
+ 't' => 1,
+ );
+ my $input_stream_name = get_input_stream_name();
- } ## end BEGIN
+ # look for a file extension
+ my $pos_dot = rindex( $input_stream_name, '.' );
+ my $file_extension = EMPTY_STRING;
+ if ( $pos_dot > 0 ) {
+ $file_extension = substr( $input_stream_name, $pos_dot + 1 );
- sub is_essential_whitespace {
+ # allow additional digits, like .pm.0, .pm.1 etc
+ if ( defined($file_extension)
+ && length($file_extension)
+ && $file_extension =~ /^\d+$/ )
+ {
+ my $str = substr( $input_stream_name, 0, $pos_dot );
+ $pos_dot = rindex( $str, '.' );
+ if ( $pos_dot > 0 ) {
+ $file_extension = substr( $str, $pos_dot + 1 );
+ }
+ }
- # Essential whitespace means whitespace which cannot be safely deleted
- # without risking the introduction of a syntax error.
- # We are given three tokens and their types:
- # ($tokenl, $typel) is the token to the left of the space in question
- # ($tokenr, $typer) is the token to the right of the space in question
- # ($tokenll, $typell) is previous nonblank token to the left of $tokenl
- #
- # Note1: This routine should almost never need to be changed. It is
- # for avoiding syntax problems rather than for formatting.
+ return 1 if $is_standard_file_extension{ lc($file_extension) };
+ }
- # Note2: The -mangle option causes large numbers of calls to this
- # routine and therefore is a good test. So if a change is made, be sure
- # to use nytprof to profile with both old and reviesed coding using the
- # -mangle option and check differences.
+ #-------------------------------------------------------------
+ # TEST 2: a positive starting level implies an incomplete script
+ #-------------------------------------------------------------
+ my $rLL = $self->[_rLL_];
+ return unless ( @{$rLL} );
+ my $sil = $rLL->[0]->[_LEVEL_];
+ return if ($sil);
- my ( $tokenll, $typell, $tokenl, $typel, $tokenr, $typer ) = @_;
+ #------------------------------------
+ # TEST 3: look for a complete package
+ #------------------------------------
+ return 1 if $self->has_complete_package();
- # This is potentially a very slow routine but the following quick
- # filters typically catch and handle over 90% of the calls.
+ #----------------------------
+ # TEST 4: examine other clues
+ #----------------------------
+ my $rlines = $self->[_rlines_];
+ my $line_count = @{$rlines};
+ return unless ($line_count);
+
+ my $input_line = $rlines->[0]->{_line_text};
+ my $saw_hash_bang = substr( $input_line, 0, 2 ) eq '#!'
+ && $input_line =~ /^\#\!.*perl\b/;
+
+ my $rK_package_list = $self->[_rK_package_list_];
+ my $saw_package = defined($rK_package_list) && @{$rK_package_list};
+ my $sub_count = +keys %{ $self->[_ris_sub_block_] };
+ my $use_count = 0;
+ $use_count += $rkeyword_count->{use} if $rkeyword_count->{use};
+ $use_count += $rkeyword_count->{require} if $rkeyword_count->{require};
+
+ # Make a guess using the available clues. No single clue is conclusive.
+ my $score = 0;
+
+ # starting indicators
+ $score += 50
+ if ( $saw_hash_bang
+ || $self->[_saw_use_strict_]
+ || $saw_package );
+
+ $score +=
+ $use_count > 1 ? 50
+ : $use_count > 0 ? 25
+ : 0;
+
+ # interior indicators
+ $score +=
+ $line_count > 50 ? 50
+ : $line_count > 25 ? 25
+ : 0;
+ $score +=
+ $sub_count > 1 ? 50
+ : $sub_count > 0 ? 25
+ : 0;
+
+ # common filter keywords
+ foreach (qw( exit print printf open system exec die )) {
+ if ( $rkeyword_count->{$_} ) { $score += 50; last; }
+ }
+
+ $score += 50 if $rline_type_count->{POD};
+
+ # ending indicator
+ $score += 50 if $self->[_saw_END_or_DATA_];
+
+ if ( $score >= 100 ) { return 1 }
+ return;
+} ## end sub is_complete_script
- # Filter 1: usually no space required after common types ; , [ ] { } ( )
- return
- if ( $essential_whitespace_filter_l1{$typel}
- && !$essential_whitespace_filter_r1{$typer} );
+use constant DEBUG_USE_CONSTANT => 0;
- # Filter 2: usually no space before common types ; ,
- return
- if ( $essential_whitespace_filter_r2{$typer}
- && !$essential_whitespace_filter_l2{$typel} );
+sub get_qw_list {
+ my ( $self, $Kn ) = @_;
- # Filter 3: Handle side comments: a space is only essential if the left
- # token ends in '$' For example, we do not want to create $#foo below:
+ # Given:
+ # $Kn = index of start of a qw quote
+ # Return:
+ # ($K_last_q, \@list) to list of words, or
+ # nothing if error
- # sub t086
- # ( #foo)))
- # $ #foo)))
- # a #foo)))
- # ) #foo)))
- # { ... }
+ my $rLL = $self->[_rLL_];
+ return unless defined($Kn);
+ my $type_n = $rLL->[$Kn]->[_TYPE_];
+ return unless ( $type_n eq 'q' );
+ my $token_n = $rLL->[$Kn]->[_TOKEN_];
+ my $K_last_q = $Kn;
+
+ # collect a multi-line qw
+ my $string = $token_n;
+ foreach my $Knn ( $Kn + 1 .. @{$rLL} - 1 ) {
+ my $type_nn = $rLL->[$Knn]->[_TYPE_];
+ next if ( $type_nn eq 'b' );
+ last if ( $type_nn ne 'q' );
+ $string .= SPACE . $rLL->[$Knn]->[_TOKEN_];
+ $K_last_q = $Knn;
+ }
+
+ $string = substr( $string, 2 ); # remove qw
+ $string =~ s/^\s*//; # trim left
+ $string = substr( $string, 1 ); # remove opening mark char
+ $string = substr( $string, 0, -1 ); # remove closing mark char
+ $string =~ s/^\s*//; # trim left
+ $string =~ s/\s*$//; # trim right
+
+ my @list = split /\s+/, $string;
+ return ( $K_last_q, \@list );
+} ## end sub get_qw_list
+
+sub expand_quoted_word_list {
+ my ( $self, $Kbeg ) = @_;
+
+ # Expand a list quoted words
+ # Given:
+ # $Kbeg = index of the start of a list of quoted words
+ # Returns:
+ # ref to list if found words
+ # undef if not successful, or non-constant list item encountered
+ my $rLL = $self->[_rLL_];
+ return unless defined($Kbeg);
+ my $Klimit = @{$rLL} - 1;
+ my @list;
+ my $Kn = $Kbeg - 1;
+ while ( ++$Kn <= $Klimit ) {
+
+ my $type = $rLL->[$Kn]->[_TYPE_];
+ my $token = $rLL->[$Kn]->[_TOKEN_];
+
+ next if ( $type eq 'b' );
+ next if ( $type eq '#' );
+ next if ( $token eq '(' );
+ next if ( $token eq ')' );
+ next if ( $token eq ',' );
+ last if ( $type eq ';' );
+ last if ( $token eq '}' );
+
+ if ( $type eq 'q' ) {
+
+ # qw list
+ my ( $K_last_q, $rlist ) = $self->get_qw_list($Kn);
+ return if ( !defined($K_last_q) );
+ if ( $K_last_q > $Kn ) { $Kn = $K_last_q }
+ push @list, @{$rlist};
+ }
+ elsif ( $type eq 'Q' ) {
- # Also, I prefer not to put a ? and # together because ? used to be
- # a pattern delimiter and spacing was used if guessing was needed.
+ # single quoted word
+ next if ( length($token) < 3 );
+ my $name = substr( $token, 1, -1 );
+ push @list, $name;
+ }
- if ( $typer eq '#' ) {
+ else {
- return 1
- if ( $tokenl
- && ( $typel eq '?' || substr( $tokenl, -1 ) eq '$' ) );
+ # Give up on anything else..
+ # some examples where we have to quit:
+ # @EXPORT = Archive::Tar::Constant->_list_consts( __PACKAGE__ );
+ # @EXPORT = ( @CONSTANTS, qw( %ALL_CODESETS));
+ # @EXPORT = ( @{$EXPORT_TAGS{standard}}, ..
return;
}
+ } ## end while ( ++$Kn <= $Klimit )
+ return \@list;
- my $tokenr_is_bareword = $tokenr =~ /^\w/ && $tokenr !~ /^\d/;
- my $tokenr_is_open_paren = $tokenr eq '(';
- my $token_joined = $tokenl . $tokenr;
- my $tokenl_is_dash = $tokenl eq '-';
+} ## end sub expand_quoted_word_list
- my $result =
+sub expand_EXPORT_list {
+ my ( $self, $KK, $rhash ) = @_;
- # never combine two bare words or numbers
- # examples: and ::ok(1)
- # return ::spw(...)
- # for bla::bla:: abc
- # example is "%overload:: and" in files Dumpvalue.pm or colonbug.pl
- # $input eq"quit" to make $inputeq"quit"
- # my $size=-s::SINK if $file; <==OK but we won't do it
- # don't join something like: for bla::bla:: abc
- # example is "%overload:: and" in files Dumpvalue.pm or colonbug.pl
- ( ( $tokenl =~ /([\'\w]|\:\:)$/ && $typel ne 'CORE::' )
- && ( $tokenr =~ /^([\'\w]|\:\:)/ ) )
+ # Given:
+ # $KK = index of variable @EXPORT or @EXPORT_OK
+ # $rhash = a hash to fill
+ # Task:
+ # Update $rhash with any quoted words which follow any subsequent '='
- # do not combine a number with a concatenation dot
- # example: pom.caputo:
- # $vt100_compatible ? "\e[0;0H" : ('-' x 78 . "\n");
- || $typel eq 'n' && $tokenr eq '.'
- || $typer eq 'n' && $tokenl eq '.'
+ my $rLL = $self->[_rLL_];
+ my $Kn = $self->K_next_code($KK);
- # cases of a space before a bareword...
- || (
- $tokenr_is_bareword && (
+ # Require a following '='
+ return unless ( $Kn && $rLL->[$Kn]->[_TYPE_] eq '=' );
- # do not join a minus with a bare word, because you might form
- # a file test operator. Example from Complex.pm:
- # if (CORE::abs($z - i) < $eps);
- # "z-i" would be taken as a file test.
- $tokenl_is_dash && length($tokenr) == 1
+ # Move to the next token
+ $Kn = $self->K_next_code($Kn);
+ return unless ($Kn);
- # and something like this could become ambiguous without space
- # after the '-':
- # use constant III=>1;
- # $a = $b - III;
- # and even this:
- # $a = - III;
- || $tokenl_is_dash && $typer =~ /^[wC]$/
+ # Get any list
+ my $rlist = $self->expand_quoted_word_list($Kn);
+ return unless ($rlist);
- # keep space between types Q & and a bareword
- || $is_type_with_space_before_bareword{$typel}
+ # Include the listed words in the hash
+ foreach ( @{$rlist} ) { $rhash->{$_} = 1 }
+ return;
+} ## end sub expand_EXPORT_list
- # +-: binary plus and minus before a bareword could get
- # converted into unary plus and minus on next pass through the
- # tokenizer. This can lead to blinkers: cases b660 b670 b780
- # b781 b787 b788 b790 So we keep a space unless the +/- clearly
- # follows an operator
- || ( ( $typel eq '+' || $typel eq '-' )
- && $typell !~ /^[niC\)\}\]R]$/ )
+sub scan_variable_usage {
- # keep a space between a token ending in '$' and any word;
- # this caused trouble: "die @$ if $@"
- || $typel eq 'i' && substr( $tokenl, -1, 1 ) eq '$'
+ my ( $self, ($roption) ) = @_;
- # don't combine $$ or $# with any alphanumeric
- # (testfile mangle.t with --mangle)
- || $tokenl eq '$$'
- || $tokenl eq '$#'
+ # Scan for unused and reused lexical variables in a single sweep.
- )
- ) ## end $tokenr_is_bareword
+ # Given:
+ # $roption = an optional set of types of checks,
+ # all checks are made if not given
+ # Return:
+ # - nothing if no errors found
+ # - ref to a list of 'warnings', one per variable, in line order.
+ # Each list item is a hash of values describing the issue. These
+ # are stored in a list of hash refs, as follows:
+ # push @warnings,
+ # {
+ # name => $name, # name, such as '$var', '%data'
+ # line_number => $line_number, # line number where defined
+ # K => $KK, # index of token $name
+ # keyword => $keyword, # 'my', 'state', 'for', 'foreach'
+ # letter => $letter, # one of: r s p u
+ # note => $note, # additional text info
+ # see_line => $see_line, # line referenced in note
+ # };
- # OLD, not used
- # '= -' should not become =- or you will get a warning
- # about reversed -=
- # || ($tokenr eq '-')
+ my $rLL = $self->[_rLL_];
+ my $rlines = $self->[_rlines_];
+ my $rblock_type_of_seqno = $self->[_rblock_type_of_seqno_];
+ my $ris_sub_block = $self->[_ris_sub_block_];
+ my $K_closing_container = $self->[_K_closing_container_];
- # do not join a bare word with a minus, like between 'Send' and
- # '-recipients' here <<snippets/space3.in>>
- # my $msg = new Fax::Send
- # -recipients => $to,
- # -data => $data;
- # This is the safest thing to do. If we had the token to the right of
- # the minus we could do a better check.
- #
- # And do not combine a bareword and a quote, like this:
- # oops "Your login, $Bad_Login, is not valid";
- # It can cause a syntax error if oops is a sub
- || $typel eq 'w' && ( $tokenr eq '-' || $typer eq 'Q' )
+ # check for file without code (could be all pod or comments)
+ return unless defined( $self->K_first_code() );
- # perl is very fussy about spaces before <<
- || substr( $tokenr, 0, 2 ) eq '<<'
+ # issues are indicated by these names:
+ my %unusual_variable_issue_note = (
+ c => "unused constant",
+ p => "package crossing",
+ r => "reused",
+ s => "multi-sigil",
+ u => "unused lexical",
+ );
- # avoid combining tokens to create new meanings. Example:
- # $a+ +$b must not become $a++$b
- || ( $is_digraph{$token_joined} )
- || $is_trigraph{$token_joined}
+ # Default is to do all checks if no control hash received (dump mode)
+ if ( !defined($roption) ) {
+ foreach my $key ( keys %unusual_variable_issue_note ) {
+ $roption->{$key} = 1;
+ }
+ }
+
+ my $issue_type_string = "Issue types are";
+ foreach my $letter ( reverse sort keys %unusual_variable_issue_note ) {
+ next if ( !$roption->{$letter} );
+ my $txt = $unusual_variable_issue_note{$letter};
+ $issue_type_string .= " '$letter'=$txt";
+ }
+
+ # Unpack the control hash
+ my $check_sigil = $roption->{'s'};
+ my $check_cross_package = $roption->{'p'};
+ my $check_unused = $roption->{'u'};
+ my $check_reused = $roption->{'r'};
+ my $check_constant = $roption->{'c'};
+
+ my %is_valid_sigil = ( '$' => 1, '@' => 1, '%' => 1 );
+
+ # Variables defining current state:
+ my $current_package = 'main';
+
+ # The basic idea of this routine is straightforward:
+ # - We create a stack of block braces
+ # - We walk through the tokens in the file
+ # - At an opening block brace, we push a new stack entry
+ # - At a closing block brace, we pop the stack,
+ # and check the count of any 'my' vars (issue 'u')
+ # - At an identifier, like '$var':
+ # - if it follows a 'my' we enter it on the stack with starting count 0
+ # check conflicts with any other vars on the stack (issues 'r' and 's')
+ # - otherwise, we see if the variable is in the stack, and if so,
+ # update the count
+ # - At a package, we see if it has access to existing 'my' vars (issue 'p')
+
+ # There are lots of details, but that's the main idea. A difficulty is
+ # when 'my' vars are created in the control section of blocks such as
+ # for, foreach, if, unless, .. these follow special rules. The
+ # way it is done here is to propagate such vars in a special control
+ # layer stack entry which is pushed on just before these blocks.
+
+ my $rblock_stack = [];
+ my $rconstant_hash = {};
+ my $ruse_vars_hash = {};
+ my $rEXPORT_hash = {};
+
+ #---------------------------------------
+ # sub to push a block brace on the stack
+ #---------------------------------------
+ my $push_block_stack = sub {
+ my ( $seqno, $rvars ) = @_;
+
+ # push an entry for a new block onto the block stack:
+ # Given:
+ # $seqno = the sequence number of the code block
+ # $rvars = hash of initial identifiers for the block, if given
+ # will be empty hash ref if not given
+ if ( !defined($rvars) ) { $rvars = {} }
- # another example: do not combine these two &'s:
- # allow_options & &OPT_EXECCGI
- || $is_digraph{ $tokenl . substr( $tokenr, 0, 1 ) }
+ push @{$rblock_stack},
+ { seqno => $seqno, package => $current_package, rvars => $rvars };
+ return;
+ }; ## end $push_block_stack = sub
+
+ $push_block_stack->(SEQ_ROOT);
+
+ # $rhash holds all lexecal variables defined within a given block:
+ # $rhash->{
+ # $name => {
+ # count => $count,
+ # line_index => $line_index,
+ # keyword => $keyword,
+ # package => $package,
+ # K => $KK
+ # }
+ # };
+ # $name = the variable name, such as '$data', '@list', '%vars',
+ # $count = number of uses
+ # $line_index = index of the line where it is defined
+ # $keyword = 'my' or 'state' or 'for' or 'foreach'
+ # $package = what package was in effect when it was defined
+ # $KK = token index (for sorting)
+
+ # Variables for a batch of lexical vars being collected:
+ my $my_keyword; # 'state' or 'my' keyword for this set
+ my $K_end_my = -1; # max token index of this set
+ my $in_signature_seqno = 0; # true while scanning a signature
+ my $my_starting_count = 0; # the initial token count for this set
+
+ # Variables for warning messages:
+ my @warnings; # array of warning messages
+ my %package_warnings; # warning messages for package cross-over
+ my %sub_count_by_package; # how many subs defined in a package
+
+ # Variables for scanning interpolated quotes:
+ my $ix_HERE_END = -1; # the line index of the last here target read
+ my $in_interpolated_quote; # in multiline quote with interpolation?
- # retain any space after possible filehandle
- # (testfiles prnterr1.t with --extrude and mangle.t with --mangle)
- || $typel eq 'Z'
+ #-------------------------------------------------------
+ # sub to check for overlapping usage, issues 'r' and 's'
+ #-------------------------------------------------------
+ my $check_for_overlapping_variables = sub {
- # Added 'Y' here 16 Jan 2021 to prevent -mangle option from removing
- # space after type Y. Otherwise, it will get parsed as type 'Z' later
- # and any space would have to be added back manually if desired.
- || $typel eq 'Y'
+ my ( $name, $KK ) = @_;
- # Perl is sensitive to whitespace after the + here:
- # $b = xvals $a + 0.1 * yvals $a;
- || $typell eq 'Z' && $typel =~ /^[\/\?\+\-\*]$/
+ # Given:
+ # $name = a variable with sigil, such as '$var', '%var', '@var';
+ # $KK = index associated with this variable
+ # $line_index = index of line where this name first appears
+ # Task:
+ # Create a warning if this overlaps a previously defined variable
+ # Returns:
+ # true if error, variable is not of expected form with sigil
+ # false if no error
- || (
- $tokenr_is_open_paren && (
+ my $sigil = EMPTY_STRING;
+ my $word = EMPTY_STRING;
+ if ( $name =~ /^(\W+)(\w+)$/ ) {
+ $sigil = $1;
+ $word = $2;
+ }
+ else {
- # keep paren separate in 'use Foo::Bar ()'
- ( $typel eq 'w' && $typell eq 'k' && $tokenll eq 'use' )
+ # give up, flag as error
+ # could be something like '$' or '@' in a signature, or
+ # for $Storable::downgrade_restricted (0, 1, ...
+ return 1;
+ }
- # OLD: keep any space between filehandle and paren:
- # file mangle.t with --mangle:
- # NEW: this test is no longer necessary here (moved above)
- ## || $typel eq 'Y'
+ # Perform checks for reused names
+ my @sigils_to_test;
+ if ($check_sigil) {
+ if ($check_reused) {
+ @sigils_to_test = (qw( $ @ % ));
+ }
+ else {
+ foreach my $sig (qw( $ @ % )) {
+ if ( $sig ne $sigil ) { push @sigils_to_test, $sig; }
+ }
+ }
+ }
+ elsif ($check_reused) {
+ push @sigils_to_test, $sigil;
+ }
+ else {
+ # neither
+ }
- # must have space between grep and left paren; "grep(" will fail
- || $is_sort_grep_map{$tokenl}
+ # See if this name has been seen, possibly with a different sigil
+ if (@sigils_to_test) {
- # don't stick numbers next to left parens, as in:
- #use Mail::Internet 1.28 (); (see Entity.pm, Head.pm, Test.pm)
- || $typel eq 'n'
- )
- ) ## end $tokenr_is_open_paren
+ # Look at stack and 'use vars' hash
+ foreach
+ my $item ( @{$rblock_stack}, $ruse_vars_hash->{$current_package} )
+ {
- # retain any space after here doc operator ( hereerr.t)
- || $typel eq 'h'
+ # distinguish between stack item and use vars item
+ my $rhash = defined( $item->{seqno} ) ? $item->{rvars} : $item;
+
+ foreach my $sig (@sigils_to_test) {
+ my $test_name = $sig . $word;
+
+ next unless ( $rhash->{$test_name} );
+ my $first_line = $rhash->{$test_name}->{line_index} + 1;
+ my $letter;
+ my $note;
+ my $see_line = 0;
+ if ( $sig eq $sigil ) {
+ my $as_iterator =
+ defined($my_keyword)
+ && ( $my_keyword eq 'for'
+ || $my_keyword eq 'foreach' )
+ ? ' as iterator'
+ : EMPTY_STRING;
+ $note = "reused$as_iterator - see line $first_line";
+ $letter = 'r';
+ }
+ else {
+ $see_line = $first_line;
+ $note =
+ "overlaps $test_name in scope - see line $see_line";
+ $letter = 's';
+ }
- # be careful with a space around ++ and --, to avoid ambiguity as to
- # which token it applies
- || ( $typer eq 'pp' || $typer eq 'mm' ) && $tokenl !~ /^[\;\{\(\[]/
- || ( $typel eq '++' || $typel eq '--' )
- && $tokenr !~ /^[\;\}\)\]]/
+ my $line_index = $rLL->[$KK]->[_LINE_INDEX_];
+ push @warnings,
+ {
+ name => $name,
+ keyword => $my_keyword,
+ note => $note,
+ see_line => $see_line,
+ line_number => $line_index + 1,
+ letter => $letter,
+ K => $KK,
+ };
+ last;
+ }
+ }
+ }
+ return;
+ }; ## end $check_for_overlapping_variables = sub
- # need space after foreach my; for example, this will fail in
- # older versions of Perl:
- # foreach my$ft(@filetypes)...
- || (
- $tokenl eq 'my'
+ #--------------------------------
+ # sub to checkin a new identifier
+ #--------------------------------
+ my $checkin_new_lexical = sub {
+ my ($KK) = @_;
- && substr( $tokenr, 0, 1 ) eq '$'
+ # Store the new identifier at index $KK
- # /^(for|foreach)$/
- && $is_for_foreach{$tokenll}
- )
+ my $name = $rLL->[$KK]->[_TOKEN_];
+ my $line_index = $rLL->[$KK]->[_LINE_INDEX_];
- # Keep space after like $^ if needed to avoid forming a different
- # special variable (issue c068). For example:
- # my $aa = $^ ? "none" : "ok";
- || ( $typel eq 'i'
- && length($tokenl) == 2
- && substr( $tokenl, 1, 1 ) eq '^'
- && $is_special_variable_char{ substr( $tokenr, 0, 1 ) } )
+ # Special checks for signature variables
+ if ($in_signature_seqno) {
- # We must be sure that a space between a ? and a quoted string
- # remains if the space before the ? remains. [Loca.pm, lockarea]
- # ie,
- # $b=join $comma ? ',' : ':', @_; # ok
- # $b=join $comma?',' : ':', @_; # ok!
- # $b=join $comma ?',' : ':', @_; # error!
- # Not really required:
- ## || ( ( $typel eq '?' ) && ( $typer eq 'Q' ) )
+ # must be in top signature layer
+ my $parent = $self->parent_seqno_by_K($KK);
+ return if ( $parent != $in_signature_seqno );
- # Space stacked labels...
- # Not really required: Perl seems to accept non-spaced labels.
- ## || $typel eq 'J' && $typer eq 'J'
+ # must be preceded by a comma or opening paren
+ my $Kp = $self->K_previous_code($KK);
+ return if ( !$Kp );
+ my $token_p = $rLL->[$Kp]->[_TOKEN_];
+ my $type_p = $rLL->[$Kp]->[_TYPE_];
+ return if ( $type_p ne ',' && $token_p ne '(' );
+ }
- ; # the value of this long logic sequence is the result we want
- return $result;
- } ## end sub is_essential_whitespace
-} ## end closure is_essential_whitespace
+ my $bad_name = $check_for_overlapping_variables->( $name, $KK );
+ return if ($bad_name);
-{ ## begin closure new_secret_operator_whitespace
+ # Store this lexical variable
+ my $rhash = $rblock_stack->[-1]->{rvars};
+ $rhash->{$name} = {
+ count => $my_starting_count,
+ line_index => $line_index,
+ keyword => $my_keyword,
+ package => $current_package,
+ K => $KK,
+ };
+ return;
+ }; ## end $checkin_new_lexical = sub
- my %secret_operators;
- my %is_leading_secret_token;
+ #--------------------------------------------------
+ # sub to update counts for a list of variable names
+ #--------------------------------------------------
+ my $update_use_count = sub {
+ my ( $sigil_string, $word, $bracket ) = @_;
- BEGIN {
+ # Given:
+ # $sigil_string = a string of leading sigils, like '$', '$$', '@$$'
+ # $word = the following bareword
+ # $bracket = a following array or hash bracket or brace, if any
+ # (token types '[' and 'L')
+ # Note: any braces around the bareword must have been stripped
+ # by the caller
+ # Task:
+ # Form the hash key ($word, @word, or %word) and update the count
+
+ return unless ($check_unused);
+
+ return unless ( defined($sigil_string) && defined($word) );
+
+ my $sigil = substr( $sigil_string, -1, 1 );
+ return unless ( $is_valid_sigil{$sigil} );
+
+ # Examples:
+ # input => key
+ # $var $var
+ # @var @var
+ # $var[ @var
+ # $var{ %var
+ # @$var $var
+ # ${var} $var (caller must remove the braces)
+ # @$var[0..2] $var
+ # @var[0..2] @var array slice
+ # @var{w1 w2} %var hash slice
+ # %var{w1 w2} %var hash slice
+
+ my $name;
+ if ( $bracket && length($sigil_string) == 1 ) {
+ if ( $bracket eq '{' ) { $sigil = '%' }
+ elsif ( $bracket eq '[' ) { $sigil = '@' }
+ else { }
+ }
+ $name = $sigil . $word;
+
+ foreach my $layer ( reverse( @{$rblock_stack} ) ) {
+ my $rvars = $layer->{rvars};
+ if ( $rvars->{$name} ) {
+ $rvars->{$name}->{count}++;
+ last;
+ }
+ }
+ return;
+ }; ## end $update_use_count = sub
+
+ my $checkin_new_constant = sub {
+ my ( $KK, $word ) = @_;
+ my $line_index = $rLL->[$KK]->[_LINE_INDEX_];
+ my $rvars = {
+ count => 0,
+ line_index => $line_index,
+ package => $current_package,
+ K => $KK,
+ };
+ $rconstant_hash->{$current_package}->{$word} = $rvars;
+ return;
+ }; ## end $checkin_new_constant = sub
- # token lists for perl secret operators as compiled by Philippe Bruhat
- # at: https://metacpan.org/module/perlsecret
- %secret_operators = (
- 'Goatse' => [qw#= ( ) =#], #=( )=
- 'Venus1' => [qw#0 +#], # 0+
- 'Venus2' => [qw#+ 0#], # +0
- 'Enterprise' => [qw#) x ! !#], # ()x!!
- 'Kite1' => [qw#~ ~ <>#], # ~~<>
- 'Kite2' => [qw#~~ <>#], # ~~<>
- 'Winking Fat Comma' => [ ( ',', '=>' ) ], # ,=>
- 'Bang bang ' => [qw#! !#], # !!
- );
+ my $push_new_EXPORT = sub {
+ my ( $KK, $package ) = @_;
- # The following operators and constants are not included because they
- # are normally kept tight by perltidy:
- # ~~ <~>
- #
+ # Save index of any @EXPORT and @EXPORT_OK lists
+ $package = $current_package unless ($package);
+ push @{ $rEXPORT_hash->{$package} }, $KK;
+ return;
+ }; ## end $push_new_EXPORT = sub
+
+ my $scan_use_vars = sub {
+ my ($KK) = @_;
+ my $Kn = $self->K_next_code($KK);
+ return unless ($Kn);
+ my $rlist = $self->expand_quoted_word_list($Kn);
+ return unless ($rlist);
+ my $line_index = $rLL->[$KK]->[_LINE_INDEX_];
+ $my_keyword = 'use vars';
+ foreach my $name ( @{$rlist} ) {
+ my $bad_name = $check_for_overlapping_variables->( $name, $KK );
+ next if ($bad_name);
+ my $rvars = {
+ line_index => $line_index,
+ package => $current_package,
+ K => $KK,
+ };
+ $ruse_vars_hash->{$current_package}->{$name} = $rvars;
+ }
+ return;
+ }; ## end $scan_use_vars = sub
+
+ my $scan_use_constant = sub {
+ my ($KK) = @_;
+ my $Kn = $self->K_next_code($KK);
+ return unless ($Kn);
+ my $type_n = $rLL->[$Kn]->[_TYPE_];
+ my $token_n = $rLL->[$Kn]->[_TOKEN_];
+
+ # version?
+ if ( $type_n eq 'n' || $type_n eq 'v' ) {
+ $Kn = $self->K_next_code($Kn);
+ $type_n = $rLL->[$Kn]->[_TYPE_];
+ $token_n = $rLL->[$Kn]->[_TOKEN_];
+ }
+
+ if ( $token_n eq '(' ) {
+ $Kn = $self->K_next_code($Kn);
+ $type_n = $rLL->[$Kn]->[_TYPE_];
+ $token_n = $rLL->[$Kn]->[_TOKEN_];
+ }
+
+ # use constant _meth1_=>1;
+ if ( $type_n eq 'w' ) {
+ $checkin_new_constant->( $Kn, $token_n );
+ }
+
+ # use constant '_meth1_',1;
+ elsif ( $type_n eq 'Q' ) {
+
+ # don't try to handle anything strange
+ if ( length($token_n) < 3 ) { return }
+ my $name = substr( $token_n, 1, -1 );
+ $checkin_new_constant->( $Kn, $name );
+ }
+
+ # use constant qw(_meth2_ 2);
+ elsif ( $type_n eq 'q' ) {
+ my $name;
+ if ( $token_n =~ /qw\s*.(\w+)/ ) {
+ $name = $1;
+ $checkin_new_constant->( $Kn, $name );
+ }
+ }
+
+ # A hash ref with multiple definitions:
+ # use constant { _meth3_=>3, _meth4_=>4};
+ # use constant { '_meth3_',3, '_meth4_',4};
+ elsif ( $type_n eq '{' && $token_n eq '{' ) {
+ my $seqno_n = $rLL->[$Kn]->[_TYPE_SEQUENCE_];
+ return unless $seqno_n;
+ my $Kc = $self->[_K_closing_container_]->{$seqno_n};
+ return unless $Kc;
+
+ # loop to collect constants in hash ref
+ my $Knn = $self->K_next_code($Kn);
+ my $total_comma_count = 0;
+ my $last_type = ',';
+ my $level_start = $rLL->[$Knn]->[_LEVEL_];
+
+ foreach my $Kx ( $Knn .. $Kc - 1 ) {
+ my $type = $rLL->[$Kx]->[_TYPE_];
+ my $token = $rLL->[$Kx]->[_TOKEN_];
+ next if ( $type eq 'b' || $type eq '#' );
+ my $level = $rLL->[$Kx]->[_LEVEL_];
+ next if ( $level > $level_start );
+ if ( $level < $level_start ) {
+ ## shouldn't happen
+ my $lno = $rLL->[$Kx]->[_LINE_INDEX_] + 1;
+ DEBUG_USE_CONSTANT
+ && Fault("$lno: level=$level > start=$level_start\n");
+ return;
+ }
+ if ( $last_type eq ',' && !( $total_comma_count % 2 ) ) {
+ if ( $type eq 'w' ) {
+ $checkin_new_constant->( $Kx, $token );
+ }
+ elsif ( $type eq 'Q' ) {
+ if ( length($token) < 3 ) { return }
+ my $name = substr( $token, 1, -1 );
+ $checkin_new_constant->( $Kx, $name );
+ }
+ else {
+ my $lno = $rLL->[$Kx]->[_LINE_INDEX_] + 1;
+ DEBUG_USE_CONSTANT
+ && Fault(
+ "$lno: unexpected type: type=$type token=$token\n");
+ return;
+ }
+ }
+ else {
+ if ( $type eq ',' || $type eq '=>' ) {
+ $total_comma_count++;
+ }
+ }
+ $last_type = $type;
+ }
+ }
- # Make a lookup table indexed by the first token of each operator:
- # first token => [list, list, ...]
- foreach my $value ( values(%secret_operators) ) {
- my $tok = $value->[0];
- push @{ $is_leading_secret_token{$tok} }, $value;
+ elsif ( $type_n eq ';' ) {
+
+ }
+
+ else {
+ my $ln = $rLL->[$KK]->[_LINE_INDEX_] + 1;
+ DEBUG_USE_CONSTANT && Fault("$ln: unknown use constant syntax\n");
}
- } ## end BEGIN
+ return;
+ }; ## end $scan_use_constant = sub
+
+ my $update_constant_count = sub {
+ my ( $KK, $word ) = @_;
+ if ( !defined($word) ) { $word = $rLL->[$KK]->[_TOKEN_] }
+ my $package = $current_package;
+ my $pos = rindex( $word, '::' );
+ if ( $pos >= 0 ) {
+ $package = $pos > 0 ? substr( $word, 0, $pos ) : 'main';
+ $word = substr( $word, $pos + 2 );
+ }
+ return if ( !defined( $rconstant_hash->{$package} ) );
+ my $rvars = $rconstant_hash->{$package}->{$word};
+ return if ( !defined($rvars) );
+ return if ( $KK <= $rvars->{K} );
+ $rvars->{count}++;
+ return;
+ }; ## end $update_constant_count = sub
- sub new_secret_operator_whitespace {
+ #-----------------------------------------------
+ # sub to check for zero counts when stack closes
+ #-----------------------------------------------
+ my $check_for_unused_names = sub {
+ my ($rhash) = @_;
+ foreach my $name ( keys %{$rhash} ) {
+ my $entry = $rhash->{$name};
+ my $count = $entry->{count};
+ my $keyword = $entry->{keyword};
+
+ if ( !$count ) {
+
+ # typically global vars are for external access so we
+ # do not report them as type 'u' (unused)
+ next if ( $keyword eq 'our' || $keyword eq 'use vars' );
+
+ push @warnings,
+ {
+ name => $name,
+ keyword => $entry->{keyword},
+ note => EMPTY_STRING,
+ see_line => 0,
+ line_number => $entry->{line_index} + 1,
+ letter => 'u',
+ K => $entry->{K},
+ };
+ }
+ }
+ return;
+ }; ## end $check_for_unused_names = sub
+
+ #---------------------------------------
+ # sub to scan interpolated text for vars
+ #---------------------------------------
+ my $scan_quoted_text = sub {
+ my ($text) = @_;
+ return unless ($check_unused);
+
+ # Looking for something like $word, @word, $word[, $$word, ${word}, ..
+ while ( $text =~ / ([\$\@] [\$]*) \{?(\w+)\}? ([\[\{]?) /gcx ) {
+ ## ------1------ -2- ---3---
+ my $sigil_string = $1;
+ my $word = $2;
+ my $brace = $3;
+ $update_use_count->( $sigil_string, $word, $brace );
+ } ## end while ( $text =~ ...)
+ return;
+ }; ## end $scan_quoted_text = sub
- my ( $rlong_array, $rwhitespace_flags ) = @_;
+ #-------------------------------------------------------------
+ # sub to find the next opening brace seqno of an if-elsif- chain
+ #-------------------------------------------------------------
+ my $push_next_if_chain = sub {
+ my ( $KK, $rpopped_vars ) = @_;
- # Loop over all tokens in this line
- my ( $token, $type );
- my $jmax = @{$rlong_array} - 1;
- foreach my $j ( 0 .. $jmax ) {
+ # Given:
+ # $KK = index of a closing block brace of if/unless/elsif chain
+ # $rpopped_vars = values just popped off the stack
+ # Task:
+ # - do nothing if chain ends, or
+ # - push $rpopped_vars onto the next block in the chain
+
+ # $seqno_block = sequence number of next opening block in the chain,
+ my $seqno_block;
+ my $K_n = $self->K_next_code($KK);
+ return unless ($K_n);
+ return unless ( $rLL->[$K_n]->[_TYPE_] eq 'k' );
+
+ # For an 'elsif' the brace will be after the closing paren
+ # 'elsif (..) { ... }'
+ # ^ ^
+ # | |
+ # --$KK --$seqno of brace that we want
+ #
+ if ( $rLL->[$K_n]->[_TOKEN_] eq 'elsif' ) {
+ ( $seqno_block, my $K_last_iterator_uu ) =
+ $self->block_seqno_of_paren_keyword($K_n);
+ }
- $token = $rlong_array->[$j]->[_TOKEN_];
- $type = $rlong_array->[$j]->[_TYPE_];
+ # For an 'else' the brace will be the next token
+ # 'else { ... }'
+ # ^ ^
+ # | |
+ # --$KK --$seqno of brace that we want
+ #
+ elsif ( $rLL->[$K_n]->[_TOKEN_] eq 'else' ) {
+ my $K_nn = $self->K_next_code($K_n);
+ if ( $K_nn
+ && $is_opening_token{ $rLL->[$K_nn]->[_TOKEN_] } )
+ {
+ $seqno_block = $rLL->[$K_nn]->[_TYPE_SEQUENCE_];
+ }
+ }
- # Skip unless this token might start a secret operator
- next if ( $type eq 'b' );
- next unless ( $is_leading_secret_token{$token} );
+ else {
+ # chain ends if no elsif/else block
+ }
- # Loop over all secret operators with this leading token
- foreach my $rpattern ( @{ $is_leading_secret_token{$token} } ) {
- my $jend = $j - 1;
- foreach my $tok ( @{$rpattern} ) {
- $jend++;
- $jend++
+ if ( $seqno_block
+ && $rblock_type_of_seqno->{$seqno_block} )
+ {
+ $push_block_stack->( $seqno_block, $rpopped_vars );
+ }
+ return;
+ }; ## end $push_next_if_chain = sub
- if ( $jend <= $jmax
- && $rlong_array->[$jend]->[_TYPE_] eq 'b' );
- if ( $jend > $jmax
- || $tok ne $rlong_array->[$jend]->[_TOKEN_] )
- {
- $jend = undef;
- last;
- }
- }
+ my $scan_braced_id = sub {
+ my ($KK) = @_;
- if ($jend) {
+ # We are at an opening brace and looking for something like this:
+ # @{word}[@var]
+ # ${word}
+ # ^
+ # |
+ # -- $KK
- # set flags to prevent spaces within this operator
- foreach my $jj ( $j + 1 .. $jend ) {
- $rwhitespace_flags->[$jj] = WS_NO;
- }
- $j = $jend;
- last;
- }
- } ## End Loop over all operators
- } ## End loop over all tokens
- return;
- } ## end sub new_secret_operator_whitespace
-} ## end closure new_secret_operator_whitespace
+ return unless ($check_unused);
-{ ## begin closure set_bond_strengths
+ # Look back for the sigil
+ my $Kp = $self->K_previous_code($KK);
- # These routines and variables are involved in deciding where to break very
- # long lines.
+ return unless ( defined($Kp) && $rLL->[$Kp]->[_TYPE_] eq 't' );
+ my $sigil_string = $rLL->[$Kp]->[_TOKEN_];
- my %is_good_keyword_breakpoint;
- my %is_lt_gt_le_ge;
- my %is_container_token;
+ # Look forward for the bareword
+ my $Kn = $self->K_next_code($KK);
+ return unless ( defined($Kn) && $rLL->[$Kn]->[_TYPE_] eq 'w' );
+ my $word = $rLL->[$Kn]->[_TOKEN_];
- my %binary_bond_strength_nospace;
- my %binary_bond_strength;
- my %nobreak_lhs;
- my %nobreak_rhs;
+ # Look forward for the closing brace
+ my $Knn = $self->K_next_code($Kn);
+ return unless ( defined($Knn) && $rLL->[$Knn]->[_TYPE_] eq 'R' );
- my @bias_tokens;
- my %bias_hash;
- my %bias;
- my $delta_bias;
+ # Look forward for a possible { or [
+ my $bracket;
+ my $Knnn = $self->K_next_code($Knn);
+ if ( defined($Knnn) ) {
+ my $next_type = $rLL->[$Knnn]->[_TYPE_];
+ if ( $next_type eq 'L' || $next_type eq '[' ) {
+ $bracket = $rLL->[$Knnn]->[_TOKEN_];
+ }
+ }
+ $update_use_count->( $sigil_string, $word, $bracket );
+ return;
+ }; ## end $scan_braced_id = sub
+
+ my $check_sub_signature = sub {
+ my ($KK) = @_;
+
+ # looking for a sub signature
+ # sub xxx (...) {
+ # -------
+ # | | | |
+ # $KK $Kn | |
+ # $K_opening_brace
+
+ # Note: this version cannot handle signatures within signatures.
+ # Inner signatures are currently ignored. For example, only the
+ # outermost $a below will be checked in this line:
+
+ # sub xyz ($a = sub ($a) { $a."z" }) { $a->("a")."y" }
+
+ # What happens is that variable $K_end_my is set by the first
+ # signature, and the second signature is within it and so does
+ # not get activated. A stack scheme would be necessary to handle
+ # this, but does not seem necessary because this probably only
+ # occurs in test code, and the only downside is that we limit
+ # some checking.
+
+ my $Kn = $self->K_next_code($KK);
+ return unless ( $rLL->[$Kn]->[_TOKEN_] eq '(' );
+ my $seqno_paren = $rLL->[$Kn]->[_TYPE_SEQUENCE_];
+ return unless ($seqno_paren);
+ my $K_closing_paren = $self->[_K_closing_container_]->{$seqno_paren};
+ my $K_opening_brace = $self->K_next_code($K_closing_paren);
+ return unless ($K_opening_brace);
+ my $seqno_brace = $rLL->[$K_opening_brace]->[_TYPE_SEQUENCE_];
+ my $token_brace = $rLL->[$K_opening_brace]->[_TOKEN_];
+ return unless ( $seqno_brace && $token_brace eq '{' );
+
+ # Treat signature variables like my variables
+ # Create special block on the stack..see note above for
+ # $is_if_unless
+ if ( $K_opening_brace > $K_end_my ) {
+ $K_end_my = $K_opening_brace;
+ $my_keyword = 'sub signature';
+ $in_signature_seqno = $seqno_paren;
+ $push_block_stack->($seqno_brace);
+ }
+ return;
+ }; ## end $check_sub_signature = sub
- sub initialize_bond_strength_hashes {
+ my $rkeyword_count = {};
+ my $rline_type_count = {};
- my @q;
- @q = qw(if unless while until for foreach);
- @is_good_keyword_breakpoint{@q} = (1) x scalar(@q);
+ #--------------------
+ # Loop over all lines
+ #--------------------
+ my $ix_line = -1;
+ foreach my $line_of_tokens ( @{$rlines} ) {
+ $ix_line++;
+ my $line_type = $line_of_tokens->{_line_type};
+ if ( $line_type ne 'CODE' ) {
+ $rline_type_count->{$line_type}++;
+ next;
+ }
- @q = qw(lt gt le ge);
- @is_lt_gt_le_ge{@q} = (1) x scalar(@q);
+ my ( $Kfirst, $Klast ) = @{ $line_of_tokens->{_rK_range} };
+ next unless defined($Kfirst);
- @q = qw/ ( [ { } ] ) /;
- @is_container_token{@q} = (1) x scalar(@q);
+ #----------------------------------
+ # Loop over all tokens on this line
+ #----------------------------------
+ foreach my $KK ( $Kfirst .. $Klast ) {
+ my $type = $rLL->[$KK]->[_TYPE_];
+ next if ( $type eq 'b' || $type eq '#' );
+ my $token = $rLL->[$KK]->[_TOKEN_];
+ my $seqno = $rLL->[$KK]->[_TYPE_SEQUENCE_];
- # The decision about where to break a line depends upon a "bond
- # strength" between tokens. The LOWER the bond strength, the MORE
- # likely a break. A bond strength may be any value but to simplify
- # things there are several pre-defined strength levels:
+ if ($seqno) {
+ my $block_type;
+ $block_type = $rblock_type_of_seqno->{$seqno} if ($seqno);
- # NO_BREAK => 10000;
- # VERY_STRONG => 100;
- # STRONG => 2.1;
- # NOMINAL => 1.1;
- # WEAK => 0.8;
- # VERY_WEAK => 0.55;
+ my $is_on_stack = ( $seqno == $rblock_stack->[-1]->{seqno} );
- # The strength values are based on trial-and-error, and need to be
- # tweaked occasionally to get desired results. Some comments:
- #
- # 1. Only relative strengths are important. small differences
- # in strengths can make big formatting differences.
- # 2. Each indentation level adds one unit of bond strength.
- # 3. A value of NO_BREAK makes an unbreakable bond
- # 4. A value of VERY_WEAK is the strength of a ','
- # 5. Values below NOMINAL are considered ok break points.
- # 6. Values above NOMINAL are considered poor break points.
- #
- # The bond strengths should roughly follow precedence order where
- # possible. If you make changes, please check the results very
- # carefully on a variety of scripts. Testing with the -extrude
- # options is particularly helpful in exercising all of the rules.
+ if ( $is_opening_token{$token} ) {
- # Wherever possible, bond strengths are defined in the following
- # tables. There are two main stages to setting bond strengths and
- # two types of tables:
- #
- # The first stage involves looking at each token individually and
- # defining left and right bond strengths, according to if we want
- # to break to the left or right side, and how good a break point it
- # is. For example tokens like =, ||, && make good break points and
- # will have low strengths, but one might want to break on either
- # side to put them at the end of one line or beginning of the next.
- #
- # The second stage involves looking at certain pairs of tokens and
- # defining a bond strength for that particular pair. This second
- # stage has priority.
+ # always push a block
+ if ($block_type) {
- #---------------------------------------------------------------
- # Bond Strength BEGIN Section 1.
- # Set left and right bond strengths of individual tokens.
- #---------------------------------------------------------------
+ # exit signature if we will push a duplicate block
+ if ( $in_signature_seqno
+ && @{$rblock_stack}
+ && $seqno == $rblock_stack->[-1]->{seqno} )
+ {
+ $in_signature_seqno = 0;
+ }
- # NOTE: NO_BREAK's set in this section first are HINTS which will
- # probably not be honored. Essential NO_BREAKS's should be set in
- # BEGIN Section 2 or hardwired in the NO_BREAK coding near the end
- # of this subroutine.
+ $push_block_stack->($seqno);
- # Note that we are setting defaults in this section. The user
- # cannot change bond strengths but can cause the left and right
- # bond strengths of any token type to be swapped through the use of
- # the -wba and -wbb flags. In this way the user can determine if a
- # breakpoint token should appear at the end of one line or the
- # beginning of the next line.
+ # update sub count for cross-package checks
+ if ( $ris_sub_block->{$seqno} ) {
+ $sub_count_by_package{$current_package}++;
+ }
+ }
- %right_bond_strength = ();
- %left_bond_strength = ();
- %binary_bond_strength_nospace = ();
- %binary_bond_strength = ();
- %nobreak_lhs = ();
- %nobreak_rhs = ();
+ # look for something like @{word} etc
+ if ( $type eq 'L' ) {
+ $scan_braced_id->($KK);
+ }
+ }
- # The hash keys in this section are token types, plus the text of
- # certain keywords like 'or', 'and'.
+ elsif ( $is_closing_token{$token} ) {
- # no break around possible filehandle
- $left_bond_strength{'Z'} = NO_BREAK;
- $right_bond_strength{'Z'} = NO_BREAK;
+ # always pop the stack if this token is on the stack
+ if ($is_on_stack) {
+ my $stack_item = pop @{$rblock_stack};
+ my $rpopped_vars = $stack_item->{rvars};
- # never put a bare word on a new line:
- # example print (STDERR, "bla"); will fail with break after (
- $left_bond_strength{'w'} = NO_BREAK;
+ # if we popped a block token
+ if ($block_type) {
- # blanks always have infinite strength to force breaks after
- # real tokens
- $right_bond_strength{'b'} = NO_BREAK;
+ # the current package gets updated at a block end
+ $current_package = $stack_item->{package};
- # try not to break on exponentiation
- @q = qw# ** .. ... <=> #;
- @left_bond_strength{@q} = (STRONG) x scalar(@q);
- @right_bond_strength{@q} = (STRONG) x scalar(@q);
+ # Check for unused vars if requested
+ if ( $check_unused && $rpopped_vars ) {
+ $check_for_unused_names->($rpopped_vars);
+ }
- # The comma-arrow has very low precedence but not a good break point
- $left_bond_strength{'=>'} = NO_BREAK;
- $right_bond_strength{'=>'} = NOMINAL;
+ # Check for and propagate an if-chain control layer,
+ # which will have the same seqno.
+ if ( @{$rblock_stack}
+ && $seqno == $rblock_stack->[-1]->{seqno} )
+ {
- # ok to break after label
- $left_bond_strength{'J'} = NO_BREAK;
- $right_bond_strength{'J'} = NOMINAL;
- $left_bond_strength{'j'} = STRONG;
- $right_bond_strength{'j'} = STRONG;
- $left_bond_strength{'A'} = STRONG;
- $right_bond_strength{'A'} = STRONG;
+ # pop again
+ $stack_item = pop @{$rblock_stack};
+ $rpopped_vars = $stack_item->{rvars};
- $left_bond_strength{'->'} = STRONG;
- $right_bond_strength{'->'} = VERY_STRONG;
+ # Check unused vars
+ # - except for vars in an if-chain control layer
+ # because they are involved in logic
+ if ( $check_unused
+ && $rpopped_vars
+ && !$is_if_unless_elsif_else{$block_type} )
+ {
+ $check_for_unused_names->($rpopped_vars);
+ }
- $left_bond_strength{'CORE::'} = NOMINAL;
- $right_bond_strength{'CORE::'} = NO_BREAK;
+ # propagate control layer along if chain
+ if ( $is_if_unless_elsif{$block_type} ) {
+ $push_next_if_chain->( $KK, $rpopped_vars );
+ }
+ }
+ }
- # breaking AFTER modulus operator is ok:
- @q = qw< % >;
- @left_bond_strength{@q} = (STRONG) x scalar(@q);
- @right_bond_strength{@q} =
- ( 0.1 * NOMINAL + 0.9 * STRONG ) x scalar(@q);
+ # error if we just popped a non-block token:
+ else {
+ my $K_n = $self->K_next_code($KK);
+ my $token_n = $rLL->[$K_n]->[_TOKEN_];
+ my $lno = $ix_line + 1;
+ DEVEL_MODE && Fault(<<EOM);
+Non-block closing token '$token' on stack followed by token $token_n at line $lno
+Expecting to find an opening token here.
+EOM
+ }
+ }
- # Break AFTER math operators * and /
- @q = qw< * / x >;
- @left_bond_strength{@q} = (STRONG) x scalar(@q);
- @right_bond_strength{@q} = (NOMINAL) x scalar(@q);
+ # if not on the stack: error if this is a block
+ elsif ($block_type) {
+ my $lno = $ix_line + 1;
+ my $stack_seqno = $rblock_stack->[-1]->{seqno};
+ DEVEL_MODE
+ && Fault(
+"stack error: seqno=$seqno ne $stack_seqno near line $lno\n"
+ );
- # Break AFTER weakest math operators + and -
- # Make them weaker than * but a bit stronger than '.'
- @q = qw< + - >;
- @left_bond_strength{@q} = (STRONG) x scalar(@q);
- @right_bond_strength{@q} =
- ( 0.91 * NOMINAL + 0.09 * WEAK ) x scalar(@q);
+ # give up - file may be unbalanced
+ return;
+ }
+ else {
+ # not a block, not on stack: nothing to do
+ }
+ }
+ else {
+ # ternary
+ }
+ }
- # Define left strength of unary plus and minus (fixes case b511)
- $left_bond_strength{p} = $left_bond_strength{'+'};
- $left_bond_strength{m} = $left_bond_strength{'-'};
+ #----------
+ # a keyword
+ #----------
+ elsif ( $type eq 'k' ) {
- # And make right strength of unary plus and minus very high.
- # Fixes cases b670 b790
- $right_bond_strength{p} = NO_BREAK;
- $right_bond_strength{m} = NO_BREAK;
+ #----------------------------------------------
+ # look for lexical keyword 'my', 'state', 'our'
+ #----------------------------------------------
+ if ( $is_my_state_our{$token} ) {
+ $my_keyword = $token;
+
+ # Set '$K_end_my' to be the last $K index of the variables
+ # controlled by this 'my' keyword
+ my $Kn = $self->K_next_code($KK);
+ $K_end_my = $Kn;
+ if ( $Kn && $rLL->[$Kn]->[_TOKEN_] eq '(' ) {
+ my $seqno_next = $rLL->[$Kn]->[_TYPE_SEQUENCE_];
+ $K_end_my = $K_closing_container->{$seqno_next};
+ }
- # breaking BEFORE these is just ok:
- @q = qw# >> << #;
- @right_bond_strength{@q} = (STRONG) x scalar(@q);
- @left_bond_strength{@q} = (NOMINAL) x scalar(@q);
+ # Get initial count
+ $my_starting_count = 0;
+ my $K_last_code = $self->K_previous_code($KK);
+ if ( defined($K_last_code) ) {
+ my $last_type = $rLL->[$K_last_code]->[_TYPE_];
- # breaking before the string concatenation operator seems best
- # because it can be hard to see at the end of a line
- $right_bond_strength{'.'} = STRONG;
- $left_bond_strength{'.'} = 0.9 * NOMINAL + 0.1 * WEAK;
+ # A preceding \ implies that this memory can be used
+ # even if the variable name does not appear again.
+ # For example: return \my $string_buf;
+ if ( $last_type eq '\\' ) { $my_starting_count = 1 }
+ }
+ }
- @q = qw< } ] ) R >;
- @left_bond_strength{@q} = (STRONG) x scalar(@q);
- @right_bond_strength{@q} = (NOMINAL) x scalar(@q);
+ #--------------------------------------------------
+ # look for certain keywords which introduce blocks:
+ # such as 'for my $var (..) { ... }'
+ #--------------------------------------------------
+ elsif ( $is_if_unless_while_until_for_foreach{$token} ) {
+ my ( $seqno_brace, $K_end_iterator ) =
+ $self->block_seqno_of_paren_keyword($KK);
+ if ($seqno_brace) {
+
+ # Found the brace. Mark an iterator as a new lexical
+ # variable in order to catch something like:
+ # my $i;
+ # foreach $i(...) { }
+ # where the iterator $i is not the same as the first
+ # $i, We should be beyond any existing $K_end_my, but
+ # check anyway:
+ if ( $K_end_iterator && $K_end_iterator > $K_end_my ) {
+ $K_end_my = $K_end_iterator;
+ $my_keyword = $token;
+ }
- # make these a little weaker than nominal so that they get
- # favored for end-of-line characters
- @q = qw< != == =~ !~ ~~ !~~ >;
- @left_bond_strength{@q} = (STRONG) x scalar(@q);
- @right_bond_strength{@q} =
- ( 0.9 * NOMINAL + 0.1 * WEAK ) x scalar(@q);
+ # Variables created between these keywords and their
+ # opening brace have special scope rules. We will
+ # create a special 'control layer' stack entry for them
+ # here, with the same block sequence number. When the
+ # closing block brace arrives, it will look for a
+ # duplicate stack entry and either close it or,
+ # for if-elsif-else chain, propagate it onward.
+ $push_block_stack->($seqno_brace);
+ }
+ }
+ elsif ( $token eq 'sub' ) {
+ $check_sub_signature->($KK);
+ }
+ else {
+ $rkeyword_count->{$token}++;
+ }
+ }
- # break AFTER these
- @q = qw# < > | & >= <= #;
- @left_bond_strength{@q} = (VERY_STRONG) x scalar(@q);
- @right_bond_strength{@q} =
- ( 0.8 * NOMINAL + 0.2 * WEAK ) x scalar(@q);
+ #--------------
+ # an identifier
+ #--------------
+ elsif ( $type eq 'i' || $type eq 'Z' ) {
- # breaking either before or after a quote is ok
- # but bias for breaking before a quote
- $left_bond_strength{'Q'} = NOMINAL;
- $right_bond_strength{'Q'} = NOMINAL + 0.02;
- $left_bond_strength{'q'} = NOMINAL;
- $right_bond_strength{'q'} = NOMINAL;
+ # Still collecting 'my' vars?
+ if ( $KK <= $K_end_my ) {
+ $checkin_new_lexical->($KK);
+ }
- # starting a line with a keyword is usually ok
- $left_bond_strength{'k'} = NOMINAL;
+ # Not collecting 'my' vars - update counts
+ elsif ( $check_unused || $check_constant ) {
- # we usually want to bond a keyword strongly to what immediately
- # follows, rather than leaving it stranded at the end of a line
- $right_bond_strength{'k'} = STRONG;
+ my $sigil_string = EMPTY_STRING;
+ my $word = EMPTY_STRING;
- $left_bond_strength{'G'} = NOMINAL;
- $right_bond_strength{'G'} = STRONG;
+ # The regex below will match numbers, like '$34x', but that
+ # should not be a problem because it will not match a hash
+ # key.
+ if ( $token =~ /^(\W+)?(\w.*)$/ ) {
+ $sigil_string = $1 if ($1);
+ $word = $2;
- # assignment operators
- @q = qw(
- = **= += *= &= <<= &&=
- -= /= |= >>= ||= //=
- .= %= ^=
- x=
- );
+ if ( $check_constant && $word ) {
- # Default is to break AFTER various assignment operators
- @left_bond_strength{@q} = (STRONG) x scalar(@q);
- @right_bond_strength{@q} =
- ( 0.4 * WEAK + 0.6 * VERY_WEAK ) x scalar(@q);
+ # look for constant invoked like '&ORD' or '->ORD'
+ if ( !$sigil_string || $sigil_string eq '&' ) {
+ $update_constant_count->( $KK, $word );
+ }
+ elsif ( $sigil_string eq '@'
+ && index( $word, 'EXPORT' ) >= 0 )
+ {
+ # Looking for stuff like:
+ # @EXPORT_OK
+ # @ALPHA::BETA::EXPORT
+ my $package = $current_package;
+ my $name = $word;
+ my $pos = rindex( $word, '::' );
+ if ( $pos >= 0 ) {
+ $package = substr( $word, 0, $pos );
+ $name = substr( $word, $pos + 2 );
+ }
+ if ( $name eq 'EXPORT' || $name eq 'EXPORT_OK' )
+ {
+ $push_new_EXPORT->( $KK, $package );
+ }
+ }
+ else { }
+ }
- # Default is to break BEFORE '&&' and '||' and '//'
- # set strength of '||' to same as '=' so that chains like
- # $a = $b || $c || $d will break before the first '||'
- $right_bond_strength{'||'} = NOMINAL;
- $left_bond_strength{'||'} = $right_bond_strength{'='};
+ if ($sigil_string) {
+ my $sigil = substr( $sigil_string, -1, 1 );
+ if ( !$is_valid_sigil{$sigil} ) {
+ $sigil_string = EMPTY_STRING;
+ $word = EMPTY_STRING;
+ }
+ }
+ }
- # same thing for '//'
- $right_bond_strength{'//'} = NOMINAL;
- $left_bond_strength{'//'} = $right_bond_strength{'='};
+ if ( $check_unused
+ && $sigil_string
+ && $word
+ && $word =~ /\w+/ )
+ {
- # set strength of && a little higher than ||
- $right_bond_strength{'&&'} = NOMINAL;
- $left_bond_strength{'&&'} = $left_bond_strength{'||'} + 0.1;
+ my $Kn = $self->K_next_code($KK);
+ my $bracket;
+ if ( defined($Kn) ) {
+ my $next_type = $rLL->[$Kn]->[_TYPE_];
+ if ( $next_type eq '[' || $next_type eq 'L' ) {
+ $bracket = $rLL->[$Kn]->[_TOKEN_];
+ }
+ }
+ $update_use_count->( $sigil_string, $word, $bracket );
+ }
+ }
+ else {
+ # ignore variable if not collecting 'my' or counts
+ }
+ }
- $left_bond_strength{';'} = VERY_STRONG;
- $right_bond_strength{';'} = VERY_WEAK;
- $left_bond_strength{'f'} = VERY_STRONG;
+ #----------------
+ # a sub statement
+ #----------------
+ elsif ( $type eq 'S' ) {
+ $check_sub_signature->($KK);
+ }
- # make right strength of for ';' a little less than '='
- # to make for contents break after the ';' to avoid this:
- # for ( $j = $number_of_fields - 1 ; $j < $item_count ; $j +=
- # $number_of_fields )
- # and make it weaker than ',' and 'and' too
- $right_bond_strength{'f'} = VERY_WEAK - 0.03;
+ #--------------------
+ # a package statement
+ #--------------------
+ elsif ( $type eq 'P' ) {
+ my ( $keyword, $package ) = split /\s+/, $token, 2;
- # The strengths of ?/: should be somewhere between
- # an '=' and a quote (NOMINAL),
- # make strength of ':' slightly less than '?' to help
- # break long chains of ? : after the colons
- $left_bond_strength{':'} = 0.4 * WEAK + 0.6 * NOMINAL;
- $right_bond_strength{':'} = NO_BREAK;
- $left_bond_strength{'?'} = $left_bond_strength{':'} + 0.01;
- $right_bond_strength{'?'} = NO_BREAK;
+ # keyword 'package' may be on a previous line
+ if ( !$package ) { $package = $keyword }
- $left_bond_strength{','} = VERY_STRONG;
- $right_bond_strength{','} = VERY_WEAK;
+ if ( $package ne $current_package ) {
+ $current_package = $package;
- # remaining digraphs and trigraphs not defined above
- @q = qw( :: <> ++ --);
- @left_bond_strength{@q} = (WEAK) x scalar(@q);
- @right_bond_strength{@q} = (STRONG) x scalar(@q);
+ # Look for lexical vars declared in other packages which
+ # will be accessible in this package. We will limit
+ # this check to new package statements at the top level
+ # in order to filter out some common cases.
+ if ( $check_cross_package && @{$rblock_stack} == 1 ) {
+ my $rpackage_warnings = $package_warnings{$package};
+ if ( !defined($rpackage_warnings) ) {
+ $rpackage_warnings = [];
+ $package_warnings{$package} = $rpackage_warnings;
+ }
+ foreach my $item ( @{$rblock_stack} ) {
+ my $rhash = $item->{rvars};
+ foreach my $name ( keys %{$rhash} ) {
+ my $entry = $rhash->{$name};
+ my $pkg = $entry->{package};
+ if ( $pkg ne $package ) {
+ my $lno = $ix_line + 1;
+ my $see_line = $lno;
+ my $note =
+"is accessible in later packages, see line $see_line";
+ push @{$rpackage_warnings},
+ {
+ name => $name,
+ keyword => $entry->{keyword},
+ note => $note,
+ see_line => $see_line,
+ line_number => $entry->{line_index} + 1,
+ letter => 'p',
+ K => $entry->{K},
+ };
+ }
+ }
+ }
+ }
+ }
+ }
- # Set bond strengths of certain keywords
- # make 'or', 'err', 'and' slightly weaker than a ','
- $left_bond_strength{'and'} = VERY_WEAK - 0.01;
- $left_bond_strength{'or'} = VERY_WEAK - 0.02;
- $left_bond_strength{'err'} = VERY_WEAK - 0.02;
- $left_bond_strength{'xor'} = VERY_WEAK - 0.01;
- $right_bond_strength{'and'} = NOMINAL;
- $right_bond_strength{'or'} = NOMINAL;
- $right_bond_strength{'err'} = NOMINAL;
- $right_bond_strength{'xor'} = NOMINAL;
+ #-----------
+ # a here doc
+ #-----------
+ elsif ( $type eq 'h' ) {
- #---------------------------------------------------------------
- # Bond Strength BEGIN Section 2.
- # Set binary rules for bond strengths between certain token types.
- #---------------------------------------------------------------
+ # scan here-doc if it is interpolated
+ if ( $check_unused && is_interpolated_here_doc($token) ) {
+ my $ix_HERE = max( $ix_HERE_END, $ix_line );
- # We have a little problem making tables which apply to the
- # container tokens. Here is a list of container tokens and
- # their types:
- #
- # type tokens // meaning
- # { {, [, ( // indent
- # } }, ], ) // outdent
- # [ [ // left non-structural [ (enclosing an array index)
- # ] ] // right non-structural square bracket
- # ( ( // left non-structural paren
- # ) ) // right non-structural paren
- # L { // left non-structural curly brace (enclosing a key)
- # R } // right non-structural curly brace
- #
- # Some rules apply to token types and some to just the token
- # itself. We solve the problem by combining type and token into a
- # new hash key for the container types.
- #
- # If a rule applies to a token 'type' then we need to make rules
- # for each of these 'type.token' combinations:
- # Type Type.Token
- # { {{, {[, {(
- # [ [[
- # ( ((
- # L L{
- # } }}, }], })
- # ] ]]
- # ) ))
- # R R}
- #
- # If a rule applies to a token then we need to make rules for
- # these 'type.token' combinations:
- # Token Type.Token
- # { {{, L{
- # [ {[, [[
- # ( {(, ((
- # } }}, R}
- # ] }], ]]
- # ) }), ))
+ # collect the here doc text
+ ( $ix_HERE_END, my $here_text ) =
+ $self->get_here_text($ix_HERE);
- # allow long lines before final { in an if statement, as in:
- # if (..........
- # ..........)
- # {
- #
- # Otherwise, the line before the { tends to be too short.
+ # scan the here-doc text
+ $scan_quoted_text->($here_text);
+ }
+ }
- $binary_bond_strength{'))'}{'{{'} = VERY_WEAK + 0.03;
- $binary_bond_strength{'(('}{'{{'} = NOMINAL;
+ #---------------------
+ # a quote of some type
+ #---------------------
+ elsif ( $type eq 'Q' ) {
- # break on something like '} (', but keep this stronger than a ','
- # example is in 'howe.pl'
- $binary_bond_strength{'R}'}{'(('} = 0.8 * VERY_WEAK + 0.2 * WEAK;
- $binary_bond_strength{'}}'}{'(('} = 0.8 * VERY_WEAK + 0.2 * WEAK;
+ # is this an interpolated quote?
+ my $interpolated;
+ if ( $KK == $Kfirst && $line_of_tokens->{_starting_in_quote} ) {
+ $interpolated = $in_interpolated_quote;
+ }
+ else {
- # keep matrix and hash indices together
- # but make them a little below STRONG to allow breaking open
- # something like {'some-word'}{'some-very-long-word'} at the }{
- # (bracebrk.t)
- $binary_bond_strength{']]'}{'[['} = 0.9 * STRONG + 0.1 * NOMINAL;
- $binary_bond_strength{']]'}{'L{'} = 0.9 * STRONG + 0.1 * NOMINAL;
- $binary_bond_strength{'R}'}{'[['} = 0.9 * STRONG + 0.1 * NOMINAL;
- $binary_bond_strength{'R}'}{'L{'} = 0.9 * STRONG + 0.1 * NOMINAL;
+ # is interpolated if it follow a match operator =~ or !~
+ my $K_last_code = $self->K_previous_code($KK);
+ if ( $K_last_code
+ && $is_re_match_op{ $rLL->[$K_last_code]->[_TYPE_] } )
+ {
+ $interpolated = 1;
+ }
- # increase strength to the point where a break in the following
- # will be after the opening paren rather than at the arrow:
- # $a->$b($c);
- $binary_bond_strength{'i'}{'->'} = 1.45 * STRONG;
+ # is not interpolated for leading operators: qw q tr y '
+ elsif ( $token =~ /^(qw | q[^qrx] | tr | [y\'] )/x ) {
+ $interpolated = 0;
+ }
- # Added for c140 to make 'w ->' and 'i ->' behave the same
- $binary_bond_strength{'w'}{'->'} = 1.45 * STRONG;
+ # is interpolated for everything else
+ else {
+ $interpolated = 1;
+ }
+ }
- # Note that the following alternative strength would make the break at the
- # '->' rather than opening the '('. Both have advantages and disadvantages.
- # $binary_bond_strength{'i'}{'->'} = 0.5*STRONG + 0.5 * NOMINAL; #
+ if ($interpolated) {
+ $scan_quoted_text->($token);
+ }
- $binary_bond_strength{'))'}{'->'} = 0.1 * STRONG + 0.9 * NOMINAL;
- $binary_bond_strength{']]'}{'->'} = 0.1 * STRONG + 0.9 * NOMINAL;
- $binary_bond_strength{'})'}{'->'} = 0.1 * STRONG + 0.9 * NOMINAL;
- $binary_bond_strength{'}]'}{'->'} = 0.1 * STRONG + 0.9 * NOMINAL;
- $binary_bond_strength{'}}'}{'->'} = 0.1 * STRONG + 0.9 * NOMINAL;
- $binary_bond_strength{'R}'}{'->'} = 0.1 * STRONG + 0.9 * NOMINAL;
+ if ( $KK == $Klast && $line_of_tokens->{_ending_in_quote} ) {
+ $in_interpolated_quote = $interpolated;
+ }
+ else {
+ $in_interpolated_quote = 0;
+ }
+ }
+ elsif ( $type eq 'w' ) {
+ if ( $token eq 'vars' ) {
+ my $Kp = $self->K_previous_code($KK);
+ if ( defined($Kp)
+ && $rLL->[$Kp]->[_TOKEN_] eq 'use'
+ && $rLL->[$Kp]->[_TYPE_] eq 'k' )
+ {
+ $scan_use_vars->($KK);
+ }
+ }
+ if ($check_constant) {
+ if ( $token eq 'constant' ) {
+ my $Kp = $self->K_previous_code($KK);
+ if ( defined($Kp)
+ && $rLL->[$Kp]->[_TOKEN_] eq 'use'
+ && $rLL->[$Kp]->[_TYPE_] eq 'k' )
+ {
+ $scan_use_constant->($KK);
+ }
+ else {
+ $update_constant_count->($KK);
+ }
+ }
+ else {
+ $update_constant_count->($KK);
+ }
+ }
+ }
+ elsif ( $type eq 'C' ) {
+ if ($check_constant) {
+ $update_constant_count->($KK);
+ }
+ }
+ elsif ( $type eq 'U' ) {
+ if ($check_constant) {
+ $update_constant_count->($KK);
+ }
+ }
+ else {
+ # skip all other token types
+ }
+ }
+ }
- $binary_bond_strength{'))'}{'[['} = 0.2 * STRONG + 0.8 * NOMINAL;
- $binary_bond_strength{'})'}{'[['} = 0.2 * STRONG + 0.8 * NOMINAL;
- $binary_bond_strength{'))'}{'{['} = 0.2 * STRONG + 0.8 * NOMINAL;
- $binary_bond_strength{'})'}{'{['} = 0.2 * STRONG + 0.8 * NOMINAL;
+ #----------
+ # Finish up
+ #----------
- #---------------------------------------------------------------
- # Binary NO_BREAK rules
- #---------------------------------------------------------------
+ # skip final 'c' and 'u' output if this appears to be a snippet
+ my $is_possible_snippet = $roption->{is_possible_snippet};
+ my $more_u_checks =
+ $check_unused
+ && @{$rblock_stack} == 1
+ && keys %{ $rblock_stack->[0]->{rvars} };
+ my $more_c_checks = $check_constant && keys %{$rconstant_hash};
- # use strict requires that bare word and => not be separated
- $binary_bond_strength{'C'}{'=>'} = NO_BREAK;
- $binary_bond_strength{'U'}{'=>'} = NO_BREAK;
+ if ( $is_possible_snippet
+ && ( $more_u_checks || $more_c_checks ) )
+ {
- # Never break between a bareword and a following paren because
- # perl may give an error. For example, if a break is placed
- # between 'to_filehandle' and its '(' the following line will
- # give a syntax error [Carp.pm]: my( $no) =fileno(
- # to_filehandle( $in)) ;
- $binary_bond_strength{'C'}{'(('} = NO_BREAK;
- $binary_bond_strength{'C'}{'{('} = NO_BREAK;
- $binary_bond_strength{'U'}{'(('} = NO_BREAK;
- $binary_bond_strength{'U'}{'{('} = NO_BREAK;
+ # the flag $is_possible_snippet = 0:No 1:Uncertain 2:Yes
+ if ( $is_possible_snippet == 1
+ && $self->is_complete_script( $rline_type_count, $rkeyword_count ) )
+ {
+ # not a snippet
+ }
- # use strict requires that bare word within braces not start new
- # line
- $binary_bond_strength{'L{'}{'w'} = NO_BREAK;
+ # is possible snippet: deactivate 'c' and 'u
+ else {
+ $check_unused = 0;
+ $check_constant = 0;
+ }
+ }
- $binary_bond_strength{'w'}{'R}'} = NO_BREAK;
+ if ( @{$rblock_stack} != 1 ) {
- # The following two rules prevent a syntax error caused by breaking up
- # a construction like '{-y}'. The '-' quotes the 'y' and prevents
- # it from being taken as a transliteration. We have to keep
- # token types 'L m w' together to prevent this error.
- $binary_bond_strength{'L{'}{'m'} = NO_BREAK;
- $binary_bond_strength_nospace{'m'}{'w'} = NO_BREAK;
+ # shouldn't happen for a balanced input file
+ DEVEL_MODE && Fault("stack error at end of scan\n");
+ }
+ else {
+ if ($check_unused) {
+ foreach my $item ( @{$rblock_stack} ) {
+ my $rhash = $item->{rvars};
+ $check_for_unused_names->($rhash);
+ }
+ }
+ }
- # keep 'bareword-' together, but only if there is no space between
- # the word and dash. Do not keep together if there is a space.
- # example 'use perl6-alpha'
- $binary_bond_strength_nospace{'w'}{'m'} = NO_BREAK;
+ if ($check_constant) {
+ my @warnings_c;
+ my %packages_with_warnings;
+ foreach my $package ( keys %{$rconstant_hash} ) {
+ my $rhash = $rconstant_hash->{$package};
+ next if ( !defined($rhash) );
+ foreach my $name ( keys %{$rhash} ) {
+ my $entry = $rconstant_hash->{$package}->{$name};
+ next if ( $entry->{count} );
+ push @warnings_c,
+ {
+ name => $name,
+ keyword => 'use constant',
+ see_line => EMPTY_STRING,
+ note => "appears unused in package $package",
+ line_number => $entry->{line_index} + 1,
+ letter => 'c',
+ package => $package,
+ K => $entry->{K},
+ };
+ $packages_with_warnings{$package} = 1;
+ }
+ }
- # use strict requires that bare word and => not be separated
- $binary_bond_strength{'w'}{'=>'} = NO_BREAK;
+ # filter out constants found in @EXPORT and @EXPORT_OK
+ if (@warnings_c) {
- # use strict does not allow separating type info from trailing { }
- # testfile is readmail.pl
- $binary_bond_strength{'t'}{'L{'} = NO_BREAK;
- $binary_bond_strength{'i'}{'L{'} = NO_BREAK;
+ # expand relevant EXPORT lists
+ my $rEXPORT_words_by_package = {};
+ foreach my $package ( keys %packages_with_warnings ) {
+ my $rKlist = $rEXPORT_hash->{$package};
+ next unless ($rKlist);
+ $rEXPORT_words_by_package->{$package} = {};
+ foreach my $KK ( @{$rKlist} ) {
+ $self->expand_EXPORT_list( $KK,
+ $rEXPORT_words_by_package->{$package} );
+ }
+ }
- # As a defensive measure, do not break between a '(' and a
- # filehandle. In some cases, this can cause an error. For
- # example, the following program works:
- # my $msg="hi!\n";
- # print
- # ( STDOUT
- # $msg
- # );
- #
- # But this program fails:
- # my $msg="hi!\n";
- # print
- # (
- # STDOUT
- # $msg
- # );
- #
- # This is normally only a problem with the 'extrude' option
- $binary_bond_strength{'(('}{'Y'} = NO_BREAK;
- $binary_bond_strength{'{('}{'Y'} = NO_BREAK;
+ # remove warnings in EXPORT lists
+ foreach my $rwarning (@warnings_c) {
+ my $package = $rwarning->{package};
+ my $name = $rwarning->{name};
+ my $rhash = $rEXPORT_words_by_package->{$package};
+ next if ( $rhash && $rhash->{$name} );
+ push @warnings, $rwarning;
+ }
+ }
+ }
- # never break between sub name and opening paren
- $binary_bond_strength{'w'}{'(('} = NO_BREAK;
- $binary_bond_strength{'w'}{'{('} = NO_BREAK;
+ # Merge package issues...
+ # Only include cross-package warnings for packages which created subs.
+ # This will limit this type of warning to significant package changes.
+ my @p_warnings;
+ foreach my $key ( keys %package_warnings ) {
+ next if ( !$sub_count_by_package{$key} );
+ push @p_warnings, @{ $package_warnings{$key} };
+ }
- # keep '}' together with ';'
- $binary_bond_strength{'}}'}{';'} = NO_BREAK;
+ # Remove duplicate package warnings for the same initial line, which can
+ # happen if there were multiple packages.
+ if (@p_warnings) {
+ my %seen;
- # Breaking before a ++ can cause perl to guess wrong. For
- # example the following line will cause a syntax error
- # with -extrude if we break between '$i' and '++' [fixstyle2]
- # print( ( $i++ & 1 ) ? $_ : ( $change{$_} || $_ ) );
- $nobreak_lhs{'++'} = NO_BREAK;
+ # sort on package warning line order
+ @p_warnings = sort { $a->{see_line} <=> $b->{see_line} } @p_warnings;
- # Do not break before a possible file handle
- $nobreak_lhs{'Z'} = NO_BREAK;
+ # use first package warning for a given variable
+ foreach my $item (@p_warnings) {
+ my $key = $item->{line_number} . ':' . $item->{name};
+ next if ( $seen{$key}++ );
+ push @warnings, $item;
+ }
+ }
- # use strict hates bare words on any new line. For
- # example, a break before the underscore here provokes the
- # wrath of use strict:
- # if ( -r $fn && ( -s _ || $AllowZeroFilesize)) {
- $nobreak_rhs{'F'} = NO_BREAK;
- $nobreak_rhs{'CORE::'} = NO_BREAK;
+ if (@warnings) {
- # To prevent the tokenizer from switching between types 'w' and 'G' we
- # need to avoid breaking between type 'G' and the following code block
- # brace. Fixes case b929.
- $nobreak_rhs{G} = NO_BREAK;
+ # filter out certain common 'our' variables from all warnings
+ # because they are common and difficult to fix, and
+ # sort on token index and issue type
- #---------------------------------------------------------------
- # Bond Strength BEGIN Section 3.
- # Define tables and values for applying a small bias to the above
- # values.
- #---------------------------------------------------------------
- # Adding a small 'bias' to strengths is a simple way to make a line
- # break at the first of a sequence of identical terms. For
- # example, to force long string of conditional operators to break
- # with each line ending in a ':', we can add a small number to the
- # bond strength of each ':' (colon.t)
- @bias_tokens = qw( : && || f and or . ); # tokens which get bias
- %bias_hash = map { $_ => 0 } @bias_tokens;
- $delta_bias = 0.0001; # a very small strength level
- return;
+ my %is_exempted_global_name;
+ my @q = qw( $VERSION @EXPORT @EXPORT_OK %EXPORT_TAGS @ISA $AUTOLOAD );
+ @is_exempted_global_name{@q} = (1) x scalar(@q);
- } ## end sub initialize_bond_strength_hashes
+ @warnings =
+ sort { $a->{K} <=> $b->{K} || $a->{letter} cmp $b->{letter} }
- use constant DEBUG_BOND => 0;
+ # FIXME: this limitation may eventually just be for 'our' vars
+ # after 'use vars' coding is finalized
+ grep {
+ ( $_->{keyword} ne 'our' && $_->{keyword} ne 'use vars' )
+ || !$is_exempted_global_name{ $_->{name} }
+ } @warnings;
+ }
- sub set_bond_strengths {
+ return ( \@warnings, $issue_type_string );
+} ## end sub scan_variable_usage
- my ($self) = @_;
+sub dump_unusual_variables {
+ my ($self) = @_;
- #-----------------------------------------------------------------
- # Define a 'bond strength' for each token pair in an output batch.
- # See comments above for definition of bond strength.
- #-----------------------------------------------------------------
+ # process a --dump-unusual-variables(-duv) command
- my $rbond_strength_to_go = [];
+ my ( $rlines, $issue_type_string ) = $self->scan_variable_usage();
+ return unless ( $rlines && @{$rlines} );
- my $rLL = $self->[_rLL_];
- my $rK_weld_right = $self->[_rK_weld_right_];
- my $rK_weld_left = $self->[_rK_weld_left_];
- my $ris_list_by_seqno = $self->[_ris_list_by_seqno_];
+ my $input_stream_name = get_input_stream_name();
- # patch-its always ok to break at end of line
- $nobreak_to_go[$max_index_to_go] = 0;
+ # output for multiple types
+ my $output_string = <<EOM;
+$input_stream_name: output for --dump-unusual-variables
+$issue_type_string
+Line:Issue: Var: note
+EOM
+ foreach my $item ( @{$rlines} ) {
+ my $lno = $item->{line_number};
+ my $letter = $item->{letter};
+ my $keyword = $item->{keyword};
+ my $name = $item->{name};
+ my $note = $item->{note};
+ if ($note) { $note = ": $note" }
+ $output_string .= "$lno:$letter: $keyword $name$note\n";
+ }
+ print {*STDOUT} $output_string;
- # we start a new set of bias values for each line
- %bias = %bias_hash;
+ return;
+} ## end sub dump_unusual_variables
- my $code_bias = -.01; # bias for closing block braces
+sub initialize_warn_hash {
+ my ( $long_name, $default, $rall_opts ) = @_;
- my $type = 'b';
- my $token = SPACE;
- my $token_length = 1;
- my $last_type;
- my $last_nonblank_type = $type;
- my $last_nonblank_token = $token;
- my $list_str = $left_bond_strength{'?'};
+ # Given:
+ # $long_name = full option name
+ # $default = default value
+ # $rall_opts = all possible options
+ # Return the corresponding option hash
- my ( $bond_str_1, $bond_str_2, $bond_str_3, $bond_str_4 );
+ # Example of all possible options for --warn-variable-types=s
+ # r - reused scope
+ # s - reused sigil
+ # p - package boundaries crossed by lexical variables
+ # u - unused lexical variable defined by my, state, our
+ # c - unused constant defined by use constant
- my ( $block_type, $i_next, $i_next_nonblank, $next_nonblank_token,
- $next_nonblank_type, $next_token, $next_type,
- $total_nesting_depth, );
+ # Other warn options use different letters
- # main loop to compute bond strengths between each pair of tokens
- foreach my $i ( 0 .. $max_index_to_go ) {
- $last_type = $type;
- if ( $type ne 'b' ) {
- $last_nonblank_type = $type;
- $last_nonblank_token = $token;
- }
- $type = $types_to_go[$i];
+ # Other controls:
+ # 0 - none of the above
+ # 1 - all of the above
+ # * - all of the above
- # strength on both sides of a blank is the same
- if ( $type eq 'b' && $last_type ne 'b' ) {
- $rbond_strength_to_go->[$i] = $rbond_strength_to_go->[ $i - 1 ];
- $nobreak_to_go[$i] ||= $nobreak_to_go[ $i - 1 ]; # fix for b1257
- next;
- }
+ # Example:
+ # -wvt='s r' : do check types 's' and 'r'
- $token = $tokens_to_go[$i];
- $token_length = $token_lengths_to_go[$i];
- $block_type = $block_type_to_go[$i];
- $i_next = $i + 1;
- $next_type = $types_to_go[$i_next];
- $next_token = $tokens_to_go[$i_next];
- $total_nesting_depth = $nesting_depth_to_go[$i_next];
- $i_next_nonblank = ( ( $next_type eq 'b' ) ? $i + 2 : $i + 1 );
- $next_nonblank_type = $types_to_go[$i_next_nonblank];
- $next_nonblank_token = $tokens_to_go[$i_next_nonblank];
+ my $rwarn_hash = {};
- my $seqno = $type_sequence_to_go[$i];
- my $next_nonblank_seqno = $type_sequence_to_go[$i_next_nonblank];
+ if ( !$rall_opts || !@{$rall_opts} ) {
+ Fault("all_options is empty for call with option $long_name\n");
+ return $rwarn_hash;
+ }
- # We are computing the strength of the bond between the current
- # token and the NEXT token.
+ my $user_option_string = $rOpts->{$long_name};
+ if ( !defined($user_option_string) ) { $user_option_string = $default }
+ return $rwarn_hash unless ($user_option_string);
- #---------------------------------------------------------------
- # Bond Strength Section 1:
- # First Approximation.
- # Use minimum of individual left and right tabulated bond
- # strengths.
- #---------------------------------------------------------------
- my $bsr = $right_bond_strength{$type};
- my $bsl = $left_bond_strength{$next_nonblank_type};
+ my %is_valid_option;
+ @is_valid_option{ @{$rall_opts} } = (1) x scalar( @{$rall_opts} );
- # define right bond strengths of certain keywords
- if ( $type eq 'k' && defined( $right_bond_strength{$token} ) ) {
- $bsr = $right_bond_strength{$token};
- }
- elsif ( $token eq 'ne' or $token eq 'eq' ) {
- $bsr = NOMINAL;
- }
+ # allow comma separators
+ $user_option_string =~ s/,/ /g;
- # set terminal bond strength to the nominal value
- # this will cause good preceding breaks to be retained
- if ( $i_next_nonblank > $max_index_to_go ) {
- $bsl = NOMINAL;
+ my @opts = split_words($user_option_string);
+ return $rwarn_hash unless (@opts);
- # But weaken the bond at a 'missing terminal comma'. If an
- # optional comma is missing at the end of a broken list, use
- # the strength of a comma anyway to make formatting the same as
- # if it were there. Fixes issue c133.
- if ( !defined($bsr) || $bsr > VERY_WEAK ) {
- my $seqno_px = $parent_seqno_to_go[$max_index_to_go];
- if ( $ris_list_by_seqno->{$seqno_px} ) {
- my $KK = $K_to_go[$max_index_to_go];
- my $Kn = $self->K_next_nonblank($KK);
- my $seqno_n = $rLL->[$Kn]->[_TYPE_SEQUENCE_];
- if ( $seqno_n && $seqno_n eq $seqno_px ) {
- $bsl = VERY_WEAK;
- }
- }
- }
- }
+ # check a single item
+ if ( @opts == 1 ) {
+ my $opt = $opts[0];
- # define right bond strengths of certain keywords
- if ( $next_nonblank_type eq 'k'
- && defined( $left_bond_strength{$next_nonblank_token} ) )
- {
- $bsl = $left_bond_strength{$next_nonblank_token};
- }
- elsif ($next_nonblank_token eq 'ne'
- or $next_nonblank_token eq 'eq' )
- {
- $bsl = NOMINAL;
- }
- elsif ( $is_lt_gt_le_ge{$next_nonblank_token} ) {
- $bsl = 0.9 * NOMINAL + 0.1 * STRONG;
- }
+ # Split a single option of bundled letters like 'rsp' into 'r s p'
+ # but give a warning because this may not be allowed in the future
+ if ( length($opt) > 1 ) {
+ @opts = split //, $opt;
+ Warn("Please use space-separated letters in --$long_name\n");
+ }
+ elsif ( $opt eq '*' || $opt eq '1' ) {
+ @opts = keys %is_valid_option;
+ }
+ elsif ( $opt eq '0' ) {
+ return $rwarn_hash;
+ }
+ else {
+ # should be one of the allowed letters - catch any error below
+ }
+ }
- # Use the minimum of the left and right strengths. Note: it might
- # seem that we would want to keep a NO_BREAK if either token has
- # this value. This didn't work, for example because in an arrow
- # list, it prevents the comma from separating from the following
- # bare word (which is probably quoted by its arrow). So necessary
- # NO_BREAK's have to be handled as special cases in the final
- # section.
- if ( !defined($bsr) ) { $bsr = VERY_STRONG }
- if ( !defined($bsl) ) { $bsl = VERY_STRONG }
- my $bond_str = ( $bsr < $bsl ) ? $bsr : $bsl;
- $bond_str_1 = $bond_str if (DEBUG_BOND);
+ my $msg = EMPTY_STRING;
+ foreach my $opt (@opts) {
+ if ( $is_valid_option{$opt} ) {
+ $rwarn_hash->{$opt} = 1;
+ next;
+ }
- #---------------------------------------------------------------
- # Bond Strength Section 2:
- # Apply hardwired rules..
- #---------------------------------------------------------------
+ # invalid option..
+ if ( $opt =~ /^[01\*]$/ ) {
+ $msg .=
+ "--$long_name cannot contain $opt mixed with other options\n";
+ }
+ else {
+ $msg .= "--$long_name has unexpected symbol: '$opt'\n";
+ }
+ }
+ if ($msg) { Die($msg) }
+ return $rwarn_hash;
+} ## end sub initialize_warn_hash
- # Patch to put terminal or clauses on a new line: Weaken the bond
- # at an || followed by die or similar keyword to make the terminal
- # or clause fall on a new line, like this:
- #
- # my $class = shift
- # || die "Cannot add broadcast: No class identifier found";
- #
- # Otherwise the break will be at the previous '=' since the || and
- # = have the same starting strength and the or is biased, like
- # this:
- #
- # my $class =
- # shift || die "Cannot add broadcast: No class identifier found";
- #
- # In any case if the user places a break at either the = or the ||
- # it should remain there.
- if ( $type eq '||' || $type eq 'k' && $token eq 'or' ) {
+sub make_excluded_name_hash {
+ my ($option_name) = @_;
- # /^(die|confess|croak|warn)$/
- if ( $is_die_confess_croak_warn{$next_nonblank_token} ) {
- if ( $want_break_before{$token} && $i > 0 ) {
- $rbond_strength_to_go->[ $i - 1 ] -= $delta_bias;
+ # Convert a list of words into a hash ref for an input option
+ # Given:
+ # $option_name = the name of an input option
+ # example: 'warn-variable-exclusion-list'
+ my $rexcluded_name_hash = {};
+ my $excluded_names = $rOpts->{$option_name};
+ if ($excluded_names) {
+ $excluded_names =~ s/,/ /g;
+ my @xl = split_words($excluded_names);
+ my $err_msg = EMPTY_STRING;
+ foreach my $name (@xl) {
+ if ( $name =~ /^([\$\@\%\*])?(\w+)?(\*)?$/ ) {
+ my $left_star = $1;
+ my $key = $2;
+ my $right_star = $3;
+ if ( defined($left_star) ) {
+ if ( $left_star ne '*' ) {
+ if ( defined($key) ) {
+
+ # append sigil to the bareword
+ $key = $left_star . $key;
+ }
+ else {
- # keep bond strength of a token and its following blank
- # the same
- if ( $types_to_go[ $i - 1 ] eq 'b' && $i > 2 ) {
- $rbond_strength_to_go->[ $i - 2 ] -= $delta_bias;
+ # word not given: '$*' is ok but just '$' is not
+ if ($right_star) { $key = $left_star }
}
+ $left_star = EMPTY_STRING;
}
- else {
- $bond_str -= $delta_bias;
- }
}
+
+ # Wildcard matching codes:
+ # 1 = no stars
+ # 2 = left star only
+ # 3 = right star only
+ # 4 = both left and right stars
+ my $code = 1;
+ $code += 1 if ($left_star);
+ $code += 2 if ($right_star);
+ if ( !defined($key) ) {
+ $err_msg .= "--$option_name has unexpected name: '$name'\n";
+ }
+ else {
+ $rexcluded_name_hash->{$key} = $code;
+ }
+ }
+ else {
+ $err_msg .= "--$option_name has unexpected name: '$name'\n";
}
+ }
+ if ($err_msg) { Die($err_msg) }
+ }
+ return $rexcluded_name_hash;
+} ## end sub make_excluded_name_hash
- # good to break after end of code blocks
- if ( $type eq '}' && $block_type && $next_nonblank_type ne ';' ) {
+sub wildcard_match {
- $bond_str = 0.5 * WEAK + 0.5 * VERY_WEAK + $code_bias;
- $code_bias += $delta_bias;
- }
+ my ( $name, $rwildcard_match_list ) = @_;
- if ( $type eq 'k' ) {
+ # Given:
+ # $name = a string to test for a match
+ # $rwildcard_match_list = a list of [key,code] pairs:
+ # key = a string to match
+ # code = 2, 3, or 4 is match type (see comments below)
+ # Return:
+ # true for a match
+ # false for no match
- # allow certain control keywords to stand out
- if ( $next_nonblank_type eq 'k'
- && $is_last_next_redo_return{$token} )
- {
- $bond_str = 0.45 * WEAK + 0.55 * VERY_WEAK;
- }
+ # For example, key='$pack' with code=3 is short for '$pack*'
+ # which will match '$package', '$packer', etc
- # Don't break after keyword my. This is a quick fix for a
- # rare problem with perl. An example is this line from file
- # Container.pm:
+ # Loop over all possible matches
+ foreach ( @{$rwildcard_match_list} ) {
+ my ( $key, $code ) = @{$_};
+ my $len_key = length($key);
+ my $len_name = length($name);
+ next if ( $len_name < $len_key );
- # foreach my $question( Debian::DebConf::ConfigDb::gettree(
- # $this->{'question'} ) )
+ # code 2 = left star only
+ if ( $code == 2 ) {
+ if ( substr( $name, -$len_key, $len_key ) eq $key ) { return 1 }
+ }
- if ( $token eq 'my' ) {
- $bond_str = NO_BREAK;
- }
+ # code 3 = right star only
+ elsif ( $code == 3 ) {
+ if ( substr( $name, 0, $len_key ) eq $key ) { return 1 }
+ }
- }
+ # code 4 = both left and right stars
+ elsif ( $code == 4 ) {
+ if ( index( $name, $key, 0 ) >= 0 ) { return 1 }
+ }
+ else {
+ DEVEL_MODE && Fault("unexpected code '$code' for '$name'\n");
+ }
+ }
+ return;
+} ## end sub wildcard_match
- if ( $next_nonblank_type eq 'k' && $type ne 'CORE::' ) {
+sub initialize_warn_variable_types {
- if ( $is_keyword_returning_list{$next_nonblank_token} ) {
- $bond_str = $list_str if ( $bond_str > $list_str );
- }
+ my ( $wvt_in_args, $num_files, $line_range_clipped ) = @_;
- # keywords like 'unless', 'if', etc, within statements
- # make good breaks
- if ( $is_good_keyword_breakpoint{$next_nonblank_token} ) {
- $bond_str = VERY_WEAK / 1.05;
- }
- }
+ # Initialization for:
+ # --warn-variable-types=s and
+ # --warn-variable-exclusion-list=s
+ # Given:
+ # $wvt_in_args = true if the -wvt parameter was on the command line
+ # $num_files = number of files on the command line
+ # $line_range_clipped = true if only part of a file is being formatted
- # try not to break before a comma-arrow
- elsif ( $next_nonblank_type eq '=>' ) {
- if ( $bond_str < STRONG ) { $bond_str = STRONG }
- }
+ my @all_opts = qw( r s p u c );
+ $rwarn_variable_types =
+ initialize_warn_hash( 'warn-variable-types', 0, \@all_opts );
- #---------------------------------------------------------------
- # Additional hardwired NOBREAK rules
- #---------------------------------------------------------------
+ # Check for issues 'u' or 'c' cannot be fully made if we are working
+ # on a partial file (snippet), so we save info about that.
+ if ( $rwarn_variable_types->{u} || $rwarn_variable_types->{c} ) {
- # map1.t -- correct for a quirk in perl
- if ( $token eq '('
- && $next_nonblank_type eq 'i'
- && $last_nonblank_type eq 'k'
- && $is_sort_map_grep{$last_nonblank_token} )
+ # Three value switch: 0=NO, 1=MAYBE 2=DEFINITELY
+ my $is_possible_snippet = 1;
- # /^(sort|map|grep)$/ )
- {
- $bond_str = NO_BREAK;
- }
+ # assume snippet if incomplete line range is being formatted
+ if ($line_range_clipped) {
+ $is_possible_snippet = 2;
+ }
- # extrude.t: do not break before paren at:
- # -l pid_filename(
- if ( $last_nonblank_type eq 'F' && $next_nonblank_token eq '(' ) {
- $bond_str = NO_BREAK;
+ # assume complete script if operating on multiple files or if
+ # operating on one file and -wvt came in on the command line
+ if ( $is_possible_snippet == 1 && $num_files ) {
+ if ( $num_files > 1 || $wvt_in_args && $num_files ) {
+ $is_possible_snippet = 0;
}
+ }
- # OLD COMMENT: In older version of perl, use strict can cause
- # problems with breaks before bare words following opening parens.
- # For example, this will fail under older versions if a break is
- # made between '(' and 'MAIL':
+ $rwarn_variable_types->{is_possible_snippet} = $is_possible_snippet;
+ }
- # use strict; open( MAIL, "a long filename or command"); close MAIL;
+ $ris_warn_variable_excluded_name =
+ make_excluded_name_hash('warn-variable-exclusion-list');
+ return;
+} ## end sub initialize_warn_variable_types
- # NEW COMMENT: Third fix for b1213:
- # This option does not seem to be needed any longer, and it can
- # cause instabilities. It can be turned off, but to minimize
- # changes to existing formatting it is retained only in the case
- # where the previous token was 'open' and there was no line break.
- # Even this could eventually be removed if it causes instability.
- if ( $type eq '{' ) {
+sub filter_excluded_names {
- if ( $token eq '('
- && $next_nonblank_type eq 'w'
- && $last_nonblank_type eq 'k'
- && $last_nonblank_token eq 'open'
- && !$old_breakpoint_to_go[$i] )
- {
- $bond_str = NO_BREAK;
- }
- }
+ my ( $rwarnings, $rexcluded_name_hash ) = @_;
- # Do not break between a possible filehandle and a ? or / and do
- # not introduce a break after it if there is no blank
- # (extrude.t)
- elsif ( $type eq 'Z' ) {
+ # Remove warnings for variable names excluded by user request
+ # for an operation like --warn-variable-types
- # don't break..
- if (
+ # Given:
+ # $rwarnigns = ref to list of warning info hashes
+ # $rexcluded_name_hash = ref to hash with excluded names
+ # Return updated $rwarnings with excluded names removed
+ if ( @{$rwarnings} && $rexcluded_name_hash ) {
- # if there is no blank and we do not want one. Examples:
- # print $x++ # do not break after $x
- # print HTML"HELLO" # break ok after HTML
- (
- $next_type ne 'b'
- && defined( $want_left_space{$next_type} )
- && $want_left_space{$next_type} == WS_NO
- )
+ # Check for exact matches
+ $rwarnings =
+ [ grep { !$rexcluded_name_hash->{ $_->{name} } } @{$rwarnings} ];
- # or we might be followed by the start of a quote,
- # and this is not an existing breakpoint; fixes c039.
- || !$old_breakpoint_to_go[$i]
- && substr( $next_nonblank_token, 0, 1 ) eq '/'
+ # See if there are any wildcard names
+ my @excluded_wildcards;
+ foreach my $key ( keys %{$rexcluded_name_hash} ) {
+ my $code = $rexcluded_name_hash->{$key};
+ if ( $code != 1 ) {
+ push @excluded_wildcards, [ $key, $code ];
+ }
+ }
- )
- {
- $bond_str = NO_BREAK;
+ if (@excluded_wildcards) {
+ my @tmp;
+ foreach my $item ( @{$rwarnings} ) {
+ my $name = $item->{name};
+ if ( wildcard_match( $name, \@excluded_wildcards ) ) {
+ next;
}
+ push @tmp, $item;
}
+ $rwarnings = \@tmp;
+ }
+ }
+ return $rwarnings;
+} ## end sub filter_excluded_names
- # Breaking before a ? before a quote can cause trouble if
- # they are not separated by a blank.
- # Example: a syntax error occurs if you break before the ? here
- # my$logic=join$all?' && ':' || ',@regexps;
- # From: Professional_Perl_Programming_Code/multifind.pl
- if ( $next_nonblank_type eq '?' ) {
- $bond_str = NO_BREAK
- if ( $types_to_go[ $i_next_nonblank + 1 ] eq 'Q' );
- }
+sub warn_variable_types {
+ my ($self) = @_;
- # Breaking before a . followed by a number
- # can cause trouble if there is no intervening space
- # Example: a syntax error occurs if you break before the .2 here
- # $str .= pack($endian.2, ensurrogate($ord));
- # From: perl58/Unicode.pm
- elsif ( $next_nonblank_type eq '.' ) {
- $bond_str = NO_BREAK
- if ( $types_to_go[ $i_next_nonblank + 1 ] eq 'n' );
- }
+ # process a --warn-variable-types command
- # Fix for c039
- elsif ( $type eq 'w' ) {
- $bond_str = NO_BREAK
- if ( !$old_breakpoint_to_go[$i]
- && substr( $next_nonblank_token, 0, 1 ) eq '/'
- && $next_nonblank_type ne '//' );
- }
+ my $wv_key = 'warn-variable-types';
+ my $wv_option = $rOpts->{$wv_key};
+ return unless ( %{$rwarn_variable_types} );
- $bond_str_2 = $bond_str if (DEBUG_BOND);
+ my ( $rwarnings, $issue_type_string ) =
+ $self->scan_variable_usage($rwarn_variable_types);
+ return unless ( $rwarnings && @{$rwarnings} );
- #---------------------------------------------------------------
- # End of hardwired rules
- #---------------------------------------------------------------
+ $rwarnings =
+ filter_excluded_names( $rwarnings, $ris_warn_variable_excluded_name );
- #---------------------------------------------------------------
- # Bond Strength Section 3:
- # Apply table rules. These have priority over the above
- # hardwired rules.
- #---------------------------------------------------------------
+ # loop to form error messages
+ my $message_middle = EMPTY_STRING;
+ foreach my $item ( @{$rwarnings} ) {
+ my $name = $item->{name};
+ my $lno = $item->{line_number};
+ my $letter = $item->{letter};
+ my $keyword = $item->{keyword};
+ my $note = $item->{note};
+ if ($note) { $note = ": $note" }
+ $message_middle .= "$lno:$letter: $keyword $name$note\n";
+ }
- my $tabulated_bond_str;
- my $ltype = $type;
- my $rtype = $next_nonblank_type;
- if ( $seqno && $is_container_token{$token} ) {
- $ltype = $type . $token;
- }
+ if ($message_middle) {
+ my $message = "Begin scan for --$wv_key=$wv_option\n";
+ $message .= <<EOM;
+$issue_type_string
+Line:Issue: Var: note
+EOM
+ $message .= $message_middle;
+ $message .= "End scan for --$wv_key=$wv_option:\n";
+ warning($message);
+ }
+ return;
+} ## end sub warn_variable_types
- if ( $next_nonblank_seqno
- && $is_container_token{$next_nonblank_token} )
- {
- $rtype = $next_nonblank_type . $next_nonblank_token;
+sub block_seqno_of_paren_seqno {
- # Alternate Fix #1 for issue b1299. This version makes the
- # decision as soon as possible. See Alternate Fix #2 also.
- # Do not separate a bareword identifier from its paren: b1299
- # This is currently needed for stability because if the bareword
- # gets separated from a preceding '->' and following '(' then
- # the tokenizer may switch from type 'i' to type 'w'. This
- # patch will prevent this by keeping it adjacent to its '('.
-## if ( $next_nonblank_token eq '('
-## && $ltype eq 'i'
-## && substr( $token, 0, 1 ) =~ /^\w$/ )
-## {
-## $ltype = 'w';
-## }
- }
+ my ( $self, $seqno_paren ) = @_;
- # apply binary rules which apply regardless of space between tokens
- if ( $binary_bond_strength{$ltype}{$rtype} ) {
- $bond_str = $binary_bond_strength{$ltype}{$rtype};
- $tabulated_bond_str = $bond_str;
- }
+ # Find brace at '){' after paren of keyword such as for, foreach, ...
+ # SEE ALSO: sub block_seqno_of_paren_keyword
- # apply binary rules which apply only if no space between tokens
- if ( $binary_bond_strength_nospace{$ltype}{$next_type} ) {
- $bond_str = $binary_bond_strength{$ltype}{$next_type};
- $tabulated_bond_str = $bond_str;
- }
+ # Given:
+ # $seqno_paren = sequence number of the paren following a keyword which
+ # may either introduce a block or be a trailing statement modifier,
+ # such as 'if',
+ # Return:
+ # - the sequence number of the block, if any, or
+ # - nothing
+
+ # if (...) { ...
+ # ^ ^ ^
+ # | | |
+ # | | K_opening_brace => return sequno of this brace
+ # | K_closing_paren
+ # $seqno_paren = seqno of this paren pair
+
+ return unless $seqno_paren;
+ my $K_closing_paren = $self->[_K_closing_container_]->{$seqno_paren};
+ return unless ($K_closing_paren);
+ my $K_opening_brace = $self->K_next_code($K_closing_paren);
+ return unless ($K_opening_brace);
+ my $rLL = $self->[_rLL_];
+ my $seqno_block = $rLL->[$K_opening_brace]->[_TYPE_SEQUENCE_];
+ return
+ unless ( $seqno_block
+ && $rLL->[$K_opening_brace]->[_TOKEN_] eq '{'
+ && $self->[_rblock_type_of_seqno_]->{$seqno_block} );
+ return $seqno_block;
+} ## end sub block_seqno_of_paren_seqno
- if ( $nobreak_rhs{$ltype} || $nobreak_lhs{$rtype} ) {
- $bond_str = NO_BREAK;
- $tabulated_bond_str = $bond_str;
- }
+sub dump_mixed_call_parens {
+ my ($self) = @_;
- $bond_str_3 = $bond_str if (DEBUG_BOND);
+ # Implent --dump-mixed-call-parens
- # If the hardwired rules conflict with the tabulated bond
- # strength then there is an inconsistency that should be fixed
- DEBUG_BOND
- && $tabulated_bond_str
- && $bond_str_1
- && $bond_str_1 != $bond_str_2
- && $bond_str_2 != $tabulated_bond_str
- && do {
- print STDERR
-"BOND_TABLES: ltype=$ltype rtype=$rtype $bond_str_1->$bond_str_2->$bond_str_3\n";
- };
+ my $opt_name = 'dump-mixed-call-parens';
+ return unless $rOpts->{$opt_name};
- #-----------------------------------------------------------------
- # Bond Strength Section 4:
- # Modify strengths of certain tokens which often occur in sequence
- # by adding a small bias to each one in turn so that the breaks
- # occur from left to right.
- #
- # Note that we only changing strengths by small amounts here,
- # and usually increasing, so we should not be altering any NO_BREAKs.
- # Other routines which check for NO_BREAKs will use a tolerance
- # of one to avoid any problem.
- #-----------------------------------------------------------------
+ my $rLL = $self->[_rLL_];
- # The bias tables use special keys:
- # $type - if not keyword
- # $token - if keyword, but map some keywords together
- my $left_key =
- $type eq 'k' ? $token eq 'err' ? 'or' : $token : $type;
- my $right_key =
- $next_nonblank_type eq 'k'
- ? $next_nonblank_token eq 'err'
- ? 'or'
- : $next_nonblank_token
- : $next_nonblank_type;
+ my %skip_keywords;
+ my @q = qw( my our local state
+ and cmp continue do else elsif eq ge gt le lt ne not or xor );
+ @skip_keywords{@q} = (1) x scalar(@q);
- # bias left token
- if ( defined( $bias{$left_key} ) ) {
- if ( !$want_break_before{$left_key} ) {
- $bias{$left_key} += $delta_bias;
- $bond_str += $bias{$left_key};
- }
- }
+ my %call_counts;
+ foreach my $KK ( 0 .. @{$rLL} - 1 ) {
- # bias right token
- if ( defined( $bias{$right_key} ) ) {
- if ( $want_break_before{$right_key} ) {
+ # Types which will be checked:
+ # 'k'=builtin keyword, 'U'=user defined sub, 'w'=unknown bareword
+ next unless ( $is_kwU{ $rLL->[$KK]->[_TYPE_] } );
- # for leading '.' align all but 'short' quotes; the idea
- # is to not place something like "\n" on a single line.
- if ( $right_key eq '.' ) {
- unless (
- $last_nonblank_type eq '.'
- && ( $token_length <=
- $rOpts_short_concatenation_item_length )
- && ( !$is_closing_token{$token} )
- )
- {
- $bias{$right_key} += $delta_bias;
- }
- }
- else {
- $bias{$right_key} += $delta_bias;
- }
- $bond_str += $bias{$right_key};
- }
- }
+ my $type = $rLL->[$KK]->[_TYPE_];
+ my $token = $rLL->[$KK]->[_TOKEN_];
+ if ( $type eq 'k' && $skip_keywords{$token} ) { next }
+
+ my $Kn = $self->K_next_code($KK);
+ next unless defined($Kn);
+ my $token_Kn = $rLL->[$Kn]->[_TOKEN_];
+ my $have_paren;
+ if ( $token_Kn eq '=>' ) { next }
+ elsif ( $token_Kn eq '->' ) { next }
+ elsif ( $token_Kn eq '(' ) { $have_paren = 1 }
+ else { $have_paren = 0 }
+
+ # return if this is the block form of 'if', 'unless', ..
+ if ( $have_paren
+ && $is_if_unless_while_until_for_foreach{$token} )
+ {
+ my $seqno = $rLL->[$Kn]->[_TYPE_SEQUENCE_];
+ next if ( $self->block_seqno_of_paren_seqno($seqno) );
+ }
- $bond_str_4 = $bond_str if (DEBUG_BOND);
+ if ( !defined( $call_counts{$token} ) ) {
+ $call_counts{$token} = [ 0, 0, $type ];
+ }
+ $call_counts{$token}->[$have_paren]++;
+ }
+ my @mixed_counts;
+ foreach my $key ( keys %call_counts ) {
+ my ( $no_count, $yes_count, $type ) = @{ $call_counts{$key} };
+ next unless ( $no_count && $yes_count );
- #---------------------------------------------------------------
- # Bond Strength Section 5:
- # Fifth Approximation.
- # Take nesting depth into account by adding the nesting depth
- # to the bond strength.
- #---------------------------------------------------------------
- my $strength;
+ push @mixed_counts,
+ {
+ name => $key,
+ type => $type,
+ no_count => $no_count,
+ yes_count => $yes_count,
+ };
+ }
+ return unless (@mixed_counts);
+
+ # sort on lc of type so that user sub type 'U' will come after 'k'
+ my @sorted =
+ sort { lc $a->{type} cmp lc $b->{type} || $a->{name} cmp $b->{name} }
+ @mixed_counts;
+
+ my $input_stream_name = get_input_stream_name();
+ my $output_string = <<EOM;
+$input_stream_name: output for --dump-mixed-call-parens
+use -wcp=s and/or nwcp=s to find line numbers, where s is a string of words
+types are 'k'=builtin keyword 'U'=user sub 'w'=other word
+type:word:+count:-count
+EOM
+ foreach my $item (@sorted) {
+ my $type = $item->{type};
+ my $name = $item->{name};
+ my $no_count = $item->{no_count};
+ my $yes_count = $item->{yes_count};
+ $output_string .= "$type:$name:$yes_count:$no_count\n";
+ }
+ print {*STDOUT} $output_string;
+ return;
+} ## end sub dump_mixed_call_parens
- if ( defined($bond_str) && !$nobreak_to_go[$i] ) {
- if ( $total_nesting_depth > 0 ) {
- $strength = $bond_str + $total_nesting_depth;
- }
- else {
- $strength = $bond_str;
- }
- }
- else {
- $strength = NO_BREAK;
+sub initialize_call_paren_style {
- # For critical code such as lines with here targets we must
- # be absolutely sure that we do not allow a break. So for
- # these the nobreak flag exceeds 1 as a signal. Otherwise we
- # can run into trouble when small tolerances are added.
- $strength += 1
- if ( $nobreak_to_go[$i] && $nobreak_to_go[$i] > 1 );
- }
+ # parse --want-call-parens=s and --nowant-call-parens=s
+ # and store results in this global hash:
+ %call_paren_style = ();
+ my $iter = -1;
+ foreach my $opt_name ( 'nowant-call-parens', 'want-call-parens' ) {
+ $iter++;
+ my $opt = $rOpts->{$opt_name};
+ next unless defined($opt);
- #---------------------------------------------------------------
- # Bond Strength Section 6:
- # Sixth Approximation. Welds.
- #---------------------------------------------------------------
+ # allow comma separators
+ $opt =~ s/,/ /g;
+ if ( my @q = split_words($opt) ) {
+ foreach my $word (@q) {
- # Do not allow a break within welds
- if ( $total_weld_count && $seqno ) {
- my $KK = $K_to_go[$i];
- if ( $rK_weld_right->{$KK} ) {
- $strength = NO_BREAK;
+ # words must be simple identifiers, or '&'
+ if ( $word !~ /^(?:\&|\w+)$/ || $word =~ /^\d/ ) {
+ Die("Unexpected word in --$opt_name: '$word'\n");
}
-
- # But encourage breaking after opening welded tokens
- elsif ($rK_weld_left->{$KK}
- && $is_opening_token{$token} )
- {
- $strength -= 1;
+ if ( $iter && defined( $call_paren_style{$word} ) ) {
+ Warn("'$word' occurs in both -nwcp and -wcp, using -wcp\n");
}
}
+ @call_paren_style{@q} = ($iter) x scalar(@q);
+ }
+ }
+ return;
+} ## end sub initialize_call_paren_style
- # always break after side comment
- if ( $type eq '#' ) { $strength = 0 }
+sub scan_call_parens {
+ my ($self) = @_;
- $rbond_strength_to_go->[$i] = $strength;
+ # Perform a scan requested by --want-call-parens
+ # We search for selected functions or keywords and for a following paren.
+ # A warning is issued if the paren existence is not what is wanted
+ # according to the setting --want-call-parens.
- # Fix for case c001: be sure NO_BREAK's are enforced by later
- # routines, except at a '?' because '?' as quote delimiter is
- # deprecated.
- if ( $strength >= NO_BREAK && $next_nonblank_type ne '?' ) {
- $nobreak_to_go[$i] ||= 1;
- }
+ # This routine does not attempt to add or remove parens, it merely
+ # issues a warning so that the user can make a change if desired.
+ # It is risky to add or delete parens automatically; see git #128.
- DEBUG_BOND && do {
- my $str = substr( $token, 0, 15 );
- $str .= SPACE x ( 16 - length($str) );
- print STDOUT
-"BOND: i=$i $str $type $next_nonblank_type depth=$total_nesting_depth strength=$bond_str_1 -> $bond_str_2 -> $bond_str_3 -> $bond_str_4 $bond_str -> $strength \n";
+ return unless (%call_paren_style);
+ my $opt_name = 'want-call-parens';
- # reset for next pass
- $bond_str_1 = $bond_str_2 = $bond_str_3 = $bond_str_4 = undef;
- };
+ my $rwarnings = [];
- } ## end main loop
- return $rbond_strength_to_go;
- } ## end sub set_bond_strengths
-} ## end closure set_bond_strengths
+ #---------------------
+ # Loop over all tokens
+ #---------------------
+ my $rLL = $self->[_rLL_];
+ foreach my $KK ( 0 .. @{$rLL} - 1 ) {
-sub bad_pattern {
+ # Types which will be checked:
+ # 'k'=builtin keyword, 'U'=user defined sub, 'w'=unknown bareword
+ next unless ( $is_kwU{ $rLL->[$KK]->[_TYPE_] } );
+
+ # Are we looking for this word?
+ my $type = $rLL->[$KK]->[_TYPE_];
+ my $token = $rLL->[$KK]->[_TOKEN_];
+ my $want_paren = $call_paren_style{$token};
+
+ # Only user-defined subs (type 'U') have defaults.
+ if ( !defined($want_paren) ) {
+ $want_paren =
+ $type eq 'k' ? undef
+ : $type eq 'U' ? $call_paren_style{'&'}
+ : undef;
+ }
+ next unless defined($want_paren);
+
+ # This is a selected word. Look for a '(' at the next token.
+ my $Kn = $self->K_next_code($KK);
+ next unless defined($Kn);
+
+ my $token_Kn = $rLL->[$Kn]->[_TOKEN_];
+ if ( $token_Kn eq '=>' ) { next }
+ elsif ( $token_Kn eq '->' ) { next }
+ elsif ( $token_Kn eq '(' ) { next if ($want_paren) }
+ else { next if ( !$want_paren ) }
+
+ # return if this is the block form of 'if', 'unless', ..
+ if ( $token_Kn eq '('
+ && $is_if_unless_while_until_for_foreach{$token} )
+ {
+ my $seqno = $rLL->[$Kn]->[_TYPE_SEQUENCE_];
+ next if ( $self->block_seqno_of_paren_seqno($seqno) );
+ }
+
+ # This disagrees with the wanted style; issue a warning.
+ my $note = $want_paren ? "no call parens" : "has call parens";
+ my $rwarning = {
+ token => $token,
+ token_next => $token_Kn,
+ want => $want_paren,
+ note => $note,
+ line_number => $rLL->[$KK]->[_LINE_INDEX_] + 1,
+ KK => $KK,
+ Kn => $Kn,
+ };
+ push @{$rwarnings}, $rwarning;
+ }
- # See if a pattern will compile. We have to use a string eval here,
- # but it should be safe because the pattern has been constructed
- # by this program.
- my ($pattern) = @_;
- my $ok = eval "'##'=~/$pattern/";
- return !defined($ok) || $EVAL_ERROR;
-} ## end sub bad_pattern
+ # Report any warnings
+ if ( @{$rwarnings} ) {
+ my $message = "Begin scan for --$opt_name\n";
+ $message .= <<EOM;
+Line:text:
+EOM
+ foreach my $item ( @{$rwarnings} ) {
+ my $token = $item->{token};
+ my $token_next = $item->{token_next};
+ my $note = $item->{note};
+ my $lno = $item->{line_number};
-{ ## begin closure prepare_cuddled_block_types
+ # trim long tokens for the output line
+ if ( length($token_next) > 23 ) {
+ $token_next = substr( $token_next, 0, 20 ) . '...';
+ }
- my %no_cuddle;
+ # stop before a ':' to allow use of ':' as spreadsheet col separator
+ my $ii = index( $token_next, ':' );
+ if ( $ii >= 0 ) { $token_next = substr( $token_next, 0, $ii ) }
- # Add keywords here which really should not be cuddled
- BEGIN {
- my @q = qw(if unless for foreach while);
- @no_cuddle{@q} = (1) x scalar(@q);
+ $message .= "$lno:$token $token_next: $note\n";
+ }
+ $message .= "End scan for --$opt_name\n";
+
+ # Note that this is sent in a single call to warning() in order
+ # to avoid triggering a stop on large warning count
+ warning($message);
}
+ return;
+} ## end sub scan_call_parens
- sub prepare_cuddled_block_types {
+sub find_non_indenting_braces {
- # the cuddled-else style, if used, is controlled by a hash that
- # we construct here
+ my ( $self, $rix_side_comments ) = @_;
- # Include keywords here which should not be cuddled
+ # Find and mark all non-indenting braces in this file.
- my $cuddled_string = EMPTY_STRING;
- if ( $rOpts->{'cuddled-else'} ) {
+ # Given:
+ # $rix_side_comments = index of lines which have side comments
+ # Find and save the line indexes of these special side comments in:
+ # $self->[_rseqno_non_indenting_brace_by_ix_];
- # set the default
- $cuddled_string = 'elsif else continue catch finally'
- unless ( $rOpts->{'cuddled-block-list-exclusive'} );
+ # Non-indenting braces are opening braces of the form
+ # { #<<< ...
+ # which do not cause an increase in indentation level.
+ # They are enabled with the --non-indenting-braces, or -nib, flag.
- # This is the old equivalent but more complex version
- # $cuddled_string = 'if-elsif-else unless-elsif-else -continue ';
+ return unless ( $rOpts->{'non-indenting-braces'} );
+ my $rLL = $self->[_rLL_];
+ return unless ( defined($rLL) && @{$rLL} );
+ my $rlines = $self->[_rlines_];
+ my $rblock_type_of_seqno = $self->[_rblock_type_of_seqno_];
+ my $rseqno_non_indenting_brace_by_ix =
+ $self->[_rseqno_non_indenting_brace_by_ix_];
- # Add users other blocks to be cuddled
- my $cuddled_block_list = $rOpts->{'cuddled-block-list'};
- if ($cuddled_block_list) {
- $cuddled_string .= SPACE . $cuddled_block_list;
- }
+ foreach my $ix ( @{$rix_side_comments} ) {
+ my $line_of_tokens = $rlines->[$ix];
+ my $line_type = $line_of_tokens->{_line_type};
+ if ( $line_type ne 'CODE' ) {
+ # shouldn't happen
+ DEVEL_MODE && Fault("unexpected line_type=$line_type\n");
+ next;
}
+ my $rK_range = $line_of_tokens->{_rK_range};
+ my ( $Kfirst, $Klast ) = @{$rK_range};
+ if ( !defined($Kfirst) || $rLL->[$Klast]->[_TYPE_] ne '#' ) {
- # If we have a cuddled string of the form
- # 'try-catch-finally'
+ # shouldn't happen
+ DEVEL_MODE && Fault("did not get a comment\n");
+ next;
+ }
+ next if ( $Klast <= $Kfirst ); # maybe HSC
+ my $token_sc = $rLL->[$Klast]->[_TOKEN_];
+ my $K_m = $Klast - 1;
+ my $type_m = $rLL->[$K_m]->[_TYPE_];
+ if ( $type_m eq 'b' && $K_m > $Kfirst ) {
+ $K_m--;
+ $type_m = $rLL->[$K_m]->[_TYPE_];
+ }
+ my $seqno_m = $rLL->[$K_m]->[_TYPE_SEQUENCE_];
+ if ($seqno_m) {
+ my $block_type_m = $rblock_type_of_seqno->{$seqno_m};
- # we want to prepare a hash of the form
+ # The pattern ends in \s but we have removed the newline, so
+ # we added it back for the match. That way we require an exact
+ # match to the special string and also allow additional text.
+ $token_sc .= "\n";
+ if ( $block_type_m
+ && $is_opening_type{$type_m}
+ && $token_sc =~ /$non_indenting_brace_pattern/ )
+ {
+ $rseqno_non_indenting_brace_by_ix->{$ix} = $seqno_m;
+ }
+ }
+ }
+ return;
+} ## end sub find_non_indenting_braces
- # $rcuddled_block_types = {
- # 'try' => {
- # 'catch' => 1,
- # 'finally' => 1
- # },
- # };
+sub interbracket_arrow_check {
- # use -dcbl to dump this hash
+ my ($self) = @_;
- # Multiple such strings are input as a space or comma separated list
+ # Implement the options to add or delete optional arrows between brackets
+ my $rOpts_add = $rOpts->{'add-interbracket-arrows'};
+ my $rOpts_del = $rOpts->{'delete-interbracket-arrows'};
+ my $rOpts_warn = $rOpts->{'warn-interbracket-arrows'};
+ my $rOpts_warn_and_style = $rOpts_warn && %interbracket_arrow_style;
- # If we get two lists with the same leading type, such as
- # -cbl = "-try-catch-finally -try-catch-otherwise"
- # then they will get merged as follows:
- # $rcuddled_block_types = {
- # 'try' => {
- # 'catch' => 1,
- # 'finally' => 2,
- # 'otherwise' => 1,
- # },
- # };
- # This will allow either type of chain to be followed.
+ return
+ unless ( $rOpts_add || $rOpts_del || $rOpts_warn_and_style );
- $cuddled_string =~ s/,/ /g; # allow space or comma separated lists
- my @cuddled_strings = split /\s+/, $cuddled_string;
+ # Method:
+ # Loop over all opening brackets and look back for a possible arrow
+ # and closing bracket. If the location between brackets allows an
+ # optional arrow, then see if one should be added or deleted.
+ # Set a flag for sub respace_tokens which will make the change.
+
+ # Deleting examples:
+ # $variables->{'a'}->{'b'} $variables->{'a'}{'b'}
+ # $variables{'a'}->{'b'} $variables{'a'}->{'b'}
+ # $items[1]->[4]->{red} $items[1][4]{red}
+ # $items{blue}->[4]->{red} $items{blue}[4]{red}
+
+ # Adding examples:
+ # $variables->{'a'}{'b'} $variables->{'a'}->{'b'}
+ # $variables{'a'}->{'b'} $variables{'a'}->{'b'}
+ # $items[1][4]{red} $items[1]->[4]->{red}
+ # $items{blue}[4]{red} $items{blue}->[4]->{red}
+
+ # bracket chain ] { } [ ] [
+ # | | |
+ # arrow ok? ? ? ?
+
+ # The following chain rule is used to locate optional arrow locations:
+ # Scanning left to right:
+ # -arrows can begin once we see an opening token preceded by:
+ # - an ->, or
+ # - a simple scalar identifier like '$href{' or '$aryref['
+ # - Once arrows begin they may continue to the end of the bracket chain.
+
+ # To illustrate why we just can't add and remove arrows between
+ # ']' and '[', for example, consider
+ # my $v1 = [ 1, 2, [ 3, 4 ] ]->[2]->[0]; # ok
+ # my $v2 = [ 1, 2, [ 3, 4 ] ]->[2][0]; # ok, keep required arrow
+ # my $v3 = [ 1, 2, [ 3, 4 ] ][2][0]; # Error
+
+ # Note that an arrow does not get placed between '}' and '[' here:
+ # my $val = ${$x}[1];
+ # Perltidy marks the '$' as type 't', and since the logic below checks
+ # for identifiers of type 'i', it will work ok.
+
+ # We will maintain the flag for this check in the following hash:
+ my %trailing_arrow_ok_by_seqno;
- $rcuddled_block_types = {};
+ my $rLL = $self->[_rLL_];
+ return unless ( defined($rLL) && @{$rLL} );
+ my $K_opening_container = $self->[_K_opening_container_];
- # process each dash-separated string...
- my $string_count = 0;
- foreach my $string (@cuddled_strings) {
- next unless $string;
- my @words = split /-+/, $string; # allow multiple dashes
+ my @lno_del;
+ my @lno_add;
- # we could look for and report possible errors here...
- next unless ( @words > 0 );
+ my $warn = sub {
- # allow either '-continue' or *-continue' for arbitrary starting type
- my $start = '*';
+ # write a warning on changes made or needed if -wia is set
+ my ( $rlno_list, $first_word ) = @_;
+ my $str;
+ my $num_changes = @{$rlno_list};
+ my @unique_lno = do {
+ my %seen;
+ grep { !$seen{$_}++ } @{$rlno_list};
+ };
+ my $num_lno = @unique_lno;
+ my $num_lim = 10;
+ if ( $num_lno <= $num_lim ) {
+ $str = join( SPACE, @unique_lno );
+ }
+ else {
+ $str = join( SPACE, @unique_lno[ 0 .. $num_lim - 1 ] ) . " ...";
+ }
+ my $ess1 = $num_changes == 1 ? EMPTY_STRING : 's';
+ my $ess2 = $num_lno == 1 ? EMPTY_STRING : 's';
+ my $msg = "$first_word $num_changes '->'$ess1 at line$ess2 $str\n";
+ warning($msg);
+ return;
+ }; ## end $warn = sub
+
+ # Complexity control flag:
+ # =0 left container must just contain a single token
+ # =1 left container must not contain other containers [DEFAULT]
+ # =2 no complexity constraints
+ my $complexity = $rOpts->{'interbracket-arrow-complexity'};
+ if ( !defined($complexity) ) { $complexity = 1 }
+
+ #--------------------------------------------
+ # Main loop over all opening container tokens
+ #--------------------------------------------
+ foreach my $seqno ( sort { $a <=> $b } keys %{$K_opening_container} ) {
+
+ # We just want opening token types 'L" or '['
+ # Note: the tokenizer marks hash braces '{' and '}' as 'L' and 'R'
+ # but we have to be careful because small block braces can also
+ # get marked 'L' and 'R' for formatting purposes.
+ my $Ko = $K_opening_container->{$seqno};
+ my $type = $rLL->[$Ko]->[_TYPE_];
+ next if ( $type ne 'L' && $type ne '[' );
+
+ # Now find the previous nonblank token
+ my $K_m = $Ko - 1;
+ next if ( $K_m < 0 );
+ my $type_m = $rLL->[$K_m]->[_TYPE_];
+ if ( $type_m eq 'b' && $K_m > 0 ) {
+ $K_m -= 1;
+ $type_m = $rLL->[$K_m]->[_TYPE_];
+ }
- # a single word without dashes is a secondary block type
- if ( @words > 1 ) {
- $start = shift @words;
+ # These vars will hold the previous closing bracket, if any;
+ # initialized to this token but will be moved if it is an arrow
+ my $K_mm = $K_m;
+ my $type_mm = $type_m;
+
+ # Decide if an inter-bracket arrow could follow the closing token
+ # of this container..
+
+ # preceded by scalar identifier (such as '$array[' or '$hash{') ?
+ if ( $type_m eq 'i' || $type_m eq 'Z' ) {
+
+ my $token_m = $rLL->[$K_m]->[_TOKEN_];
+ if ( substr( $token_m, 0, 1 ) eq '$' ) {
+
+ # arrows can follow the CLOSING bracket of this container
+ $trailing_arrow_ok_by_seqno{$seqno} = 1;
}
+ }
- # always make an entry for the leading word. If none follow, this
- # will still prevent a wildcard from matching this word.
- if ( !defined( $rcuddled_block_types->{$start} ) ) {
- $rcuddled_block_types->{$start} = {};
+ # or a closing bracket or hash brace
+ elsif ( $type_m eq ']' || $type_m eq 'R' ) {
+ my $seqno_m = $rLL->[$K_m]->[_TYPE_SEQUENCE_];
+
+ # propagate the arrow status flag
+ $trailing_arrow_ok_by_seqno{$seqno} =
+ $trailing_arrow_ok_by_seqno{$seqno_m};
+ }
+
+ # check a pointer and if found, back up one more token
+ elsif ( $type_m eq '->' ) {
+
+ # arrows can follow the CLOSING bracket of this container
+ $trailing_arrow_ok_by_seqno{$seqno} = 1;
+
+ # back up one token before the arrow
+ $K_mm = $K_m - 1;
+ next if ( $K_mm <= 0 );
+ $type_mm = $rLL->[$K_mm]->[_TYPE_];
+ if ( $type_mm eq 'b' && $K_mm > 0 ) {
+ $K_mm -= 1;
+ $type_mm = $rLL->[$K_mm]->[_TYPE_];
}
+ }
+ else {
+ # something else
+ }
- # The count gives the original word order in case we ever want it.
- $string_count++;
- my $word_count = 0;
- foreach my $word (@words) {
- next unless $word;
- if ( $no_cuddle{$word} ) {
- Warn(
-"## Ignoring keyword '$word' in -cbl; does not seem right\n"
- );
- next;
- }
- $word_count++;
- $rcuddled_block_types->{$start}->{$word} =
- 1; #"$string_count.$word_count";
+ # now check for a preceding closing bracket or hash brace
+ next if ( $type_mm ne ']' && $type_mm ne 'R' );
+ my $seqno_mm = $rLL->[$K_mm]->[_TYPE_SEQUENCE_];
+ next if ( !$seqno_mm );
- # git#9: Remove this word from the list of desired one-line
- # blocks
- $want_one_line_block{$word} = 0;
+ $trailing_arrow_ok_by_seqno{$seqno} = 1;
+
+ # We are between brackets with these two or three sequential tokens,
+ # indexes _mm and _m are identical if there is no arrow.
+ # $type_mm $type_m $type
+ # R or ] ->? [ or L
+
+ # Can an inter-bracket arrow be here?
+ next unless ( $trailing_arrow_ok_by_seqno{$seqno_mm} );
+
+ # If the user defined a style, only continue if this requires
+ # adding or deleting an '->' to match the style
+ if (%interbracket_arrow_style) {
+ my $style = $interbracket_arrow_style{ $type_mm . $type };
+ next if ( !$style );
+ next
+ if ( $style == -1 && $type_m ne '->'
+ || $style == 1 && $type_m eq '->' );
+ }
+
+ next if ( $type_m eq '->' && !$rOpts_del && !$rOpts_warn );
+ next if ( $type_m ne '->' && !$rOpts_add && !$rOpts_warn );
+
+ # Do not continue if the left container is too complex..
+ # complexity flag = 0: only one nonblank token in the brackets
+ if ( !$complexity ) {
+ my $count = 0;
+ my $Ko_mm = $K_opening_container->{$seqno_mm};
+ next unless defined($Ko_mm);
+ foreach my $KK ( $Ko_mm + 1 .. $K_mm - 2 ) {
+ next if ( $rLL->[$KK]->[_TYPE_] eq 'b' );
+ $count++;
+ last if ( $count > 1 );
}
+ next if ( $count > 1 );
}
- return;
- } ## end sub prepare_cuddled_block_types
-} ## end closure prepare_cuddled_block_types
-sub dump_cuddled_block_list {
- my ($fh) = @_;
+ # complexity flag = 1: no interior container tokens
+ elsif ( $complexity == 1 ) {
- # ORIGINAL METHOD: Here is the format of the cuddled block type hash
- # which controls this routine
- # my $rcuddled_block_types = {
- # 'if' => {
- # 'else' => 1,
- # 'elsif' => 1
- # },
- # 'try' => {
- # 'catch' => 1,
- # 'finally' => 1
- # },
- # };
+ if ( $seqno_mm ne $seqno - 1 ) {
+ next;
+ }
+ }
+ else {
+ # complexity flag >1 => no restriction
+ }
+
+ # set a flag telling sub respace_tokens to actually make the change
+ my $lno = 1 + $rLL->[$Ko]->[_LINE_INDEX_];
+ if ( $type_m eq '->' ) {
+ if ($rOpts_del) {
+ $self->[_rwant_arrow_before_seqno_]->{$seqno} = -1;
+ }
+ if ( $rOpts_del || $rOpts_warn_and_style ) { push @lno_del, $lno }
+ }
+ else {
+ if ($rOpts_add) {
+ $self->[_rwant_arrow_before_seqno_]->{$seqno} = 1;
+ }
+ if ( $rOpts_add || $rOpts_warn_and_style ) { push @lno_add, $lno }
+ }
+ }
- # SIMPLIFIED METHOD: the simplified method uses a wildcard for
- # the starting block type and puts all cuddled blocks together:
- # my $rcuddled_block_types = {
- # '*' => {
- # 'else' => 1,
- # 'elsif' => 1
- # 'catch' => 1,
- # 'finally' => 1
- # },
- # };
+ if ($rOpts_warn) {
+ my $wia = '--warn-interbracket-arrows report:';
+ $warn->( \@lno_add, $rOpts_add ? "$wia added" : "$wia: missing" )
+ if (@lno_add);
+ $warn->( \@lno_del, $rOpts_del ? "$wia deleted " : "$wia: unwanted " )
+ if (@lno_del);
+ }
+ return;
+} ## end sub interbracket_arrow_check
- # Both methods work, but the simplified method has proven to be adequate and
- # easier to manage.
+sub delete_side_comments {
+ my ( $self, $rix_side_comments ) = @_;
- my $cuddled_string = $rOpts->{'cuddled-block-list'};
- $cuddled_string = EMPTY_STRING unless $cuddled_string;
+ # Handle any requested side comment deletions.
+ # Given:
+ # $rix_side_comments = ref to list of indexes of lines with side comments
- my $flags = EMPTY_STRING;
- $flags .= "-ce" if ( $rOpts->{'cuddled-else'} );
- $flags .= " -cbl='$cuddled_string'";
+ my $rLL = $self->[_rLL_];
+ my $rlines = $self->[_rlines_];
+ my $rblock_type_of_seqno = $self->[_rblock_type_of_seqno_];
+ my $rseqno_non_indenting_brace_by_ix =
+ $self->[_rseqno_non_indenting_brace_by_ix_];
- unless ( $rOpts->{'cuddled-else'} ) {
- $flags .= "\nNote: You must specify -ce to generate a cuddled hash";
- }
+ foreach my $ix ( @{$rix_side_comments} ) {
+ my $line_of_tokens = $rlines->[$ix];
+ my $line_type = $line_of_tokens->{_line_type};
- $fh->print(<<EOM);
-------------------------------------------------------------------------
-Hash of cuddled block types prepared for a run with these parameters:
- $flags
-------------------------------------------------------------------------
+ # This fault shouldn't happen because we only saved CODE lines with
+ # side comments in the TASK 1 loop above.
+ if ( $line_type ne 'CODE' ) {
+ if (DEVEL_MODE) {
+ my $lno = $ix + 1;
+ Fault(<<EOM);
+Hit unexpected line_type = '$line_type' near line $lno while deleting side comments, should be 'CODE'
EOM
+ }
+ next;
+ }
- use Data::Dumper;
- $fh->print( Dumper($rcuddled_block_types) );
+ my $CODE_type = $line_of_tokens->{_code_type};
+ my $rK_range = $line_of_tokens->{_rK_range};
+ my ( $Kfirst, $Klast ) = @{$rK_range};
- $fh->print(<<EOM);
-------------------------------------------------------------------------
+ if ( !defined($Kfirst) || $rLL->[$Klast]->[_TYPE_] ne '#' ) {
+ if (DEVEL_MODE) {
+ my $lno = $ix + 1;
+ Fault(<<EOM);
+Did not find side comment near line $lno while deleting side comments
EOM
- return;
-} ## end sub dump_cuddled_block_list
+ }
+ next;
+ }
-sub make_static_block_comment_pattern {
+ my $delete_side_comment =
+ $rOpts_delete_side_comments
+ && ( $Klast > $Kfirst || $CODE_type eq 'HSC' )
+ && (!$CODE_type
+ || $CODE_type eq 'HSC'
+ || $CODE_type eq 'IO'
+ || $CODE_type eq 'NIN' );
- # create the pattern used to identify static block comments
- $static_block_comment_pattern = '^\s*##';
+ # Do not delete special control side comments
+ if ( $rseqno_non_indenting_brace_by_ix->{$ix} ) {
+ $delete_side_comment = 0;
+ }
- # allow the user to change it
- if ( $rOpts->{'static-block-comment-prefix'} ) {
- my $prefix = $rOpts->{'static-block-comment-prefix'};
- $prefix =~ s/^\s*//;
- my $pattern = $prefix;
+ if (
+ $rOpts_delete_closing_side_comments
+ && !$delete_side_comment
+ && $Klast > $Kfirst
+ && ( !$CODE_type
+ || $CODE_type eq 'HSC'
+ || $CODE_type eq 'IO'
+ || $CODE_type eq 'NIN' )
+ )
+ {
+ my $token = $rLL->[$Klast]->[_TOKEN_];
+ my $K_m = $Klast - 1;
+ my $type_m = $rLL->[$K_m]->[_TYPE_];
+ if ( $type_m eq 'b' && $K_m > $Kfirst ) { $K_m-- }
+ my $seqno_m = $rLL->[$K_m]->[_TYPE_SEQUENCE_];
- # user may give leading caret to force matching left comments only
- if ( $prefix !~ /^\^#/ ) {
- if ( $prefix !~ /^#/ ) {
- Die(
-"ERROR: the -sbcp prefix is '$prefix' but must begin with '#' or '^#'\n"
- );
+ # patch to delete asub csc's (c380)
+ if ( !$seqno_m && $K_m && $rLL->[$K_m]->[_TYPE_] eq ';' ) {
+ $K_m = $K_m - 1;
+ $type_m = $rLL->[$K_m]->[_TYPE_];
+ if ( $type_m eq 'b' && $K_m > $Kfirst ) { $K_m-- }
+ if ( $K_m == $Kfirst ) {
+ $seqno_m = $rLL->[$K_m]->[_TYPE_SEQUENCE_];
+ }
}
- $pattern = '^\s*' . $prefix;
- }
- if ( bad_pattern($pattern) ) {
- Die(
-"ERROR: the -sbc prefix '$prefix' causes the invalid regex '$pattern'\n"
- );
- }
- $static_block_comment_pattern = $pattern;
- }
- return;
-} ## end sub make_static_block_comment_pattern
-sub make_format_skipping_pattern {
- my ( $opt_name, $default ) = @_;
- my $param = $rOpts->{$opt_name};
- unless ($param) { $param = $default }
- $param =~ s/^\s*//;
- if ( $param !~ /^#/ ) {
- Die("ERROR: the $opt_name parameter '$param' must begin with '#'\n");
- }
- my $pattern = '^' . $param . '\s';
- if ( bad_pattern($pattern) ) {
- Die(
-"ERROR: the $opt_name parameter '$param' causes the invalid regex '$pattern'\n"
- );
- }
- return $pattern;
-} ## end sub make_format_skipping_pattern
+ if ($seqno_m) {
+ my $block_type_m = $rblock_type_of_seqno->{$seqno_m};
+ if ( $block_type_m
+ && $token =~ /$closing_side_comment_prefix_pattern/
+ && $block_type_m =~ /$closing_side_comment_list_pattern/
+ && $block_type_m !~
+ /$closing_side_comment_exclusion_pattern/ )
+ {
+ $delete_side_comment = 1;
+ }
+ }
+ } ## end if ( $rOpts_delete_closing_side_comments...)
-sub make_non_indenting_brace_pattern {
+ if ($delete_side_comment) {
- # Create the pattern used to identify static side comments.
- # Note that we are ending the pattern in a \s. This will allow
- # the pattern to be followed by a space and some text, or a newline.
- # The pattern is used in sub 'non_indenting_braces'
- $non_indenting_brace_pattern = '^#<<<\s';
+ # We are actually just changing the side comment to a blank.
+ # This may produce multiple blanks in a row, but sub respace_tokens
+ # will check for this and fix it.
+ $rLL->[$Klast]->[_TYPE_] = 'b';
+ $rLL->[$Klast]->[_TOKEN_] = SPACE;
- # allow the user to change it
- if ( $rOpts->{'non-indenting-brace-prefix'} ) {
- my $prefix = $rOpts->{'non-indenting-brace-prefix'};
- $prefix =~ s/^\s*//;
- if ( $prefix !~ /^#/ ) {
- Die("ERROR: the -nibp parameter '$prefix' must begin with '#'\n");
- }
- my $pattern = '^' . $prefix . '\s';
- if ( bad_pattern($pattern) ) {
- Die(
-"ERROR: the -nibp prefix '$prefix' causes the invalid regex '$pattern'\n"
- );
+ # The -io option outputs the line text, so we have to update
+ # the line text so that the comment does not reappear.
+ if ( $CODE_type eq 'IO' ) {
+ my $line = EMPTY_STRING;
+ foreach my $KK ( $Kfirst .. $Klast - 1 ) {
+ $line .= $rLL->[$KK]->[_TOKEN_];
+ }
+ $line =~ s/\s+$//;
+ $line_of_tokens->{_line_text} = $line . "\n";
+ }
+
+ # If we delete a hanging side comment the line becomes blank.
+ if ( $CODE_type eq 'HSC' ) { $line_of_tokens->{_code_type} = 'BL' }
}
- $non_indenting_brace_pattern = $pattern;
}
return;
-} ## end sub make_non_indenting_brace_pattern
+} ## end sub delete_side_comments
-sub make_closing_side_comment_list_pattern {
+my %wU;
+my %wiq;
+my %is_wit;
+my %is_sigil;
+my %is_nonlist_keyword;
+my %is_nonlist_type;
+my %is_s_y_m_slash;
+my %is_unexpected_equals;
+my %is_ascii_type;
- # turn any input list into a regex for recognizing selected block types
- $closing_side_comment_list_pattern = '^\w+';
- if ( defined( $rOpts->{'closing-side-comment-list'} )
- && $rOpts->{'closing-side-comment-list'} )
- {
- $closing_side_comment_list_pattern =
- make_block_pattern( '-cscl', $rOpts->{'closing-side-comment-list'} );
- }
- return;
-} ## end sub make_closing_side_comment_list_pattern
+BEGIN {
-sub make_sub_matching_pattern {
+ # added 'U' to fix cases b1125 b1126 b1127
+ my @q = qw( w U );
+ @wU{@q} = (1) x scalar(@q);
- # Patterns for standardizing matches to block types for regular subs and
- # anonymous subs. Examples
- # 'sub process' is a named sub
- # 'sub ::m' is a named sub
- # 'sub' is an anonymous sub
- # 'sub:' is a label, not a sub
- # 'sub :' is a label, not a sub ( block type will be <sub:> )
- # sub'_ is a named sub ( block type will be <sub '_> )
- # 'substr' is a keyword
- # So note that named subs always have a space after 'sub'
- $SUB_PATTERN = '^sub\s'; # match normal sub
- $ASUB_PATTERN = '^sub$'; # match anonymous sub
+ @q = qw( w i q Q G C Z );
+ @wiq{@q} = (1) x scalar(@q);
- # Note (see also RT #133130): These patterns are used by
- # sub make_block_pattern, which is used for making most patterns.
- # So this sub needs to be called before other pattern-making routines.
+ @q = qw( w i t ); # for c250: added new types 'P', 'S', formerly 'i'
+ @is_wit{@q} = (1) x scalar(@q);
+
+ @q = qw( $ & % * @ );
+ @is_sigil{@q} = (1) x scalar(@q);
+
+ # Parens following these keywords will not be marked as lists. Note that
+ # 'for' is not included and is handled separately, by including 'f' in the
+ # hash %is_counted_type, since it may or may not be a c-style for loop.
+ @q = qw( if elsif unless and or );
+ @is_nonlist_keyword{@q} = (1) x scalar(@q);
+
+ # Parens following these types will not be marked as lists
+ @q = qw( && || );
+ @is_nonlist_type{@q} = (1) x scalar(@q);
+
+ @q = qw( s y m / );
+ @is_s_y_m_slash{@q} = (1) x scalar(@q);
+
+ @q = qw( = == != );
+ @is_unexpected_equals{@q} = (1) x scalar(@q);
+
+ # We can always skip expensive length_function->() calls for these
+ # ascii token types
+ @q = qw#
+ b k L R ; ( { [ ? : ] } ) f t n v F p m pp mm
+ .. :: << >> ** && .. || // -> => += -= .= %= &= |= ^= *= <>
+ ( ) <= >= == =~ !~ != ++ -- /= x=
+ ... **= <<= >>= &&= ||= //= <=>
+ + - / * | % ! x ~ = \ ? : . < > ^ &
+ #;
+ push @q, ',';
+ @is_ascii_type{@q} = (1) x scalar(@q);
+
+} ## end BEGIN
+
+{ #<<< begin closure respace_tokens
- if ( $rOpts->{'sub-alias-list'} ) {
+my $rLL_new; # This will be the new array of tokens
+
+# These are variables in $self
+my $rLL;
+my $length_function;
+
+my $K_closing_ternary;
+my $K_opening_ternary;
+my $rchildren_of_seqno;
+my $rhas_broken_code_block;
+my $rhas_broken_list;
+my $rhas_broken_list_with_lec;
+my $rhas_code_block;
+my $rhas_list;
+my $rhas_ternary;
+my $ris_assigned_structure;
+my $ris_broken_container;
+my $ris_excluded_lp_container;
+my $ris_list_by_seqno;
+my $ris_permanently_broken;
+my $rlec_count_by_seqno;
+my $roverride_cab3;
+my $rparent_of_seqno;
+my $rtype_count_by_seqno;
+my $rblock_type_of_seqno;
+my $rwant_arrow_before_seqno;
+my $ris_sub_block;
+my $ris_asub_block;
+my $rseqno_arrow_call_chain_start;
+my $rarrow_call_chain;
- # Note that any 'sub-alias-list' has been preprocessed to
- # be a trimmed, space-separated list which includes 'sub'
- # for example, it might be 'sub method fun'
- my $sub_alias_list = $rOpts->{'sub-alias-list'};
- $sub_alias_list =~ s/\s+/\|/g;
- $SUB_PATTERN =~ s/sub/\($sub_alias_list\)/;
- $ASUB_PATTERN =~ s/sub/\($sub_alias_list\)/;
- }
- return;
-} ## end sub make_sub_matching_pattern
+my $K_opening_container;
+my $K_closing_container;
+my @K_sequenced_token_list;
+my @seqno_paren_arrow;
-sub make_bl_pattern {
+my %K_first_here_doc_by_seqno;
- # Set defaults lists to retain historical default behavior for -bl:
- my $bl_list_string = '*';
- my $bl_exclusion_list_string = 'sort map grep eval asub';
+my $last_nonblank_code_type;
+my $last_nonblank_code_token;
+my $last_nonblank_block_type;
+my $last_last_nonblank_code_type;
+my $last_last_nonblank_code_token;
+my $K_last_S;
+my $K_last_S_is_my;
- if ( defined( $rOpts->{'brace-left-list'} )
- && $rOpts->{'brace-left-list'} )
- {
- $bl_list_string = $rOpts->{'brace-left-list'};
- }
- if ( $bl_list_string =~ /\bsub\b/ ) {
- $rOpts->{'opening-sub-brace-on-new-line'} ||=
- $rOpts->{'opening-brace-on-new-line'};
- }
- if ( $bl_list_string =~ /\basub\b/ ) {
- $rOpts->{'opening-anonymous-sub-brace-on-new-line'} ||=
- $rOpts->{'opening-brace-on-new-line'};
- }
+my %seqno_stack;
+my %K_old_opening_by_seqno;
+my $depth_next;
+my $depth_next_max;
+my @sub_seqno_stack;
+my $current_sub_seqno;
- $bl_pattern = make_block_pattern( '-bll', $bl_list_string );
+my $cumulative_length;
- # for -bl, a list with '*' turns on -sbl and -asbl
- if ( $bl_pattern =~ /\.\*/ ) {
- $rOpts->{'opening-sub-brace-on-new-line'} ||=
- $rOpts->{'opening-brace-on-new-line'};
- $rOpts->{'opening-anonymous-sub-brace-on-new-line'} ||=
- $rOpts->{'opening-anonymous-brace-on-new-line'};
- }
+# Variables holding the current line info
+my $Ktoken_vars;
+my $Kfirst_old;
+my $Klast_old;
+my $Klast_old_code;
+my $CODE_type;
- if ( defined( $rOpts->{'brace-left-exclusion-list'} )
- && $rOpts->{'brace-left-exclusion-list'} )
- {
- $bl_exclusion_list_string = $rOpts->{'brace-left-exclusion-list'};
- if ( $bl_exclusion_list_string =~ /\bsub\b/ ) {
- $rOpts->{'opening-sub-brace-on-new-line'} = 0;
- }
- if ( $bl_exclusion_list_string =~ /\basub\b/ ) {
- $rOpts->{'opening-anonymous-sub-brace-on-new-line'} = 0;
- }
- }
+my $rwhitespace_flags;
- $bl_exclusion_pattern =
- make_block_pattern( '-blxl', $bl_exclusion_list_string );
- return;
-} ## end sub make_bl_pattern
+# new index K of package or class statements
+my $rK_package_list;
-sub make_bli_pattern {
+# new index K of @_ tokens
+my $rK_AT_underscore_by_sub_seqno;
- # default list of block types for which -bli would apply
- my $bli_list_string = 'if else elsif unless while for foreach do : sub';
- my $bli_exclusion_list_string = SPACE;
+# new index K of first $self tokens for each sub
+my $rK_first_self_by_sub_seqno;
- if ( defined( $rOpts->{'brace-left-and-indent-list'} )
- && $rOpts->{'brace-left-and-indent-list'} )
- {
- $bli_list_string = $rOpts->{'brace-left-and-indent-list'};
- }
+# new index K of first 'bless' for each sub
+my $rK_bless_by_sub_seqno;
- $bli_pattern = make_block_pattern( '-blil', $bli_list_string );
+# new index K of 'return' for each sub
+my $rK_return_by_sub_seqno;
- if ( defined( $rOpts->{'brace-left-and-indent-exclusion-list'} )
- && $rOpts->{'brace-left-and-indent-exclusion-list'} )
- {
- $bli_exclusion_list_string =
- $rOpts->{'brace-left-and-indent-exclusion-list'};
- }
- $bli_exclusion_pattern =
- make_block_pattern( '-blixl', $bli_exclusion_list_string );
- return;
-} ## end sub make_bli_pattern
+# new index K of 'wantarray' for each sub
+my $rK_wantarray_by_sub_seqno;
-sub make_keyword_group_list_pattern {
+# info about list of sub call args
+my $rsub_call_paren_info_by_seqno;
+my $rDOLLAR_underscore_by_sub_seqno;
- # turn any input list into a regex for recognizing selected block types.
- # Here are the defaults:
- $keyword_group_list_pattern = '^(our|local|my|use|require|)$';
- $keyword_group_list_comment_pattern = EMPTY_STRING;
- if ( defined( $rOpts->{'keyword-group-blanks-list'} )
- && $rOpts->{'keyword-group-blanks-list'} )
- {
- my @words = split /\s+/, $rOpts->{'keyword-group-blanks-list'};
- my @keyword_list;
- my @comment_list;
- foreach my $word (@words) {
- if ( $word eq 'BC' || $word eq 'SBC' ) {
- push @comment_list, $word;
- if ( $word eq 'SBC' ) { push @comment_list, 'SBCX' }
- }
- else {
- push @keyword_list, $word;
- }
- }
- $keyword_group_list_pattern =
- make_block_pattern( '-kgbl', $rOpts->{'keyword-group-blanks-list'} );
- $keyword_group_list_comment_pattern =
- make_block_pattern( '-kgbl', join( SPACE, @comment_list ) );
- }
- return;
-} ## end sub make_keyword_group_list_pattern
+# index K of the preceding 'S' token for a sub
+my $rK_sub_by_seqno;
-sub make_block_brace_vertical_tightness_pattern {
+# true for a 'my' sub
+my $ris_my_sub_by_seqno;
- # turn any input list into a regex for recognizing selected block types
- $block_brace_vertical_tightness_pattern =
- '^((if|else|elsif|unless|while|for|foreach|do|\w+:)$|sub)';
- if ( defined( $rOpts->{'block-brace-vertical-tightness-list'} )
- && $rOpts->{'block-brace-vertical-tightness-list'} )
- {
- $block_brace_vertical_tightness_pattern =
- make_block_pattern( '-bbvtl',
- $rOpts->{'block-brace-vertical-tightness-list'} );
- }
- return;
-} ## end sub make_block_brace_vertical_tightness_pattern
+sub initialize_respace_tokens_closure {
-sub make_blank_line_pattern {
+ my ($self) = @_;
- $blank_lines_before_closing_block_pattern = $SUB_PATTERN;
- my $key = 'blank-lines-before-closing-block-list';
- if ( defined( $rOpts->{$key} ) && $rOpts->{$key} ) {
- $blank_lines_before_closing_block_pattern =
- make_block_pattern( '-blbcl', $rOpts->{$key} );
- }
+ $rLL_new = []; # This is the new array
- $blank_lines_after_opening_block_pattern = $SUB_PATTERN;
- $key = 'blank-lines-after-opening-block-list';
- if ( defined( $rOpts->{$key} ) && $rOpts->{$key} ) {
- $blank_lines_after_opening_block_pattern =
- make_block_pattern( '-blaol', $rOpts->{$key} );
- }
- return;
-} ## end sub make_blank_line_pattern
+ $rLL = $self->[_rLL_];
-sub make_block_pattern {
+ $length_function = $self->[_length_function_];
+ $K_closing_ternary = $self->[_K_closing_ternary_];
+ $K_opening_ternary = $self->[_K_opening_ternary_];
+ $rchildren_of_seqno = $self->[_rchildren_of_seqno_];
+ $rhas_broken_code_block = $self->[_rhas_broken_code_block_];
+ $rhas_broken_list = $self->[_rhas_broken_list_];
+ $rhas_broken_list_with_lec = $self->[_rhas_broken_list_with_lec_];
+ $rhas_code_block = $self->[_rhas_code_block_];
+ $rhas_list = $self->[_rhas_list_];
+ $rhas_ternary = $self->[_rhas_ternary_];
+ $ris_assigned_structure = $self->[_ris_assigned_structure_];
+ $ris_broken_container = $self->[_ris_broken_container_];
+ $ris_excluded_lp_container = $self->[_ris_excluded_lp_container_];
+ $ris_list_by_seqno = $self->[_ris_list_by_seqno_];
+ $ris_permanently_broken = $self->[_ris_permanently_broken_];
+ $rlec_count_by_seqno = $self->[_rlec_count_by_seqno_];
+ $roverride_cab3 = $self->[_roverride_cab3_];
+ $rparent_of_seqno = $self->[_rparent_of_seqno_];
+ $rtype_count_by_seqno = $self->[_rtype_count_by_seqno_];
+ $rblock_type_of_seqno = $self->[_rblock_type_of_seqno_];
+ $rwant_arrow_before_seqno = $self->[_rwant_arrow_before_seqno_];
+ $ris_sub_block = $self->[_ris_sub_block_];
+ $ris_asub_block = $self->[_ris_asub_block_];
+
+ $rK_package_list = $self->[_rK_package_list_];
+ $rK_AT_underscore_by_sub_seqno = $self->[_rK_AT_underscore_by_sub_seqno_];
+ $rK_first_self_by_sub_seqno = $self->[_rK_first_self_by_sub_seqno_];
+ $rK_bless_by_sub_seqno = $self->[_rK_bless_by_sub_seqno_];
+ $rK_return_by_sub_seqno = $self->[_rK_return_by_sub_seqno_];
+ $rK_wantarray_by_sub_seqno = $self->[_rK_wantarray_by_sub_seqno_];
+ $rsub_call_paren_info_by_seqno = $self->[_rsub_call_paren_info_by_seqno_];
+ $rseqno_arrow_call_chain_start = $self->[_rseqno_arrow_call_chain_start_];
+ $rarrow_call_chain = $self->[_rarrow_call_chain_];
+ $rDOLLAR_underscore_by_sub_seqno =
+ $self->[_rDOLLAR_underscore_by_sub_seqno_];
+ $rK_sub_by_seqno = $self->[_rK_sub_by_seqno_];
+ $ris_my_sub_by_seqno = $self->[_ris_my_sub_by_seqno_];
- # given a string of block-type keywords, return a regex to match them
- # The only tricky part is that labels are indicated with a single ':'
- # and the 'sub' token text may have additional text after it (name of
- # sub).
- #
- # Example:
- #
- # input string: "if else elsif unless while for foreach do : sub";
- # pattern: '^((if|else|elsif|unless|while|for|foreach|do|\w+:)$|sub)';
+ %K_first_here_doc_by_seqno = ();
- # Minor Update:
- #
- # To distinguish between anonymous subs and named subs, use 'sub' to
- # indicate a named sub, and 'asub' to indicate an anonymous sub
+ $last_nonblank_code_type = ';';
+ $last_nonblank_code_token = ';';
+ $last_nonblank_block_type = EMPTY_STRING;
+ $last_last_nonblank_code_type = ';';
+ $last_last_nonblank_code_token = ';';
+ $K_last_S = 1;
+ $K_last_S_is_my = undef;
- my ( $abbrev, $string ) = @_;
- my @list = split_words($string);
- my @words = ();
- my %seen;
- for my $i (@list) {
- if ( $i eq '*' ) { my $pattern = '^.*'; return $pattern }
- next if $seen{$i};
- $seen{$i} = 1;
- if ( $i eq 'sub' ) {
- }
- elsif ( $i eq 'asub' ) {
- }
- elsif ( $i eq ';' ) {
- push @words, ';';
- }
- elsif ( $i eq '{' ) {
- push @words, '\{';
- }
- elsif ( $i eq ':' ) {
- push @words, '\w+:';
- }
- elsif ( $i =~ /^\w/ ) {
- push @words, $i;
- }
- else {
- Warn("unrecognized block type $i after $abbrev, ignoring\n");
- }
- }
+ %seqno_stack = ();
+ %K_old_opening_by_seqno = (); # Note: old K index
+ $depth_next = 0;
+ $depth_next_max = 0;
- # Fix 2 for c091, prevent the pattern from matching an empty string
- # '1 ' is an impossible block name.
- if ( !@words ) { push @words, "1 " }
+ @sub_seqno_stack = ();
+ $current_sub_seqno = 0;
- my $pattern = '(' . join( '|', @words ) . ')$';
- my $sub_patterns = EMPTY_STRING;
- if ( $seen{'sub'} ) {
- $sub_patterns .= '|' . $SUB_PATTERN;
- }
- if ( $seen{'asub'} ) {
- $sub_patterns .= '|' . $ASUB_PATTERN;
- }
- if ($sub_patterns) {
- $pattern = '(' . $pattern . $sub_patterns . ')';
- }
- $pattern = '^' . $pattern;
- return $pattern;
-} ## end sub make_block_pattern
+ # we will be setting token lengths as we go
+ $cumulative_length = 0;
-sub make_static_side_comment_pattern {
+ $Ktoken_vars = undef; # the old K value of $rtoken_vars
+ $Kfirst_old = undef; # min K of old line
+ $Klast_old = undef; # max K of old line
+ $Klast_old_code = undef; # K of last token if side comment
+ $CODE_type = EMPTY_STRING;
- # create the pattern used to identify static side comments
- $static_side_comment_pattern = '^##';
+ # Set the whitespace flags, which indicate the token spacing preference.
+ $rwhitespace_flags = $self->set_whitespace_flags();
- # allow the user to change it
- if ( $rOpts->{'static-side-comment-prefix'} ) {
- my $prefix = $rOpts->{'static-side-comment-prefix'};
- $prefix =~ s/^\s*//;
- my $pattern = '^' . $prefix;
- if ( bad_pattern($pattern) ) {
- Die(
-"ERROR: the -sscp prefix '$prefix' causes the invalid regex '$pattern'\n"
- );
- }
- $static_side_comment_pattern = $pattern;
- }
- return;
-} ## end sub make_static_side_comment_pattern
+ # Note that $K_opening_container and $K_closing_container have values
+ # defined in sub get_line() for the previous K indexes. They were needed
+ # in case option 'indent-only' was set, and we didn't get here. We no
+ # longer need those and will eliminate them now to avoid any possible
+ # mixing of old and new values. This must be done AFTER the call to
+ # set_whitespace_flags, which needs these.
+ $K_opening_container = $self->[_K_opening_container_] = {};
+ $K_closing_container = $self->[_K_closing_container_] = {};
-sub make_closing_side_comment_prefix {
+ @K_sequenced_token_list = ();
- # Be sure we have a valid closing side comment prefix
- my $csc_prefix = $rOpts->{'closing-side-comment-prefix'};
- my $csc_prefix_pattern;
- if ( !defined($csc_prefix) ) {
- $csc_prefix = '## end';
- $csc_prefix_pattern = '^##\s+end';
- }
- else {
- my $test_csc_prefix = $csc_prefix;
- if ( $test_csc_prefix !~ /^#/ ) {
- $test_csc_prefix = '#' . $test_csc_prefix;
- }
+ # array for saving seqno's of ')->' for possible line breaks, git #171
+ @seqno_paren_arrow = ();
- # make a regex to recognize the prefix
- my $test_csc_prefix_pattern = $test_csc_prefix;
+ return;
- # escape any special characters
- $test_csc_prefix_pattern =~ s/([^#\s\w])/\\$1/g;
+} ## end sub initialize_respace_tokens_closure
- $test_csc_prefix_pattern = '^' . $test_csc_prefix_pattern;
+sub respace_tokens {
- # allow exact number of intermediate spaces to vary
- $test_csc_prefix_pattern =~ s/\s+/\\s\+/g;
+ my $self = shift;
- # make sure we have a good pattern
- # if we fail this we probably have an error in escaping
- # characters.
+ # This routine is called once per file to do as much formatting as possible
+ # before new line breaks are set.
- if ( bad_pattern($test_csc_prefix_pattern) ) {
+ # Returns:
+ # $severe_error = true if processing must terminate immediately
+ # $rqw_lines = ref to list of lines with qw quotes (for -qwaf)
+ my ( $severe_error, $rqw_lines );
- # shouldn't happen..must have screwed up escaping, above
- if (DEVEL_MODE) {
- Fault(<<EOM);
-Program Error: the -cscp prefix '$csc_prefix' caused the invalid regex '$csc_prefix_pattern'
-EOM
- }
+ # We do not change any spaces in --indent-only mode
+ if ( $rOpts->{'indent-only'} ) {
- # just warn and keep going with defaults
- Warn(
-"Error: the -cscp prefix '$csc_prefix' caused the invalid regex '$csc_prefix_pattern'\n"
- );
- Warn("Please consider using a simpler -cscp prefix\n");
- Warn("Using default -cscp instead; please check output\n");
- }
- else {
- $csc_prefix = $test_csc_prefix;
- $csc_prefix_pattern = $test_csc_prefix_pattern;
+ # We need to define lengths for -indent-only to avoid undefs, even
+ # though these values are not actually needed for option --indent-only.
+
+ $rLL = $self->[_rLL_];
+ $cumulative_length = 0;
+
+ foreach my $item ( @{$rLL} ) {
+ my $token = $item->[_TOKEN_];
+ my $token_length =
+ $length_function ? $length_function->($token) : length($token);
+ $cumulative_length += $token_length;
+ $item->[_TOKEN_LENGTH_] = $token_length;
+ $item->[_CUMULATIVE_LENGTH_] = $cumulative_length;
}
+
+ return ( $severe_error, $rqw_lines );
}
- $rOpts->{'closing-side-comment-prefix'} = $csc_prefix;
- $closing_side_comment_prefix_pattern = $csc_prefix_pattern;
- return;
-} ## end sub make_closing_side_comment_prefix
-##################################################
-# CODE SECTION 4: receive lines from the tokenizer
-##################################################
+ # This routine makes all necessary and possible changes to the tokenization
+ # after the initial tokenization of the file. This is a tedious routine,
+ # but basically it consists of inserting and deleting whitespace between
+ # nonblank tokens according to the selected parameters. In a few cases
+ # non-space characters are added, deleted or modified.
-{ ## begin closure write_line
+ # The goal of this routine is to create a new token array which only needs
+ # the definition of new line breaks and padding to complete formatting. In
+ # a few cases we have to cheat a little to achieve this goal. In
+ # particular, we may not know if a semicolon will be needed, because it
+ # depends on how the line breaks go. To handle this, we include the
+ # semicolon as a 'phantom' which can be displayed as normal or as an empty
+ # string.
- my $nesting_depth;
+ # Method: The old tokens are copied one-by-one, with changes, from the old
+ # linear storage array $rLL to a new array $rLL_new.
- # Variables used by sub check_sequence_numbers:
- my $last_seqno;
- my %saw_opening_seqno;
- my %saw_closing_seqno;
- my $initial_seqno;
+ # (re-)initialize closure variables for this problem
+ $self->initialize_respace_tokens_closure();
- sub initialize_write_line {
+ #--------------------------------
+ # Main over all lines of the file
+ #--------------------------------
+ my $rlines = $self->[_rlines_];
+ my $line_type = EMPTY_STRING;
+ my $last_K_out;
- $nesting_depth = undef;
+ foreach my $line_of_tokens ( @{$rlines} ) {
- $last_seqno = SEQ_ROOT;
- %saw_opening_seqno = ();
- %saw_closing_seqno = ();
+ my $input_line_number = $line_of_tokens->{_line_number};
+ my $last_line_type = $line_type;
+ $line_type = $line_of_tokens->{_line_type};
+ next unless ( $line_type eq 'CODE' );
+ $CODE_type = $line_of_tokens->{_code_type};
- return;
- } ## end sub initialize_write_line
+ if ( $CODE_type eq 'BL' ) {
+ my $seqno = $seqno_stack{ $depth_next - 1 };
+ if ( defined($seqno) ) {
+ $self->[_rblank_and_comment_count_]->{$seqno} += 1;
+ if ( !$ris_permanently_broken->{$seqno}
+ && $rOpts_maximum_consecutive_blank_lines )
+ {
+ $ris_permanently_broken->{$seqno} = 1;
+ $self->mark_parent_containers( $seqno,
+ $ris_permanently_broken );
+ }
+ }
+ }
- sub check_sequence_numbers {
+ my $rK_range = $line_of_tokens->{_rK_range};
+ my ( $Kfirst, $Klast ) = @{$rK_range};
+ next unless defined($Kfirst);
+ ( $Kfirst_old, $Klast_old ) = ( $Kfirst, $Klast );
+ $Klast_old_code = $Klast_old;
- # Routine for checking sequence numbers. This only needs to be
- # done occasionally in DEVEL_MODE to be sure everything is working
- # correctly.
- my ( $rtokens, $rtoken_type, $rtype_sequence, $input_line_no ) = @_;
- my $jmax = @{$rtokens} - 1;
- return unless ( $jmax >= 0 );
- foreach my $j ( 0 .. $jmax ) {
- my $seqno = $rtype_sequence->[$j];
- my $token = $rtokens->[$j];
- my $type = $rtoken_type->[$j];
- $seqno = EMPTY_STRING unless ( defined($seqno) );
- my $err_msg =
-"Error at j=$j, line number $input_line_no, seqno='$seqno', type='$type', tok='$token':\n";
+ # Be sure an old K value is defined for sub store_token
+ $Ktoken_vars = $Kfirst;
- if ( !$seqno ) {
+ # Check for correct sequence of token indexes...
+ # An error here means that sub write_line() did not correctly
+ # package the tokenized lines as it received them. If we
+ # get a fault here it has not output a continuous sequence
+ # of K values. Or a line of CODE may have been mis-marked as
+ # something else. There is no good way to continue after such an
+ # error.
+ if ( defined($last_K_out) ) {
+ if ( $Kfirst != $last_K_out + 1 ) {
+ Fault_Warn(
+ "Program Bug: last K out was $last_K_out but Kfirst=$Kfirst"
+ );
+ $severe_error = 1;
+ return ( $severe_error, $rqw_lines );
+ }
+ }
+ else {
- # Sequence numbers are generated for opening tokens, so every opening
- # token should be sequenced. Closing tokens will be unsequenced
- # if they do not have a matching opening token.
- if ( $is_opening_sequence_token{$token}
- && $type ne 'q'
- && $type ne 'Q' )
- {
- Fault(
- <<EOM
-$err_msg Unexpected opening token without sequence number
-EOM
- );
- }
+ # The first token should always have been given index 0 by sub
+ # write_line()
+ if ( $Kfirst != 0 ) {
+ Fault("Program Bug: first K is $Kfirst but should be 0");
}
- else {
+ }
+ $last_K_out = $Klast;
- # Save starting seqno to identify sequence method:
- # New method starts with 2 and has continuous numbering
- # Old method starts with >2 and may have gaps
- if ( !defined($initial_seqno) ) { $initial_seqno = $seqno }
+ # Handle special lines of code
+ if ( $CODE_type && $CODE_type ne 'NIN' && $CODE_type ne 'VER' ) {
- if ( $is_opening_sequence_token{$token} ) {
+ # CODE_types are as follows.
+ # 'BL' = Blank Line
+ # 'VB' = Verbatim - line goes out verbatim
+ # 'FS' = Format Skipping - line goes out verbatim, no blanks
+ # 'IO' = Indent Only - only indentation may be changed
+ # 'NIN' = No Internal Newlines - line does not get broken
+ # 'HSC'=Hanging Side Comment - fix this hanging side comment
+ # 'BC'=Block Comment - an ordinary full line comment
+ # 'SBC'=Static Block Comment - a block comment which does not get
+ # indented
+ # 'SBCX'=Static Block Comment Without Leading Space
+ # 'VER'=VERSION statement
+ # '' or (undefined) - no restrictions
- # New method should have continuous numbering
- if ( $initial_seqno == 2 && $seqno != $last_seqno + 1 ) {
- Fault(
- <<EOM
-$err_msg Unexpected opening sequence number: previous seqno=$last_seqno, but seqno= $seqno
-EOM
- );
- }
- $last_seqno = $seqno;
+ # Copy tokens unchanged
+ foreach my $KK ( $Kfirst .. $Klast ) {
+ $Ktoken_vars = $KK;
+ $self->store_token( $rLL->[$KK] );
+ }
+ next;
+ }
- # Numbers must be unique
- if ( $saw_opening_seqno{$seqno} ) {
- my $lno = $saw_opening_seqno{$seqno};
- Fault(
- <<EOM
-$err_msg Already saw an opening tokens at line $lno with this sequence number
-EOM
- );
- }
- $saw_opening_seqno{$seqno} = $input_line_no;
- }
+ # Handle normal line..
- # only one closing item per seqno
- elsif ( $is_closing_sequence_token{$token} ) {
- if ( $saw_closing_seqno{$seqno} ) {
- my $lno = $saw_closing_seqno{$seqno};
- Fault(
- <<EOM
-$err_msg Already saw a closing token with this seqno at line $lno
-EOM
- );
- }
- $saw_closing_seqno{$seqno} = $input_line_no;
+ # Define index of last token before any side comment for comma counts
+ my $type_end = $rLL->[$Klast_old_code]->[_TYPE_];
+ if ( ( $type_end eq '#' || $type_end eq 'b' )
+ && $Klast_old_code > $Kfirst_old )
+ {
+ $Klast_old_code--;
+ if ( $rLL->[$Klast_old_code]->[_TYPE_] eq 'b'
+ && $Klast_old_code > $Kfirst_old )
+ {
+ $Klast_old_code--;
+ }
+ }
- # Every closing seqno must have an opening seqno
- if ( !$saw_opening_seqno{$seqno} ) {
- Fault(
- <<EOM
-$err_msg Saw a closing token but no opening token with this seqno
-EOM
- );
- }
- }
+ # Insert any essential whitespace between lines
+ # if last line was normal CODE.
+ # Patch for rt #125012: use K_previous_code rather than '_nonblank'
+ # because comments may disappear.
+ # Note that we must do this even if --noadd-whitespace is set
+ if ( $last_line_type eq 'CODE' ) {
+ if (
+ is_essential_whitespace(
+ $last_last_nonblank_code_token,
+ $last_last_nonblank_code_type,
+ $last_nonblank_code_token,
+ $last_nonblank_code_type,
+ $rLL->[$Kfirst]->[_TOKEN_],
+ $rLL->[$Kfirst]->[_TYPE_],
+ )
+ )
+ {
+ $self->store_token();
+ }
+ }
- # Sequenced items must be opening or closing
- else {
- Fault(
- <<EOM
-$err_msg Unexpected token type with a sequence number
-EOM
- );
- }
+ #-----------------------------------------------
+ # Inner loop to respace tokens on a line of code
+ #-----------------------------------------------
+
+ # The inner loop is in a separate sub for clarity
+ $self->respace_tokens_inner_loop( $Kfirst, $Klast, $input_line_number );
+
+ if ( $line_of_tokens->{_ending_in_quote} ) {
+ my $seqno = $seqno_stack{ $depth_next - 1 };
+ if ( defined($seqno) ) {
+ $ris_permanently_broken->{$seqno} = 1;
+ $self->mark_parent_containers( $seqno,
+ $ris_permanently_broken );
}
}
- return;
- } ## end sub check_sequence_numbers
+ } # End line loop
- sub store_block_type {
- my ( $self, $block_type, $seqno ) = @_;
+ # finalize data structures
+ $self->respace_post_loop_ops();
- return if ( !$block_type );
+ # Reset memory to be the new array
+ $self->[_rLL_] = $rLL_new;
+ my $Klimit;
+ if ( @{$rLL_new} ) { $Klimit = @{$rLL_new} - 1 }
+ $self->[_Klimit_] = $Klimit;
- $self->[_rblock_type_of_seqno_]->{$seqno} = $block_type;
+ # During development, verify that the new array still looks okay.
+ DEVEL_MODE && $self->check_token_array();
- if ( $block_type =~ /$ASUB_PATTERN/ ) {
- $self->[_ris_asub_block_]->{$seqno} = 1;
- }
- elsif ( $block_type =~ /$SUB_PATTERN/ ) {
- $self->[_ris_sub_block_]->{$seqno} = 1;
- }
- return;
- } ## end sub store_block_type
+ # update the token limits of each line
+ ( $severe_error, $rqw_lines ) = $self->resync_lines_and_tokens();
- sub write_line {
+ return ( $severe_error, $rqw_lines );
+} ## end sub respace_tokens
- # This routine receives lines one-by-one from the tokenizer and stores
- # them in a format suitable for further processing. After the last
- # line has been sent, the tokenizer will call sub 'finish_formatting'
- # to do the actual formatting.
+sub respace_tokens_inner_loop {
- my ( $self, $line_of_tokens_old ) = @_;
+ my ( $self, $Kfirst, $Klast, $input_line_number ) = @_;
- my $rLL = $self->[_rLL_];
- my $line_of_tokens = {};
- foreach (
- qw(
- _curly_brace_depth
- _ending_in_quote
- _guessed_indentation_level
- _line_number
- _line_text
- _line_type
- _paren_depth
- _quote_character
- _square_bracket_depth
- _starting_in_quote
- )
- )
- {
- $line_of_tokens->{$_} = $line_of_tokens_old->{$_};
- }
+ # Loop to copy all tokens on one line, making any spacing changes,
+ # while also collecting information needed by later subs.
- my $line_type = $line_of_tokens_old->{_line_type};
- my $tee_output;
+ # Given:
+ # $Kfirst = index of first token on this line
+ # $Klast = index of last token on this line
+ # $input_line_number = number of this line in input stream
- my $Klimit = $self->[_Klimit_];
- my $Kfirst;
+ my $type;
+ foreach my $KK ( $Kfirst .. $Klast ) {
- # Handle line of non-code
- if ( $line_type ne 'CODE' ) {
- $tee_output ||= $rOpts_tee_pod
- && substr( $line_type, 0, 3 ) eq 'POD';
+ # Update closure variable needed by sub store_token
+ $Ktoken_vars = $KK;
- $line_of_tokens->{_level_0} = 0;
- $line_of_tokens->{_ci_level_0} = 0;
- $line_of_tokens->{_nesting_blocks_0} = EMPTY_STRING;
- $line_of_tokens->{_nesting_tokens_0} = EMPTY_STRING;
- $line_of_tokens->{_ended_in_blank_token} = undef;
+ my $rtoken_vars = $rLL->[$KK];
+
+ # Handle a blank space ...
+ if ( ( $type = $rtoken_vars->[_TYPE_] ) eq 'b' ) {
+
+ # Delete it if not wanted by whitespace rules
+ # or we are deleting all whitespace
+ # Note that whitespace flag is a flag indicating whether a
+ # white space BEFORE the token is needed
+ next if ( $KK >= $Klast ); # skip terminal blank
+ my $Knext = $KK + 1;
+
+ if ($rOpts_freeze_whitespace) {
+ $self->store_token($rtoken_vars);
+ next;
+ }
+
+ my $ws = $rwhitespace_flags->[$Knext];
+ if ( $ws == WS_NO
+ || $rOpts_delete_old_whitespace )
+ {
+
+ my $token_next = $rLL->[$Knext]->[_TOKEN_];
+ my $type_next = $rLL->[$Knext]->[_TYPE_];
+
+ my $do_not_delete = is_essential_whitespace(
+ $last_last_nonblank_code_token,
+ $last_last_nonblank_code_type,
+ $last_nonblank_code_token,
+ $last_nonblank_code_type,
+ $token_next,
+ $type_next,
+ );
+
+ # Note that repeated blanks will get filtered out here
+ next unless ($do_not_delete);
+ }
+ # make it just one character
+ $rtoken_vars->[_TOKEN_] = SPACE;
+ $self->store_token($rtoken_vars);
+ next;
}
- # Handle line of code
- else {
+ my $token = $rtoken_vars->[_TOKEN_];
- my $rtokens = $line_of_tokens_old->{_rtokens};
- my $jmax = @{$rtokens} - 1;
+ # Handle a sequenced token ... i.e. one of ( ) { } [ ] ? :
+ if ( $rtoken_vars->[_TYPE_SEQUENCE_] ) {
- if ( $jmax >= 0 ) {
+ # One of ) ] } ...
+ if ( $is_closing_token{$token} ) {
- $Kfirst = defined($Klimit) ? $Klimit + 1 : 0;
+ my $type_sequence = $rtoken_vars->[_TYPE_SEQUENCE_];
+ my $block_type = $rblock_type_of_seqno->{$type_sequence};
- #----------------------------
- # get the tokens on this line
- #----------------------------
- $self->write_line_inner_loop( $line_of_tokens_old,
- $line_of_tokens );
+ #---------------------------------------------
+ # check for semicolon addition in a code block
+ #---------------------------------------------
+ if ($block_type) {
- # update Klimit for added tokens
- $Klimit = @{$rLL} - 1;
+ # if not preceded by a ';' ..
+ if ( $last_nonblank_code_type ne ';' ) {
- } ## end if ( $jmax >= 0 )
- else {
+ # tentatively insert a semicolon if appropriate
+ $self->add_phantom_semicolon($KK)
+ if $rOpts->{'add-semicolons'};
+ }
- # blank line
- $line_of_tokens->{_level_0} = 0;
- $line_of_tokens->{_ci_level_0} = 0;
- $line_of_tokens->{_nesting_blocks_0} = EMPTY_STRING;
- $line_of_tokens->{_nesting_tokens_0} = EMPTY_STRING;
- $line_of_tokens->{_ended_in_blank_token} = undef;
+ if ( $ris_sub_block->{$type_sequence}
+ || $ris_asub_block->{$type_sequence} )
+ {
+ $current_sub_seqno = pop @sub_seqno_stack;
+ }
+ }
- }
+ #----------------------------------------------------------
+ # check for addition/deletion of a trailing comma in a list
+ #----------------------------------------------------------
+ else {
- $tee_output ||=
- $rOpts_tee_block_comments
- && $jmax == 0
- && $rLL->[$Kfirst]->[_TYPE_] eq '#';
+ # if this looks like a list ..
+ my $rtype_count = $rtype_count_by_seqno->{$type_sequence};
+ if ( !$rtype_count
+ || !$rtype_count->{';'} && !$rtype_count->{'f'} )
+ {
- $tee_output ||=
- $rOpts_tee_side_comments
- && defined($Kfirst)
- && $Klimit > $Kfirst
- && $rLL->[$Klimit]->[_TYPE_] eq '#';
+ # if NOT preceded by a comma..
+ if ( $last_nonblank_code_type ne ',' ) {
- } ## end if ( $line_type eq 'CODE')
+ # insert a comma if requested
+ if (
+ $rOpts_add_trailing_commas
+ && %trailing_comma_rules
- # Finish storing line variables
- $line_of_tokens->{_rK_range} = [ $Kfirst, $Klimit ];
- $self->[_Klimit_] = $Klimit;
- my $rlines = $self->[_rlines_];
- push @{$rlines}, $line_of_tokens;
+ # and...
+ && (
- if ($tee_output) {
- my $fh_tee = $self->[_fh_tee_];
- my $line_text = $line_of_tokens_old->{_line_text};
- $fh_tee->print($line_text) if ($fh_tee);
- }
+ # ... there is a comma or fat_comma
+ $rtype_count
+ && ( $rtype_count->{','}
+ || $rtype_count->{'=>'} )
+
+ # ... or exception for nested container
+ || (
+ $rOpts_add_lone_trailing_commas
+ && $is_closing_type{
+ $last_nonblank_code_type}
+ )
+ )
- return;
- } ## end sub write_line
+ # and not preceded by '=>'
+ # (unusual but can occur in test files)
+ && $last_nonblank_code_type ne '=>'
+ )
+ {
+ my $rule = $trailing_comma_rules{add};
+ if ( $rule && $rule->{$token} ) {
+ $self->add_trailing_comma( $KK, $Kfirst,
+ $rule->{$token} );
+ }
+ }
+ }
- sub write_line_inner_loop {
- my ( $self, $line_of_tokens_old, $line_of_tokens ) = @_;
+ # if preceded by a comma ..
+ else {
- #---------------------------------------------------------------------
- # Copy the tokens on one line received from the tokenizer to their new
- # storage locations.
- #---------------------------------------------------------------------
+ # delete a trailing comma if requested
+ my $deleted;
+ if (
+ $rOpts_delete_trailing_commas
+ && %trailing_comma_rules
+ && $rtype_count
+ && $rtype_count->{','}
+ && ( $rOpts_delete_lone_trailing_commas
+ || $rtype_count->{','} > 1
+ || $rtype_count->{'=>'} )
+
+ # ignore zero-size qw commas
+ && $last_nonblank_code_token
+ )
+ {
+ my $rule = $trailing_comma_rules{delete};
+ if ( $rule && $rule->{$token} ) {
+ $deleted =
+ $self->delete_trailing_comma( $KK,
+ $Kfirst, $rule->{$token} );
+ }
+ }
- # Input parameters:
- # $line_of_tokens_old = line received from tokenizer
- # $line_of_tokens = line of tokens being formed for formatter
+ # delete a weld-interfering comma if requested
+ if ( !$deleted
+ && $rOpts_delete_weld_interfering_commas
+ && $is_closing_type{
+ $last_last_nonblank_code_type} )
+ {
+ $self->delete_weld_interfering_comma($KK);
+ }
+ }
+ }
+ }
+ }
- my $rtokens = $line_of_tokens_old->{_rtokens};
- my $jmax = @{$rtokens} - 1;
- if ( $jmax < 0 ) {
+ # Opening container
+ else {
+ my $type_sequence = $rtoken_vars->[_TYPE_SEQUENCE_];
+ if ( $rwant_arrow_before_seqno->{$type_sequence} ) {
- # safety check; shouldn't happen
- DEVEL_MODE && Fault("unexpected jmax=$jmax\n");
- return;
- }
+ # +1 means add -1 means delete previous arrow
+ if ( $rwant_arrow_before_seqno->{$type_sequence} > 0 ) {
+ $self->add_interbracket_arrow();
+ }
+ else {
+ $self->delete_interbracket_arrow();
+ $rwhitespace_flags->[$KK] = WS_NO;
+ }
+ }
- my $line_index = $line_of_tokens_old->{_line_number} - 1;
- my $rtoken_type = $line_of_tokens_old->{_rtoken_type};
- my $rblock_type = $line_of_tokens_old->{_rblock_type};
- my $rtype_sequence = $line_of_tokens_old->{_rtype_sequence};
- my $rlevels = $line_of_tokens_old->{_rlevels};
- my $rci_levels = $line_of_tokens_old->{_rci_levels};
+ # Save info for sub call arg count check
+ if ( $token eq '(' ) {
+ if (
- my $rLL = $self->[_rLL_];
- my $rSS = $self->[_rSS_];
- my $rdepth_of_opening_seqno = $self->[_rdepth_of_opening_seqno_];
+ # function(
+ $last_nonblank_code_type eq 'U'
+ || $last_nonblank_code_type eq 'w'
- DEVEL_MODE
- && check_sequence_numbers( $rtokens, $rtoken_type,
- $rtype_sequence, $line_index + 1 );
+ # ->function(
+ || ( $last_nonblank_code_type eq 'i'
+ && $last_last_nonblank_code_type eq '->' )
- # Find the starting nesting depth ...
- # It must be the value of variable 'level' of the first token
- # because the nesting depth is used as a token tag in the
- # vertical aligner and is compared to actual levels.
- # So vertical alignment problems will occur with any other
- # starting value.
- if ( !defined($nesting_depth) ) {
- $nesting_depth = $rlevels->[0];
- $nesting_depth = 0 if ( $nesting_depth < 0 );
- $rdepth_of_opening_seqno->[SEQ_ROOT] = $nesting_depth - 1;
- }
+ # &function(
+ || ( $last_nonblank_code_type eq 'i'
+ && substr( $last_nonblank_code_token, 0, 1 ) eq
+ '&' )
+ )
+ {
+ $rsub_call_paren_info_by_seqno->{$type_sequence} = {
+ type_mm => $last_last_nonblank_code_type,
+ token_m => $last_nonblank_code_token,
+ };
+ }
+ }
+
+ # At a sub block, save info to cross check arg counts
+ elsif ( $ris_sub_block->{$type_sequence} ) {
+ $rK_sub_by_seqno->{$type_sequence} = $K_last_S;
+ if ($K_last_S_is_my) {
+ $ris_my_sub_by_seqno->{$type_sequence} = 1;
+ }
+ push @sub_seqno_stack, $current_sub_seqno;
+ $current_sub_seqno = $type_sequence;
+ }
+ elsif ( $ris_asub_block->{$type_sequence} ) {
+ push @sub_seqno_stack, $current_sub_seqno;
+ $current_sub_seqno = $type_sequence;
+ }
- my $j = -1;
+ # Look for '$_[' for mismatched arg checks
+ elsif ($token eq '['
+ && $last_nonblank_code_token eq '$_'
+ && $current_sub_seqno )
+ {
+ push
+ @{ $rDOLLAR_underscore_by_sub_seqno->{$current_sub_seqno}
+ },
+ $type_sequence;
+ }
+ else {
+ ## not a special opening token
+ }
+ }
+ }
- # NOTE: coding efficiency is critical in this loop over all tokens
- foreach my $token ( @{$rtokens} ) {
+ # Modify certain tokens here for whitespace
+ # The following is not yet done, but could be:
+ # sub (x x x)
+ # ( $type =~ /^[wit]$/ )
+ elsif ( $is_wit{$type} ) {
- # Do not clip the 'level' variable yet. We will do this
- # later, in sub 'store_token_to_go'. The reason is that in
- # files with level errors, the logic in 'weld_cuddled_else'
- # uses a stack logic that will give bad welds if we clip
- # levels here.
- ## $j++;
- ## if ( $rlevels->[$j] < 0 ) { $rlevels->[$j] = 0 }
+ # index() is several times faster than a regex test with \s here
+ ## $token =~ /\s/
+ if ( index( $token, SPACE ) > 0 || index( $token, "\t" ) > 0 ) {
- my $seqno = EMPTY_STRING;
+ # change '$ var' to '$var' etc
+ # change '@ ' to '@'
+ # Examples: <<snippets/space1.in>>
+ my $ord = ord( substr( $token, 1, 1 ) );
+ if (
- # Handle tokens with sequence numbers ...
- # note the ++ increment hidden here for efficiency
- if ( $rtype_sequence->[ ++$j ] ) {
- $seqno = $rtype_sequence->[$j];
- my $sign = 1;
- if ( $is_opening_token{$token} ) {
- $self->[_K_opening_container_]->{$seqno} = @{$rLL};
- $rdepth_of_opening_seqno->[$seqno] = $nesting_depth;
- $nesting_depth++;
+ # quick test for possible blank at second char
+ $ord > 0 && ( $ord < ORD_PRINTABLE_MIN
+ || $ord > ORD_PRINTABLE_MAX )
+ )
+ {
+ my ( $sigil, $word ) = split /\s+/, $token, 2;
- # Save a sequenced block type at its opening token.
- # Note that unsequenced block types can occur in
- # unbalanced code with errors but are ignored here.
- $self->store_block_type( $rblock_type->[$j], $seqno )
- if ( $rblock_type->[$j] );
+ # $sigil =~ /^[\$\&\%\*\@]$/ )
+ if ( $is_sigil{$sigil} ) {
+ $token = $sigil;
+ $token .= $word if ( defined($word) ); # fix c104
+ $rtoken_vars->[_TOKEN_] = $token;
+ }
}
- elsif ( $is_closing_token{$token} ) {
- # The opening depth should always be defined, and
- # it should equal $nesting_depth-1. To protect
- # against unforseen error conditions, however, we
- # will check this and fix things if necessary. For
- # a test case see issue c055.
- my $opening_depth = $rdepth_of_opening_seqno->[$seqno];
- if ( !defined($opening_depth) ) {
- $opening_depth = $nesting_depth - 1;
- $opening_depth = 0 if ( $opening_depth < 0 );
- $rdepth_of_opening_seqno->[$seqno] = $opening_depth;
+ # trim identifiers of trailing blanks which can occur
+ # under some unusual circumstances, such as if the
+ # identifier 'witch' has trailing blanks on input here:
+ #
+ # sub
+ # witch
+ # () # prototype may be on new line ...
+ # ...
+ my $ord_ch = ord( substr( $token, -1, 1 ) );
+ if (
- # This is not fatal but should not happen. The
- # tokenizer generates sequence numbers
- # incrementally upon encountering each new
- # opening token, so every positive sequence
- # number should correspond to an opening token.
- DEVEL_MODE && Fault(<<EOM);
-No opening token seen for closing token = '$token' at seq=$seqno at depth=$opening_depth
-EOM
- }
- $self->[_K_closing_container_]->{$seqno} = @{$rLL};
- $nesting_depth = $opening_depth;
- $sign = -1;
- }
- elsif ( $token eq '?' ) {
+ # quick check for possible ending space
+ $ord_ch > 0 && ( $ord_ch < ORD_PRINTABLE_MIN
+ || $ord_ch > ORD_PRINTABLE_MAX )
+ )
+ {
+ $token =~ s/\s+$//g;
+ $rtoken_vars->[_TOKEN_] = $token;
}
- elsif ( $token eq ':' ) {
- $sign = -1;
+ }
+ if ( $type eq 'i' ) {
+ if ( $token eq '@_' && $current_sub_seqno ) {
+
+ # remember the new K of this @_; this may be
+ # off by 1 if a blank gets inserted before it
+ push
+ @{ $rK_AT_underscore_by_sub_seqno->{$current_sub_seqno} },
+ scalar( @{$rLL_new} );
}
- # The only sequenced types output by the tokenizer are
- # the opening & closing containers and the ternary
- # types. So we would only get here if the tokenizer has
- # been changed to mark some other tokens with sequence
- # numbers, or if an error has been introduced in a
- # hash such as %is_opening_container
- else {
- DEVEL_MODE && Fault(<<EOM);
-Unexpected sequenced token '$token' of type '$rtoken_type->[$j]', sequence=$seqno arrived from tokenizer.
-Expecting only opening or closing container tokens or ternary tokens with sequence numbers.
-EOM
+ # Remember new K of the first '$self' in a sub for -dma option
+ if ( $token eq '$self' && $current_sub_seqno ) {
+ $rK_first_self_by_sub_seqno->{$current_sub_seqno} ||=
+ scalar( @{$rLL_new} );
}
- if ( $sign > 0 ) {
- $self->[_Iss_opening_]->[$seqno] = @{$rSS};
+ # Remember new K and name of blessed objects for -dma option
+ if (
+ (
+ $last_nonblank_code_token eq 'bless'
+ && $last_nonblank_code_type eq 'k'
+ )
+ || (
+ $last_last_nonblank_code_token eq 'bless'
+ && $last_last_nonblank_code_type eq 'k'
+ && (
- # For efficiency, we find the maximum level of
- # opening tokens of any type. The actual maximum
- # level will be that of their contents which is 1
- # greater. That will be fixed in sub
- # 'finish_formatting'.
- my $level = $rlevels->[$j];
- if ( $level > $self->[_maximum_level_] ) {
- $self->[_maximum_level_] = $level;
- $self->[_maximum_level_at_line_] = $line_index + 1;
- }
+ $last_nonblank_code_token eq 'my'
+ || $last_nonblank_code_token eq '('
+ )
+ )
+ )
+ {
+ push @{ $rK_bless_by_sub_seqno->{$current_sub_seqno} },
+ [ scalar( @{$rLL_new} ), $token ];
}
- else { $self->[_Iss_closing_]->[$seqno] = @{$rSS} }
- push @{$rSS}, $sign * $seqno;
+ }
+ elsif ( $type eq 'w' ) {
+ if ( $last_nonblank_code_token eq 'use'
+ && $last_nonblank_code_type eq 'k' )
+ {
+ if ( $token eq 'strict' ) { $self->[_saw_use_strict_] = 1 }
+ }
+ }
+ else {
+ # Could be something like '* STDERR' or '$ debug'
+ }
+ }
+
+ # handle keywords
+ elsif ( $type eq 'k' ) {
+ if ( $token eq 'return' ) {
+ # remember the new K of this 'return; this may be
+ # off by 1 if a blank gets inserted before it
+ push
+ @{ $rK_return_by_sub_seqno->{$current_sub_seqno} },
+ scalar( @{$rLL_new} );
}
+ if ( $token eq 'wantarray' ) {
+ push
+ @{ $rK_wantarray_by_sub_seqno->{$current_sub_seqno} },
+ scalar( @{$rLL_new} );
+ }
+ }
- my @tokary;
- @tokary[
+ # handle semicolons
+ elsif ( $type eq ';' ) {
+
+ # Remove unnecessary semicolons, but not after bare
+ # blocks, where it could be unsafe if the brace is
+ # mis-tokenized.
+ if (
+ $rOpts->{'delete-semicolons'}
+ && (
+ (
+ $last_nonblank_block_type
+ && $last_nonblank_code_type eq '}'
+ && (
+ $is_block_without_semicolon{
+ $last_nonblank_block_type}
+ || $last_nonblank_block_type =~ /$SUB_PATTERN/
+ || $last_nonblank_block_type =~ /^\w+:$/
+ )
+ )
+ || $last_nonblank_code_type eq ';'
+ )
+ )
+ {
- _TOKEN_,
- _TYPE_,
- _TYPE_SEQUENCE_,
- _LEVEL_,
- _CI_LEVEL_,
- _LINE_INDEX_,
+ # This looks like a deletable semicolon, but even if a
+ # semicolon can be deleted it is not necessarily best to do
+ # so. We apply these additional rules for deletion:
+ # - Always ok to delete a ';' at the end of a line
+ # - Never delete a ';' before a '#' because it would
+ # promote it to a block comment.
+ # - If a semicolon is not at the end of line, then only
+ # delete if it is followed by another semicolon or closing
+ # token. This includes the comment rule. It may take
+ # two passes to get to a final state, but it is a little
+ # safer. For example, keep the first semicolon here:
+ # eval { sub bubba { ok(0) }; ok(0) } || ok(1);
+ # It is not required but adds some clarity.
+ my $ok_to_delete = 1;
+ if ( $KK < $Klast ) {
+ my $Kn = $self->K_next_nonblank($KK);
+ if ( defined($Kn) && $Kn <= $Klast ) {
+ my $next_nonblank_token_type = $rLL->[$Kn]->[_TYPE_];
+ $ok_to_delete = $next_nonblank_token_type eq ';'
+ || $next_nonblank_token_type eq '}';
+ }
+ }
- ] = (
+ # do not delete only nonblank token in a file
+ else {
+ my $Kp = $self->K_previous_code( undef, $rLL_new );
+ my $Kn = $self->K_next_nonblank($KK);
+ $ok_to_delete = defined($Kn) || defined($Kp);
+ }
- $token,
- $rtoken_type->[$j],
- $seqno,
- $rlevels->[$j],
- $rci_levels->[$j],
- $line_index,
+ if ($ok_to_delete) {
+ $self->note_deleted_semicolon($input_line_number);
+ next;
+ }
+ else {
+ write_logfile_entry("Extra ';'\n");
+ }
+ }
+ }
- );
- push @{$rLL}, \@tokary;
- } ## end token loop
+ elsif ( $type eq '->' ) {
+ if ( $last_nonblank_code_token eq ')' ) {
- # Need to remember if we can trim the input line
- $line_of_tokens->{_ended_in_blank_token} = $rtoken_type->[$jmax] eq 'b';
+ # save seqno of closing paren with arrow, ')->', git #171
+ # (the paren seqno is still on the stack)
+ my $seqno_paren = $seqno_stack{$depth_next};
+ if ($seqno_paren) { push @seqno_paren_arrow, $seqno_paren }
+ }
+ }
- # Values needed by Logger
- $line_of_tokens->{_level_0} = $rlevels->[0];
- $line_of_tokens->{_ci_level_0} = $rci_levels->[0];
- $line_of_tokens->{_nesting_blocks_0} =
- $line_of_tokens_old->{_nesting_blocks_0};
- $line_of_tokens->{_nesting_tokens_0} =
- $line_of_tokens_old->{_nesting_tokens_0};
+ # delete repeated commas if requested
+ elsif ( $type eq ',' ) {
+ if ( $last_nonblank_code_type eq ','
+ && $rOpts->{'delete-repeated-commas'} )
+ {
- return;
+ # Do not delete the leading comma of a line with a side
+ # comment. This could promote the side comment to a block
+ # comment. See test 'mangle4.in'
+ my $lno = 1 + $rLL->[$KK]->[_LINE_INDEX_];
+ if ( $KK eq $Kfirst && $rLL->[$Klast]->[_TYPE_] eq '#' ) {
+ complain(
+ "repeated comma before side comment, not deleted\n",
+ $lno );
+ }
+ else {
+ complain( "deleted repeated ','\n", $lno );
+ next;
+ }
+ }
+ elsif ($last_nonblank_code_type eq '=>'
+ && $rOpts->{'delete-repeated-commas'} )
+ {
+ my $lno = 1 + $rLL->[$KK]->[_LINE_INDEX_];
+ complain( "found '=>,' ... error?\n", $lno );
+ }
+ else {
+ # not a repeated comma type
+ }
- } ## end sub write_line_inner_loop
+ # remember input line index of first comma if -wtc is used
+ if (%trailing_comma_rules) {
+ my $seqno = $seqno_stack{ $depth_next - 1 };
+ if ( defined($seqno)
+ && !defined( $self->[_rfirst_comma_line_index_]->{$seqno} )
+ )
+ {
+ $self->[_rfirst_comma_line_index_]->{$seqno} =
+ $rtoken_vars->[_LINE_INDEX_];
+ }
+ }
+ }
-} ## end closure write_line
+ # check a quote for problems
+ elsif ( $type eq 'Q' ) {
+ $self->check_Q( $KK, $Kfirst, $input_line_number )
+ if ( $self->[_save_logfile_] );
+ }
-#############################################
-# CODE SECTION 5: Pre-process the entire file
-#############################################
+ # Old patch to add space to something like "x10".
+ # Note: This is now done in the Tokenizer, but this code remains
+ # for reference.
+ elsif ( $type eq 'n' ) {
+ if ( substr( $token, 0, 1 ) eq 'x' && $token =~ /^x\d+/ ) {
+ $token =~ s/x/x /;
+ $rtoken_vars->[_TOKEN_] = $token;
+ if (DEVEL_MODE) {
+ Fault(<<EOM);
+Near line $input_line_number, Unexpected need to split a token '$token' - this should now be done by the Tokenizer
+EOM
+ }
+ }
+ }
-sub finish_formatting {
+ elsif ( $type eq '=>' ) {
+ if ( $last_nonblank_code_type eq '=>'
+ && $rOpts->{'delete-repeated-commas'} )
+ {
- my ( $self, $severe_error ) = @_;
+ # Check for repeated '=>'s
+ # Note that ',=>' is useful and called a winking fat comma
- # The file has been tokenized and is ready to be formatted.
- # All of the relevant data is stored in $self, ready to go.
+ # Do not delete the leading fat comma of a line with a side
+ # comment. This could promote the side comment to a block
+ # comment. See test 'mangle4.in'
+ my $lno = 1 + $rLL->[$KK]->[_LINE_INDEX_];
+ if ( $KK eq $Kfirst && $rLL->[$Klast]->[_TYPE_] eq '#' ) {
+ complain(
+ "-repeated '=>' before side comment, not deleted\n",
+ $lno );
+ }
+ else {
+ complain( "deleted repeated '=>'\n", $lno );
+ next;
+ }
+ }
+
+ # remember input line index of first '=>' if -wtc is used
+ if (%trailing_comma_rules) {
+ my $seqno = $seqno_stack{ $depth_next - 1 };
+ if ( defined($seqno)
+ && !defined( $self->[_rfirst_comma_line_index_]->{$seqno} )
+ )
+ {
+ $self->[_rfirst_comma_line_index_]->{$seqno} =
+ $rtoken_vars->[_LINE_INDEX_];
+ }
+ }
+ }
- # Returns:
- # true if input file was copied verbatim due to errors
- # false otherwise
+ # check for a qw quote
+ elsif ( $type eq 'q' ) {
- # Some of the code in sub break_lists is not robust enough to process code
- # with arbitrary brace errors. The simplest fix is to just return the file
- # verbatim if there are brace errors. This fixes issue c160.
- $severe_error ||= get_saw_brace_error();
+ # Trim spaces from right of qw quotes. Also trim from the left for
+ # safety (the tokenizer should have done this).
+ # To avoid trimming qw quotes use -ntqw; this causes the
+ # tokenizer to set them as type 'Q' instead of 'q'.
+ $token =~ s/^ \s+ | \s+ $//gx;
+ $rtoken_vars->[_TOKEN_] = $token;
+ if ( $self->[_save_logfile_] && $token =~ /\t/ ) {
+ $self->note_embedded_tab($input_line_number);
+ }
+ if ( $rwhitespace_flags->[$KK] == WS_YES
+ && @{$rLL_new}
+ && $rLL_new->[-1]->[_TYPE_] ne 'b'
+ && $rOpts_add_whitespace )
+ {
+ $self->store_token();
+ }
+ $self->store_token($rtoken_vars);
+ next;
+ }
- # Check the maximum level. If it is extremely large we will give up and
- # output the file verbatim. Note that the actual maximum level is 1
- # greater than the saved value, so we fix that here.
- $self->[_maximum_level_] += 1;
- my $maximum_level = $self->[_maximum_level_];
- my $maximum_table_index = $#maximum_line_length_at_level;
- if ( !$severe_error && $maximum_level >= $maximum_table_index ) {
- $severe_error ||= 1;
- Warn(<<EOM);
-The maximum indentation level, $maximum_level, exceeds the builtin limit of $maximum_table_index.
-Something may be wrong; formatting will be skipped.
-EOM
- }
+ # Remove space after '<<'. Note that perl may use a space after
+ # '<<' to guess tokenization for numeric targets. See git #174.
+ elsif ( $type eq 'h' ) {
+ if ( index( $token, SPACE ) > 0 || index( $token, "\t" ) > 0 ) {
+ if ( $token =~ /^ (\<\<\~?) \s+ ([^\d].*) $/x ) {
+ $token = $1 . $2;
+ $rtoken_vars->[_TOKEN_] = $token;
+ }
+ }
+ }
+ elsif ( $type eq 'S' ) {
- # Dump any requested block summary data
- if ( $rOpts->{'dump-block-summary'} ) {
- if ($severe_error) { Exit(1) }
- $self->dump_block_summary();
- Exit(0);
- }
+ # Trim spaces in sub definitions
- # output file verbatim if severe error or no formatting requested
- if ( $severe_error || $rOpts->{notidy} ) {
- $self->dump_verbatim();
- $self->wrapup($severe_error);
- return 1;
- }
+ # save the NEW index of this token which will normally
+ # be @{$rLL_new} plus 1 because a blank is usually inserted
+ # ahead of it. The user routine will back up if necessary.
+ # Note that an isolated prototype starting on new line will
+ # be marked as 'S' but start with '(' and must be skipped.
+ if ( substr( $token, 0, 1 ) ne '(' ) {
- # Update the 'save_logfile' flag based to include any tokenization errors.
- # We can save time by skipping logfile calls if it is not going to be saved.
- my $logger_object = $self->[_logger_object_];
- if ($logger_object) {
- my $save_logfile = $logger_object->get_save_logfile();
- $self->[_save_logfile_] = $save_logfile;
- my $file_writer_object = $self->[_file_writer_object_];
- $file_writer_object->set_save_logfile($save_logfile);
- }
+ $K_last_S = @{$rLL_new} + 1;
- {
- my $rix_side_comments = $self->set_CODE_type();
+ # also, remember if this is a 'my' sub
+ $K_last_S_is_my = $last_nonblank_code_type eq 'k'
+ && (
+ $last_nonblank_code_token eq 'my'
+ || ( $last_nonblank_code_token eq 'sub'
+ && $last_last_nonblank_code_type eq 'k'
+ && $last_last_nonblank_code_token eq 'my' )
+ );
+ }
- $self->find_non_indenting_braces($rix_side_comments);
+ # Note: an asub with prototype like this will come this way
+ # and be partially treated as a named sub
+ # sub () {
- # Handle any requested side comment deletions. It is easier to get
- # this done here rather than farther down the pipeline because IO
- # lines take a different route, and because lines with deleted HSC
- # become BL lines. We have already handled any tee requests in sub
- # getline, so it is safe to delete side comments now.
- $self->delete_side_comments($rix_side_comments)
- if ( $rOpts_delete_side_comments
- || $rOpts_delete_closing_side_comments );
- }
+ # -spp = 0 : no space before opening prototype paren
+ # -spp = 1 : stable (follow input spacing)
+ # -spp = 2 : always space before opening prototype paren
+ if ( !defined($rOpts_space_prototype_paren)
+ || $rOpts_space_prototype_paren == 1 )
+ {
+ ## default: stable
+ }
+ elsif ( $rOpts_space_prototype_paren == 0 ) {
+ $token =~ s/\s+\(/\(/;
+ }
+ elsif ( $rOpts_space_prototype_paren == 2 ) {
+ $token =~ s/\(/ (/;
+ }
+ else {
+ ## should have been caught with the integer range check
+ ## continue with the default
+ DEVEL_MODE && Fault(<<EOM);
+unexpected integer value space-prototype-paren=$rOpts_space_prototype_paren
+EOM
+ }
- # Verify that the line hash does not have any unknown keys.
- $self->check_line_hashes() if (DEVEL_MODE);
+ # one space max, and no tabs
+ $token =~ s/\s+/ /g;
+ $rtoken_vars->[_TOKEN_] = $token;
- {
- # Make a pass through all tokens, adding or deleting any whitespace as
- # required. Also make any other changes, such as adding semicolons.
- # All token changes must be made here so that the token data structure
- # remains fixed for the rest of this iteration.
- my ( $error, $rqw_lines ) = $self->respace_tokens();
- if ($error) {
- $self->dump_verbatim();
- $self->wrapup();
- return 1;
+ $self->[_ris_special_identifier_token_]->{$token} = 'sub';
}
- $self->find_multiline_qw($rqw_lines);
- }
+ # and trim spaces in package statements (added for c250)
+ elsif ( $type eq 'P' ) {
- $self->examine_vertical_tightness_flags();
+ # clean up spaces in package identifiers, like
+ # "package Bob::Dog;"
+ if ( $token =~ s/\s+/ /g ) {
+ $rtoken_vars->[_TOKEN_] = $token;
+ $self->[_ris_special_identifier_token_]->{$token} = 'package';
+ }
- $self->set_excluded_lp_containers();
+ # remember the new K of this package; this may be
+ # off by 1 if a blank gets inserted before it
+ push @{$rK_package_list}, scalar( @{$rLL_new} );
+ }
- $self->keep_old_line_breaks();
+ # change 'LABEL :' to 'LABEL:'
+ elsif ( $type eq 'J' ) {
+ $token =~ s/\s+//g;
+ $rtoken_vars->[_TOKEN_] = $token;
+ }
- # Implement any welding needed for the -wn or -cb options
- $self->weld_containers();
+ else {
+ # no special processing for this token type
+ }
- # Collect info needed to implement the -xlp style
- $self->xlp_collapsed_lengths()
- if ( $rOpts_line_up_parentheses && $rOpts_extended_line_up_parentheses );
+ # Store this token with possible previous blank
+ if ( $rwhitespace_flags->[$KK] == WS_YES
+ && @{$rLL_new}
+ && $rLL_new->[-1]->[_TYPE_] ne 'b'
+ && $rOpts_add_whitespace )
+ {
+ $self->store_token();
+ }
+ $self->store_token($rtoken_vars);
- # Locate small nested blocks which should not be broken
- $self->mark_short_nested_blocks();
+ } # End token loop
- $self->special_indentation_adjustments();
+ return;
+} ## end sub respace_tokens_inner_loop
- # Verify that the main token array looks OK. If this ever causes a fault
- # then place similar checks before the sub calls above to localize the
- # problem.
- $self->check_rLL("Before 'process_all_lines'") if (DEVEL_MODE);
+sub respace_post_loop_ops {
- # Finishes formatting and write the result to the line sink.
- # Eventually this call should just change the 'rlines' data according to the
- # new line breaks and then return so that we can do an internal iteration
- # before continuing with the next stages of formatting.
- $self->process_all_lines();
+ my ($self) = @_;
- # A final routine to tie up any loose ends
- $self->wrapup();
- return;
-} ## end sub finish_formatting
+ # We have just completed the 'respace' operation, in which we have made
+ # a pass through all tokens and set the whitespace between tokens to be
+ # according to user settings. The new tokens have been placed in the new
+ # token list '$rLL_new'. Now we have to go through this new list and
+ # define some indexes which allow quick access into it.
-my %is_loop_type;
+ return unless ( @{$rLL_new} );
-BEGIN {
- my @q = qw( for foreach while do until );
- @{is_loop_type}{@q} = (1) x scalar(@q);
-}
+ # Setup array for finding the next sequence number after any token
+ my @K_next_seqno_by_K;
+ my $K_last = 0;
+ foreach my $K (@K_sequenced_token_list) {
+ push @K_next_seqno_by_K, ($K) x ( $K - $K_last );
+ $K_last = $K;
+ }
-sub find_level_info {
+ # Note: here is the slow way to do the above loop (100 ms)
+ ## foreach my $KK ( $K_last .. $K - 1 ) {
+ ## $K_next_seqno_by_K[$KK] = $K;
+ ## }
- # Find level ranges and total variations of all code blocks in this file.
+ # This is faster (63 ms)
+ ## my @q = ( $K_last .. $K - 1 );
+ ## @K_next_seqno_by_K[@q] = ($K) x scalar(@q);
- # Returns:
- # ref to hash with block info, with seqno as key (see below)
+ # The push method above is fastest, at 37 ms in my benchmark.
- my ($self) = @_;
+ $self->[_rK_next_seqno_by_K_] = \@K_next_seqno_by_K;
+ $self->[_rK_sequenced_token_list_] = \@K_sequenced_token_list;
- # The array _rSS_ has the complete container tree for this file.
- my $rSS = $self->[_rSS_];
+ # Verify that arrays @K_sequenced_token_list and @{$rSS} are parallel
+ # arrays, meaning that they have a common array index 'I'. This index maybe
+ # be found by seqno with rI_container and rI_closing.
+ if (DEVEL_MODE) {
+ my $num_rSS = @{ $self->[_rSS_] };
+ my $num_Kseq = @K_sequenced_token_list;
- # We will be ignoring everything except code block containers
- my $rblock_type_of_seqno = $self->[_rblock_type_of_seqno_];
+ # If this error occurs, we have gained or lost one or more of the
+ # sequenced tokens received from the tokenizer. This should never
+ # happen.
+ if ( $num_rSS != $num_Kseq ) {
+ Fault(<<EOM);
+num_rSS= $num_rSS != num_Kseq=$num_Kseq
+EOM
+ }
+ }
- my @stack;
- my %level_info;
+ # Find and remember lists by sequence number
+ foreach my $seqno ( keys %{$K_opening_container} ) {
+ my $K_opening = $K_opening_container->{$seqno};
+ next unless defined($K_opening);
- # TREE_LOOP:
- foreach my $sseq ( @{$rSS} ) {
- my $stack_depth = @stack;
- my $seq_next = $sseq > 0 ? $sseq : -$sseq;
+ # code errors may leave undefined closing tokens
+ my $K_closing = $K_closing_container->{$seqno};
+ next unless defined($K_closing);
- next if ( !$rblock_type_of_seqno->{$seq_next} );
- if ( $sseq > 0 ) {
+ my $lx_open = $rLL_new->[$K_opening]->[_LINE_INDEX_];
+ my $lx_close = $rLL_new->[$K_closing]->[_LINE_INDEX_];
+ my $line_diff = $lx_close - $lx_open;
+ $ris_broken_container->{$seqno} = $line_diff;
- # STACK_LOOP:
- my $item;
- foreach my $seq (@stack) {
- $item = $level_info{$seq};
- if ( $item->{maximum_depth} < $stack_depth ) {
- $item->{maximum_depth} = $stack_depth;
- }
- $item->{block_count}++;
- } ## end STACK LOOP
+ # See if this is a list
+ my $is_list;
+ my $rtype_count = $rtype_count_by_seqno->{$seqno};
+ if ($rtype_count) {
+ my $comma_count = $rtype_count->{','};
+ my $fat_comma_count = $rtype_count->{'=>'};
+ my $semicolon_count = $rtype_count->{';'};
+ if ( $rtype_count->{'f'} ) {
+ $semicolon_count += $rtype_count->{'f'};
+ }
- push @stack, $seq_next;
- my $block_type = $rblock_type_of_seqno->{$seq_next};
+ # We will define a list to be a container with one or more commas
+ # and no semicolons. Note that we have included the semicolons
+ # in a 'for' container in the semicolon count to keep c-style for
+ # statements from being formatted as lists.
+ if ( ( $comma_count || $fat_comma_count ) && !$semicolon_count ) {
+ $is_list = 1;
- # If this block is a loop nested within a loop, then we
- # will mark it as an 'inner_loop'. This is a useful
- # complexity measure.
- my $is_inner_loop = 0;
- if ( $is_loop_type{$block_type} && defined($item) ) {
- $is_inner_loop = $is_loop_type{ $item->{block_type} };
+ # We need to do one more check for a parenthesized list:
+ # At an opening paren following certain tokens, such as 'if',
+ # we do not want to format the contents as a list.
+ if ( $rLL_new->[$K_opening]->[_TOKEN_] eq '(' ) {
+ my $Kp = $self->K_previous_code( $K_opening, $rLL_new );
+ if ( defined($Kp) ) {
+ my $type_p = $rLL_new->[$Kp]->[_TYPE_];
+ my $token_p = $rLL_new->[$Kp]->[_TOKEN_];
+ $is_list =
+ $type_p eq 'k'
+ ? !$is_nonlist_keyword{$token_p}
+ : !$is_nonlist_type{$type_p};
+ }
+ }
}
-
- $level_info{$seq_next} = {
- starting_depth => $stack_depth,
- maximum_depth => $stack_depth,
- block_count => 1,
- block_type => $block_type,
- is_inner_loop => $is_inner_loop,
- };
}
- else {
- my $seq_test = pop @stack;
- # error check
- if ( $seq_test != $seq_next ) {
+ # Look for a block brace marked as uncertain. If the tokenizer thinks
+ # its guess is uncertain for the type of a brace following an unknown
+ # bareword then it adds a trailing space as a signal. We can fix the
+ # type here now that we have had a better look at the contents of the
+ # container. This fixes case b1085. To find the corresponding code in
+ # Tokenizer.pm search for 'b1085' with an editor.
+ my $block_type = $rblock_type_of_seqno->{$seqno};
+ if ( $block_type && substr( $block_type, -1, 1 ) eq SPACE ) {
+
+ # Always remove the trailing space
+ $block_type =~ s/\s+$//;
+
+ # Try to filter out parenless sub calls
+ my $Knn1 = $self->K_next_nonblank( $K_opening, $rLL_new );
+ my $Knn2;
+ if ( defined($Knn1) ) {
+ $Knn2 = $self->K_next_nonblank( $Knn1, $rLL_new );
+ }
+ my $type_nn1 = defined($Knn1) ? $rLL_new->[$Knn1]->[_TYPE_] : 'b';
+ my $type_nn2 = defined($Knn2) ? $rLL_new->[$Knn2]->[_TYPE_] : 'b';
- # Shouldn't happen - the $rSS array must have an error
- DEVEL_MODE && Fault("stack error finding total depths\n");
+ # if ( $type_nn1 =~ /^[wU]$/ && $type_nn2 =~ /^[wiqQGCZ]$/ ) {
+ if ( $wU{$type_nn1} && $wiq{$type_nn2} ) {
+ $is_list = 0;
+ }
- %level_info = ();
- last;
+ # Convert to a hash brace if it looks like it holds a list
+ if ($is_list) {
+ $block_type = EMPTY_STRING;
}
+
+ $rblock_type_of_seqno->{$seqno} = $block_type;
}
- } ## end TREE_LOOP
- return \%level_info;
-} ## end sub find_level_info
-sub find_loop_label {
+ # Handle a list container
+ if ( $is_list && !$block_type ) {
+ $ris_list_by_seqno->{$seqno} = $seqno;
- my ( $self, $seqno ) = @_;
+ # Update parent container properties
+ my $depth = 0;
+ my $rparent_seqno_list = $self->get_parent_containers($seqno);
+ foreach my $seqno_parent ( @{$rparent_seqno_list} ) {
+ $depth++;
- # Given:
- # $seqno = sequence number of a block of code for a loop
- # Return:
- # $label = the loop label text, if any, or an empty string
+ # for $rhas_list we need to save the minimum depth
+ if ( !$rhas_list->{$seqno_parent}
+ || $rhas_list->{$seqno_parent} > $depth )
+ {
+ $rhas_list->{$seqno_parent} = $depth;
+ }
- my $rLL = $self->[_rLL_];
- my $rlines = $self->[_rlines_];
- my $K_opening_container = $self->[_K_opening_container_];
+ if ($line_diff) {
+ $rhas_broken_list->{$seqno_parent} = 1;
- my $label = EMPTY_STRING;
- my $K_opening = $K_opening_container->{$seqno};
+ # Patch1: We need to mark broken lists with non-terminal
+ # line-ending commas for the -bbx=2 parameter. This insures
+ # that the list will stay broken. Otherwise the flag
+ # -bbx=2 can be unstable. This fixes case b789 and b938.
- # backup to the line with the opening paren, if any, in case the
- # keyword is on a different line
- my $Kp = $self->K_previous_code($K_opening);
- return $label unless ( defined($Kp) );
- if ( $rLL->[$Kp]->[_TOKEN_] eq ')' ) {
- $seqno = $rLL->[$Kp]->[_TYPE_SEQUENCE_];
- $K_opening = $K_opening_container->{$seqno};
+ # Patch2: Updated to also require either one fat comma or
+ # one more line-ending comma. Fixes cases b1069 b1070
+ # b1072 b1076.
+ if (
+ $rlec_count_by_seqno->{$seqno}
+ && ( $rlec_count_by_seqno->{$seqno} > 1
+ || $rtype_count_by_seqno->{$seqno}->{'=>'} )
+ )
+ {
+ $rhas_broken_list_with_lec->{$seqno_parent} = 1;
+ }
+ }
+ }
+ }
+
+ # Handle code blocks ...
+ # The -lp option needs to know if a container holds a code block
+ elsif ( $block_type && $rOpts_line_up_parentheses ) {
+
+ # Update parent container properties
+ my $rparent_seqno_list = $self->get_parent_containers($seqno);
+ foreach my $seqno_parent ( @{$rparent_seqno_list} ) {
+ $rhas_code_block->{$seqno_parent} = 1;
+ $rhas_broken_code_block->{$seqno_parent} = $line_diff;
+ }
+ }
+ else {
+ # nothing special to do for this container token
+ }
}
- return $label unless ( defined($K_opening) );
- my $lx_open = $rLL->[$K_opening]->[_LINE_INDEX_];
+ # Find containers with ternaries, needed for -lp formatting.
+ foreach my $seqno ( keys %{$K_opening_ternary} ) {
- # look for a lable within a few lines; allow a couple of blank lines
- foreach my $lx ( reverse( $lx_open - 3 .. $lx_open ) ) {
- last if ( $lx < 0 );
- my $line_of_tokens = $rlines->[$lx];
- my $line_type = $line_of_tokens->{_line_type};
+ # Update parent container properties
+ $self->mark_parent_containers( $seqno, $rhas_ternary );
+ }
- # stop search on a non-code line
- last if ( $line_type ne 'CODE' );
+ # Turn off -lp for containers with here-docs with text within a container,
+ # since they have their own fixed indentation. Fixes case b1081.
+ if ($rOpts_line_up_parentheses) {
+ foreach my $seqno ( keys %K_first_here_doc_by_seqno ) {
+ my $Kh = $K_first_here_doc_by_seqno{$seqno};
+ my $Kc = $K_closing_container->{$seqno};
+ my $line_Kh = $rLL_new->[$Kh]->[_LINE_INDEX_];
+ my $line_Kc = $rLL_new->[$Kc]->[_LINE_INDEX_];
+ next if ( $line_Kh == $line_Kc );
+ $ris_excluded_lp_container->{$seqno} = 1;
+ }
+ }
- my $rK_range = $line_of_tokens->{_rK_range};
- my ( $Kfirst, $Klast ) = @{$rK_range};
+ # Set a flag to turn off -cab=3 in complex structures. Otherwise,
+ # instability can occur. When it is overridden the behavior of the closest
+ # match, -cab=2, will be used instead. This fixes cases b1096 b1113.
+ if ( $rOpts_comma_arrow_breakpoints == 3 ) {
+ foreach my $seqno ( keys %{$K_opening_container} ) {
- # skip a blank line
- next if ( !defined($Kfirst) );
+ my $rtype_count = $rtype_count_by_seqno->{$seqno};
+ next unless ( $rtype_count && $rtype_count->{'=>'} );
- # check for a lable
- if ( $rLL->[$Kfirst]->[_TYPE_] eq 'J' ) {
- $label = $rLL->[$Kfirst]->[_TOKEN_];
- last;
- }
+ # override -cab=3 if this contains a sub-list
+ if ( !defined( $roverride_cab3->{$seqno} ) ) {
+ if ( $rhas_list->{$seqno} ) {
+ $roverride_cab3->{$seqno} = 2;
+ }
- # quit the search if we are above the starting line
- last if ( $lx < $lx_open );
+ # or if this is a sub-list of its parent container
+ else {
+ my $seqno_parent = $rparent_of_seqno->{$seqno};
+ if ( defined($seqno_parent)
+ && $ris_list_by_seqno->{$seqno_parent} )
+ {
+ $roverride_cab3->{$seqno} = 2;
+ }
+ }
+ }
+ }
}
- return $label;
-} ## end sub find_loop_label
+ # Search for chains of method calls of the form (git #171)
+ # )->xxx( )->xxx( )->
+ # We have previously saved the seqno of all ')->' combinations
+ my $in_chain_seqno = 0;
+ while ( my $seqno = shift @seqno_paren_arrow ) {
-{ ## closure find_mccabe_count
- my %is_mccabe_logic_keyword;
- my %is_mccabe_logic_operator;
+ # ) -> func (
+ # ) -> func (
+ # $Kc--^ ^--$K_test
- BEGIN {
- my @q = (qw( && || ||= &&= ? <<= >>= ));
- @is_mccabe_logic_operator{@q} = (1) x scalar(@q);
+ my $Kc = $K_closing_container->{$seqno};
+ my $K_arrow = $self->K_next_nonblank( $Kc, $rLL_new );
+ my $K_func = $self->K_next_nonblank( $K_arrow, $rLL_new );
+ my $K_test = $self->K_next_nonblank( $K_func, $rLL_new );
- @q = (qw( and or xor if else elsif unless until while for foreach ));
- @is_mccabe_logic_keyword{@q} = (1) x scalar(@q);
- } ## end BEGIN
+ last if ( !defined($K_test) );
- sub find_mccabe_count {
- my ($self) = @_;
+ # ignore index operation like ')->{' or ')->[' and end any chain
+ my $tok = $rLL_new->[$K_func]->[_TOKEN_];
+ if ( $tok eq '[' || $tok eq '{' ) { $in_chain_seqno = 0; next }
- # Find the cumulative mccabe count to each token
- # Return '$rmccabe_count_sum' = ref to array with cumulative
- # mccabe count to each token $K
+ # mark seqno of parens which are part of a call chain
+ my $seqno_start = $in_chain_seqno ? $in_chain_seqno : $seqno;
+ $rseqno_arrow_call_chain_start->{$seqno} = $seqno_start;
- # NOTE: This sub currently follows the definitions in Perl::Critic
+ # save a list of the arrows, needed to set line breaks
+ push @{ $rarrow_call_chain->{$seqno_start} }, $K_arrow;
- my $rmccabe_count_sum;
- my $rLL = $self->[_rLL_];
- my $count = 0;
- my $Klimit = $self->[_Klimit_];
- foreach my $KK ( 0 .. $Klimit ) {
- $rmccabe_count_sum->{$KK} = $count;
- my $type = $rLL->[$KK]->[_TYPE_];
- if ( $type eq 'k' ) {
- my $token = $rLL->[$KK]->[_TOKEN_];
- if ( $is_mccabe_logic_keyword{$token} ) { $count++ }
- }
- elsif ( $is_mccabe_logic_operator{$type} ) {
- $count++;
- }
+ # See if this chain continues
+ if ( @seqno_paren_arrow
+ && defined($K_test)
+ && $rLL_new->[$K_test]->[_TOKEN_] eq '('
+ && $rLL_new->[$K_test]->[_TYPE_SEQUENCE_] eq $seqno_paren_arrow[0] )
+ {
+ $in_chain_seqno ||= $seqno;
}
- $rmccabe_count_sum->{ $Klimit + 1 } = $count;
- return $rmccabe_count_sum;
- } ## end sub find_mccabe_count
-} ## end closure find_mccabe_count
+ else { $in_chain_seqno = 0 }
+ } ## end while ( my $seqno = shift...)
-sub find_code_line_count {
- my ($self) = @_;
+ # For efficiency, remove chains with length < 2
+ foreach my $seqno ( keys %{$rseqno_arrow_call_chain_start} ) {
+ my $seqno_start = $rseqno_arrow_call_chain_start->{$seqno};
+ if ( @{ $rarrow_call_chain->{$seqno_start} } < 2 ) {
+ delete $rseqno_arrow_call_chain_start->{$seqno};
+ delete $rarrow_call_chain->{$seqno_start};
+ }
+ }
- # Find the cumulative number of lines of code, excluding blanks,
- # comments and pod.
- # Return '$rcode_line_count' = ref to array with cumulative
- # code line count for each input line number.
+ return;
+} ## end sub respace_post_loop_ops
- my $rcode_line_count;
- my $rLL = $self->[_rLL_];
- my $rlines = $self->[_rlines_];
- my $ix_line = -1;
- my $code_line_count = 0;
+sub store_token {
- # loop over all lines
- foreach my $line_of_tokens ( @{$rlines} ) {
- $ix_line++;
+ my ( $self, ($item) ) = @_;
- # what type of line?
- my $line_type = $line_of_tokens->{_line_type};
+ # Store one token during respace operations
- # if 'CODE' it must be non-blank and non-comment
- if ( $line_type eq 'CODE' ) {
- my $rK_range = $line_of_tokens->{_rK_range};
- my ( $Kfirst, $Klast ) = @{$rK_range};
+ # Given:
+ # $item =
+ # if defined => reference to a token to be stored
+ # if not defined => make and store a blank space
- if ( defined($Kfirst) ) {
+ # NOTE: this sub is called once per token so coding efficiency is critical.
- # it is non-blank
- my $jmax = defined($Kfirst) ? $Klast - $Kfirst : -1;
- if ( $jmax > 0 || $rLL->[$Klast]->[_TYPE_] ne '#' ) {
+ # If no arg, then make and store a blank space
+ if ( !$item ) {
- # ok, it is a non-comment
- $code_line_count++;
- }
- }
+ # - Never start the array with a space, and
+ # - Never store two consecutive spaces
+ if ( @{$rLL_new} && $rLL_new->[-1]->[_TYPE_] ne 'b' ) {
+
+ # Note that the level and ci_level of newly created spaces should
+ # be the same as the previous token. Otherwise the coding for the
+ # -lp option can create a blinking state in some rare cases.
+ # (see b1109, b1110).
+ $item = [];
+ $item->[_TYPE_] = 'b';
+ $item->[_TOKEN_] = SPACE;
+ $item->[_TYPE_SEQUENCE_] = EMPTY_STRING;
+ $item->[_LINE_INDEX_] = $rLL_new->[-1]->[_LINE_INDEX_];
+ $item->[_LEVEL_] = $rLL_new->[-1]->[_LEVEL_];
}
+ else { return }
+ }
- # Count all other special line types except pod;
- # For a list of line types see sub 'process_all_lines'
- elsif ( $line_type !~ /^POD/ ) { $code_line_count++ }
+ # The next multiple assignment statements are significantly faster than
+ # doing them one-by-one.
+ my (
- # Store the cumulative count using the input line index
- $rcode_line_count->[$ix_line] = $code_line_count;
- }
- return $rcode_line_count;
-} ## end sub find_code_line_count
+ $type,
+ $token,
+ $type_sequence,
-sub find_selected_packages {
+ ) = @{$item}[
- my ( $self, $rdump_block_types ) = @_;
+ _TYPE_,
+ _TOKEN_,
+ _TYPE_SEQUENCE_,
+
+ ];
- # returns a list of all package statements in a file if requested
+ # Set the token length. Later it may be adjusted again if phantom or
+ # ignoring side comment lengths. It is always okay to calculate the length
+ # with $length_function->() if it is defined, but it is extremely slow so
+ # we avoid it and use the builtin length() for printable ascii tokens.
+ # Note: non-printable ascii characters (like tab) may get different lengths
+ # by the two methods, so we have to use $length_function for them.
+ my $token_length =
+ ( $length_function
+ && !$is_ascii_type{$type}
+ && $token =~ /[[:^ascii:][:^print:]]/ )
+ ? $length_function->($token)
+ : length($token);
- unless ( $rdump_block_types->{'*'}
- || $rdump_block_types->{'package'}
- || $rdump_block_types->{'class'} )
- {
- return;
+ # handle blanks
+ if ( $type eq 'b' ) {
+
+ # Do not output consecutive blanks. This situation should have been
+ # prevented earlier, but it is worth checking because later routines
+ # make this assumption.
+ if ( @{$rLL_new} && $rLL_new->[-1]->[_TYPE_] eq 'b' ) {
+ return;
+ }
}
- my $rLL = $self->[_rLL_];
- my $Klimit = $self->[_Klimit_];
- my $rlines = $self->[_rlines_];
+ # handle comments
+ elsif ( $type eq '#' ) {
- my $K_closing_container = $self->[_K_closing_container_];
- my @package_list;
- my @package_sweep;
- foreach my $KK ( 0 .. $Klimit ) {
- my $item = $rLL->[$KK];
- my $type = $item->[_TYPE_];
- if ( $type ne 'i' ) {
- next;
+ # trim comments if necessary
+ my $ord = ord( substr( $token, -1, 1 ) );
+ if (
+ $ord > 0
+ && ( $ord < ORD_PRINTABLE_MIN
+ || $ord > ORD_PRINTABLE_MAX )
+ && $token =~ s/\s+$//
+ )
+ {
+ $token_length =
+ $length_function ? $length_function->($token) : length($token);
+ $item->[_TOKEN_] = $token;
}
- my $token = $item->[_TOKEN_];
- if ( substr( $token, 0, 7 ) eq 'package' && $token =~ /^package\s/
- || substr( $token, 0, 5 ) eq 'class' && $token =~ /^class\s/ )
+
+ my $ignore_sc_length = $rOpts_ignore_side_comment_lengths;
+
+ # Ignore length of '## no critic' comments even if -iscl is not set
+ if ( !$ignore_sc_length
+ && !$rOpts_ignore_perlcritic_comments
+ && $token_length > 10
+ && substr( $token, 1, 1 ) eq '#'
+ && $token =~ /^##\s*no\s+critic\b/ )
{
- $token =~ s/\s+/ /g;
- my ( $keyword, $name ) = split /\s+/, $token, 2;
+ # Is it a side comment or a block comment?
+ if ( $Ktoken_vars > $Kfirst_old ) {
- my $lx_start = $item->[_LINE_INDEX_];
- my $level = $item->[_LEVEL_];
- my $parent_seqno = $self->parent_seqno_by_K($KK);
+ # This is a side comment. If we do not ignore its length, and
+ # -iscl has not been set, then the line could be broken and
+ # perlcritic will complain. So this is essential:
+ $ignore_sc_length ||= 1;
- # Skip a class BLOCK because it will be handled as a block
- if ( $keyword eq 'class' ) {
- my $line_of_tokens = $rlines->[$lx_start];
- my $rK_range = $line_of_tokens->{_rK_range};
- my ( $K_first, $K_last ) = @{$rK_range};
- if ( $rLL->[$K_last]->[_TYPE_] eq '#' ) {
- $K_last = $self->K_previous_code($K_last);
- }
- if ( defined($K_last) ) {
- my $seqno_class = $rLL->[$K_last]->[_TYPE_SEQUENCE_];
- my $block_type_next =
- $self->[_rblock_type_of_seqno_]->{$seqno_class};
-
- # these block types are currently marked 'package'
- # but may be 'class' in the future, so allow both.
- if ( defined($block_type_next)
- && $block_type_next =~ /^(class|package)\b/ )
- {
- next;
- }
- }
+ # It would be a good idea to also make this behave like a
+ # static side comment, but this is not essential and would
+ # change existing formatting. So we will leave it to the user
+ # to set -ssc if desired.
}
+ else {
- my $K_closing = $Klimit;
- if ( $parent_seqno != SEQ_ROOT ) {
- my $Kc = $K_closing_container->{$parent_seqno};
- if ( defined($Kc) ) {
- $K_closing = $Kc;
- }
+ # This is a full-line (block) comment.
+ # It would be a good idea to make this behave like a static
+ # block comment, but this is not essential and would change
+ # existing formatting. So we will leave it to the user to
+ # set -sbc if desired
}
+ }
- # This package ends any previous package at this level
- if ( defined( my $ix = $package_sweep[$level] ) ) {
- my $rpk = $package_list[$ix];
- my $Kc = $rpk->{K_closing};
- if ( $Kc > $KK ) {
- $rpk->{K_closing} = $KK - 1;
- }
- }
- $package_sweep[$level] = @package_list;
+ # Set length of ignored side comments as just 1
+ if ( $ignore_sc_length && ( !$CODE_type || $CODE_type eq 'HSC' ) ) {
+ $token_length = 1;
+ }
- # max_change and block_count are not currently reported 'package'
- push @package_list,
- {
- line_start => $lx_start + 1,
- K_opening => $KK,
- K_closing => $Klimit,
- name => $name,
- type => $keyword,
- level => $level,
- max_change => 0,
- block_count => 0,
- };
+ my $seqno = $seqno_stack{ $depth_next - 1 };
+ if ( defined($seqno) ) {
+ $self->[_rblank_and_comment_count_]->{$seqno} += 1
+ if ( $CODE_type eq 'BC' );
+ if ( !$ris_permanently_broken->{$seqno} ) {
+ $ris_permanently_broken->{$seqno} = 1;
+ $self->mark_parent_containers( $seqno,
+ $ris_permanently_broken );
+ }
}
}
- return \@package_list;
-} ## end sub find_selected_packages
+ # handle non-blanks and non-comments
+ else {
-sub find_selected_blocks {
+ my $block_type;
- my ( $self, $rdump_block_types ) = @_;
+ # check for a sequenced item (i.e., container or ?/:)
+ if ($type_sequence) {
- # Find blocks needed for --dump-block-summary
- # Returns:
- # $rslected_blocks = ref to a list of information on the selected blocks
+ # This will be the index of this item in the new array
+ my $KK_new = @{$rLL_new};
- my $rLL = $self->[_rLL_];
- my $rlines = $self->[_rlines_];
- my $rblock_type_of_seqno = $self->[_rblock_type_of_seqno_];
- my $K_opening_container = $self->[_K_opening_container_];
- my $K_closing_container = $self->[_K_closing_container_];
- my $ris_asub_block = $self->[_ris_asub_block_];
- my $ris_sub_block = $self->[_ris_sub_block_];
+ # remember new K of sequence tokens
+ push @K_sequenced_token_list, $KK_new;
- my $dump_all_types = $rdump_block_types->{'*'};
+ if ( $is_opening_token{$token} ) {
- # Get level variation info for code blocks
- my $rlevel_info = $self->find_level_info();
+ $K_opening_container->{$type_sequence} = $KK_new;
+ $block_type = $rblock_type_of_seqno->{$type_sequence};
- my @selected_blocks;
+ # Fix for case b1100: Count a line ending in ', [' as having
+ # a line-ending comma. Otherwise, these commas can be hidden
+ # with something like --opening-square-bracket-right
+ if ( $last_nonblank_code_type eq ','
+ && $Ktoken_vars == $Klast_old_code
+ && $Ktoken_vars > $Kfirst_old )
+ {
+ $rlec_count_by_seqno->{$type_sequence}++;
+ }
- #---------------------------------------------------
- # BEGIN loop over all blocks to find selected blocks
- #---------------------------------------------------
- foreach my $seqno ( keys %{$rblock_type_of_seqno} ) {
+ if ( $last_nonblank_code_type eq '='
+ || $last_nonblank_code_type eq '=>' )
+ {
+ $ris_assigned_structure->{$type_sequence} =
+ $last_nonblank_code_type;
+ }
- my $type;
- my $name = EMPTY_STRING;
- my $block_type = $rblock_type_of_seqno->{$seqno};
- my $K_opening = $K_opening_container->{$seqno};
- my $K_closing = $K_closing_container->{$seqno};
- my $level = $rLL->[$K_opening]->[_LEVEL_];
+ my $seqno_parent = $seqno_stack{ $depth_next - 1 };
+ $seqno_parent = SEQ_ROOT unless defined($seqno_parent);
+ push @{ $rchildren_of_seqno->{$seqno_parent} }, $type_sequence;
+ $rparent_of_seqno->{$type_sequence} = $seqno_parent;
+ $seqno_stack{$depth_next} = $type_sequence;
+ $K_old_opening_by_seqno{$type_sequence} = $Ktoken_vars;
+ $depth_next++;
- my $lx_open = $rLL->[$K_opening]->[_LINE_INDEX_];
- my $line_of_tokens = $rlines->[$lx_open];
- my $rK_range = $line_of_tokens->{_rK_range};
- my ( $Kfirst, $Klast ) = @{$rK_range};
- if ( !defined($Kfirst) || !defined($Klast) || $Kfirst > $K_opening ) {
- my $line_type = $line_of_tokens->{_line_type};
+ if ( $depth_next > $depth_next_max ) {
+ $depth_next_max = $depth_next;
+ }
+ }
+ elsif ( $is_closing_token{$token} ) {
- # shouldn't happen
- my $CODE_type = $line_of_tokens->{_code_type};
- DEVEL_MODE && Fault(<<EOM);
-unexpected line_type=$line_type at line $lx_open, code type=$CODE_type
+ $K_closing_container->{$type_sequence} = $KK_new;
+ $block_type = $rblock_type_of_seqno->{$type_sequence};
+
+ # Do not include terminal commas in counts
+ if ( $last_nonblank_code_type eq ','
+ || $last_nonblank_code_type eq '=>' )
+ {
+ $rtype_count_by_seqno->{$type_sequence}
+ ->{$last_nonblank_code_type}--;
+
+ if ( $Ktoken_vars == $Kfirst_old
+ && $last_nonblank_code_type eq ','
+ && $rlec_count_by_seqno->{$type_sequence} )
+ {
+ $rlec_count_by_seqno->{$type_sequence}--;
+ }
+
+ # set flag to retain trailing comma breaks (b1493, c416)
+ # length check needed to ignore phantom commas (b1496)
+ if ( $last_nonblank_code_type eq ','
+ && $trailing_comma_break_rules{$token}
+ && length($last_nonblank_code_token) )
+ {
+
+ my $rule = $trailing_comma_break_rules{$token};
+ my ( $letter, $paren_flag ) = @{$rule};
+ my $match;
+ if ( $letter eq 'b' ) {
+ $match = $Ktoken_vars == $Kfirst_old;
+ }
+ elsif ( $letter eq 'm' ) {
+ $match = $K_old_opening_by_seqno{$type_sequence} <
+ $Kfirst_old;
+ }
+ elsif ( $letter eq '1' || $letter eq '*' ) {
+ $match = 1;
+ }
+ else {
+ ## shouldn't happen - treat as 'b' for now
+ DEVEL_MODE && Fault(<<EOM);
+unexpected option '$letter' for --trailing-comma-break-flag at token '$token'
EOM
- next;
- }
+ $match = $Ktoken_vars == $Kfirst_old;
+ }
- my ( $max_change, $block_count, $inner_loop_plus ) =
- ( 0, 0, EMPTY_STRING );
- my $item = $rlevel_info->{$seqno};
- if ( defined($item) ) {
- my $starting_depth = $item->{starting_depth};
- my $maximum_depth = $item->{maximum_depth};
- $block_count = $item->{block_count};
- $max_change = $maximum_depth - $starting_depth + 1;
+ if ( $match && $paren_flag && $token eq ')' ) {
+ $match &&=
+ $self->match_paren_control_flag( $type_sequence,
+ $paren_flag, $rLL_new );
+ }
- # this is a '+' character if this block is an inner loops
- $inner_loop_plus = $item->{is_inner_loop} ? '+' : EMPTY_STRING;
- }
+ if ($match) {
+ $self->[_rbreak_container_]->{$type_sequence} = 1;
+ }
+ }
+ }
- # Skip closures unless type 'closure' is explicitely requested
- if ( ( $block_type eq '}' || $block_type eq ';' )
- && $rdump_block_types->{'closure'} )
- {
- $type = 'closure';
- }
+ # Update the stack...
+ $depth_next--;
+ }
+ else {
- # Both 'sub' and 'asub' select an anonymous sub.
- # This allows anonymous subs to be explicitely selected
- elsif (
- $ris_asub_block->{$seqno}
- && ( $dump_all_types
- || $rdump_block_types->{'sub'}
- || $rdump_block_types->{'asub'} )
- )
- {
- $type = 'asub';
+ # For ternary, note parent but do not include as child
+ my $seqno_parent = $seqno_stack{ $depth_next - 1 };
+ $seqno_parent = SEQ_ROOT unless defined($seqno_parent);
+ $rparent_of_seqno->{$type_sequence} = $seqno_parent;
- # Look back to try to find some kind of name, such as
- # my $var = sub { - var is type 'i'
- # var => sub { - var is type 'w'
- # -var => sub { - var is type 'w'
- # 'var' => sub { - var is type 'Q'
- my ( $saw_equals, $saw_fat_comma, $blank_count );
- foreach my $KK ( reverse( $Kfirst .. $K_opening - 1 ) ) {
- my $token_type = $rLL->[$KK]->[_TYPE_];
- if ( $token_type eq 'b' ) { $blank_count++; next }
- if ( $token_type eq '=>' ) { $saw_fat_comma++; next }
- if ( $token_type eq '=' ) { $saw_equals++; next }
- if ( $token_type eq 'i' && $saw_equals
- || ( $token_type eq 'w' || $token_type eq 'Q' )
- && $saw_fat_comma )
- {
- $name = $rLL->[$KK]->[_TOKEN_];
- last;
+ # These are not yet used but could be useful
+ if ( $token eq '?' ) {
+ $K_opening_ternary->{$type_sequence} = $KK_new;
+ }
+ elsif ( $token eq ':' ) {
+ $K_closing_ternary->{$type_sequence} = $KK_new;
+ }
+ else {
+
+ # We really shouldn't arrive here, just being cautious:
+ # The only sequenced types output by the tokenizer are the
+ # opening & closing containers and the ternary types. Each
+ # of those was checked above. So we would only get here
+ # if the tokenizer has been changed to mark some other
+ # tokens with sequence numbers.
+ if (DEVEL_MODE) {
+ Fault(
+"Unexpected token type with sequence number: type='$type', seqno='$type_sequence'"
+ );
+ }
}
}
}
- elsif ( $ris_sub_block->{$seqno}
- && ( $dump_all_types || $rdump_block_types->{'sub'} ) )
- {
- $type = 'sub';
- # what we want:
- # $block_type $name
- # 'sub setidentifier($)' => 'setidentifier'
- # 'method setidentifier($)' => 'setidentifier'
- my @parts = split /\s+/, $block_type;
- $name = $parts[1];
- $name =~ s/\(.*$//;
- }
- elsif (
- $block_type =~ /^(package|class)\b/
- && ( $dump_all_types
- || $rdump_block_types->{'package'}
- || $rdump_block_types->{'class'} )
- )
- {
- $type = 'class';
- my @parts = split /\s+/, $block_type;
- $name = $parts[1];
- $name =~ s/\(.*$//;
- }
- elsif (
- $is_loop_type{$block_type}
- && ( $dump_all_types
- || $rdump_block_types->{$block_type}
- || $rdump_block_types->{ $block_type . $inner_loop_plus }
- || $rdump_block_types->{$inner_loop_plus} )
- )
- {
- $type = $block_type . $inner_loop_plus;
- }
- elsif ( $dump_all_types || $rdump_block_types->{$block_type} ) {
- if ( $is_loop_type{$block_type} ) {
- $name = $self->find_loop_label($seqno);
+ # Remember the most recent two non-blank, non-comment tokens.
+ # NOTE: the phantom semicolon code may change the output stack
+ # without updating these values. Phantom semicolons are considered
+ # the same as blanks for now, but future needs might change that.
+ # See the related note in sub 'add_phantom_semicolon'.
+ $last_last_nonblank_code_type = $last_nonblank_code_type;
+ $last_last_nonblank_code_token = $last_nonblank_code_token;
+
+ $last_nonblank_code_type = $type;
+ $last_nonblank_code_token = $token;
+ $last_nonblank_block_type = $block_type;
+
+ # count selected types
+ if ( $is_counted_type{$type} ) {
+ my $seqno = $seqno_stack{ $depth_next - 1 };
+ if ( defined($seqno) ) {
+ $rtype_count_by_seqno->{$seqno}->{$type}++;
+
+ # Count line-ending commas for -bbx
+ if ( $type eq ',' && $Ktoken_vars == $Klast_old_code ) {
+ $rlec_count_by_seqno->{$seqno}++;
+ }
+
+ # Remember index of first here doc target
+ if ( $type eq 'h' && !$K_first_here_doc_by_seqno{$seqno} ) {
+ my $KK_new = @{$rLL_new};
+ $K_first_here_doc_by_seqno{$seqno} = $KK_new;
+
+ # the here doc which follows makes the container broken
+ if ( !$ris_permanently_broken->{$seqno} ) {
+ $ris_permanently_broken->{$seqno} = 1;
+ $self->mark_parent_containers( $seqno,
+ $ris_permanently_broken );
+ }
+ }
}
- $type = $block_type;
- }
- else {
- next;
}
+ }
- push @selected_blocks,
- {
- K_opening => $K_opening,
- K_closing => $K_closing,
- line_start => $lx_open + 1,
- name => $name,
- type => $type,
- level => $level,
- max_change => $max_change,
- block_count => $block_count,
- };
- } ## END loop to get info for selected blocks
- return \@selected_blocks;
-} ## end sub find_selected_blocks
+ # cumulative length is the length sum including this token
+ $cumulative_length += $token_length;
-sub dump_block_summary {
- my ($self) = @_;
+ $item->[_CUMULATIVE_LENGTH_] = $cumulative_length;
+ $item->[_TOKEN_LENGTH_] = $token_length;
- # Dump information about selected code blocks to STDOUT
- # This sub is called when
- # --dump-block-summary (-dbs) is set.
+ # For reference, here is how to get the parent sequence number.
+ # This is not used because it is slower than finding it on the fly
+ # in sub parent_seqno_by_K:
- # The following controls are available:
- # --dump-block-types=s (-dbt=s), where s is a list of block types
- # (if else elsif for foreach while do ... sub) ; default is 'sub'
- # --dump-block-minimum-lines=n (-dbml=n), where n is the minimum
- # number of lines for a block to be included; default is 20.
+ # my $seqno_parent =
+ # $type_sequence && $is_opening_token{$token}
+ # ? $seqno_stack{ $depth_next - 2 }
+ # : $seqno_stack{ $depth_next - 1 };
+ # my $KK = @{$rLL_new};
+ # $rseqno_of_parent_by_K->{$KK} = $seqno_parent;
- my $rOpts_dump_block_types = $rOpts->{'dump-block-types'};
- if ( !defined($rOpts_dump_block_types) ) { $rOpts_dump_block_types = 'sub' }
- $rOpts_dump_block_types =~ s/^\s+//;
- $rOpts_dump_block_types =~ s/\s+$//;
- my @list = split /\s+/, $rOpts_dump_block_types;
- my %dump_block_types;
- @{dump_block_types}{@list} = (1) x scalar(@list);
+ # and finally, add this item to the new array
+ push @{$rLL_new}, $item;
+ return;
+} ## end sub store_token
- # Get block info
- my $rselected_blocks = $self->find_selected_blocks( \%dump_block_types );
+sub add_phantom_semicolon {
- # Get package info
- my $rpackage_list = $self->find_selected_packages( \%dump_block_types );
+ my ( $self, $KK ) = @_;
- return if ( !@{$rselected_blocks} && !@{$rpackage_list} );
+ # The token at old index $KK is a closing block brace, and not preceded
+ # by a semicolon. Before we push it onto the new token list, we may
+ # want to add a phantom semicolon which can be activated if the the
+ # block is broken on output.
- my $input_stream_name = get_input_stream_name();
+ # We are only adding semicolons for certain block types
+ my $type_sequence = $rLL->[$KK]->[_TYPE_SEQUENCE_];
+ return unless ($type_sequence);
+ my $block_type = $rblock_type_of_seqno->{$type_sequence};
+ return unless ($block_type);
+ return
+ unless ( $ok_to_add_semicolon_for_block_type{$block_type}
+ || $block_type =~ /^(sub|package)/
+ || $block_type =~ /^\w+\:$/ );
- # Get code line count
- my $rcode_line_count = $self->find_code_line_count();
+ # Find the most recent token in the new token list
+ my $Kp = $self->K_previous_nonblank( undef, $rLL_new );
+ return unless ( defined($Kp) ); # shouldn't happen except for bad input
- # Get mccabe count
- my $rmccabe_count_sum = $self->find_mccabe_count();
+ my $type_p = $rLL_new->[$Kp]->[_TYPE_];
+ my $token_p = $rLL_new->[$Kp]->[_TOKEN_];
+ my $type_sequence_p = $rLL_new->[$Kp]->[_TYPE_SEQUENCE_];
- my $rOpts_dump_block_minimum_lines = $rOpts->{'dump-block-minimum-lines'};
- if ( !defined($rOpts_dump_block_minimum_lines) ) {
- $rOpts_dump_block_minimum_lines = 20;
- }
+ # Do not add a semicolon if...
+ return
+ if (
- my $rLL = $self->[_rLL_];
+ # it would follow a comment (and be isolated)
+ $type_p eq '#'
- # merge blocks and packages, add various counts, filter and print to STDOUT
- my $routput_lines = [];
- foreach my $item ( @{$rselected_blocks}, @{$rpackage_list} ) {
+ # it follows a code block ( because they are not always wanted
+ # there and may add clutter)
+ || $type_sequence_p && $rblock_type_of_seqno->{$type_sequence_p}
- my $K_opening = $item->{K_opening};
- my $K_closing = $item->{K_closing};
+ # it would follow a label
+ || $type_p eq 'J'
- # define total number of lines
- my $lx_open = $rLL->[$K_opening]->[_LINE_INDEX_];
- my $lx_close = $rLL->[$K_closing]->[_LINE_INDEX_];
- my $line_count = $lx_close - $lx_open + 1;
+ # it would be inside a 'format' statement (and cause syntax error)
+ || ( $type_p eq 'k'
+ && $token_p =~ /format/ )
- # define total number of lines of code excluding blanks, comments, pod
- my $code_lines_open = $rcode_line_count->[$lx_open];
- my $code_lines_close = $rcode_line_count->[$lx_close];
- my $code_lines = 0;
- if ( defined($code_lines_open) && defined($code_lines_close) ) {
- $code_lines = $code_lines_close - $code_lines_open + 1;
- }
+ );
- # filter out blocks below the selected code line limit
- if ( $code_lines < $rOpts_dump_block_minimum_lines ) {
- next;
- }
+ # Do not add a semicolon if it would impede a weld with an immediately
+ # following closing token...like this
+ # { ( some code ) }
+ # ^--No semicolon can go here
- # add mccabe_count for this block
- my $mccabe_closing = $rmccabe_count_sum->{ $K_closing + 1 };
- my $mccabe_opening = $rmccabe_count_sum->{$K_opening};
- my $mccabe_count = 1; # add 1 to match Perl::Critic
- if ( defined($mccabe_opening) && defined($mccabe_closing) ) {
- $mccabe_count += $mccabe_closing - $mccabe_opening;
- }
+ # look at the previous token... note use of the _NEW rLL array here,
+ # but sequence numbers are invariant.
+ my $seqno_inner = $rLL_new->[$Kp]->[_TYPE_SEQUENCE_];
- # Store the final set of print variables
- push @{$routput_lines}, [
+ # If it is also a CLOSING token we have to look closer...
+ if (
+ $seqno_inner
+ && $is_closing_token{$token_p}
- $input_stream_name,
- $item->{line_start},
- $line_count,
- $code_lines,
- $item->{type},
- $item->{name},
- $item->{level},
- $item->{max_change},
- $item->{block_count},
- $mccabe_count,
+ # we only need to look if there is just one inner container..
+ && defined( $rchildren_of_seqno->{$type_sequence} )
+ && @{ $rchildren_of_seqno->{$type_sequence} } == 1
+ )
+ {
- ];
+ # Go back and see if the corresponding two OPENING tokens are also
+ # together. Note that we are using the OLD K indexing here:
+ my $K_outer_opening = $K_old_opening_by_seqno{$type_sequence};
+ if ( defined($K_outer_opening) ) {
+ my $K_nxt = $self->K_next_nonblank($K_outer_opening);
+ if ( defined($K_nxt) ) {
+ my $seqno_nxt = $rLL->[$K_nxt]->[_TYPE_SEQUENCE_];
+
+ # Is the next token after the outer opening the same as
+ # our inner closing (i.e. same sequence number)?
+ # If so, do not insert a semicolon here.
+ return if ( $seqno_nxt && $seqno_nxt == $seqno_inner );
+ }
+ }
}
- return unless @{$routput_lines};
+ # We will insert an empty semicolon here as a placeholder. Later, if
+ # it becomes the last token on a line, we will bring it to life. The
+ # advantage of doing this is that (1) we just have to check line
+ # endings, and (2) the phantom semicolon has zero width and therefore
+ # won't cause needless breaks of one-line blocks.
+ my $Ktop = -1;
+ if ( $rLL_new->[$Ktop]->[_TYPE_] eq 'b'
+ && $want_left_space{';'} == WS_NO )
+ {
- # Sort blocks and packages on starting line number
- my @sorted_lines = sort { $a->[1] <=> $b->[1] } @{$routput_lines};
+ # convert the blank into a semicolon..
+ # be careful: we are working on the new stack top
+ # on a token which has been stored.
+ my $rcopy = copy_token_as_type( $rLL_new->[$Ktop], 'b', SPACE );
- print STDOUT
-"file,line,line_count,code_lines,type,name,level,max_change,block_count,mccabe_count\n";
+ # Convert the existing blank to:
+ # a phantom semicolon for one_line_block option = 0 or 1
+ # a real semicolon for one_line_block option = 2
+ my $tok = EMPTY_STRING;
+ my $len_tok = 0;
+ if ( $rOpts_one_line_block_semicolons == 2 ) {
+ $tok = ';';
+ $len_tok = 1;
+ }
- foreach my $rline_vars (@sorted_lines) {
- my $line = join( ",", @{$rline_vars} ) . "\n";
- print STDOUT $line;
+ $rLL_new->[$Ktop]->[_TOKEN_] = $tok;
+ $rLL_new->[$Ktop]->[_TOKEN_LENGTH_] = $len_tok;
+ $rLL_new->[$Ktop]->[_TYPE_] = ';';
+
+ $self->[_rtype_count_by_seqno_]->{$type_sequence}->{';'}++;
+
+ # NOTE: we are changing the output stack without updating variables
+ # $last_nonblank_code_type, etc. Future needs might require that
+ # those variables be updated here. For now, it seems ok to skip
+ # this.
+
+ # Then store a new blank
+ $self->store_token($rcopy);
+ }
+ else {
+
+ # Patch for issue c078: keep line indexes in order. If the top
+ # token is a space that we are keeping (due to '-wls=';') then
+ # we have to check that old line indexes stay in order.
+ # In very rare
+ # instances in which side comments have been deleted and converted
+ # into blanks, we may have filtered down multiple blanks into just
+ # one. In that case the top blank may have a higher line number
+ # than the previous nonblank token. Although the line indexes of
+ # blanks are not really significant, we need to keep them in order
+ # in order to pass error checks.
+ if ( $rLL_new->[$Ktop]->[_TYPE_] eq 'b' ) {
+ my $old_top_ix = $rLL_new->[$Ktop]->[_LINE_INDEX_];
+ my $new_top_ix = $rLL_new->[$Kp]->[_LINE_INDEX_];
+ if ( $new_top_ix < $old_top_ix ) {
+ $rLL_new->[$Ktop]->[_LINE_INDEX_] = $new_top_ix;
+ }
+ }
+
+ my $rcopy = copy_token_as_type( $rLL_new->[$Kp], ';', EMPTY_STRING );
+ $self->store_token($rcopy);
}
return;
-} ## end sub dump_block_summary
+} ## end sub add_phantom_semicolon
-sub set_CODE_type {
- my ($self) = @_;
+sub delay_trailing_comma_op {
- # Examine each line of code and set a flag '$CODE_type' to describe it.
- # Also return a list of lines with side comments.
+ my ( $self, $if_add, $stable_flag ) = @_;
- my $rLL = $self->[_rLL_];
- my $rlines = $self->[_rlines_];
+ # Given:
+ # $if_add = true for add comma operation, false for delete
+ # $stable_flag = true if -btct setting makes this stable
- my $rOpts_format_skipping_begin = $rOpts->{'format-skipping-begin'};
- my $rOpts_format_skipping_end = $rOpts->{'format-skipping-end'};
- my $rOpts_static_block_comment_prefix =
- $rOpts->{'static-block-comment-prefix'};
+ # Returns:
+ # true if a trailing comma operation should be skipped
+ # false otherwise
+
+ # This can prevent unwanted path-dependent formatting when both
+ # line breaks are changing and we are only adding or deleting
+ # commas, but not both. See git #156
- # Remember indexes of lines with side comments
- my @ix_side_comments;
+ # Get user setting, if any
+ my $delay = $rOpts->{'delay-trailing-comma-operations'};
- my $In_format_skipping_section = 0;
- my $Saw_VERSION_in_this_file = 0;
- my $has_side_comment = 0;
- my ( $Kfirst, $Klast );
- my $CODE_type;
+ # Set default if not defined:
+ # - if deleting: delay always ok
+ # - if adding: delay ok unless breaks will be stabilized by -btct setting
+ # Explanation:
+ # - deleting can be irreversible, so it is safest to delay
+ # - adding, along with -btct, can save original line breaks which would
+ # be lost otherwise, so it may be best not to delay.
+ if ( !defined($delay) ) {
+ $delay = $if_add ? !$stable_flag : 1;
+ }
- # Loop to set CODE_type
+ return if ( !$delay );
- # Possible CODE_types
- # 'VB' = Verbatim - line goes out verbatim (a quote)
- # 'FS' = Format Skipping - line goes out verbatim
- # 'BL' = Blank Line
- # 'HSC' = Hanging Side Comment - fix this hanging side comment
- # 'SBCX'= Static Block Comment Without Leading Space
- # 'SBC' = Static Block Comment
- # 'BC' = Block Comment - an ordinary full line comment
- # 'IO' = Indent Only - line goes out unchanged except for indentation
- # 'NIN' = No Internal Newlines - line does not get broken
- # 'VER' = VERSION statement
- # '' = ordinary line of code with no restrictions
+ # We must be at the first of multiple iterations for a delay
+ my $it = Perl::Tidy::get_iteration_count();
+ my $max_iterations = $rOpts->{'iterations'};
+ if ( $it == 1 && $max_iterations > 1 ) {
- my $ix_line = -1;
- foreach my $line_of_tokens ( @{$rlines} ) {
- $ix_line++;
- my $line_type = $line_of_tokens->{_line_type};
+ # if so, set flag to request another iteration
+ $self->[_want_second_iteration_] = 1;
+ return 1;
+ }
+ return;
+} ## end sub delay_trailing_comma_op
- my $Last_line_had_side_comment = $has_side_comment;
- if ($has_side_comment) {
- push @ix_side_comments, $ix_line - 1;
- $has_side_comment = 0;
- }
+my %is_b_i_h;
- my $last_CODE_type = $CODE_type;
- $CODE_type = EMPTY_STRING;
+BEGIN {
+ my @q = qw( b i h );
+ @is_b_i_h{@q} = (1) x scalar(@q);
+}
- if ( $line_type ne 'CODE' ) {
- next;
- }
+sub add_trailing_comma {
- my $Klast_prev = $Klast;
+ # Implement the --add-trailing-commas flag to the line end before index $KK:
- my $rK_range = $line_of_tokens->{_rK_range};
- ( $Kfirst, $Klast ) = @{$rK_range};
+ my ( $self, $KK, $Kfirst, $trailing_comma_add_rule ) = @_;
- my $input_line = $line_of_tokens->{_line_text};
- my $jmax = defined($Kfirst) ? $Klast - $Kfirst : -1;
+ # Input parameter:
+ # $KK = index of closing token in old ($rLL) token list
+ # which starts a new line and is not preceded by a comma
+ # $Kfirst = index of first token on the current line of input tokens
+ # $trailing_comma_add_rule = user control flags for adding trailng commas
- my $is_block_comment = 0;
- if ( $jmax >= 0 && $rLL->[$Klast]->[_TYPE_] eq '#' ) {
- if ( $jmax == 0 ) { $is_block_comment = 1; }
- else { $has_side_comment = 1 }
- }
+ # For example, we might want to add a comma here:
- # Write line verbatim if we are in a formatting skip section
- if ($In_format_skipping_section) {
+ # bless {
+ # _name => $name,
+ # _price => $price,
+ # _rebate => $rebate <------ location of possible bare comma
+ # }, $pkg;
+ # ^-------------------closing token at index $KK on new line
- # Note: extra space appended to comment simplifies pattern matching
- if (
- $is_block_comment
+ # Do not add a comma if it would follow a comment
+ my $Kp = $self->K_previous_nonblank( undef, $rLL_new );
+ return unless ( defined($Kp) );
+ my $type_p = $rLL_new->[$Kp]->[_TYPE_];
+ return if ( $type_p eq '#' );
- # optional fast pre-check
- && ( substr( $rLL->[$Kfirst]->[_TOKEN_], 0, 4 ) eq '#>>>'
- || $rOpts_format_skipping_end )
+ return unless ($trailing_comma_add_rule);
+ my ( $trailing_comma_style, $paren_flag, $stable_flag ) =
+ @{$trailing_comma_add_rule};
- && ( $rLL->[$Kfirst]->[_TOKEN_] . SPACE ) =~
- /$format_skipping_pattern_end/
- )
- {
- $In_format_skipping_section = 0;
- my $input_line_no = $line_of_tokens->{_line_number};
- write_logfile_entry(
- "Line $input_line_no: Exiting format-skipping section\n");
- }
- $CODE_type = 'FS';
- next;
- }
+ # see if the user wants a trailing comma here
+ my $match =
+ $self->match_trailing_comma_rule( $KK, $Kfirst, $Kp,
+ $trailing_comma_style, $paren_flag, $stable_flag, 1 );
- # Check for a continued quote..
- if ( $line_of_tokens->{_starting_in_quote} ) {
+ # Do not add if this would cause excess line length and possible
+ # instability. This is b1458 fix method 1. This is more general than fix
+ # method 2, below, which also worked. So this is not needed for b1458 but
+ # re-activated and updated for b1495.
+ if ( $match
+ && $rOpts_delete_trailing_commas
+ && $KK > 0 )
+ {
+ if ( !$stable_flag && $is_b_i_h{$trailing_comma_style} ) {
+ my $line_index = $rLL->[ $KK - 1 ]->[_LINE_INDEX_];
+ my $rlines = $self->[_rlines_];
+ my $line_of_tokens = $rlines->[$line_index];
+ my $input_line = $line_of_tokens->{_line_text};
+ my $len =
+ $length_function
+ ? $length_function->($input_line) - 1
+ : length($input_line) - 1;
+ my $new_len = $want_left_space{','} ? $len + 2 : $len + 1;
+ my $level = $rLL->[$Kfirst]->[_LEVEL_];
+ my $max_len = $maximum_line_length_at_level[$level];
- # A line which is entirely a quote or pattern must go out
- # verbatim. Note: the \n is contained in $input_line.
- if ( $jmax <= 0 ) {
- if ( $self->[_save_logfile_] && $input_line =~ /\t/ ) {
- my $input_line_number = $line_of_tokens->{_line_number};
- $self->note_embedded_tab($input_line_number);
- }
- $CODE_type = 'VB';
- next;
+ if ( $new_len > $max_len ) {
+ $match = 0;
}
}
+ }
- # See if we are entering a formatting skip section
- if (
- $is_block_comment
+ # If so, and not delayed, add a comma
+ if ( $match && !$self->delay_trailing_comma_op( 1, $stable_flag ) ) {
- # optional fast pre-check
- && ( substr( $rLL->[$Kfirst]->[_TOKEN_], 0, 4 ) eq '#<<<'
- || $rOpts_format_skipping_begin )
+ # any blank after the comma will be added before the closing paren,
+ # below
+ $self->store_new_token( ',', ',', $Kp );
+ }
+ return;
- && $rOpts_format_skipping
- && ( $rLL->[$Kfirst]->[_TOKEN_] . SPACE ) =~
- /$format_skipping_pattern_begin/
- )
- {
- $In_format_skipping_section = 1;
- my $input_line_no = $line_of_tokens->{_line_number};
- write_logfile_entry(
- "Line $input_line_no: Entering format-skipping section\n");
- $CODE_type = 'FS';
- next;
- }
+} ## end sub add_trailing_comma
- # ignore trailing blank tokens (they will get deleted later)
- if ( $jmax > 0 && $rLL->[$Klast]->[_TYPE_] eq 'b' ) {
- $jmax--;
- }
+sub delete_trailing_comma {
- # blank line..
- if ( $jmax < 0 ) {
- $CODE_type = 'BL';
- next;
- }
+ my ( $self, $KK, $Kfirst, $trailing_comma_delete_rule ) = @_;
- # Handle comments
- if ($is_block_comment) {
+ # Apply the --delete-trailing-commas flag to the comma before index $KK
- # see if this is a static block comment (starts with ## by default)
- my $is_static_block_comment = 0;
- my $no_leading_space = substr( $input_line, 0, 1 ) eq '#';
- if (
+ # Input parameter:
+ # $KK = index of a closing token in OLD ($rLL) token list
+ # which is preceded by a comma on the same line.
+ # $Kfirst = index of first token on the current line of input tokens
+ # $delete_option = user control flag
- # optional fast pre-check
- (
- substr( $rLL->[$Kfirst]->[_TOKEN_], 0, 2 ) eq '##'
- || $rOpts_static_block_comment_prefix
- )
+ # Returns true if the comma was deleted
- && $rOpts_static_block_comments
- && $input_line =~ /$static_block_comment_pattern/
- )
- {
- $is_static_block_comment = 1;
- }
+ # For example, we might want to delete this comma:
+ # my @asset = ("FASMX", "FASGX", "FASIX",);
+ # | |^--------token at index $KK
+ # | ^------comma of interest
+ # ^-------------token at $Kfirst
- # Check for comments which are line directives
- # Treat exactly as static block comments without leading space
- # reference: perlsyn, near end, section Plain Old Comments (Not!)
- # example: '# line 42 "new_filename.plx"'
- if (
- $no_leading_space
- && $input_line =~ /^\# \s*
- line \s+ (\d+) \s*
- (?:\s("?)([^"]+)\2)? \s*
- $/x
- )
- {
- $is_static_block_comment = 1;
- }
+ # Verify that the previous token is a comma. Note that we are working in
+ # the new token list $rLL_new.
+ my $Kp = $self->K_previous_nonblank( undef, $rLL_new );
+ return unless ( defined($Kp) );
+ if ( $rLL_new->[$Kp]->[_TYPE_] ne ',' ) {
- # look for hanging side comment ...
- if (
- $Last_line_had_side_comment # last line had side comment
- && !$no_leading_space # there is some leading space
- && !
- $is_static_block_comment # do not make static comment hanging
- )
- {
+ # there must be a '#' between the ',' and closing token; give up.
+ return;
+ }
- # continuing an existing HSC chain?
- if ( $last_CODE_type eq 'HSC' ) {
- $has_side_comment = 1;
- $CODE_type = 'HSC';
- next;
- }
+ # Do not delete commas when formatting under stress to avoid instability.
+ # This fixes b1389, b1390, b1391, b1392. The $high_stress_level has
+ # been found to work well for trailing commas.
+ if ( $rLL_new->[$Kp]->[_LEVEL_] >= $high_stress_level ) {
+ return;
+ }
- # starting a new HSC chain?
- elsif (
+ return unless ($trailing_comma_delete_rule);
+ my ( $trailing_comma_style, $paren_flag, $stable_flag ) =
+ @{$trailing_comma_delete_rule};
- $rOpts->{'hanging-side-comments'} # user is allowing
- # hanging side comments
- # like this
+ # See if the user wants this trailing comma
+ my $match =
+ $self->match_trailing_comma_rule( $KK, $Kfirst, $Kp,
+ $trailing_comma_style, $paren_flag, $stable_flag, 0 );
- && ( defined($Klast_prev) && $Klast_prev > 1 )
+ # Patch: the --noadd-whitespace flag can cause instability in complex
+ # structures. In this case do not delete the comma. Fixes b1409.
+ if ( !$match && !$rOpts_add_whitespace ) {
+ my $Kn = $self->K_next_nonblank($KK);
+ if ( defined($Kn) ) {
+ my $type_n = $rLL->[$Kn]->[_TYPE_];
+ if ( $type_n ne ';' && $type_n ne '#' ) { return }
+ }
+ }
- # and the previous side comment was not static (issue c070)
- && !(
- $rOpts->{'static-side-comments'}
- && $rLL->[$Klast_prev]->[_TOKEN_] =~
- /$static_side_comment_pattern/
- )
+ # b1458 fix method 2: do not remove a comma after a leading brace type 'R'
+ # since it is under stress and could become unstable. This is a more
+ # specific fix but the logic is cleaner than method 1.
+ if ( !$match
+ && $rOpts_add_trailing_commas
+ && $rLL->[$Kfirst]->[_TYPE_] eq 'R' )
+ {
- )
- {
+ # previous old token should be the comma..
+ my $Kp_old = $self->K_previous_nonblank( $KK, $rLL );
+ if ( defined($Kp_old)
+ && $Kp_old > $Kfirst
+ && $rLL->[$Kp_old]->[_TYPE_] eq ',' )
+ {
- # and it is not a closing side comment (issue c070).
- my $K_penult = $Klast_prev - 1;
- $K_penult -= 1 if ( $rLL->[$K_penult]->[_TYPE_] eq 'b' );
- my $follows_csc =
- ( $rLL->[$K_penult]->[_TOKEN_] eq '}'
- && $rLL->[$K_penult]->[_TYPE_] eq '}'
- && $rLL->[$Klast_prev]->[_TOKEN_] =~
- /$closing_side_comment_prefix_pattern/ );
+ # if the comma follows the first token of the line ..
+ my $Kpp_old = $self->K_previous_nonblank( $Kp_old, $rLL );
+ if ( defined($Kpp_old) && $Kpp_old eq $Kfirst ) {
- if ( !$follows_csc ) {
- $has_side_comment = 1;
- $CODE_type = 'HSC';
- next;
- }
- }
+ # do not delete it
+ $match = 1;
}
+ }
+ }
+
+ # If no match and not delayed
+ if ( !$match && !$self->delay_trailing_comma_op( 0, $stable_flag ) ) {
+
+ # delete it
+ return $self->unstore_last_nonblank_token(',');
+ }
+ return;
+
+} ## end sub delete_trailing_comma
+
+sub delete_weld_interfering_comma {
+
+ my ( $self, $KK ) = @_;
+
+ # Apply the flag '--delete-weld-interfering-commas' to the comma
+ # before index $KK
+
+ # Input parameter:
+ # $KK = index of a closing token in OLD ($rLL) token list
+ # which is preceded by a comma on the same line.
- if ($is_static_block_comment) {
- $CODE_type = $no_leading_space ? 'SBCX' : 'SBC';
- next;
- }
- elsif ($Last_line_had_side_comment
- && !$rOpts_maximum_consecutive_blank_lines
- && $rLL->[$Kfirst]->[_LEVEL_] > 0 )
- {
- # Emergency fix to keep a block comment from becoming a hanging
- # side comment. This fix is for the case that blank lines
- # cannot be inserted. There is related code in sub
- # 'process_line_of_CODE'
- $CODE_type = 'SBCX';
- next;
- }
- else {
- $CODE_type = 'BC';
- next;
- }
- }
+ # Returns true if the comma was deleted
- # End of comments. Handle a line of normal code:
+ # For example, we might want to delete this comma:
- if ($rOpts_indent_only) {
- $CODE_type = 'IO';
- next;
- }
+ # my $tmpl = { foo => {no_override => 1, default => 42}, };
+ # || ^------$KK
+ # |^---$Kp
+ # $Kpp---^
+ #
+ # Note that:
+ # index $KK is in the old $rLL array, but
+ # indexes $Kp and $Kpp are in the new $rLL_new array.
- if ( !$rOpts_add_newlines ) {
- $CODE_type = 'NIN';
- next;
- }
+ my $type_sequence = $rLL->[$KK]->[_TYPE_SEQUENCE_];
+ return unless ($type_sequence);
- # Patch needed for MakeMaker. Do not break a statement
- # in which $VERSION may be calculated. See MakeMaker.pm;
- # this is based on the coding in it.
- # The first line of a file that matches this will be eval'd:
- # /([\$*])(([\w\:\']*)\bVERSION)\b.*\=/
- # Examples:
- # *VERSION = \'1.01';
- # ( $VERSION ) = '$Revision: 1.74 $ ' =~ /\$Revision:\s+([^\s]+)/;
- # We will pass such a line straight through without breaking
- # it unless -npvl is used.
+ # Find the previous token and verify that it is a comma.
+ my $Kp = $self->K_previous_nonblank( undef, $rLL_new );
+ return unless ( defined($Kp) );
+ if ( $rLL_new->[$Kp]->[_TYPE_] ne ',' ) {
- # Patch for problem reported in RT #81866, where files
- # had been flattened into a single line and couldn't be
- # tidied without -npvl. There are two parts to this patch:
- # First, it is not done for a really long line (80 tokens for now).
- # Second, we will only allow up to one semicolon
- # before the VERSION. We need to allow at least one semicolon
- # for statements like this:
- # require Exporter; our $VERSION = $Exporter::VERSION;
- # where both statements must be on a single line for MakeMaker
+ # it is not a comma, so give up ( it is probably a '#' )
+ return;
+ }
- if ( !$Saw_VERSION_in_this_file
- && $jmax < 80
- && $input_line =~
- /^[^;]*;?[^;]*([\$*])(([\w\:\']*)\bVERSION)\b.*\=/ )
- {
- $Saw_VERSION_in_this_file = 1;
- write_logfile_entry("passing VERSION line; -npvl deactivates\n");
+ # This must be the only comma in this list
+ my $rtype_count = $self->[_rtype_count_by_seqno_]->{$type_sequence};
+ return
+ unless ( defined($rtype_count)
+ && $rtype_count->{','}
+ && $rtype_count->{','} == 1 );
- # This code type has lower priority than others
- $CODE_type = 'VER';
- next;
+ # Back up to the previous closing token
+ my $Kpp = $self->K_previous_nonblank( $Kp, $rLL_new );
+ return unless ( defined($Kpp) );
+ my $seqno_pp = $rLL_new->[$Kpp]->[_TYPE_SEQUENCE_];
+ my $type_pp = $rLL_new->[$Kpp]->[_TYPE_];
+
+ # The containers must be nesting (i.e., sequence numbers must differ by 1 )
+ if ( $seqno_pp && $is_closing_type{$type_pp} ) {
+ if ( $seqno_pp == $type_sequence + 1 ) {
+
+ # remove the ',' from the top of the new token list
+ return $self->unstore_last_nonblank_token(',');
}
}
- continue {
- $line_of_tokens->{_code_type} = $CODE_type;
- }
+ return;
- if ($has_side_comment) {
- push @ix_side_comments, $ix_line;
- }
+} ## end sub delete_weld_interfering_comma
- return \@ix_side_comments;
-} ## end sub set_CODE_type
+sub add_interbracket_arrow {
+ my ($self) = @_;
-sub find_non_indenting_braces {
+ # Add a new '->' after the last token on the stack
+ my $Kp = $self->K_previous_nonblank( undef, $rLL_new );
+ return unless ( defined($Kp) );
- my ( $self, $rix_side_comments ) = @_;
- return unless ( $rOpts->{'non-indenting-braces'} );
- my $rLL = $self->[_rLL_];
- return unless ( defined($rLL) && @{$rLL} );
- my $rlines = $self->[_rlines_];
- my $rblock_type_of_seqno = $self->[_rblock_type_of_seqno_];
- my $rseqno_non_indenting_brace_by_ix =
- $self->[_rseqno_non_indenting_brace_by_ix_];
+ # verify that we are adding after a } or ]
+ my $type_p = $rLL_new->[$Kp]->[_TYPE_];
+ if ( $type_p ne 'R' && $type_p ne ']' ) {
+ DEVEL_MODE && Fault("trying to store new arrow after type $type_p");
+ return;
+ }
- foreach my $ix ( @{$rix_side_comments} ) {
- my $line_of_tokens = $rlines->[$ix];
- my $line_type = $line_of_tokens->{_line_type};
- if ( $line_type ne 'CODE' ) {
+ $self->store_new_token( '->', '->', $Kp );
+ if ( $want_right_space{'->'} == WS_YES ) { $self->store_token() }
- # shouldn't happen
- DEVEL_MODE && Fault("unexpected line_type=$line_type\n");
- next;
- }
- my $rK_range = $line_of_tokens->{_rK_range};
- my ( $Kfirst, $Klast ) = @{$rK_range};
- unless ( defined($Kfirst) && $rLL->[$Klast]->[_TYPE_] eq '#' ) {
+ return;
+} ## end sub add_interbracket_arrow
- # shouldn't happen
- DEVEL_MODE && Fault("did not get a comment\n");
- next;
- }
- next unless ( $Klast > $Kfirst ); # maybe HSC
- my $token_sc = $rLL->[$Klast]->[_TOKEN_];
- my $K_m = $Klast - 1;
- my $type_m = $rLL->[$K_m]->[_TYPE_];
- if ( $type_m eq 'b' && $K_m > $Kfirst ) {
- $K_m--;
- $type_m = $rLL->[$K_m]->[_TYPE_];
- }
- my $seqno_m = $rLL->[$K_m]->[_TYPE_SEQUENCE_];
- if ($seqno_m) {
- my $block_type_m = $rblock_type_of_seqno->{$seqno_m};
+sub delete_interbracket_arrow {
+ my ($self) = @_;
- # The pattern ends in \s but we have removed the newline, so
- # we added it back for the match. That way we require an exact
- # match to the special string and also allow additional text.
- $token_sc .= "\n";
- if ( $block_type_m
- && $is_opening_type{$type_m}
- && $token_sc =~ /$non_indenting_brace_pattern/ )
- {
- $rseqno_non_indenting_brace_by_ix->{$ix} = $seqno_m;
- }
- }
+ # Delete the last nonblank token on the stack which is an '->'
+ my $Kp = $self->K_previous_nonblank( undef, $rLL_new );
+ return unless ( defined($Kp) );
+
+ # verify that we are deleting an '->'
+ my $type_p = $rLL_new->[$Kp]->[_TYPE_];
+ if ( $type_p ne '->' ) {
+ DEVEL_MODE && Fault("trying to delete arrow but type $type_p");
+ return;
}
+
+ $self->unstore_last_nonblank_token( '->', -1 );
+
return;
-} ## end sub find_non_indenting_braces
+} ## end sub delete_interbracket_arrow
-sub delete_side_comments {
- my ( $self, $rix_side_comments ) = @_;
+sub unstore_last_nonblank_token {
- # Given a list of indexes of lines with side comments, handle any
- # requested side comment deletions.
+ my ( $self, $type, ($want_space) ) = @_;
- my $rLL = $self->[_rLL_];
- my $rlines = $self->[_rlines_];
- my $rblock_type_of_seqno = $self->[_rblock_type_of_seqno_];
- my $rseqno_non_indenting_brace_by_ix =
- $self->[_rseqno_non_indenting_brace_by_ix_];
+ # remove the most recent nonblank token from the new token list
+ # Input parameter:
+ # $type = type to be removed (for safety check)
+ # $want_space = telling if a space should remain
+ # 1 => always
+ # 0 or undef => only if there was one (used for ',')
+ # -1 => never (used for '->')
- foreach my $ix ( @{$rix_side_comments} ) {
- my $line_of_tokens = $rlines->[$ix];
- my $line_type = $line_of_tokens->{_line_type};
+ # Returns true if success
+ # false if error
- # This fault shouldn't happen because we only saved CODE lines with
- # side comments in the TASK 1 loop above.
- if ( $line_type ne 'CODE' ) {
- if (DEVEL_MODE) {
- my $lno = $ix + 1;
- Fault(<<EOM);
-Hit unexpected line_type = '$line_type' near line $lno while deleting side comments, should be 'CODE'
-EOM
- }
- next;
- }
+ # This was written and is used for removing commas, but might
+ # be useful for other tokens. If it is ever used for other tokens
+ # then the issue of what to do about the other variables, such
+ # as token counts and the '$last...' vars needs to be considered.
- my $CODE_type = $line_of_tokens->{_code_type};
- my $rK_range = $line_of_tokens->{_rK_range};
- my ( $Kfirst, $Klast ) = @{$rK_range};
+ # Safety check, shouldn't happen
+ if ( @{$rLL_new} < 3 ) {
+ DEVEL_MODE && Fault("not enough tokens on stack to remove '$type'\n");
+ return;
+ }
- unless ( defined($Kfirst) && $rLL->[$Klast]->[_TYPE_] eq '#' ) {
- if (DEVEL_MODE) {
- my $lno = $ix + 1;
- Fault(<<EOM);
-Did not find side comment near line $lno while deleting side comments
-EOM
- }
- next;
- }
+ if ( !defined($want_space) ) { $want_space = 0 }
- my $delete_side_comment =
- $rOpts_delete_side_comments
- && ( $Klast > $Kfirst || $CODE_type eq 'HSC' )
- && (!$CODE_type
- || $CODE_type eq 'HSC'
- || $CODE_type eq 'IO'
- || $CODE_type eq 'NIN' );
+ my ( $rcomma, $rblank );
- # Do not delete special control side comments
- if ( $rseqno_non_indenting_brace_by_ix->{$ix} ) {
- $delete_side_comment = 0;
- }
+ # Note: originally just for ',' but now also for '->'
- if (
- $rOpts_delete_closing_side_comments
- && !$delete_side_comment
- && $Klast > $Kfirst
- && ( !$CODE_type
- || $CODE_type eq 'HSC'
- || $CODE_type eq 'IO'
- || $CODE_type eq 'NIN' )
- )
- {
- my $token = $rLL->[$Klast]->[_TOKEN_];
- my $K_m = $Klast - 1;
- my $type_m = $rLL->[$K_m]->[_TYPE_];
- if ( $type_m eq 'b' && $K_m > $Kfirst ) { $K_m-- }
- my $seqno_m = $rLL->[$K_m]->[_TYPE_SEQUENCE_];
- if ($seqno_m) {
- my $block_type_m = $rblock_type_of_seqno->{$seqno_m};
- if ( $block_type_m
- && $token =~ /$closing_side_comment_prefix_pattern/
- && $block_type_m =~ /$closing_side_comment_list_pattern/ )
- {
- $delete_side_comment = 1;
- }
- }
- } ## end if ( $rOpts_delete_closing_side_comments...)
+ # case 1: pop comma from top of stack
+ if ( $rLL_new->[-1]->[_TYPE_] eq $type ) {
+ $rcomma = pop @{$rLL_new};
+ }
+
+ # case 2: pop blank and then comma from top of stack
+ elsif ($rLL_new->[-1]->[_TYPE_] eq 'b'
+ && $rLL_new->[-2]->[_TYPE_] eq $type )
+ {
+ $rblank = pop @{$rLL_new};
+ $rcomma = pop @{$rLL_new};
+ }
- if ($delete_side_comment) {
+ # case 3: error, shouldn't happen unless bad call
+ else {
+ DEVEL_MODE && Fault("Could not find token type '$type' to remove\n");
+ return;
+ }
- # We are actually just changing the side comment to a blank.
- # This may produce multiple blanks in a row, but sub respace_tokens
- # will check for this and fix it.
- $rLL->[$Klast]->[_TYPE_] = 'b';
- $rLL->[$Klast]->[_TOKEN_] = SPACE;
+ # A note on updating vars set by sub store_token for this comma: If we
+ # reduce the comma count by 1 then we also have to change the variable
+ # $last_nonblank_code_type to be $last_last_nonblank_code_type because
+ # otherwise sub store_token is going to ALSO reduce the comma count.
+ # Alternatively, we can leave the count alone and the
+ # $last_nonblank_code_type alone. Then sub store_token will produce
+ # the correct result. This is simpler and is done here.
- # The -io option outputs the line text, so we have to update
- # the line text so that the comment does not reappear.
- if ( $CODE_type eq 'IO' ) {
- my $line = EMPTY_STRING;
- foreach my $KK ( $Kfirst .. $Klast - 1 ) {
- $line .= $rLL->[$KK]->[_TOKEN_];
- }
- $line =~ s/\s+$//;
- $line_of_tokens->{_line_text} = $line . "\n";
- }
+ # remove a remaining blank if requested
+ if ( $rLL_new->[-1]->[_TYPE_] eq 'b' ) {
- # If we delete a hanging side comment the line becomes blank.
- if ( $CODE_type eq 'HSC' ) { $line_of_tokens->{_code_type} = 'BL' }
+ # current method for deleted '->'
+ if ( $want_space == -1 ) {
+ pop @{$rLL_new};
}
}
- return;
-} ## end sub delete_side_comments
-sub dump_verbatim {
- my $self = shift;
- my $rlines = $self->[_rlines_];
- foreach my $line ( @{$rlines} ) {
- my $input_line = $line->{_line_text};
- $self->write_unindented_line($input_line);
+ # add a blank if requested
+ else {
+ if ( $want_space == 1 ) {
+ $self->store_token();
+ }
+ elsif ( !$want_space ) {
+
+ # add one if there was one (current method for commas)
+ if ( defined($rblank) ) {
+ my $len = length($type);
+ $rblank->[_CUMULATIVE_LENGTH_] -= $len; # fix for deleted comma
+ push @{$rLL_new}, $rblank;
+ }
+ }
+ else {
+ # want_space=-1 so do not add a blank
+ }
}
- return;
-} ## end sub dump_verbatim
-my %wU;
-my %wiq;
-my %is_wit;
-my %is_sigil;
-my %is_nonlist_keyword;
-my %is_nonlist_type;
-my %is_s_y_m_slash;
-my %is_unexpected_equals;
+ return 1;
+} ## end sub unstore_last_nonblank_token
-BEGIN {
+sub is_list_assignment {
+ my ( $self, $K_opening ) = @_;
- # added 'U' to fix cases b1125 b1126 b1127
- my @q = qw(w U);
- @{wU}{@q} = (1) x scalar(@q);
+ # Given:
+ # $K_opening = index in $rLL_new of an opening paren
+ # Return:
+ # true if this is a list assignment of the form '@xxx = ('
+ # false otherwise
- @q = qw(w i q Q G C Z);
- @{wiq}{@q} = (1) x scalar(@q);
+ return unless defined($K_opening);
+ my $Km = $self->K_previous_nonblank( $K_opening, $rLL_new );
+ return unless defined($Km);
+ my $type_m = $rLL_new->[$Km]->[_TYPE_];
+
+ # Look for list assignment like '@list = (' or '@{$ref} = ('
+ # or '%hash = ('
+ if ( $type_m eq '=' ) {
+ my $token_m = $rLL_new->[$Km]->[_TOKEN_];
+ $Km = $self->K_previous_nonblank( $Km, $rLL_new );
+ return unless defined($Km);
+ $type_m = $rLL_new->[$Km]->[_TYPE_];
+ $token_m = $rLL_new->[$Km]->[_TOKEN_];
+
+ # backup past a braced item
+ if ( $token_m eq '}' ) {
+ my $seqno_m = $rLL_new->[$Km]->[_TYPE_SEQUENCE_];
+ return unless ($seqno_m);
+ my $K_opening_m = $self->[_K_opening_container_]->{$seqno_m};
+ return unless defined($K_opening_m);
+ $Km = $self->K_previous_nonblank( $K_opening_m, $rLL_new );
+ return unless defined($Km);
+ $type_m = $rLL_new->[$Km]->[_TYPE_];
+ $token_m = $rLL_new->[$Km]->[_TOKEN_];
+ }
+
+ if ( $type_m eq 'i' || $type_m eq 't' ) {
+ my $sigil = substr( $token_m, 0, 1 );
+ if ( $sigil eq '@' ) {
+ return 1;
+ }
+ }
+ }
+ return;
+} ## end sub is_list_assignment
- @q = qw(w i t);
- @{is_wit}{@q} = (1) x scalar(@q);
+my %is_not_list_paren;
- @q = qw($ & % * @);
- @{is_sigil}{@q} = (1) x scalar(@q);
+BEGIN {
+ ## trailing comma logic ignores opening parens preceded by these tokens
+ my @q = qw# if elsif unless while and or err not && | || ? : ! . #;
+ @is_not_list_paren{@q} = (1) x scalar(@q);
+}
- # Parens following these keywords will not be marked as lists. Note that
- # 'for' is not included and is handled separately, by including 'f' in the
- # hash %is_counted_type, since it may or may not be a c-style for loop.
- @q = qw( if elsif unless and or );
- @is_nonlist_keyword{@q} = (1) x scalar(@q);
+sub match_trailing_comma_rule {
- # Parens following these types will not be marked as lists
- @q = qw( && || );
- @is_nonlist_type{@q} = (1) x scalar(@q);
+ my ( $self, $KK, $Kfirst, $Kp, $trailing_comma_style, $paren_flag,
+ $stable_flag, $if_add )
+ = @_;
- @q = qw( s y m / );
- @is_s_y_m_slash{@q} = (1) x scalar(@q);
+ # Decide if a trailing comma rule is matched.
- @q = qw( = == != );
- @is_unexpected_equals{@q} = (1) x scalar(@q);
+ # Input parameter:
+ # $KK = index of closing token in old ($rLL) token list which follows
+ # the location of a possible trailing comma. See diagram below.
+ # $Kfirst = (old) index of first token on the current line of input tokens
+ # $Kp = index of previous nonblank token in new ($rLL_new) array
+ # $trailing_comma_rule = packed user control flags
+ # $if_add = true if adding comma, false if deleting comma
-} ## end BEGIN
+ # Returns:
+ # false if no match
+ # true if match
+ # !$if_add to keep the current state unchanged
-{ #<<< begin clousure respace_tokens
+ # For example, we might be checking for addition of a comma here:
-my $rLL_new; # This will be the new array of tokens
+ # bless {
+ # _name => $name,
+ # _price => $price,
+ # _rebate => $rebate <------ location of possible trailing comma
+ # }, $pkg;
+ # ^-------------------closing token at index $KK
-# These are variables in $self
-my $rLL;
-my $length_function;
-my $is_encoded_data;
+ # List of $trailing_comma_style values:
+ # undef stable: do not change
+ # '1' or '*' : every list should have a trailing comma
+ # 'm' a multi-line list should have a trailing commas
+ # 'b' trailing commas should be 'bare' (comma followed by newline)
+ # 'i' same as s=h but also include any list with no more than about one
+ # comma per line
+ # 'h' lists of key=>value pairs with a bare trailing comma
+ # '0' : no list should have a trailing comma
+ # ' ' or -wtc not defined : leave trailing commas unchanged [DEFAULT].
-my $K_closing_ternary;
-my $K_opening_ternary;
-my $rchildren_of_seqno;
-my $rhas_broken_code_block;
-my $rhas_broken_list;
-my $rhas_broken_list_with_lec;
-my $rhas_code_block;
-my $rhas_list;
-my $rhas_ternary;
-my $ris_assigned_structure;
-my $ris_broken_container;
-my $ris_excluded_lp_container;
-my $ris_list_by_seqno;
-my $ris_permanently_broken;
-my $rlec_count_by_seqno;
-my $roverride_cab3;
-my $rparent_of_seqno;
-my $rtype_count_by_seqno;
-my $rblock_type_of_seqno;
+ # Note the hierarchy:
+ # '1' includes all 'm' includes all 'b' includes all 'i' includes all 'h'
-my $K_opening_container;
-my $K_closing_container;
+ # Note: an interesting generalization would be to let an upper case
+ # letter denote the negation of styles 'm', 'b', 'h', 'i'. This might
+ # be useful for undoing operations. It would be implemented as a wrapper
+ # around this routine.
-my %K_first_here_doc_by_seqno;
+ # Return !$if_add to keep the current state unchanged
+ my $no_change = !$if_add;
-my $last_nonblank_code_type;
-my $last_nonblank_code_token;
-my $last_nonblank_block_type;
-my $last_last_nonblank_code_type;
-my $last_last_nonblank_code_token;
+ # If no style defined : do not add or delete
+ if ( !defined($trailing_comma_style) ) { return $no_change }
-my %seqno_stack;
-my %K_old_opening_by_seqno;
-my $depth_next;
-my $depth_next_max;
+ #----------------------------------------
+ # Set some flags describing this location
+ #----------------------------------------
+ my $type_sequence = $rLL->[$KK]->[_TYPE_SEQUENCE_];
+ return $no_change unless ($type_sequence);
+ my $closing_token = $rLL->[$KK]->[_TOKEN_];
-my $cumulative_length;
+ # factors which force stability
+ my $is_permanently_broken =
+ $self->[_ris_permanently_broken_]->{$type_sequence};
+ $is_permanently_broken ||= $rOpts_break_at_old_comma_breakpoints
+ && !$rOpts_ignore_old_breakpoints;
+ $is_permanently_broken ||= $stable_flag;
-# Variables holding the current line info
-my $Ktoken_vars;
-my $Kfirst_old;
-my $Klast_old;
-my $Klast_old_code;
-my $CODE_type;
+ my $K_opening = $self->[_K_opening_container_]->{$type_sequence};
+ return $no_change if ( !defined($K_opening) );
+ my $iline_first_comma =
+ $self->[_rfirst_comma_line_index_]->{$type_sequence};
+ my $iline_last_comma = $rLL_new->[$Kp]->[_LINE_INDEX_];
+ my $rtype_count = $self->[_rtype_count_by_seqno_]->{$type_sequence};
+ my $comma_count = 0;
+ my $fat_comma_count = 0;
+ my $has_inner_multiline_structure;
+ my $has_inner_multiline_commas;
+
+ # if outer container is paren, return if this is not a possible list
+ # For example, return for an if paren 'if ('
+ my $token = $rLL_new->[$K_opening]->[_TOKEN_];
+ my $is_arrow_call;
+ my $is_hash_value;
+ my $is_paren_list;
+ if ( $token eq '(' ) {
+ $is_paren_list = 1;
+ my $Km = $self->K_previous_nonblank( $K_opening, $rLL_new );
+ if ( defined($Km) ) {
+ my $type_m = $rLL_new->[$Km]->[_TYPE_];
+ my $token_m = $rLL_new->[$Km]->[_TOKEN_];
+ if ( $type_m eq 'k' ) {
+ if ( $is_not_list_paren{$token_m} ) { return $no_change }
+ }
+ $is_arrow_call = $type_m eq '->';
+ $is_hash_value = $type_m eq '=>';
+ }
+ }
+
+ if ($rtype_count) {
+ $comma_count = $rtype_count->{','};
+ $fat_comma_count = $rtype_count->{'=>'};
+ }
+
+ my $follows_isolated_closing_token;
+
+ #----------------------------------------------------------------
+ # If no existing commas, see if we have an inner nested container
+ #----------------------------------------------------------------
+ if (
+ !$comma_count
+ && $if_add # for safety, should be true if no commas
+ && $is_closing_type{$last_nonblank_code_type}
+ )
+ {
-my $rwhitespace_flags;
+ # check for nesting closing containers
+ my $Kpp = $self->K_previous_nonblank( undef, $rLL_new );
+ return if ( !defined($Kpp) );
+ my $seqno_pp = $rLL_new->[$Kpp]->[_TYPE_SEQUENCE_];
+ my $type_pp = $rLL_new->[$Kpp]->[_TYPE_];
+
+ # nesting containers have sequence numbers which differ by 1
+ my $is_nesting_right =
+ $seqno_pp
+ && $is_closing_type{$type_pp}
+ && ( $seqno_pp == $type_sequence + 1 );
+
+ # Do not add a comma which will be deleted by
+ # --delete-weld-interfering commas (b1471)
+ if ( $is_nesting_right
+ && $rOpts_delete_weld_interfering_commas )
+ {
+ return;
+ }
-sub initialize_respace_tokens_closure {
+ # Does this trailing comma follow an isolated closing token?
+ if ($is_nesting_right) {
+ my $ix_pp = $rLL_new->[$Kpp]->[_LINE_INDEX_];
+ my $Kpp_m = $self->K_previous_nonblank( $Kpp, $rLL_new );
+ if ($Kpp_m) {
+ my $ix_pp_m = $rLL_new->[$Kpp_m]->[_LINE_INDEX_];
+ $follows_isolated_closing_token = $ix_pp > $ix_pp_m;
+ }
+ }
- my ($self) = @_;
+ #--------------------------------
+ # If no comma and no fat comma...
+ #--------------------------------
+ if ( !$fat_comma_count ) {
- $rLL_new = []; # This is the new array
+ # containers must be nesting on the right
+ return unless ($is_nesting_right);
- $rLL = $self->[_rLL_];
- $length_function = $self->[_length_function_];
- $is_encoded_data = $self->[_is_encoded_data_];
+ # give up if it is a code block
+ if ( $self->[_rblock_type_of_seqno_]->{$seqno_pp} ) {
+ return;
+ }
- $K_closing_ternary = $self->[_K_closing_ternary_];
- $K_opening_ternary = $self->[_K_opening_ternary_];
- $rchildren_of_seqno = $self->[_rchildren_of_seqno_];
- $rhas_broken_code_block = $self->[_rhas_broken_code_block_];
- $rhas_broken_list = $self->[_rhas_broken_list_];
- $rhas_broken_list_with_lec = $self->[_rhas_broken_list_with_lec_];
- $rhas_code_block = $self->[_rhas_code_block_];
- $rhas_list = $self->[_rhas_list_];
- $rhas_ternary = $self->[_rhas_ternary_];
- $ris_assigned_structure = $self->[_ris_assigned_structure_];
- $ris_broken_container = $self->[_ris_broken_container_];
- $ris_excluded_lp_container = $self->[_ris_excluded_lp_container_];
- $ris_list_by_seqno = $self->[_ris_list_by_seqno_];
- $ris_permanently_broken = $self->[_ris_permanently_broken_];
- $rlec_count_by_seqno = $self->[_rlec_count_by_seqno_];
- $roverride_cab3 = $self->[_roverride_cab3_];
- $rparent_of_seqno = $self->[_rparent_of_seqno_];
- $rtype_count_by_seqno = $self->[_rtype_count_by_seqno_];
- $rblock_type_of_seqno = $self->[_rblock_type_of_seqno_];
+ # if outer container is paren, must be sub call or list assignment
+ # Note that _ris_function_call_paren_ does not currently include
+ # calls of the form '->(', so that has to be checked separately.
+ if ( $token eq '('
+ && !$self->[_ris_function_call_paren_]->{$type_sequence}
+ && !$is_arrow_call
+ && !$is_hash_value
+ && !$self->is_list_assignment($K_opening) )
+ {
+ return;
+ }
- %K_first_here_doc_by_seqno = ();
+ my $K_opening_pp = $self->[_K_opening_container_]->{$seqno_pp};
+ return unless defined($K_opening_pp);
+ my $iline_o = $rLL_new->[$K_opening_pp]->[_LINE_INDEX_];
+ my $iline_c = $rLL_new->[$Kpp]->[_LINE_INDEX_];
- $last_nonblank_code_type = ';';
- $last_nonblank_code_token = ';';
- $last_nonblank_block_type = EMPTY_STRING;
- $last_last_nonblank_code_type = ';';
- $last_last_nonblank_code_token = ';';
+ my $rtype_count_pp = $self->[_rtype_count_by_seqno_]->{$seqno_pp};
+ return unless ($rtype_count_pp);
- %seqno_stack = ();
- %K_old_opening_by_seqno = (); # Note: old K index
- $depth_next = 0;
- $depth_next_max = 0;
+ $has_inner_multiline_structure =
+ $iline_c > $iline_o
+ && ( $rtype_count_pp->{','} || $rtype_count_pp->{'=>'} )
+ && !$rtype_count_pp->{';'};
+ return unless ($has_inner_multiline_structure);
- # we will be setting token lengths as we go
- $cumulative_length = 0;
+ # look for inner multiline commas
+ $iline_first_comma =
+ $self->[_rfirst_comma_line_index_]->{$seqno_pp};
+ return if ( !defined($iline_first_comma) );
+ my $iline_ppc = $rLL_new->[$Kpp]->[_LINE_INDEX_];
+ return if ( $iline_ppc <= $iline_first_comma );
+ $has_inner_multiline_commas = 1;
- $Ktoken_vars = undef; # the old K value of $rtoken_vars
- $Kfirst_old = undef; # min K of old line
- $Klast_old = undef; # max K of old line
- $Klast_old_code = undef; # K of last token if side comment
- $CODE_type = EMPTY_STRING;
+ # OK, we have an inner container with commas
+ }
+ }
- # Set the whitespace flags, which indicate the token spacing preference.
- $rwhitespace_flags = $self->set_whitespace_flags();
+ #--------------------------------
+ # Characterize the trailing comma
+ #--------------------------------
+ if ( !defined($iline_first_comma) ) {
- # Note that $K_opening_container and $K_closing_container have values
- # defined in sub get_line() for the previous K indexes. They were needed
- # in case option 'indent-only' was set, and we didn't get here. We no
- # longer need those and will eliminate them now to avoid any possible
- # mixing of old and new values. This must be done AFTER the call to
- # set_whitespace_flags, which needs these.
- $K_opening_container = $self->[_K_opening_container_] = {};
- $K_closing_container = $self->[_K_closing_container_] = {};
+ # Shouldn't happen: if this sub was called without any commas in this
+ # container, then either we should have found one in a nested container
+ # or already returned.
+ if (DEVEL_MODE) {
+ my $type_kp = $rLL_new->[$Kp]->[_TYPE_];
+ Fault(
+"at line $iline_last_comma but line of first comma not defined, at Kp=$Kp, type=$type_kp\n"
+ );
+ }
+ return;
+ }
- return;
+ # multiline commas: first and last commas on different lines
+ # Note that _ris_broken_container_ also stores the line diff
+ # but it is not available at this early stage.
+ my $line_diff_commas = $iline_last_comma - $iline_first_comma;
+ my $has_multiline_commas =
+ $line_diff_commas > 0 || $has_inner_multiline_commas;
-} ## end sub initialize_respace_tokens_closure
+ # Multiline ('m'): the opening and closing tokens on different lines
+ my $iline_o = $rLL_new->[$K_opening]->[_LINE_INDEX_];
+ my $iline_c = $rLL->[$KK]->[_LINE_INDEX_];
+ my $is_multiline = $iline_c > $iline_o;
-sub respace_tokens {
+ # Require additional stability factors when adding commas
+ if ($if_add) {
- my $self = shift;
+ # basic stability rules
+ my $is_stable = (
- #--------------------------------------------------------------------------
- # This routine is called once per file to do as much formatting as possible
- # before new line breaks are set.
- #--------------------------------------------------------------------------
+ # has commas not in parens, or multiple lines ending in commas
+ $comma_count
+ && ( !$is_paren_list || $has_multiline_commas )
- # Return parameters:
- # Set $severe_error=true if processing must terminate immediately
- my ( $severe_error, $rqw_lines );
+ # or contains an inner multiline structure
+ || $has_inner_multiline_structure
- # We change any spaces in --indent-only mode
- if ( $rOpts->{'indent-only'} ) {
+ # or has other stabilizing factors, like comments and blank lines
+ || $is_permanently_broken
+ );
- # We need to define lengths for -indent-only to avoid undefs, even
- # though these values are not actually needed for option --indent-only.
+ # special stability rules for fat-commas ...
+ if ( !$is_stable && $fat_comma_count ) {
- $rLL = $self->[_rLL_];
- $length_function = $self->[_length_function_];
- $cumulative_length = 0;
+ # stable if not in paren list
+ $is_stable ||= !$is_paren_list;
- foreach my $item ( @{$rLL} ) {
- my $token = $item->[_TOKEN_];
- my $token_length = $length_function->($token);
- $cumulative_length += $token_length;
- $item->[_TOKEN_LENGTH_] = $token_length;
- $item->[_CUMULATIVE_LENGTH_] = $cumulative_length;
+ # a paren container must span several lines (b1489, b1490)
+ # and the trailing comma must follow an isolated closing token if
+ # just 1 '=>' (b1492 b1493 b1494)
+ $is_stable ||= ( $iline_c - $iline_o > 1 )
+ && ( $follows_isolated_closing_token
+ || $fat_comma_count > 1 );
}
- return ( $severe_error, $rqw_lines );
+ $is_multiline &&= $is_stable;
}
- # This routine makes all necessary and possible changes to the tokenization
- # after the initial tokenization of the file. This is a tedious routine,
- # but basically it consists of inserting and deleting whitespace between
- # nonblank tokens according to the selected parameters. In a few cases
- # non-space characters are added, deleted or modified.
+ # Bare 'b': a multiline where the closing container token starts a new line:
+ my $is_bare_trailing_comma = $is_multiline && $KK == $Kfirst;
- # The goal of this routine is to create a new token array which only needs
- # the definition of new line breaks and padding to complete formatting. In
- # a few cases we have to cheat a little to achieve this goal. In
- # particular, we may not know if a semicolon will be needed, because it
- # depends on how the line breaks go. To handle this, we include the
- # semicolon as a 'phantom' which can be displayed as normal or as an empty
- # string.
+ #---------------------
+ # Check for a match...
+ #---------------------
- # Method: The old tokens are copied one-by-one, with changes, from the old
- # linear storage array $rLL to a new array $rLL_new.
+ my $match;
- # (re-)initialize closure variables for this problem
- $self->initialize_respace_tokens_closure();
+ #----------------------------
+ # 0 : does not match any list
+ #----------------------------
+ if ( $trailing_comma_style eq '0' ) {
+ $match = 0;
+ }
- #--------------------------------
- # Main over all lines of the file
- #--------------------------------
- my $rlines = $self->[_rlines_];
- my $line_type = EMPTY_STRING;
- my $last_K_out;
+ #------------------------------
+ # '*' or '1' : matches any list
+ #------------------------------
+ elsif ( $trailing_comma_style eq '*' || $trailing_comma_style eq '1' ) {
+ $match = 1;
+ }
- foreach my $line_of_tokens ( @{$rlines} ) {
+ #-----------------------------
+ # 'm' matches a Multiline list
+ #-----------------------------
+ elsif ( $trailing_comma_style eq 'm' ) {
+ $match = $is_multiline;
+ }
- my $input_line_number = $line_of_tokens->{_line_number};
- my $last_line_type = $line_type;
- $line_type = $line_of_tokens->{_line_type};
- next unless ( $line_type eq 'CODE' );
- $CODE_type = $line_of_tokens->{_code_type};
+ #----------------------------------
+ # 'b' matches a Bare trailing comma
+ #----------------------------------
+ elsif ( $trailing_comma_style eq 'b' ) {
+ $match = $is_bare_trailing_comma;
+ }
- if ( $CODE_type eq 'BL' ) {
- my $seqno = $seqno_stack{ $depth_next - 1 };
- if ( defined($seqno) ) {
- $self->[_rblank_and_comment_count_]->{$seqno} += 1;
- $self->set_permanently_broken($seqno)
- if (!$ris_permanently_broken->{$seqno}
- && $rOpts_maximum_consecutive_blank_lines );
- }
- }
+ #--------------------------------------------------------------------------
+ # 'h' matches a bare hash list with about 1 comma and 1 fat comma per line.
+ # 'i' matches a bare stable list with about 1 comma per line.
+ #--------------------------------------------------------------------------
+ elsif ( $trailing_comma_style eq 'h' || $trailing_comma_style eq 'i' ) {
- my $rK_range = $line_of_tokens->{_rK_range};
- my ( $Kfirst, $Klast ) = @{$rK_range};
- next unless defined($Kfirst);
- ( $Kfirst_old, $Klast_old ) = ( $Kfirst, $Klast );
- $Klast_old_code = $Klast_old;
+ # We can treat these together because they are similar.
+ # The set of 'i' matches includes the set of 'h' matches.
- # Be sure an old K value is defined for sub store_token
- $Ktoken_vars = $Kfirst;
+ # the trailing comma must be bare for both 'h' and 'i'
+ return if ( !$is_bare_trailing_comma );
- # Check for correct sequence of token indexes...
- # An error here means that sub write_line() did not correctly
- # package the tokenized lines as it received them. If we
- # get a fault here it has not output a continuous sequence
- # of K values. Or a line of CODE may have been mis-marked as
- # something else. There is no good way to continue after such an
- # error.
- if ( defined($last_K_out) ) {
- if ( $Kfirst != $last_K_out + 1 ) {
- Fault_Warn(
- "Program Bug: last K out was $last_K_out but Kfirst=$Kfirst"
- );
- $severe_error = 1;
- return ( $severe_error, $rqw_lines );
- }
- }
- else {
+ # There must be no more than one comma per line for both 'h' and 'i'
+ # The new_comma_count here will include the trailing comma.
+ my $new_comma_count = $comma_count;
+ $new_comma_count += 1 if ($if_add);
+ my $excess_commas = $new_comma_count - $line_diff_commas - 1;
+ if ( $excess_commas > 0 ) {
- # The first token should always have been given index 0 by sub
- # write_line()
- if ( $Kfirst != 0 ) {
- Fault("Program Bug: first K is $Kfirst but should be 0");
+ # Exception for a special edge case for option 'i': if the trailing
+ # comma is followed by a blank line or comment, then it cannot be
+ # covered. Then we can safely accept a small list to avoid
+ # instability (issue b1443).
+ if ( $trailing_comma_style eq 'i'
+ && $iline_c - $rLL_new->[$Kp]->[_LINE_INDEX_] > 1
+ && $new_comma_count <= 2 )
+ {
+ $match = 1;
}
- }
- $last_K_out = $Klast;
- # Handle special lines of code
- if ( $CODE_type && $CODE_type ne 'NIN' && $CODE_type ne 'VER' ) {
+ # Patch for instability issue b1456: -boc can trick this test; so
+ # skip it when deleting commas to avoid possible instability
+ # with option 'h' in combination with -atc -dtc -boc;
+ elsif (
+ $trailing_comma_style eq 'h'
- # CODE_types are as follows.
- # 'BL' = Blank Line
- # 'VB' = Verbatim - line goes out verbatim
- # 'FS' = Format Skipping - line goes out verbatim, no blanks
- # 'IO' = Indent Only - only indentation may be changed
- # 'NIN' = No Internal Newlines - line does not get broken
- # 'HSC'=Hanging Side Comment - fix this hanging side comment
- # 'BC'=Block Comment - an ordinary full line comment
- # 'SBC'=Static Block Comment - a block comment which does not get
- # indented
- # 'SBCX'=Static Block Comment Without Leading Space
- # 'VER'=VERSION statement
- # '' or (undefined) - no restructions
-
- # For a hanging side comment we insert an empty quote before
- # the comment so that it becomes a normal side comment and
- # will be aligned by the vertical aligner
- if ( $CODE_type eq 'HSC' ) {
-
- # Safety Check: This must be a line with one token (a comment)
- my $rvars_Kfirst = $rLL->[$Kfirst];
- if ( $Kfirst == $Klast && $rvars_Kfirst->[_TYPE_] eq '#' ) {
-
- # Note that even if the flag 'noadd-whitespace' is set, we
- # will make an exception here and allow a blank to be
- # inserted to push the comment to the right. We can think
- # of this as an adjustment of indentation rather than
- # whitespace between tokens. This will also prevent the
- # hanging side comment from getting converted to a block
- # comment if whitespace gets deleted, as for example with
- # the -extrude and -mangle options.
- my $rcopy =
- copy_token_as_type( $rvars_Kfirst, 'q', EMPTY_STRING );
- $self->store_token($rcopy);
- $rcopy = copy_token_as_type( $rvars_Kfirst, 'b', SPACE );
- $self->store_token($rcopy);
- $self->store_token($rvars_Kfirst);
- next;
- }
- else {
+ # this is a deletion (due to -dtc)
+ && !$if_add
- # This line was mis-marked by sub scan_comment. Catch in
- # DEVEL_MODE, otherwise try to repair and keep going.
- Fault(
- "Program bug. A hanging side comment has been mismarked"
- ) if (DEVEL_MODE);
+ # -atc is also set
+ && $rOpts_add_trailing_commas
- $CODE_type = EMPTY_STRING;
- $line_of_tokens->{_code_type} = $CODE_type;
- }
+ # -boc is set and active
+ && $rOpts_break_at_old_comma_breakpoints
+ && !$rOpts_ignore_old_breakpoints
+ )
+ {
+ # ignore this test
}
-
- # Copy tokens unchanged
- foreach my $KK ( $Kfirst .. $Klast ) {
- $Ktoken_vars = $KK;
- $self->store_token( $rLL->[$KK] );
+ else {
+ return 0;
}
- next;
}
- # Handle normal line..
+ # check fat commas
+ if (
+ !$match
+ && $fat_comma_count
+ && (
- # Define index of last token before any side comment for comma counts
- my $type_end = $rLL->[$Klast_old_code]->[_TYPE_];
- if ( ( $type_end eq '#' || $type_end eq 'b' )
- && $Klast_old_code > $Kfirst_old )
+ # - a list of key=>value pairs with at least 2 fat commas is a
+ # match for both 'h' and 'i'
+ $fat_comma_count >= 2
+
+ # - an isolated fat comma is a match for type 'h'
+ # and also 'i' (see note below)
+ || (
+ $fat_comma_count == 1
+ && $new_comma_count == 1
+ ## && $if_add ## removed to fix b1476
+
+ ## removed so that 'i' and 'h' work the same here
+ ## && $trailing_comma_style eq 'h'
+ )
+ )
+ )
{
- $Klast_old_code--;
- if ( $rLL->[$Klast_old_code]->[_TYPE_] eq 'b'
- && $Klast_old_code > $Kfirst_old )
- {
- $Klast_old_code--;
- }
+
+ # but comma count (including trailer) and fat comma count must
+ # differ by by no more than 1. This allows for some small
+ # variations.
+ my $comma_diff = $new_comma_count - $fat_comma_count;
+ $match = ( $comma_diff >= -1 && $comma_diff <= 1 );
}
- # Insert any essential whitespace between lines
- # if last line was normal CODE.
- # Patch for rt #125012: use K_previous_code rather than '_nonblank'
- # because comments may disappear.
- # Note that we must do this even if --noadd-whitespace is set
- if ( $last_line_type eq 'CODE' ) {
- my $type_next = $rLL->[$Kfirst]->[_TYPE_];
- my $token_next = $rLL->[$Kfirst]->[_TOKEN_];
- if (
- is_essential_whitespace(
- $last_last_nonblank_code_token,
- $last_last_nonblank_code_type,
- $last_nonblank_code_token,
- $last_nonblank_code_type,
- $token_next,
- $type_next,
- )
- )
- {
- $self->store_space();
- }
+ # For 'i' only, a list that can be shown to be stable is a match
+ if ( !$match && $trailing_comma_style eq 'i' ) {
+ $match = (
+ $is_permanently_broken
+ || ( $rOpts_break_at_old_comma_breakpoints
+ && !$rOpts_ignore_old_breakpoints )
+ );
}
+ }
- #-----------------------------------------------
- # Inner loop to respace tokens on a line of code
- #-----------------------------------------------
+ #-------------------------------------------------------------------------
+ # Unrecognized parameter. This should have been caught in the input check.
+ #-------------------------------------------------------------------------
+ else {
- # The inner loop is in a separate sub for clarity
- $self->respace_tokens_inner_loop( $Kfirst, $Klast, $input_line_number );
+ DEVEL_MODE && Fault("Unrecognized parameter '$trailing_comma_style'\n");
- } # End line loop
+ # do not add or delete
+ return !$if_add;
+ }
- # finalize data structures
- $self->respace_post_loop_ops();
+ # Now do any special paren check
+ if ( $match
+ && $paren_flag
+ && $paren_flag ne '1'
+ && $paren_flag ne '*'
+ && $closing_token eq ')' )
+ {
+ $match &&=
+ $self->match_paren_control_flag( $type_sequence, $paren_flag,
+ $rLL_new );
+ }
- # Reset memory to be the new array
- $self->[_rLL_] = $rLL_new;
- my $Klimit;
- if ( @{$rLL_new} ) { $Klimit = @{$rLL_new} - 1 }
- $self->[_Klimit_] = $Klimit;
+ # Fix for b1379, b1380, b1381, b1382, b1384 part 1. Mark trailing commas
+ # for use by -vtc logic to avoid instability when -dtc and -atc are both
+ # active.
+ if ($match) {
+ if ( $if_add && $rOpts_delete_trailing_commas
+ || !$if_add && $rOpts_add_trailing_commas )
+ {
+ $self->[_ris_bare_trailing_comma_by_seqno_]->{$type_sequence} = 1;
- # During development, verify that the new array still looks okay.
- DEVEL_MODE && $self->check_token_array();
+ # The combination of -atc and -dtc and -cab=3 can be unstable
+ # (b1394). So we deactivate -cab=3 in this case.
+ # A value of '0' or '4' is required for stability of case b1451.
+ if ( $rOpts_comma_arrow_breakpoints == 3 ) {
+ $self->[_roverride_cab3_]->{$type_sequence} = 0;
+ }
+ }
+ }
+ return $match;
+} ## end sub match_trailing_comma_rule
- # update the token limits of each line
- ( $severe_error, $rqw_lines ) = $self->resync_lines_and_tokens();
+sub store_new_token {
+
+ my ( $self, $type, $token, $Kp ) = @_;
+
+ # Create and insert a completely new token into the output stream
+ # Caller must add space after this token if necessary
+
+ # Input parameters:
+ # $type = the token type
+ # $token = the token text
+ # $Kp = index of the previous token in the new list, $rLL_new
- return ( $severe_error, $rqw_lines );
-} ## end sub respace_tokens
+ # This operation is a little tricky because we are creating a new token and
+ # we have to take care to follow the requested whitespace rules.
-sub respace_tokens_inner_loop {
+ my $Ktop = @{$rLL_new} - 1;
+ my $top_is_space = $Ktop >= 0 && $rLL_new->[$Ktop]->[_TYPE_] eq 'b';
+ if ( $top_is_space && $want_left_space{$type} == WS_NO ) {
- my ( $self, $Kfirst, $Klast, $input_line_number ) = @_;
+ #----------------------------------------------------
+ # Method 1: Convert the top blank into the new token.
+ #----------------------------------------------------
- #-----------------------------------------------------------------
- # Loop to copy all tokens on one line, making any spacing changes,
- # while also collecting information needed by later subs.
- #-----------------------------------------------------------------
- foreach my $KK ( $Kfirst .. $Klast ) {
+ # Be Careful: we are working on the top of the new stack, on a token
+ # which has been stored.
- # TODO: consider eliminating this closure var by passing directly to
- # store_token following pattern of store_tokens_to_go.
- $Ktoken_vars = $KK;
+ $rLL_new->[$Ktop]->[_TOKEN_] = $token;
+ $rLL_new->[$Ktop]->[_TOKEN_LENGTH_] = length($token);
+ $rLL_new->[$Ktop]->[_TYPE_] = $type;
- my $rtoken_vars = $rLL->[$KK];
- my $type = $rtoken_vars->[_TYPE_];
+ # NOTE: we are changing the output stack without updating variables
+ # $last_nonblank_code_type, etc. Future needs might require that
+ # those variables be updated here. For now, we just update the
+ # type counts as necessary.
- # Handle a blank space ...
- if ( $type eq 'b' ) {
+ if ( $is_counted_type{$type} ) {
+ my $seqno = $seqno_stack{ $depth_next - 1 };
+ if ($seqno) {
+ $self->[_rtype_count_by_seqno_]->{$seqno}->{$type}++;
+ }
+ }
+ }
+ else {
- # Delete it if not wanted by whitespace rules
- # or we are deleting all whitespace
- # Note that whitespace flag is a flag indicating whether a
- # white space BEFORE the token is needed
- next if ( $KK >= $Klast ); # skip terminal blank
- my $Knext = $KK + 1;
+ #----------------------------------------
+ # Method 2: Use the normal storage method
+ #----------------------------------------
- if ($rOpts_freeze_whitespace) {
- $self->store_token($rtoken_vars);
- next;
+ # Patch for issue c078: keep line indexes in order. If the top
+ # token is a space that we are keeping (due to '-wls=...) then
+ # we have to check that old line indexes stay in order.
+ # In very rare
+ # instances in which side comments have been deleted and converted
+ # into blanks, we may have filtered down multiple blanks into just
+ # one. In that case the top blank may have a higher line number
+ # than the previous nonblank token. Although the line indexes of
+ # blanks are not really significant, we need to keep them in order
+ # in order to pass error checks.
+ if ($top_is_space) {
+ my $old_top_ix = $rLL_new->[$Ktop]->[_LINE_INDEX_];
+ my $new_top_ix = $rLL_new->[$Kp]->[_LINE_INDEX_];
+ if ( $new_top_ix < $old_top_ix ) {
+ $rLL_new->[$Ktop]->[_LINE_INDEX_] = $new_top_ix;
}
+ }
+ else {
+ if ( $want_left_space{$type} == WS_YES ) {
+ $self->store_token();
+ }
+ }
- my $ws = $rwhitespace_flags->[$Knext];
- if ( $ws == -1
- || $rOpts_delete_old_whitespace )
- {
+ my $rcopy = copy_token_as_type( $rLL_new->[$Kp], $type, $token );
+ $self->store_token($rcopy);
- my $token_next = $rLL->[$Knext]->[_TOKEN_];
- my $type_next = $rLL->[$Knext]->[_TYPE_];
+ }
- my $do_not_delete = is_essential_whitespace(
- $last_last_nonblank_code_token,
- $last_last_nonblank_code_type,
- $last_nonblank_code_token,
- $last_nonblank_code_type,
- $token_next,
- $type_next,
- );
+ $last_last_nonblank_code_type = $last_nonblank_code_type;
+ $last_last_nonblank_code_token = $last_nonblank_code_token;
- # Note that repeated blanks will get filtered out here
- next unless ($do_not_delete);
- }
+ $last_nonblank_code_type = $type;
+ $last_nonblank_code_token = $token;
- # make it just one character
- $rtoken_vars->[_TOKEN_] = SPACE;
- $self->store_token($rtoken_vars);
- next;
- }
+ # This sub is currently called to store non-block types ',' and '->', so:
+ $last_nonblank_block_type = EMPTY_STRING;
- my $token = $rtoken_vars->[_TOKEN_];
+ return;
+} ## end sub store_new_token
- # Handle a sequenced token ... i.e. one of ( ) { } [ ] ? :
- if ( $rtoken_vars->[_TYPE_SEQUENCE_] ) {
+sub check_Q {
- # One of ) ] } ...
- if ( $is_closing_token{$token} ) {
+ my ( $self, $KK, $Kfirst, $line_number ) = @_;
- my $type_sequence = $rtoken_vars->[_TYPE_SEQUENCE_];
- my $block_type = $rblock_type_of_seqno->{$type_sequence};
+ # Check that a quote looks okay, and report possible problems
+ # to the logfile.
+ # Given:
+ # $KK = index of the quote token
+ # $Kfirst = index of first token on the line
+ # $line_number = number of the line in the input stream
- #---------------------------------------------
- # check for semicolon addition in a code block
- #---------------------------------------------
- if ($block_type) {
+ my $token = $rLL->[$KK]->[_TOKEN_];
+ if ( $token =~ /\t/ ) {
+ $self->note_embedded_tab($line_number);
+ }
- # if not preceded by a ';' ..
- if ( $last_nonblank_code_type ne ';' ) {
+ # The remainder of this routine looks for something like
+ # '$var = s/xxx/yyy/;'
+ # in case it should have been '$var =~ s/xxx/yyy/;'
- # tentatively insert a semicolon if appropriate
- $self->add_phantom_semicolon($KK)
- if $rOpts->{'add-semicolons'};
- }
- }
+ # Start by looking for a token beginning with one of: s y m / tr
+ return
+ unless ( $is_s_y_m_slash{ substr( $token, 0, 1 ) }
+ || substr( $token, 0, 2 ) eq 'tr' );
- #----------------------------------------------------------
- # check for addition/deletion of a trailing comma in a list
- #----------------------------------------------------------
- else {
+ # ... and preceded by one of: = == !=
+ my $Kp = $self->K_previous_nonblank( undef, $rLL_new );
+ return unless ( defined($Kp) );
+ my $previous_nonblank_type = $rLL_new->[$Kp]->[_TYPE_];
+ return unless ( $is_unexpected_equals{$previous_nonblank_type} );
+ my $previous_nonblank_token = $rLL_new->[$Kp]->[_TOKEN_];
- # if this is a list ..
- my $rtype_count = $rtype_count_by_seqno->{$type_sequence};
- if ( $rtype_count
- && $rtype_count->{','}
- && !$rtype_count->{';'}
- && !$rtype_count->{'f'} )
- {
+ my $previous_nonblank_type_2 = 'b';
+ my $previous_nonblank_token_2 = EMPTY_STRING;
+ my $Kpp = $self->K_previous_nonblank( $Kp, $rLL_new );
+ if ( defined($Kpp) ) {
+ $previous_nonblank_type_2 = $rLL_new->[$Kpp]->[_TYPE_];
+ $previous_nonblank_token_2 = $rLL_new->[$Kpp]->[_TOKEN_];
+ }
- # if NOT preceded by a comma..
- if ( $last_nonblank_code_type ne ',' ) {
+ my $next_nonblank_token = EMPTY_STRING;
+ my $Kn = $KK + 1;
+ my $Kmax = @{$rLL} - 1;
+ if ( $Kn <= $Kmax && $rLL->[$Kn]->[_TYPE_] eq 'b' ) { $Kn += 1 }
+ if ( $Kn <= $Kmax ) {
+ $next_nonblank_token = $rLL->[$Kn]->[_TOKEN_];
+ }
- # insert a comma if requested
- if ( $rOpts_add_trailing_commas
- && %trailing_comma_rules )
- {
- $self->add_trailing_comma( $KK, $Kfirst,
- $trailing_comma_rules{$token} );
- }
- }
+ my $token_0 = $rLL->[$Kfirst]->[_TOKEN_];
+ my $type_0 = $rLL->[$Kfirst]->[_TYPE_];
- # if preceded by a comma ..
- else {
+ if (
- # delete a trailing comma if requested
- my $deleted;
- if ( $rOpts_delete_trailing_commas
- && %trailing_comma_rules )
- {
- $deleted =
- $self->delete_trailing_comma( $KK, $Kfirst,
- $trailing_comma_rules{$token} );
- }
+ # preceded by simple scalar
+ $previous_nonblank_type_2 eq 'i'
+ && $previous_nonblank_token_2 =~ /^\$/
- # delete a weld-interfering comma if requested
- if ( !$deleted
- && $rOpts_delete_weld_interfering_commas
- && $is_closing_type{
- $last_last_nonblank_code_type} )
- {
- $self->delete_weld_interfering_comma($KK);
- }
- }
- }
- }
- }
- }
+ # followed by some kind of termination
+ # (but give complaint if we can not see far enough ahead)
+ && $next_nonblank_token =~ /^[; \)\}]$/
- # Modify certain tokens here for whitespace
- # The following is not yet done, but could be:
- # sub (x x x)
- # ( $type =~ /^[wit]$/ )
- elsif ( $is_wit{$type} ) {
+ # scalar is not declared
+ && !( $type_0 eq 'k' && $is_my_our_local{$token_0} )
+ )
+ {
+ my $lno = 1 + $rLL_new->[$Kp]->[_LINE_INDEX_];
+ my $guess = substr( $previous_nonblank_token, 0, 1 ) . '~';
+ complain(
+"Line $lno: Note: be sure you want '$previous_nonblank_token' instead of '$guess' here\n"
+ );
+ }
+ return;
+} ## end sub check_Q
- # change '$ var' to '$var' etc
- # change '@ ' to '@'
- # Examples: <<snippets/space1.in>>
- my $ord = ord( substr( $token, 1, 1 ) );
- if (
+} ## end closure respace_tokens
- # quick test for possible blank at second char
- $ord > 0 && ( $ord < ORD_PRINTABLE_MIN
- || $ord > ORD_PRINTABLE_MAX )
- )
- {
- my ( $sigil, $word ) = split /\s+/, $token, 2;
+sub resync_lines_and_tokens {
- # $sigil =~ /^[\$\&\%\*\@]$/ )
- if ( $is_sigil{$sigil} ) {
- $token = $sigil;
- $token .= $word if ( defined($word) ); # fix c104
- $rtoken_vars->[_TOKEN_] = $token;
- }
- }
+ my $self = shift;
- # Trim certain spaces in identifiers
- if ( $type eq 'i' ) {
+ # Re-construct the arrays of tokens associated with the original input
+ # lines since they have probably changed due to inserting and deleting
+ # blanks and a few other tokens.
- if ( $token =~ /$SUB_PATTERN/ ) {
+ # Return parameters:
+ # set severe_error = true if processing needs to terminate
+ my $severe_error;
+ my $rqw_lines = [];
- # -spp = 0 : no space before opening prototype paren
- # -spp = 1 : stable (follow input spacing)
- # -spp = 2 : always space before opening prototype paren
- if ( !defined($rOpts_space_prototype_paren)
- || $rOpts_space_prototype_paren == 1 )
- {
- ## default: stable
- }
- elsif ( $rOpts_space_prototype_paren == 0 ) {
- $token =~ s/\s+\(/\(/;
- }
- elsif ( $rOpts_space_prototype_paren == 2 ) {
- $token =~ s/\(/ (/;
- }
+ my $rLL = $self->[_rLL_];
+ my $Klimit = $self->[_Klimit_];
+ my $rlines = $self->[_rlines_];
+ my @Krange_code_without_comments;
+ my @Klast_valign_code;
- # one space max, and no tabs
- $token =~ s/\s+/ /g;
- $rtoken_vars->[_TOKEN_] = $token;
+ # This is the next token and its line index:
+ my $Knext = 0;
+ my $Kmax = defined($Klimit) ? $Klimit : -1;
+
+ # Verify that old line indexes are in still order. If this error occurs,
+ # check locations where sub 'respace_tokens' creates new tokens (like
+ # blank spaces). It must have set a bad old line index.
+ if ( DEVEL_MODE && defined($Klimit) ) {
+ my $iline = $rLL->[0]->[_LINE_INDEX_];
+ foreach my $KK ( 1 .. $Klimit ) {
+ my $iline_last = $iline;
+ $iline = $rLL->[$KK]->[_LINE_INDEX_];
+ if ( $iline < $iline_last ) {
+ my $KK_m = $KK - 1;
+ my $token_m = $rLL->[$KK_m]->[_TOKEN_];
+ my $token = $rLL->[$KK]->[_TOKEN_];
+ my $type_m = $rLL->[$KK_m]->[_TYPE_];
+ my $type = $rLL->[$KK]->[_TYPE_];
+ Fault(<<EOM);
+Line indexes out of order at index K=$KK:
+at KK-1 =$KK_m: old line=$iline_last, type='$type_m', token='$token_m'
+at KK =$KK: old line=$iline, type='$type', token='$token',
+EOM
+ }
+ }
+ }
- $self->[_ris_special_identifier_token_]->{$token} = 'sub';
+ my $iline = -1;
+ foreach my $line_of_tokens ( @{$rlines} ) {
+ $iline++;
+ next if ( $line_of_tokens->{_line_type} ne 'CODE' );
+
+ # Get the old number of tokens on this line
+ my $rK_range_old = $line_of_tokens->{_rK_range};
+ my ( $Kfirst_old, $Klast_old ) = @{$rK_range_old};
+ my $Kdiff_old = 0;
+ if ( defined($Kfirst_old) ) {
+ $Kdiff_old = $Klast_old - $Kfirst_old;
+ }
+
+ # Find the range of NEW K indexes for the line:
+ # $Kfirst = index of first token on line
+ # $Klast = index of last token on line
+ my ( $Kfirst, $Klast );
+
+ my $Knext_beg = $Knext; # this will be $Kfirst if we find tokens
+
+ # Optimization: Although the actual K indexes may be completely
+ # changed after respacing, the number of tokens on any given line
+ # will often be nearly unchanged. So we will see if we can start
+ # our search by guessing that the new line has the same number
+ # of tokens as the old line.
+ my $Knext_guess = $Knext + $Kdiff_old;
+ if ( $Knext_guess > $Knext
+ && $Knext_guess < $Kmax
+ && $rLL->[$Knext_guess]->[_LINE_INDEX_] <= $iline )
+ {
- }
+ # the guess is good, so we can start our search here
+ $Knext = $Knext_guess + 1;
+ }
- # clean up spaces in package identifiers, like
- # "package Bob::Dog;"
- elsif ( substr( $token, 0, 7 ) eq 'package'
- && $token =~ /^package\s/ )
- {
- $token =~ s/\s+/ /g;
- $rtoken_vars->[_TOKEN_] = $token;
+ # search for the change in input line number
+ while ($Knext <= $Kmax
+ && $rLL->[$Knext]->[_LINE_INDEX_] <= $iline )
+ {
+ $Knext++;
+ }
- $self->[_ris_special_identifier_token_]->{$token} =
- 'package';
+ if ( $Knext > $Knext_beg ) {
- }
+ $Klast = $Knext - 1;
- # trim identifiers of trailing blanks which can occur
- # under some unusual circumstances, such as if the
- # identifier 'witch' has trailing blanks on input here:
- #
- # sub
- # witch
- # () # prototype may be on new line ...
- # ...
- my $ord_ch = ord( substr( $token, -1, 1 ) );
- if (
+ # Delete any terminal blank token
+ if ( $rLL->[$Klast]->[_TYPE_] eq 'b' ) { $Klast -= 1 }
- # quick check for possible ending space
- $ord_ch > 0 && ( $ord_ch < ORD_PRINTABLE_MIN
- || $ord_ch > ORD_PRINTABLE_MAX )
- )
- {
- $token =~ s/\s+$//g;
- $rtoken_vars->[_TOKEN_] = $token;
- }
+ if ( $Klast < $Knext_beg ) {
+ $Klast = undef;
}
- }
-
- # handle semicolons
- elsif ( $type eq ';' ) {
-
- # Remove unnecessary semicolons, but not after bare
- # blocks, where it could be unsafe if the brace is
- # mis-tokenized.
- if (
- $rOpts->{'delete-semicolons'}
- && (
- (
- $last_nonblank_block_type
- && $last_nonblank_code_type eq '}'
- && (
- $is_block_without_semicolon{
- $last_nonblank_block_type}
- || $last_nonblank_block_type =~ /$SUB_PATTERN/
- || $last_nonblank_block_type =~ /^\w+:$/
- )
- )
- || $last_nonblank_code_type eq ';'
- )
- )
- {
+ else {
- # This looks like a deletable semicolon, but even if a
- # semicolon can be deleted it is not necessarily best to do
- # so. We apply these additional rules for deletion:
- # - Always ok to delete a ';' at the end of a line
- # - Never delete a ';' before a '#' because it would
- # promote it to a block comment.
- # - If a semicolon is not at the end of line, then only
- # delete if it is followed by another semicolon or closing
- # token. This includes the comment rule. It may take
- # two passes to get to a final state, but it is a little
- # safer. For example, keep the first semicolon here:
- # eval { sub bubba { ok(0) }; ok(0) } || ok(1);
- # It is not required but adds some clarity.
- my $ok_to_delete = 1;
- if ( $KK < $Klast ) {
- my $Kn = $self->K_next_nonblank($KK);
- if ( defined($Kn) && $Kn <= $Klast ) {
- my $next_nonblank_token_type = $rLL->[$Kn]->[_TYPE_];
- $ok_to_delete = $next_nonblank_token_type eq ';'
- || $next_nonblank_token_type eq '}';
- }
- }
+ $Kfirst = $Knext_beg;
- # do not delete only nonblank token in a file
- else {
- my $Kp = $self->K_previous_code( undef, $rLL_new );
- my $Kn = $self->K_next_nonblank($KK);
- $ok_to_delete = defined($Kn) || defined($Kp);
+ # Save ranges of non-comment code. This will be used by
+ # sub keep_old_line_breaks.
+ if ( $rLL->[$Kfirst]->[_TYPE_] ne '#' ) {
+ push @Krange_code_without_comments, [ $Kfirst, $Klast ];
}
- if ($ok_to_delete) {
- $self->note_deleted_semicolon($input_line_number);
- next;
- }
- else {
- write_logfile_entry("Extra ';'\n");
+ # Only save ending K indexes of code types which are blank
+ # or 'VER'. These will be used for a convergence check.
+ # See related code in sub 'convey_batch_to_vertical_aligner'
+ my $CODE_type = $line_of_tokens->{_code_type};
+ if ( !$CODE_type
+ || $CODE_type eq 'VER' )
+ {
+ push @Klast_valign_code, $Klast;
}
}
}
- # Old patch to add space to something like "x10".
- # Note: This is now done in the Tokenizer, but this code remains
- # for reference.
- elsif ( $type eq 'n' ) {
- if ( substr( $token, 0, 1 ) eq 'x' && $token =~ /^x\d+/ ) {
- $token =~ s/x/x /;
- $rtoken_vars->[_TOKEN_] = $token;
- if (DEVEL_MODE) {
- Fault(<<EOM);
-Near line $input_line_number, Unexpected need to split a token '$token' - this should now be done by the Tokenizer
-EOM
- }
- }
+ # It is only safe to trim the actual line text if the input
+ # line had a terminal blank token. Otherwise, we may be
+ # in a quote.
+ if ( $line_of_tokens->{_ended_in_blank_token} ) {
+ $line_of_tokens->{_line_text} =~ s/\s+$//;
}
+ $line_of_tokens->{_rK_range} = [ $Kfirst, $Klast ];
- # check for a qw quote
- elsif ( $type eq 'q' ) {
-
- # trim blanks from right of qw quotes
- # (To avoid trimming qw quotes use -ntqw; the tokenizer handles
- # this)
- $token =~ s/\s*$//;
- $rtoken_vars->[_TOKEN_] = $token;
- if ( $self->[_save_logfile_] && $token =~ /\t/ ) {
- $self->note_embedded_tab($input_line_number);
- }
- if ( $rwhitespace_flags->[$KK] == WS_YES
- && @{$rLL_new}
- && $rLL_new->[-1]->[_TYPE_] ne 'b'
- && $rOpts_add_whitespace )
- {
- $self->store_space();
+ # Deleting semicolons can create new empty code lines
+ # which should be marked as blank
+ if ( !defined($Kfirst) ) {
+ my $CODE_type = $line_of_tokens->{_code_type};
+ if ( !$CODE_type ) {
+ $line_of_tokens->{_code_type} = 'BL';
}
- $self->store_token($rtoken_vars);
- next;
- } ## end if ( $type eq 'q' )
+ }
+ else {
- # delete repeated commas if requested
- elsif ( $type eq ',' ) {
- if ( $last_nonblank_code_type eq ','
- && $rOpts->{'delete-repeated-commas'} )
+ #---------------------------------------------------
+ # save indexes of all lines with a 'q' at either end
+ # for later use by sub find_multiline_qw
+ #---------------------------------------------------
+ if ( $rLL->[$Kfirst]->[_TYPE_] eq 'q'
+ || $rLL->[$Klast]->[_TYPE_] eq 'q' )
{
- # Could note this deletion as a possible future update:
- ## $self->note_deleted_comma($input_line_number);
- next;
- }
-
- # remember input line index of first comma if -wtc is used
- if (%trailing_comma_rules) {
- my $seqno = $seqno_stack{ $depth_next - 1 };
- if ( defined($seqno)
- && !defined( $self->[_rfirst_comma_line_index_]->{$seqno} )
- )
- {
- $self->[_rfirst_comma_line_index_]->{$seqno} =
- $rtoken_vars->[_LINE_INDEX_];
- }
+ push @{$rqw_lines}, $iline;
}
}
+ }
- # change 'LABEL :' to 'LABEL:'
- elsif ( $type eq 'J' ) {
- $token =~ s/\s+//g;
- $rtoken_vars->[_TOKEN_] = $token;
- }
-
- # check a quote for problems
- elsif ( $type eq 'Q' ) {
- $self->check_Q( $KK, $Kfirst, $input_line_number )
- if ( $self->[_save_logfile_] );
- }
+ # There shouldn't be any nodes beyond the last one. This routine is
+ # relinking lines and tokens after the tokens have been respaced. A fault
+ # here indicates some kind of bug has been introduced into the above loops.
+ # There is not good way to keep going; we better stop here.
+ if ( $Knext <= $Kmax ) {
+ Fault_Warn(
+ "unexpected tokens at end of file when reconstructing lines");
+ $severe_error = 1;
+ return ( $severe_error, $rqw_lines );
+ }
+ $self->[_rKrange_code_without_comments_] = \@Krange_code_without_comments;
- # Store this token with possible previous blank
- if ( $rwhitespace_flags->[$KK] == WS_YES
- && @{$rLL_new}
- && $rLL_new->[-1]->[_TYPE_] ne 'b'
- && $rOpts_add_whitespace )
- {
- $self->store_space();
- }
- $self->store_token($rtoken_vars);
+ # Setup the convergence test in the FileWriter based on line-ending indexes
+ my $file_writer_object = $self->[_file_writer_object_];
+ $file_writer_object->setup_convergence_test( \@Klast_valign_code );
- } # End token loop
+ return ( $severe_error, $rqw_lines );
- return;
-} ## end sub respace_tokens_inner_loop
+} ## end sub resync_lines_and_tokens
-sub respace_post_loop_ops {
+sub package_info_maker {
- my ($self) = @_;
+ my ( $self, $rK_package_list ) = @_;
- # Walk backwards through the tokens, making forward links to sequence items.
- if ( @{$rLL_new} ) {
- my $KNEXT;
- foreach my $KK ( reverse( 0 .. @{$rLL_new} - 1 ) ) {
- $rLL_new->[$KK]->[_KNEXT_SEQ_ITEM_] = $KNEXT;
- if ( $rLL_new->[$KK]->[_TYPE_SEQUENCE_] ) { $KNEXT = $KK }
- }
- $self->[_K_first_seq_item_] = $KNEXT;
- }
+ # Create a hash of values which can be used to find the package of any
+ # token. This sub must be called after rLL has been updated because it
+ # calls parent_seqno_by_K.
- # Find and remember lists by sequence number
- my %is_C_style_for;
- foreach my $seqno ( keys %{$K_opening_container} ) {
- my $K_opening = $K_opening_container->{$seqno};
- next unless defined($K_opening);
+ # Given:
+ # @{$rK_package_list} = a simple list of token index K of each 'package'
+ # statement in the file.
+ # Returns:
+ # {
+ # 'rpackage_info_list' => \@package_info_list,
+ # 'rpackage_lookup_list' => \@package_lookup_list,
+ # }
+ # which are two lists with useful information on all packages
- # code errors may leave undefined closing tokens
- my $K_closing = $K_closing_container->{$seqno};
- next unless defined($K_closing);
+ my $rLL = $self->[_rLL_];
+ my $K_closing_container = $self->[_K_closing_container_];
+ my $Klimit = @{$rLL} - 1;
+
+ # RETURN LIST #1: package_info_list:
+ # The package of a token at an arbitrary index K is the last entry
+ # in the list for which K_opening < K < K_closing.
+ # If no package is found, then the package is 'main'.
+ # This list is in order of the index K of the package statements.
+ # so the search can stop if we find K_opening > K.
+ my @package_info_list;
+
+ # Start with an entry for 'main'
+ push @package_info_list,
+ {
+ type => 'package',
+ name => 'main',
+ level => 0,
+ line_start => 0,
+ K_opening => 0,
+ K_closing => $Klimit,
+ is_block => 0,
+ max_change => 0,
+ block_count => 0,
+ };
- my $lx_open = $rLL_new->[$K_opening]->[_LINE_INDEX_];
- my $lx_close = $rLL_new->[$K_closing]->[_LINE_INDEX_];
- my $line_diff = $lx_close - $lx_open;
- $ris_broken_container->{$seqno} = $line_diff;
+ my @package_stack;
+ push @package_stack, 0;
- # See if this is a list
- my $is_list;
- my $rtype_count = $rtype_count_by_seqno->{$seqno};
- if ($rtype_count) {
- my $comma_count = $rtype_count->{','};
- my $fat_comma_count = $rtype_count->{'=>'};
- my $semicolon_count = $rtype_count->{';'};
- if ( $rtype_count->{'f'} ) {
- $semicolon_count += $rtype_count->{'f'};
- $is_C_style_for{$seqno} = 1;
- }
+ # RETURN LIST #2: package_lookup_list:
+ # A flat list of [$name, $Kbegin, $Kend], where package is name '$name'
+ # from token index $Kbegin to the index $Kend. This is easier to use than
+ # LIST #1 since it eliminates the need for a stack.
+ my @package_lookup_list;
+ push @package_lookup_list, [ 'main', 0, 0 ];
- # We will define a list to be a container with one or more commas
- # and no semicolons. Note that we have included the semicolons
- # in a 'for' container in the semicolon count to keep c-style for
- # statements from being formatted as lists.
- if ( ( $comma_count || $fat_comma_count ) && !$semicolon_count ) {
- $is_list = 1;
+ foreach my $KK ( @{$rK_package_list} ) {
+ my $item = $rLL->[$KK];
+ my $type = $item->[_TYPE_];
- # We need to do one more check for a parenthesized list:
- # At an opening paren following certain tokens, such as 'if',
- # we do not want to format the contents as a list.
- if ( $rLL_new->[$K_opening]->[_TOKEN_] eq '(' ) {
- my $Kp = $self->K_previous_code( $K_opening, $rLL_new );
- if ( defined($Kp) ) {
- my $type_p = $rLL_new->[$Kp]->[_TYPE_];
- my $token_p = $rLL_new->[$Kp]->[_TOKEN_];
- $is_list =
- $type_p eq 'k'
- ? !$is_nonlist_keyword{$token_p}
- : !$is_nonlist_type{$type_p};
- }
- }
- }
+ # Stored K values may be off by 1 due to an added blank
+ if ( $type eq 'b' ) {
+ $KK += 1;
+ $item = $rLL->[$KK];
+ $type = $item->[_TYPE_];
}
- # Look for a block brace marked as uncertain. If the tokenizer thinks
- # its guess is uncertain for the type of a brace following an unknown
- # bareword then it adds a trailing space as a signal. We can fix the
- # type here now that we have had a better look at the contents of the
- # container. This fixes case b1085. To find the corresponding code in
- # Tokenizer.pm search for 'b1085' with an editor.
- my $block_type = $rblock_type_of_seqno->{$seqno};
- if ( $block_type && substr( $block_type, -1, 1 ) eq SPACE ) {
-
- # Always remove the trailing space
- $block_type =~ s/\s+$//;
-
- # Try to filter out parenless sub calls
- my $Knn1 = $self->K_next_nonblank( $K_opening, $rLL_new );
- my $Knn2;
- if ( defined($Knn1) ) {
- $Knn2 = $self->K_next_nonblank( $Knn1, $rLL_new );
- }
- my $type_nn1 = defined($Knn1) ? $rLL_new->[$Knn1]->[_TYPE_] : 'b';
- my $type_nn2 = defined($Knn2) ? $rLL_new->[$Knn2]->[_TYPE_] : 'b';
+ # shouldn't happen:
+ if ( $type ne 'P' ) {
+ DEVEL_MODE && Fault("type '$type' expected to be 'P'\n");
+ next;
+ }
- # if ( $type_nn1 =~ /^[wU]$/ && $type_nn2 =~ /^[wiqQGCZ]$/ ) {
- if ( $wU{$type_nn1} && $wiq{$type_nn2} ) {
- $is_list = 0;
- }
+ my $token = $item->[_TOKEN_];
+ my ( $keyword, $name ) = split /\s+/, $token, 2;
- # Convert to a hash brace if it looks like it holds a list
- if ($is_list) {
+ my $K_opening = $KK;
+ my $lx_start = $item->[_LINE_INDEX_];
- $block_type = EMPTY_STRING;
+ # for non-BLOCK form:
+ my $level = $item->[_LEVEL_];
+ my $parent_seqno = $self->parent_seqno_by_K($KK);
+ my $is_block = 0;
- $rLL_new->[$K_opening]->[_CI_LEVEL_] = 1;
- $rLL_new->[$K_closing]->[_CI_LEVEL_] = 1;
- }
+ # Check for BLOCK form:
+ # package NAME VERSION BLOCK
- $rblock_type_of_seqno->{$seqno} = $block_type;
+ # Skip past VERSION
+ my $Kn = $self->K_next_code($KK);
+ if ( $Kn && $rLL->[$Kn]->[_TYPE_] eq 'n' ) {
+ $Kn = $self->K_next_code($Kn);
}
- # Handle a list container
- if ( $is_list && !$block_type ) {
- $ris_list_by_seqno->{$seqno} = $seqno;
- my $seqno_parent = $rparent_of_seqno->{$seqno};
- my $depth = 0;
- while ( defined($seqno_parent) && $seqno_parent ne SEQ_ROOT ) {
- $depth++;
+ # Look for BLOCK
+ if ( $Kn && $rLL->[$Kn]->[_TOKEN_] eq '{' ) {
+ my $seqno_n = $rLL->[$Kn]->[_TYPE_SEQUENCE_];
+ $level += 1;
+ $parent_seqno = $seqno_n;
+ $is_block = $seqno_n;
+ }
- # for $rhas_list we need to save the minimum depth
- if ( !$rhas_list->{$seqno_parent}
- || $rhas_list->{$seqno_parent} > $depth )
- {
- $rhas_list->{$seqno_parent} = $depth;
- }
+ my $K_closing = $Klimit;
+ if ( $parent_seqno != SEQ_ROOT ) {
+ my $Kc = $K_closing_container->{$parent_seqno};
+ if ( defined($Kc) ) {
+ $K_closing = $Kc;
+ }
+ }
- if ($line_diff) {
- $rhas_broken_list->{$seqno_parent} = 1;
+ # This is the index of this new package in the package_info_list
+ my $ii_next = @package_info_list;
- # Patch1: We need to mark broken lists with non-terminal
- # line-ending commas for the -bbx=2 parameter. This insures
- # that the list will stay broken. Otherwise the flag
- # -bbx=2 can be unstable. This fixes case b789 and b938.
+ while (@package_stack) {
+ my $ii = $package_stack[-1];
+ my $Kc = $package_info_list[$ii]->{K_closing};
- # Patch2: Updated to also require either one fat comma or
- # one more line-ending comma. Fixes cases b1069 b1070
- # b1072 b1076.
- if (
- $rlec_count_by_seqno->{$seqno}
- && ( $rlec_count_by_seqno->{$seqno} > 1
- || $rtype_count_by_seqno->{$seqno}->{'=>'} )
- )
- {
- $rhas_broken_list_with_lec->{$seqno_parent} = 1;
- }
- }
- $seqno_parent = $rparent_of_seqno->{$seqno_parent};
+ # pop an inactive stack item and keep going
+ if ( $Kc < $K_opening ) {
+ pop @package_stack;
+ my $i_top = $package_stack[-1];
+ my $name_top = $package_info_list[$i_top]->{name};
+ push @package_lookup_list, [ $name_top, $Kc + 1 ];
+ next;
}
- }
- # Handle code blocks ...
- # The -lp option needs to know if a container holds a code block
- elsif ( $block_type && $rOpts_line_up_parentheses ) {
- my $seqno_parent = $rparent_of_seqno->{$seqno};
- while ( defined($seqno_parent) && $seqno_parent ne SEQ_ROOT ) {
- $rhas_code_block->{$seqno_parent} = 1;
- $rhas_broken_code_block->{$seqno_parent} = $line_diff;
- $seqno_parent = $rparent_of_seqno->{$seqno_parent};
+ # end a stack item at this level
+ else {
+ my $level_i = $package_info_list[$ii]->{level};
+ if ( $level_i == $level ) {
+ $package_info_list[$ii]->{K_closing} = $K_opening - 1;
+ pop @package_stack;
+ }
}
- }
+ last;
+ } ## end while (@package_stack)
+
+ push @package_lookup_list, [ $name, $K_opening ];
+ push @package_stack, $ii_next;
+
+ # max_change and block_count are for possible future usage
+ push @package_info_list,
+ {
+ type => $keyword,
+ name => $name,
+ level => $level,
+ line_start => $lx_start + 1,
+ K_opening => $K_opening,
+ K_closing => $K_closing,
+ is_block => $is_block,
+ max_change => 0,
+ block_count => 0,
+ };
}
- # Find containers with ternaries, needed for -lp formatting.
- foreach my $seqno ( keys %{$K_opening_ternary} ) {
- my $seqno_parent = $rparent_of_seqno->{$seqno};
- while ( defined($seqno_parent) && $seqno_parent ne SEQ_ROOT ) {
- $rhas_ternary->{$seqno_parent} = 1;
- $seqno_parent = $rparent_of_seqno->{$seqno_parent};
- }
+ my $imax = @package_lookup_list - 1;
+ my $Kend = $Klimit;
+ foreach my $i ( reverse( 0 .. $imax ) ) {
+ $package_lookup_list[$i]->[2] = $Kend;
+ $Kend = $package_lookup_list[$i]->[1] - 1;
}
- # Turn off -lp for containers with here-docs with text within a container,
- # since they have their own fixed indentation. Fixes case b1081.
- if ($rOpts_line_up_parentheses) {
- foreach my $seqno ( keys %K_first_here_doc_by_seqno ) {
- my $Kh = $K_first_here_doc_by_seqno{$seqno};
- my $Kc = $K_closing_container->{$seqno};
- my $line_Kh = $rLL_new->[$Kh]->[_LINE_INDEX_];
- my $line_Kc = $rLL_new->[$Kc]->[_LINE_INDEX_];
- next if ( $line_Kh == $line_Kc );
- $ris_excluded_lp_container->{$seqno} = 1;
- }
+ # Eliminate any needless starting package 'main'
+ if ( @package_lookup_list > 1 && $package_lookup_list[0]->[2] < 0 ) {
+ shift @package_lookup_list;
}
- # Set a flag to turn off -cab=3 in complex structures. Otherwise,
- # instability can occur. When it is overridden the behavior of the closest
- # match, -cab=2, will be used instead. This fixes cases b1096 b1113.
- if ( $rOpts_comma_arrow_breakpoints == 3 ) {
- foreach my $seqno ( keys %{$K_opening_container} ) {
+ return {
+ 'rpackage_info_list' => \@package_info_list,
+ 'rpackage_lookup_list' => \@package_lookup_list,
+ };
+} ## end sub package_info_maker
- my $rtype_count = $rtype_count_by_seqno->{$seqno};
- next unless ( $rtype_count && $rtype_count->{'=>'} );
+use constant DEBUG_COUNT => 0;
- # override -cab=3 if this contains a sub-list
- if ( !defined( $roverride_cab3->{$seqno} ) ) {
- if ( $rhas_list->{$seqno} ) {
- $roverride_cab3->{$seqno} = 2;
- }
+my %is_non_interfering_keyword;
+my %is_keyword_returning_scalar;
- # or if this is a sub-list of its parent container
- else {
- my $seqno_parent = $rparent_of_seqno->{$seqno};
- if ( defined($seqno_parent)
- && $ris_list_by_seqno->{$seqno_parent} )
- {
- $roverride_cab3->{$seqno} = 2;
- }
- }
- }
+BEGIN {
+
+ # Builtin keywords which do not interfere with counting args.
+ # They do not produce arrays and do not consume more than one arg, so
+ # following parens are not required.
+ my @q = qw(
+ abs and chr cmp continue cos
+ defined delete do else elsif eq
+ exp fc ge gt hex int
+ lc lcfirst le length local log
+ lt my ne not oct or
+ ord ord our pop pos rand
+ ref scalar shift sin sqrt srand
+ state uc ucfirst undef xor
+ );
+ @is_non_interfering_keyword{@q} = (1) x scalar(@q);
+
+ # Builtin keywords possibly taking multiple parameters but returning a
+ # scalar value. These can be handled if the args are in parens.
+ @q = qw( substr join atan2 );
+ @is_keyword_returning_scalar{@q} = (1) x scalar(@q);
+}
+
+sub count_list_elements {
+ my ( $self, $rarg_list ) = @_;
+
+ # Given call arg hash containing:
+ # $seqno_list = sequence number of a paren of list to be counted, or
+ # $K_list_start = starting index of list (for 'return' lists)
+ # $shift_count_min = starting min arg count items to include
+ # $shift_count_max = starting max arg count items to include
+ # $is_signature = true if this is a sub signature list
+ # $self_name = name of first arg found
+
+ # Return:
+ # -shift_count_min => starting min arg count items to include, or
+ # undef if a specific number was not determined
+ # -shift_count_max => starting max arg count items to include
+ # undef if a specific number was not determined
+ # -self_name => possibly updated name of first arg
+ # -initialized => a hash entry maintained by this routine
+ # for keeping track of repeated calls for 'return' lists
+
+ # Method:
+ # - The basic method is to count commas, but
+ # - if we encounter sigils @ or % or other problems which prevent a
+ # count, then we do a simple return; the count will then be indefinite.
+
+ # Set the counts to undef in case we have to do a simple return upon
+ # encountering an indeterminate list count
+ my $shift_count_min_input = $rarg_list->{shift_count_min};
+## my $shift_count_max_input = $rarg_list->{shift_count_max};
+ $rarg_list->{shift_count_min} = undef;
+ $rarg_list->{shift_count_max} = undef;
+
+ my $seqno_list = $rarg_list->{seqno_list};
+ my $K_list_start = $rarg_list->{K_list_start};
+ my $is_signature = $rarg_list->{is_signature};
+ my $self_name = $is_signature ? EMPTY_STRING : $rarg_list->{self_name};
+
+ my $rLL = $self->[_rLL_];
+ my $K_list_end;
+
+ # Input option 1: $seqno_list is a container
+ my $is_return_list;
+ if ( defined($seqno_list) ) {
+ $K_list_start = $self->[_K_opening_container_]->{$seqno_list};
+ $K_list_end = $self->[_K_closing_container_]->{$seqno_list};
+ return unless ( defined($K_list_end) );
+ }
+
+ # Input option 2: $K_list_start is the index of a token,
+ # such as 'return', which has trailing args to count.
+ elsif ( defined($K_list_start) ) {
+
+ # Skip past a leading blank if necessary
+ if ( $rLL->[$K_list_start]->[_TYPE_] eq 'b' ) { $K_list_start++ }
+
+ $is_return_list = $rLL->[$K_list_start]->[_TYPE_] eq 'k'
+ && $rLL->[$K_list_start]->[_TOKEN_] eq 'return';
+ $K_list_end = @{$rLL} - 1;
+
+ # Optimization for common case of simple return
+ my $Kn = $self->K_next_code($K_list_start);
+ return unless ($Kn);
+ my $type_n = $rLL->[$Kn]->[_TYPE_];
+ if ( $type_n eq ';'
+ || $is_closing_type{$type_n}
+ || ( $type_n eq 'k' && $is_if_unless{ $rLL->[$Kn]->[_TOKEN_] } ) )
+ {
+ $rarg_list->{shift_count_max} = 0;
+ return;
}
- }
- # Add -ci to C-style for loops (issue c154)
- # This is much easier to do here than in the tokenizer.
- foreach my $seqno ( keys %is_C_style_for ) {
- my $K_opening = $K_opening_container->{$seqno};
- my $K_closing = $K_closing_container->{$seqno};
- my $type_last = 'f';
- for my $KK ( $K_opening + 1 .. $K_closing - 1 ) {
- $rLL_new->[$KK]->[_CI_LEVEL_] = $type_last eq 'f' ? 0 : 1;
- my $type = $rLL_new->[$KK]->[_TYPE_];
- if ( $type ne 'b' && $type ne '#' ) { $type_last = $type }
+ # Check for 'return ()'
+ if ( $rLL->[$Kn]->[_TOKEN_] eq '(' ) {
+ my $Knn = $self->K_next_code($Kn);
+ if ( $Knn && $rLL->[$Knn]->[_TOKEN_] eq ')' ) {
+ $rarg_list->{shift_count_max} = 0;
+ return;
+ }
}
}
- return;
-} ## end sub respace_post_loop_ops
+ else {
+ DEVEL_MODE && Fault("Neither seqno_list nor K_list_start defined\n");
+ return;
+ }
-sub set_permanently_broken {
- my ( $self, $seqno ) = @_;
- while ( defined($seqno) ) {
- $ris_permanently_broken->{$seqno} = 1;
- $seqno = $rparent_of_seqno->{$seqno};
+ # Initialize the arg count for this call. We start with any 'shift' counts
+ # previously seen if this is not a signature or 'return' list
+ my $arg_count = 0;
+ if ( $seqno_list && $shift_count_min_input && !$is_signature ) {
+ $arg_count = $shift_count_min_input;
}
- return;
-} ## end sub set_permanently_broken
-sub store_token {
+ # For signature lists we need to remember a minimum
+ my $arg_count_min;
- my ( $self, $item ) = @_;
+ my @seqno_stack;
+ if ($seqno_list) { push @seqno_stack, $seqno_list }
- #------------------------------------------
- # Store one token during respace operations
- #------------------------------------------
+ my $KK = $K_list_start;
+ my $KK_last_last_nb;
+ my $KK_last_nb;
+ my $KK_this_nb = $K_list_start;
- # Input parameter:
- # $item = ref to a token
+ my $backup_on_last = sub {
- # NOTE: this sub is called once per token so coding efficiency is critical.
+ # exclude the latest token upon encountering end of list
+ # to avoid adding 1 extra comma at the end
+ $KK_this_nb = $KK_last_nb;
+ $KK_last_nb = $KK_last_last_nb;
+ $KK_last_last_nb = undef;
+ return;
+ }; ## end $backup_on_last = sub
- # The next multiple assignment statements are significantly faster than
- # doing them one-by-one.
- my (
+ #--------------------------------------------------------
+ # Main loop to scan the container looking for list items.
+ #--------------------------------------------------------
+ while ( ++$KK < $K_list_end ) {
- $type,
- $token,
- $type_sequence,
+ # safety check - shouldn't happen
+ if ( !$KK || $KK <= $KK_this_nb ) {
+ if (DEVEL_MODE) {
+ my $lno = $rLL->[$KK_this_nb]->[_LINE_INDEX_] + 1;
+ Fault("near line $lno: index $KK decreased, was $KK_this_nb\n");
+ }
+ return;
+ }
- ) = @{$item}[
+ my $type = $rLL->[$KK]->[_TYPE_];
+ next if ( $type eq 'b' );
+ next if ( $type eq '#' );
+ last if ( $type eq ';' );
+ return if ( $type eq '..' );
- _TYPE_,
- _TOKEN_,
- _TYPE_SEQUENCE_,
+ # i.e., ($str=~/(\d+)(\w+)/) may be a list of n items
+ return if ( $type eq '=~' );
- ];
+ $KK_last_last_nb = $KK_last_nb;
+ $KK_last_nb = $KK_this_nb;
+ $KK_this_nb = $KK;
+ my $token = $rLL->[$KK]->[_TOKEN_];
- # Set the token length. Later it may be adjusted again if phantom or
- # ignoring side comment lengths.
- my $token_length =
- $is_encoded_data ? $length_function->($token) : length($token);
+ # Handle a sequenced item
+ if ( my $seqno = $rLL->[$KK]->[_TYPE_SEQUENCE_] ) {
- # handle blanks
- if ( $type eq 'b' ) {
+ if ( $is_opening_type{$type} ) {
+ if ( $token eq '(' ) {
- # Do not output consecutive blanks. This situation should have been
- # prevented earlier, but it is worth checking because later routines
- # make this assumption.
- if ( @{$rLL_new} && $rLL_new->[-1]->[_TYPE_] eq 'b' ) {
- return;
- }
- }
+ # Skip past args to args to subs not returning
+ # lists, like 'pop(' 'length('
+ if ($KK_last_nb) {
+ my $token_last = $rLL->[$KK_last_nb]->[_TOKEN_];
+ my $type_last = $rLL->[$KK_last_nb]->[_TYPE_];
+ if ( $type_last eq 'k'
+ && $is_non_interfering_keyword{$token_last} )
+ {
+ $KK = $self->[_K_closing_container_]->{$seqno};
+ next;
+ }
+ }
- # handle comments
- elsif ( $type eq '#' ) {
+ # If not a list..
+ if ( !$self->is_list_by_seqno($seqno) ) {
- # trim comments if necessary
- my $ord = ord( substr( $token, -1, 1 ) );
- if (
- $ord > 0
- && ( $ord < ORD_PRINTABLE_MIN
- || $ord > ORD_PRINTABLE_MAX )
- && $token =~ s/\s+$//
- )
- {
- $token_length = $length_function->($token);
- $item->[_TOKEN_] = $token;
- }
+ # always enter a container following 'return', as in:
+ # return (find_sub($subname) =~ /^(.*):(\d+)-(\d+)$/);
+ if ( $is_return_list && $KK_last_nb == $K_list_start ) {
+ push @seqno_stack, $seqno;
+ next;
+ }
- # Mark length of side comments as just 1 if sc lengths are ignored
- if ( $rOpts_ignore_side_comment_lengths
- && ( !$CODE_type || $CODE_type eq 'HSC' ) )
- {
- $token_length = 1;
- }
- my $seqno = $seqno_stack{ $depth_next - 1 };
- if ( defined($seqno) ) {
- $self->[_rblank_and_comment_count_]->{$seqno} += 1
- if ( $CODE_type eq 'BC' );
- $self->set_permanently_broken($seqno)
- if !$ris_permanently_broken->{$seqno};
- }
- }
+ my $Kc = $self->[_K_closing_container_]->{$seqno};
+ if ( !$Kc ) { $backup_on_last->(); last }
+
+ # Enter nested parens with inner list
+ # ( ( $v1, $v2) )
+ # | | | |
+ # $KK $Kn $Kc_p $Kc
+ if ( $self->[_rhas_list_]->{$seqno} ) {
+ my $Kc_p = $self->K_previous_code($Kc);
+ if ( $Kc_p && $rLL->[$Kc_p]->[_TOKEN_] eq ')' ) {
+ my $seqno_c_p =
+ $rLL->[$Kc_p]->[_TYPE_SEQUENCE_];
+ if ( $seqno_c_p && $seqno_c_p == $seqno + 1 ) {
+ my $Kn = $self->K_next_code($KK);
+ if ( $Kn && $rLL->[$Kn]->[_TOKEN_] eq '(' )
+ {
+ push @seqno_stack, $seqno;
+ next;
+ }
+ }
+ }
+ }
- # handle non-blanks and non-comments
- else {
+ # enter a list slice, such as '(caller)[1,2]'
+ my $Kn = $self->K_next_code($Kc);
+ if ( $Kn && $rLL->[$Kn]->[_TOKEN_] eq '[' ) {
+ my $seqno_next = $rLL->[$Kn]->[_TYPE_SEQUENCE_];
+ if ( $seqno_next
+ && $self->is_list_by_seqno($seqno_next) )
+ {
+ $KK = $Kn;
+ push @seqno_stack, $seqno_next;
+ next;
+ }
+ }
- my $block_type;
+ my $KK_n = $self->K_next_code($KK);
+ if ($KK_n) {
+
+ # look for something like return (@list), which
+ # will not be marked as a list due to lack of a
+ # comma
+ my $type_KK_n = $rLL->[$KK_n]->[_TYPE_];
+ my $token_KK_n = $rLL->[$KK_n]->[_TOKEN_];
+ if ( $type_KK_n eq 't' || $type_KK_n eq 'i' ) {
+ my $sigil = substr( $token_KK_n, 0, 1 );
+ if ( $sigil eq '@' || $sigil eq '%' ) { return }
+ }
+ elsif ( $type_KK_n eq 'k' ) {
- # check for a sequenced item (i.e., container or ?/:)
- if ($type_sequence) {
+ # look for something like
+ # return (map { ...
+ if ( !$is_non_interfering_keyword{$token_KK_n} )
+ {
+ return;
+ }
+ }
+ else { }
+ }
+ }
- # This will be the index of this item in the new array
- my $KK_new = @{$rLL_new};
+ # a list..
+ else {
- if ( $is_opening_token{$token} ) {
+ # Descend into a paren list in some special cases:
+ if ($KK_last_nb) {
- $K_opening_container->{$type_sequence} = $KK_new;
- $block_type = $rblock_type_of_seqno->{$type_sequence};
+ my $token_last = $rLL->[$KK_last_nb]->[_TOKEN_];
+ my $type_last = $rLL->[$KK_last_nb]->[_TYPE_];
- # Fix for case b1100: Count a line ending in ', [' as having
- # a line-ending comma. Otherwise, these commas can be hidden
- # with something like --opening-square-bracket-right
- if ( $last_nonblank_code_type eq ','
- && $Ktoken_vars == $Klast_old_code
- && $Ktoken_vars > $Kfirst_old )
- {
- $rlec_count_by_seqno->{$type_sequence}++;
- }
+ # 'return (' or 'my ('
+ my $ok = $type_last eq 'k'
+ && ( $token_last eq 'return'
+ || $token_last eq 'my' );
- if ( $last_nonblank_code_type eq '='
- || $last_nonblank_code_type eq '=>' )
- {
- $ris_assigned_structure->{$type_sequence} =
- $last_nonblank_code_type;
+ # ',('
+ $ok ||= $type_last eq ',';
+
+ # '(('
+ $ok ||= $token_last eq '(';
+
+ # 'wantarray ? ('
+ $ok ||=
+ $KK_last_last_nb
+ && $is_return_list
+ && $rLL->[$KK_last_nb]->[_TYPE_] eq '?'
+ && $rLL->[$KK_last_last_nb]->[_TOKEN_] eq
+ 'wantarray';
+
+ if ($ok) {
+ push @seqno_stack, $seqno;
+ next;
+ }
+ }
+ }
}
- my $seqno_parent = $seqno_stack{ $depth_next - 1 };
- $seqno_parent = SEQ_ROOT unless defined($seqno_parent);
- push @{ $rchildren_of_seqno->{$seqno_parent} }, $type_sequence;
- $rparent_of_seqno->{$type_sequence} = $seqno_parent;
- $seqno_stack{$depth_next} = $type_sequence;
- $K_old_opening_by_seqno{$type_sequence} = $Ktoken_vars;
- $depth_next++;
+ # Otherwise skip past this container
+ $KK = $self->[_K_closing_container_]->{$seqno};
+ next;
+ }
+ elsif ( $is_closing_type{$type} ) {
+ my $seqno_test = pop @seqno_stack;
+ if ( $seqno_test && $seqno_test eq $seqno ) {
- if ( $depth_next > $depth_next_max ) {
- $depth_next_max = $depth_next;
+ # hide all closing tokens to avoid adding an extra
+ # comma at the end at something like '$x,)'
+ $backup_on_last->();
+ next;
+ }
+ $backup_on_last->();
+ last;
+ }
+ elsif ( $type eq '?' ) {
+
+ # continue scanning ternary for 'return wantarray ?'
+ if ( $rLL->[$KK_last_nb]->[_TOKEN_] eq 'wantarray'
+ && $rLL->[$KK_last_nb]->[_TYPE_] eq 'k'
+ && $KK_last_last_nb
+ && $rLL->[$KK_last_last_nb]->[_TOKEN_] eq 'return'
+ && $rLL->[$KK_last_last_nb]->[_TYPE_] eq 'k' )
+ {
+ push @seqno_stack, $seqno;
+ next;
+ }
+
+ # give up in a return list
+ if ($is_return_list) {
+ return;
}
+
+ # otherwise skip past this ternary
+ $KK = $self->[_K_closing_ternary_]->{$seqno};
+ next;
}
- elsif ( $is_closing_token{$token} ) {
+ elsif ( $type eq ':' ) {
+ my $seqno_test = pop @seqno_stack;
+ if ( $seqno_test && $seqno_test eq $seqno ) {
- $K_closing_container->{$type_sequence} = $KK_new;
- $block_type = $rblock_type_of_seqno->{$type_sequence};
+ # for wantarray ternary, assume one item after ':'
+ # TODO: if wantarray was preceded by '!' then we should
+ # swap the two counts here
+ $arg_count_min = 1;
+ $backup_on_last->();
+ last;
+ }
+ $backup_on_last->();
+ last;
+ }
+ else {
+ DEVEL_MODE
+ && Fault("unexpected seqno=$seqno for type='$type'\n");
+ }
+ }
- # Do not include terminal commas in counts
- if ( $last_nonblank_code_type eq ','
- || $last_nonblank_code_type eq '=>' )
- {
- $rtype_count_by_seqno->{$type_sequence}
- ->{$last_nonblank_code_type}--;
+ # handle identifiers
+ elsif ( $type eq 'i' || $type eq 't' ) {
+ my $sigil = substr( $token, 0, 1 );
- if ( $Ktoken_vars == $Kfirst_old
- && $last_nonblank_code_type eq ','
- && $rlec_count_by_seqno->{$type_sequence} )
- {
- $rlec_count_by_seqno->{$type_sequence}--;
- }
+ # give up if we find list sigils not preceded by 'scalar'
+ if ( $sigil eq '%' || $sigil eq '@' ) {
+ my $K_last = $self->K_previous_code($KK);
+ if ( defined($K_last) ) {
+ my $type_last = $rLL->[$K_last]->[_TYPE_];
+ next if ( $type_last eq '+' || $type_last eq 'p' );
+ next if ( $type_last eq q{\\} );
+ next if ( $type_last eq '!' );
+ my $token_last = $rLL->[$K_last]->[_TOKEN_];
+ next if ( $type_last eq 'k' && $token_last eq 'scalar' );
}
+ return;
+ }
- # Update the stack...
- $depth_next--;
+ # remember the name of the first item, maybe something like '$self'
+ elsif ( $sigil eq '$'
+ && !$self_name
+ && !$arg_count )
+ {
+ $self_name = $token;
+ $rarg_list->{self_name} = $self_name;
}
else {
+ # continue search
+ }
+ }
- # For ternary, note parent but do not include as child
- my $seqno_parent = $seqno_stack{ $depth_next - 1 };
- $seqno_parent = SEQ_ROOT unless defined($seqno_parent);
- $rparent_of_seqno->{$type_sequence} = $seqno_parent;
+ # handle commas: count commas separating args in a list
+ elsif ( $type eq ',' ) {
+ $arg_count++;
+ }
- # These are not yet used but could be useful
- if ( $token eq '?' ) {
- $K_opening_ternary->{$type_sequence} = $KK_new;
- }
- elsif ( $token eq ':' ) {
- $K_closing_ternary->{$type_sequence} = $KK_new;
- }
- else {
+ # treat fat commas as commas
+ elsif ( $type eq '=>' ) {
+ $arg_count++;
+ }
- # We really shouldn't arrive here, just being cautious:
- # The only sequenced types output by the tokenizer are the
- # opening & closing containers and the ternary types. Each
- # of those was checked above. So we would only get here
- # if the tokenizer has been changed to mark some other
- # tokens with sequence numbers.
- if (DEVEL_MODE) {
- Fault(
-"Unexpected token type with sequence number: type='$type', seqno='$type_sequence'"
- );
- }
- }
+ # an '=' in a signature indicates an optional arg
+ elsif ( $type eq '=' ) {
+ if ( $is_signature && !defined($arg_count_min) ) {
+ $arg_count_min = $arg_count;
}
}
- # Remember the most recent two non-blank, non-comment tokens.
- # NOTE: the phantom semicolon code may change the output stack
- # without updating these values. Phantom semicolons are considered
- # the same as blanks for now, but future needs might change that.
- # See the related note in sub 'add_phantom_semicolon'.
- $last_last_nonblank_code_type = $last_nonblank_code_type;
- $last_last_nonblank_code_token = $last_nonblank_code_token;
+ # check for a paren-less call
+ elsif ( $is_kwU{$type} ) {
- $last_nonblank_code_type = $type;
- $last_nonblank_code_token = $token;
- $last_nonblank_block_type = $block_type;
+ # Something like 'length $str' is ok
+ if ( $type eq 'k' ) {
- # count selected types
- if ( $is_counted_type{$type} ) {
- my $seqno = $seqno_stack{ $depth_next - 1 };
- if ( defined($seqno) ) {
- $rtype_count_by_seqno->{$seqno}->{$type}++;
+ # Something like 'length $str' is ok
+ next if ( $is_non_interfering_keyword{$token} );
- # Count line-ending commas for -bbx
- if ( $type eq ',' && $Ktoken_vars == $Klast_old_code ) {
- $rlec_count_by_seqno->{$seqno}++;
- }
+ next if ( $token eq 'wantarray' );
- # Remember index of first here doc target
- if ( $type eq 'h' && !$K_first_here_doc_by_seqno{$seqno} ) {
- my $KK_new = @{$rLL_new};
- $K_first_here_doc_by_seqno{$seqno} = $KK_new;
+ # hop over asubs
+ next if ( $token eq 'sub' );
+
+ # something like return 1 if ...
+ if ( $is_if_unless{$token} ) {
+ $backup_on_last->();
+ last;
}
}
+
+ # Certain subsequent tokens prevent problems
+ my $Kn = $self->K_next_code($KK);
+ next unless defined($Kn);
+ my $token_Kn = $rLL->[$Kn]->[_TOKEN_];
+ my $type_Kn = $rLL->[$Kn]->[_TYPE_];
+ next
+ if ( $token_Kn eq ')'
+ || $type_Kn eq '=>'
+ || $type_Kn eq '->'
+ || $type_Kn eq ',' );
+
+ # Certain keywords returning scalars are okay if not made
+ # as paren-less calls
+ next
+ if ( $type eq 'k'
+ && $token_Kn eq '('
+ && $is_keyword_returning_scalar{$token} );
+
+ # Otherwise, the safe thing is to give up because a function call:
+ # -might be paren-less with multiple args, or
+ # -it might return a list (i.e. splice, split, localtime, ...)
+ # which will interfere with counting args
+ if (DEBUG_COUNT) {
+ my $lno = $rLL->[$KK]->[_LINE_INDEX_] + 1;
+ my $input_stream_name = get_input_stream_name();
+ print {*STDERR}
+"DEBUG_COUNT: file $input_stream_name line=$lno type=$type tok=$token token_Kn=$token_Kn\n";
+ }
+ return;
+ }
+
+ else {
+ # continue search
}
+ } ## end while ( ++$KK < $K_list_end)
+
+ # Increase the count by 1 if the list does not have a trailing comma
+ if ( defined($KK_this_nb)
+ && $KK_this_nb > $K_list_start
+ && $rLL->[$KK_this_nb]->[_TYPE_] ne ',' )
+ {
+ $arg_count++;
}
- # cumulative length is the length sum including this token
- $cumulative_length += $token_length;
+ if ( !defined($arg_count_min) ) {
+ $arg_count_min = $arg_count;
+ }
- $item->[_CUMULATIVE_LENGTH_] = $cumulative_length;
- $item->[_TOKEN_LENGTH_] = $token_length;
+ $rarg_list->{shift_count_min} = $arg_count_min;
+ $rarg_list->{shift_count_max} = $arg_count;
+ return;
- # For reference, here is how to get the parent sequence number.
- # This is not used because it is slower than finding it on the fly
- # in sub parent_seqno_by_K:
+} ## end sub count_list_elements
- # my $seqno_parent =
- # $type_sequence && $is_opening_token{$token}
- # ? $seqno_stack{ $depth_next - 2 }
- # : $seqno_stack{ $depth_next - 1 };
- # my $KK = @{$rLL_new};
- # $rseqno_of_parent_by_K->{$KK} = $seqno_parent;
+# A constant to limit backward searches
+use constant MANY_TOKENS => 100;
- # and finally, add this item to the new array
- push @{$rLL_new}, $item;
- return;
-} ## end sub store_token
+my %is_shift_pop;
+my %is_scalar_sigil;
+my %is_array_sigil;
-sub store_space {
- my ($self) = @_;
+BEGIN {
+ my @q = qw( shift pop );
+ @is_shift_pop{@q} = (1) x scalar(@q);
+ @q = qw( $ * & );
+ @is_scalar_sigil{@q} = (1) x scalar(@q);
+ @q = qw( @ % );
+ @is_array_sigil{@q} = (1) x scalar(@q);
+}
- # Store a blank space in the new array
- # - but never start the array with a space
- # - and never store two consecutive spaces
- if ( @{$rLL_new}
- && $rLL_new->[-1]->[_TYPE_] ne 'b' )
- {
- my $ritem = [];
- $ritem->[_TYPE_] = 'b';
- $ritem->[_TOKEN_] = SPACE;
- $ritem->[_TYPE_SEQUENCE_] = EMPTY_STRING;
+sub count_prototype_args {
+ my ($string) = @_;
+
+ # Given
+ # $string = a string with a prototype in parens, such as '($$;$)'
+ # Returns ($count_min, $count_max)
+ # $count_min = min specific number of args expected, or
+ # undef if number of args can vary
+ # $count_max = max specific number of args expected, or
+ # undef if number of args can vary
+ my @chars = split //, $string;
+ my $count_min = 0;
+ my $count_max = 0;
+ my $saw_semicolon;
+ my $bump_count = sub {
+ $count_max++;
+ $count_min++ if ( !$saw_semicolon );
+ return;
+ };
+ my $saw_array = sub {
+ $count_max = undef;
+ $count_min = undef if ( !$saw_semicolon );
+ return;
+ };
+ while (@chars) {
+ my $ch = shift @chars;
+ if ( !defined($ch) ) { $saw_array->(); last }
+ elsif ( $ch eq '(' ) { last if ($count_min) }
+ elsif ( $ch eq ')' ) { last }
+ elsif ( $ch eq ';' && !$saw_semicolon ) { $saw_semicolon = 1 }
+ elsif ( $ch eq '_' && !$saw_semicolon ) {
+ $saw_semicolon = 1;
+ $bump_count->() if ( !$count_min );
+ }
+ elsif ( $is_array_sigil{$ch} ) { $saw_array->(); last }
+ elsif ( $is_scalar_sigil{$ch} ) { $bump_count->(); }
+ elsif ( $ch eq q{\\} ) {
+ $ch = shift @chars;
+ last unless defined($ch);
+ $bump_count->();
+ }
+ else { next }
+ } ## end while (@chars)
+ return ( $count_min, $count_max );
+} ## end sub count_prototype_args
+
+sub find_sub_token {
+
+ my ( $self, $seqno_block ) = @_;
+
+ # Given:
+ # $seqno_block = sequence number of a sub block brace
+ # Return:
+ # $Ksub = index of the actual 'sub' token for the sub
+ # this will include the name of a named sub, and any prototype
+ # undef if cannot find it; this is not a critical sub, so no heroics
+ #
+ # Notation:
+ #
+ # sub find_sub_token {
+ # | |
+ # $Ksub --$K_opening_container for $seqno_block
- $ritem->[_LINE_INDEX_] =
- $rLL_new->[-1]->[_LINE_INDEX_];
+ my $rLL = $self->[_rLL_];
- # The level and ci_level of newly created spaces should be the same
- # as the previous token. Otherwise the coding for the -lp option
- # can create a blinking state in some rare cases (see b1109, b1110).
- $ritem->[_LEVEL_] =
- $rLL_new->[-1]->[_LEVEL_];
- $ritem->[_CI_LEVEL_] =
- $rLL_new->[-1]->[_CI_LEVEL_];
+ # See if sub respace_tokens saved the index of the previous type 'S'
+ # for us. May need to back up 1 token if spaces were deleted.
+ my $K_sub = $self->[_rK_sub_by_seqno_]->{$seqno_block};
+ if ( defined($K_sub) ) {
+ my $type = $rLL->[$K_sub]->[_TYPE_];
+ if ( $type ne 'S' ) {
+ $K_sub -= 1;
+ $type = $rLL->[$K_sub]->[_TYPE_];
+ if ( $type ne 'S' ) {
+ if (DEVEL_MODE) {
+ my $token = $rLL->[$K_sub]->[_TOKEN_];
+ my $lno = $rLL->[$K_sub]->[_LINE_INDEX_] + 1;
+ my $block_type =
+ $self->[_rblock_type_of_seqno_]->{$seqno_block};
+ Fault(<<EOM);
+line $lno: Bad Ksub=$K_sub for block $seqno_block,
+expecting type 'S' and token=$block_type
+found type '$type' and token='$token'
+EOM
+ }
- $self->store_token($ritem);
+ # This shouldn't happen, but try to keep going
+ # with the help of the search loop below.
+ $K_sub = undef;
+ }
+ }
}
- return;
-} ## end sub store_space
+ # Must search for it...
+ # Scan backward from the opening brace to find the keyword 'sub'
+ if ( !defined($K_sub) ) {
-sub add_phantom_semicolon {
+ # We normally only arrive here for anonymous subs. But also
+ # if --indent-only is set because respace_tokens is skipped.
+ my $K_opening_block = $self->[_K_opening_container_]->{$seqno_block};
+ my $Kt_min = $K_opening_block - MANY_TOKENS;
+ if ( $Kt_min < 0 ) { $Kt_min = 0 }
+ foreach my $Kt ( reverse( $Kt_min .. $K_opening_block ) ) {
+ my $token = $rLL->[$Kt]->[_TOKEN_];
+ my $type = $rLL->[$Kt]->[_TYPE_];
+ if ( $type eq 'S' ) {
- my ( $self, $KK ) = @_;
+ # type 'S' could be 'method xxx' or '$fn=sub () {' - see c372
+ $K_sub = $Kt;
+ last;
+ }
+ if ( ( $type eq 'k' || $type eq 'i' )
+ && substr( $token, 0, 3 ) eq 'sub' )
+ {
- # The token at old index $KK is a closing block brace, and not preceded
- # by a semicolon. Before we push it onto the new token list, we may
- # want to add a phantom semicolon which can be activated if the the
- # block is broken on output.
+ # anonymous subs are type 'k'
+ $K_sub = $Kt;
+ last;
+ }
+ }
+ }
+ return $K_sub;
+} ## end sub find_sub_token
- # We are only adding semicolons for certain block types
- my $type_sequence = $rLL->[$KK]->[_TYPE_SEQUENCE_];
- return unless ($type_sequence);
- my $block_type = $rblock_type_of_seqno->{$type_sequence};
- return unless ($block_type);
- return
- unless ( $ok_to_add_semicolon_for_block_type{$block_type}
- || $block_type =~ /^(sub|package)/
- || $block_type =~ /^\w+\:$/ );
+sub count_default_sub_args {
+ my ( $self, $item, $seqno ) = @_;
- # Find the most recent token in the new token list
- my $Kp = $self->K_previous_nonblank( undef, $rLL_new );
- return unless ( defined($Kp) ); # shouldn't happen except for bad input
+ # Given:
+ # $item = hash ref with sub arg info
+ # $seqno => sequence number of a sub block of a paren
+ # containing possible default args
+ # Task:
+ # count default args and update minimum arg count in $item
- my $type_p = $rLL_new->[$Kp]->[_TYPE_];
- my $token_p = $rLL_new->[$Kp]->[_TOKEN_];
- my $type_sequence_p = $rLL_new->[$Kp]->[_TYPE_SEQUENCE_];
+ my $rLL = $self->[_rLL_];
+ return unless ($seqno);
+
+ # The token before the opening must be a ',' or '('
+ my $K_o = $self->[_K_opening_container_]->{$seqno};
+ my $K_test = $self->K_previous_code($K_o);
+ return unless defined($K_test);
+ my $token_test = $rLL->[$K_test]->[_TOKEN_];
+ return if ( $token_test ne ',' && $token_test ne '(' );
+
+ # Check that an opening token has the previous sequence number
+ if ( $token_test eq '(' ) {
+ my $seqno_o = $rLL->[$K_test]->[_TYPE_SEQUENCE_];
+ if ( !$seqno_o || $seqno_o != $seqno - 1 ) {
+
+ # shouldn't happen: may be bad call value since the token
+ # with '$seqno' was just before a closing paren
+ DEVEL_MODE && Fault("seqno_o=$seqno_o != $seqno-1\n");
+ return;
+ }
+ }
- # Do not add a semicolon if...
- return
- if (
+ my $rtype_count = $self->[_rtype_count_by_seqno_]->{$seqno};
+ my $default_arg_count;
+ if ($rtype_count) {
- # it would follow a comment (and be isolated)
- $type_p eq '#'
+ # One or more commas, like: ( ... $v1, $v2, ($d1, $d2) )=@_
+ # Note that the comma_count does not include any trailing comma
+ # so we always add 1
+ $default_arg_count = $rtype_count->{','} + 1;
+ }
- # it follows a code block ( because they are not always wanted
- # there and may add clutter)
- || $type_sequence_p && $rblock_type_of_seqno->{$type_sequence_p}
+ if ( !defined($default_arg_count) ) {
- # it would follow a label
- || $type_p eq 'J'
+ # Check for empty parens, like: ( ... $v1, $v2, () )=@_
+ my $K_n = $self->K_next_code($K_o);
+ my $K_c = $self->[_K_closing_container_]->{$seqno};
+ return if ( $K_n == $K_c );
- # it would be inside a 'format' statement (and cause syntax error)
- || ( $type_p eq 'k'
- && $token_p =~ /format/ )
+ # No commas but not empty, so 1 arg in parens
+ # Something like: ( ... $v1, $v2, ($d1) )=@_
+ $default_arg_count = 1;
+ }
+ return unless ($default_arg_count);
- );
+ # Update the minimum count to exclude the defaults
+ if ( $item->{shift_count_min} >= $default_arg_count ) {
+ $item->{shift_count_min} -= $default_arg_count;
+ }
+ else {
+ DEVEL_MODE
+ && Fault(
+"default count is $default_arg_count but total is $item->{shift_count_min}"
+ );
+ }
- # Do not add a semicolon if it would impede a weld with an immediately
- # following closing token...like this
- # { ( some code ) }
- # ^--No semicolon can go here
+ return;
+} ## end sub count_default_sub_args
- # look at the previous token... note use of the _NEW rLL array here,
- # but sequence numbers are invariant.
- my $seqno_inner = $rLL_new->[$Kp]->[_TYPE_SEQUENCE_];
+sub count_sub_input_args {
+ my ( $self, $item ) = @_;
- # If it is also a CLOSING token we have to look closer...
- if (
- $seqno_inner
- && $is_closing_token{$token_p}
+ # Given: $item = hash ref with
+ # seqno => $seqno_block = sequence number of a sub block
+ # max_arg_count => optional optimization flag, see note below
- # we only need to look if there is just one inner container..
- && defined( $rchildren_of_seqno->{$type_sequence} )
- && @{ $rchildren_of_seqno->{$type_sequence} } == 1
- )
- {
+ # Updates hash ref $item with values for keys:
+ # shift_count_min => minimum absolute number of input args
+ # shift_count_max => maximum absolute number of input args
+ # self_name => name of first arg (if it can be determined)
+ # is_signature => true if args are in a signature
+ # .. plus several other quantities of interest to the caller
+ # These keys are left undefined if they cannot be determined.
+ # 'shift_count_min' and 'shift_count_max' are the same except for
+ # a signature or prototype.
- # Go back and see if the corresponding two OPENING tokens are also
- # together. Note that we are using the OLD K indexing here:
- my $K_outer_opening = $K_old_opening_by_seqno{$type_sequence};
- if ( defined($K_outer_opening) ) {
- my $K_nxt = $self->K_next_nonblank($K_outer_opening);
- if ( defined($K_nxt) ) {
- my $seqno_nxt = $rLL->[$K_nxt]->[_TYPE_SEQUENCE_];
+ my $seqno_block = $item->{seqno};
+ return unless ($seqno_block);
- # Is the next token after the outer opening the same as
- # our inner closing (i.e. same sequence number)?
- # If so, do not insert a semicolon here.
- return if ( $seqno_nxt && $seqno_nxt == $seqno_inner );
- }
- }
- }
+ # Pull out optional optimization flag. If this is true then there
+ # may be calls to this sub with args, so we should to do a full
+ # search of the entire sub if this would cause a -wma warning.
+ my $max_arg_count = $item->{max_arg_count};
- # We will insert an empty semicolon here as a placeholder. Later, if
- # it becomes the last token on a line, we will bring it to life. The
- # advantage of doing this is that (1) we just have to check line
- # endings, and (2) the phantom semicolon has zero width and therefore
- # won't cause needless breaks of one-line blocks.
- my $Ktop = -1;
- if ( $rLL_new->[$Ktop]->[_TYPE_] eq 'b'
- && $want_left_space{';'} == WS_NO )
- {
+ my $rLL = $self->[_rLL_];
+ my $K_opening_container = $self->[_K_opening_container_];
+ my $K_closing_container = $self->[_K_closing_container_];
+ my $K_opening_block = $self->[_K_opening_container_]->{$seqno_block};
+
+ # Find index '$K' of the last '@_' in this sub, if any
+ # Note on '$K_last_at_underscore': if we exit with only seeing shifts,
+ # but a pre-scan saw @_ somewhere after the last K, then the count
+ # is dubious and we do a simple return
+ my $K_last_at_underscore = 0;
+ my $rKlist = $self->[_rK_AT_underscore_by_sub_seqno_]->{$seqno_block};
+ if ( defined($rKlist) ) {
+ $K_last_at_underscore = $rKlist->[-1];
+ }
+
+ # Note on $_[n]: if there are any shifts of @_ or references to @_, we
+ # cannot use these for a count. Otherwise, we can use the range of n in
+ # $_[n] to get an expected arg count if all indexes n are simple integers.
+ # So for example if we see anything like $_[2+$i] we have to give up.
+ my $seqno_at_index_min;
+ my $at_index_min;
+ my $at_index_max;
+
+ my $dollar_underscore_zero_name = sub {
+
+ # Find the first arg name for a sub which references $_[0] and does
+ # not do shifting. There are two possibilities:
+ # return '$word' in something like '$word = $_[0];'
+ # return nothing otherwise
+ return unless ( $seqno_at_index_min && $at_index_min == 0 );
+ my $Ko = $K_opening_container->{$seqno_at_index_min};
+ my $Kc = $K_closing_container->{$seqno_at_index_min};
+ return unless ( $Ko && $Kc );
+ my $K_semicolon = $self->K_next_code($Kc);
+ return unless ( $K_semicolon && $rLL->[$K_semicolon]->[_TYPE_] eq ';' );
+ my $K_m = $self->K_previous_code($Ko);
+ return unless ( $K_m && $rLL->[$K_m]->[_TOKEN_] eq '$_' );
+ my $K_mm = $self->K_previous_code($K_m);
+ return unless ( $K_mm && $rLL->[$K_mm]->[_TYPE_] eq '=' );
+ my $K_mmm = $self->K_previous_code($K_mm);
+ return unless ( $K_mmm && $rLL->[$K_mmm]->[_TYPE_] eq 'i' );
+ my $name = $rLL->[$K_mmm]->[_TOKEN_];
+ return unless ( $name =~ /^\$\w/ );
+ return $name;
+ }; ## end $dollar_underscore_zero_name = sub
- # convert the blank into a semicolon..
- # be careful: we are working on the new stack top
- # on a token which has been stored.
- my $rcopy = copy_token_as_type( $rLL_new->[$Ktop], 'b', SPACE );
+ my $rseqno_DOLLAR_underscore =
+ $self->[_rDOLLAR_underscore_by_sub_seqno_]->{$seqno_block};
+ if ( !defined($rKlist) && $rseqno_DOLLAR_underscore ) {
+ my $ok;
+ foreach my $seqno_DOLLAR ( @{$rseqno_DOLLAR_underscore} ) {
+ $ok = 0;
+ my $Ko = $K_opening_container->{$seqno_DOLLAR};
+ my $Kn = $self->K_next_code($Ko);
+ last unless ($Kn);
+ last unless ( $rLL->[$Kn]->[_TYPE_] eq 'n' );
+ my $token = ( $rLL->[$Kn]->[_TOKEN_] );
+ last unless ( $token =~ /^\d+$/ );
+ my $Knn = $self->K_next_code($Kn);
+ my $Kc = $K_closing_container->{$seqno_DOLLAR};
+ last unless ( $Knn && $Kc && $Knn == $Kc );
- # Convert the existing blank to:
- # a phantom semicolon for one_line_block option = 0 or 1
- # a real semicolon for one_line_block option = 2
- my $tok = EMPTY_STRING;
- my $len_tok = 0;
- if ( $rOpts_one_line_block_semicolons == 2 ) {
- $tok = ';';
- $len_tok = 1;
+ if ( !defined($at_index_min) || $token < $at_index_min ) {
+ $at_index_min = $token;
+ if ( !defined($seqno_at_index_min) ) {
+ $seqno_at_index_min = $seqno_DOLLAR;
+ }
+ }
+ if ( !defined($at_index_max) || $token > $at_index_max ) {
+ $at_index_max = $token;
+ }
+ $ok = 1;
+ }
+ if ( !$ok ) {
+ $at_index_min = undef;
+ $at_index_max = undef;
}
+ }
+
+ # flag indicating we saw a "pop @_" or just "pop;";
+ my $saw_pop_at_underscore;
- $rLL_new->[$Ktop]->[_TOKEN_] = $tok;
- $rLL_new->[$Ktop]->[_TOKEN_LENGTH_] = $len_tok;
- $rLL_new->[$Ktop]->[_TYPE_] = ';';
+ my $ix_HERE_END = -1;
- $self->[_rtype_count_by_seqno_]->{$type_sequence}->{';'}++;
+ my $K_sub = $self->find_sub_token($seqno_block);
- # NOTE: we are changing the output stack without updating variables
- # $last_nonblank_code_type, etc. Future needs might require that
- # those variables be updated here. For now, it seems ok to skip
- # this.
+ # shouldn't happen:
+ if ( !defined($K_sub) || $K_sub >= $K_opening_block ) {
+ if ( !defined($K_sub) ) { $K_sub = 'undef' }
+ DEVEL_MODE && Fault("Bad K_sub=$K_sub, opening=$K_opening_block\n");
+ return;
+ }
- # Then store a new blank
- $self->store_token($rcopy);
+ #----------------------------------
+ # Check for and process a prototype
+ #----------------------------------
+ my $sub_token = $rLL->[$K_sub]->[_TOKEN_];
+ my $iproto_beg = index( $sub_token, '(' );
+ if ( $iproto_beg > 0 ) {
+ my $iproto_end = index( $sub_token, ')', $iproto_beg );
+ if ( $iproto_end > $iproto_beg ) {
+ my $prototype =
+ substr( $sub_token, $iproto_beg, $iproto_end - $iproto_beg + 1 );
+ my ( $prototype_count_min, $prototype_count_max ) =
+ count_prototype_args($prototype);
+ $item->{prototype} = $prototype;
+ $item->{prototype_count_min} = $prototype_count_min;
+ $item->{prototype_count_max} = $prototype_count_max;
+
+ # Since we don't yet know if we must add 1 for a method call, we
+ # will just continue normally and let the caller figure it out.
+ }
+ }
+
+ #---------------------------------------
+ # Check for and process a signature list
+ #---------------------------------------
+ my $Ksub_p = $self->K_next_code($K_sub);
+ if ( $rLL->[$Ksub_p]->[_TYPE_SEQUENCE_]
+ && $rLL->[$Ksub_p]->[_TOKEN_] eq '(' )
+ {
+ # Switch to searching the signature container. We will get the
+ # count when we arrive at the closing token.
+ my $seqno_list = $rLL->[$Ksub_p]->[_TYPE_SEQUENCE_];
+ $item->{seqno_list} = $seqno_list;
+ $item->{is_signature} = 1;
+ $self->count_list_elements($item);
+
+ # We are finished for a signature list
+ return;
}
- else {
- # Patch for issue c078: keep line indexes in order. If the top
- # token is a space that we are keeping (due to '-wls=';') then
- # we have to check that old line indexes stay in order.
- # In very rare
- # instances in which side comments have been deleted and converted
- # into blanks, we may have filtered down multiple blanks into just
- # one. In that case the top blank may have a higher line number
- # than the previous nonblank token. Although the line indexes of
- # blanks are not really significant, we need to keep them in order
- # in order to pass error checks.
- if ( $rLL_new->[$Ktop]->[_TYPE_] eq 'b' ) {
- my $old_top_ix = $rLL_new->[$Ktop]->[_LINE_INDEX_];
- my $new_top_ix = $rLL_new->[$Kp]->[_LINE_INDEX_];
- if ( $new_top_ix < $old_top_ix ) {
- $rLL_new->[$Ktop]->[_LINE_INDEX_] = $new_top_ix;
- }
- }
+ #-------------------------------------------------------------
+ # Main loop: look for =shift; and =@_; within sub block braces
+ #-------------------------------------------------------------
+ my $seqno = $seqno_block;
+ my $K_opening = $self->[_K_opening_container_]->{$seqno};
+ my $K_closing = $self->[_K_closing_container_]->{$seqno};
+ return unless defined($K_closing);
- my $rcopy = copy_token_as_type( $rLL_new->[$Kp], ';', EMPTY_STRING );
- $self->store_token($rcopy);
- }
- return;
-} ## end sub add_phantom_semicolon
+ my $level_opening = $rLL->[$K_opening]->[_LEVEL_];
-sub add_trailing_comma {
+ # Count number of 'shift;' at the top level
+ my $shift_count = 0;
+ my $self_name = EMPTY_STRING;
+ my $semicolon_count_after_last_shift = 0;
+ my $in_interpolated_quote;
- # Implement the --add-trailing-commas flag to the line end before index $KK:
+ my $KK = $K_opening;
+ my $KK_this_nb = $KK;
+ while ( ++$KK < $K_closing ) {
- my ( $self, $KK, $Kfirst, $trailing_comma_rule ) = @_;
+ # safety check - shouldn't happen
+ if ( !$KK || $KK <= $KK_this_nb ) {
+ if (DEVEL_MODE) {
+ my $lno = $rLL->[$KK_this_nb]->[_LINE_INDEX_] + 1;
+ Fault("near line $lno: index $KK decreased, was $KK_this_nb\n");
+ }
+ return;
+ }
- # Input parameter:
- # $KK = index of closing token in old ($rLL) token list
- # which starts a new line and is not preceded by a comma
- # $Kfirst = index of first token on the current line of input tokens
- # $add_flags = user control flags
+ my $type = $rLL->[$KK]->[_TYPE_];
+ next if ( $type eq 'b' );
+ next if ( $type eq '#' );
+ $KK_this_nb = $KK;
- # For example, we might want to add a comma here:
+ my $token = $rLL->[$KK]->[_TOKEN_];
- # bless {
- # _name => $name,
- # _price => $price,
- # _rebate => $rebate <------ location of possible bare comma
- # }, $pkg;
- # ^-------------------closing token at index $KK on new line
+ # Note that '$_' here is marked as type 'Z': print $_[0];
+ if ( $type eq 'i' || $type eq 'Z' ) {
+
+ # look for '@_'
+ if ( $token eq '@_' ) {
+
+ # Found '@_': the search will end here
+ my $level = $rLL->[$KK]->[_LEVEL_];
+
+ # Give up upon finding @_ at a lower level
+ return unless ( $level == $level_opening + 1 );
+
+ # Look ahead for ';'
+ my $K_p = $self->K_next_code($KK);
+ return unless ($K_p);
+ return unless ( $rLL->[$K_p]->[_TYPE_] eq ';' );
+
+ # Look back for ' = @_'
+ my $K_m = $self->K_previous_code($KK);
+ return unless defined($K_m);
+ my $type_m = $rLL->[$K_m]->[_TYPE_];
+ return unless ( $type_m eq '=' );
+
+ # Look back for ' ) = @_'
+ my $K_mm = $self->K_previous_code($K_m);
+ return unless defined($K_mm);
+ my $token_mm = $rLL->[$K_mm]->[_TOKEN_];
+ my $seqno_mm = $rLL->[$K_mm]->[_TYPE_SEQUENCE_];
+
+ # Count args in the list ( ... ) = @_;
+ if ( $seqno_mm && $token_mm eq ')' ) {
+ $item->{seqno_list} = $seqno_mm;
+ $item->{is_signature} = 0;
+ $item->{shift_count_min} = $shift_count;
+ $item->{shift_count_max} = $shift_count;
+ $self->count_list_elements($item);
+
+ # Count default args placed in separate parens, such as:
+ # .. $v1 ,($def1, $def2)) = @_
+ # .. $v1 ,($def1, $def2),) = @_
+
+ # look at the token before the last ')'
+ my $K_mm_p = $self->K_previous_code($K_mm);
+ my $token_mm_p =
+ $K_mm_p ? $rLL->[$K_mm_p]->[_TOKEN_] : SPACE;
+
+ # skip past a trailing comma
+ if ( $token_mm_p eq ',' ) {
+ $K_mm_p = $self->K_previous_code($K_mm_p);
+ $token_mm_p =
+ $K_mm_p ? $rLL->[$K_mm_p]->[_TOKEN_] : SPACE;
+ }
- # Do not add a comma if it would follow a comment
- my $Kp = $self->K_previous_nonblank( undef, $rLL_new );
- return unless ( defined($Kp) );
- my $type_p = $rLL_new->[$Kp]->[_TYPE_];
- return if ( $type_p eq '#' );
+ # if we find a closing paren, count the items and
+ # update shift_count_min
+ if ( $token_mm_p eq ')' ) {
+ my $seqno_mm_p = $rLL->[$K_mm_p]->[_TYPE_SEQUENCE_];
+ $self->count_default_sub_args( $item, $seqno_mm_p );
+ }
- # see if the user wants a trailing comma here
- my $match =
- $self->match_trailing_comma_rule( $KK, $Kfirst, $Kp,
- $trailing_comma_rule, 1 );
+ # NOTE: this could disagree with $_[n] usage; we
+ # ignore this for now.
+ return;
+ }
- # if so, add a comma
- if ($match) {
- my $Knew = $self->store_new_token( ',', ',', $Kp );
- }
+ # Give up if = @_ is not preceded by a simple list
+ return;
+ }
- return;
+ # Give up if we find an indexed ref to $_[..]
+ elsif ( $token eq '$_' ) {
-} ## end sub add_trailing_comma
+ # Found $_: currently the search ends at '$_['
+ my $Kn = $self->K_next_code($KK);
+ if ( $Kn && $rLL->[$Kn]->[_TOKEN_] eq '[' ) {
-sub delete_trailing_comma {
+ # Give up unless we might be able to define a count
+ # when there are just references to $_[n] values
+ if ( !defined($at_index_max) || $shift_count ) {
+ return;
+ }
+ }
+ }
- my ( $self, $KK, $Kfirst, $trailing_comma_rule ) = @_;
+ # Give up at something like '&func;'
+ elsif ( substr( $token, 0, 1 ) eq '&' ) {
+ my $Kn = $self->K_next_code($KK);
+ if ( $Kn && $rLL->[$Kn]->[_TOKEN_] ne '(' ) {
+ return;
+ }
+ }
- # Apply the --delete-trailing-commas flag to the comma before index $KK
+ else {
+ # continue search
+ }
+ }
- # Input parameter:
- # $KK = index of a closing token in OLD ($rLL) token list
- # which is preceded by a comma on the same line.
- # $Kfirst = index of first token on the current line of input tokens
- # $delete_option = user control flag
+ #------------------------------
+ # look for '=shift;' or '=pop;'
+ #------------------------------
+ elsif ( $type eq 'k' ) {
+ if ( $is_shift_pop{$token} ) {
+
+ # look for 'shift;' and count as 1 arg
+ my $Kp = $self->K_next_code($KK);
+ return unless defined($Kp);
+ my $type_p = $rLL->[$Kp]->[_TYPE_];
+ my $token_p = $rLL->[$Kp]->[_TOKEN_];
+
+ # look for any of these with shift or pop:
+ # shift;
+ # shift @_;
+ # shift();
+ # shift(@_);
+
+ # remove any opening paren
+ my $in_parens;
+ if ( $token_p eq '(' ) {
+ $in_parens = 1;
+ $Kp = $self->K_next_code($Kp);
+ return unless defined($Kp);
+ $type_p = $rLL->[$Kp]->[_TYPE_];
+ $token_p = $rLL->[$Kp]->[_TOKEN_];
+ }
+
+ # look for '@_'
+ if ( $type_p eq 'i' || $type_p eq 't' ) {
+
+ # keep going if not @_
+ next if ( $token_p ne '@_' );
+
+ $Kp = $self->K_next_code($Kp);
+ return unless defined($Kp);
+ $type_p = $rLL->[$Kp]->[_TYPE_];
+ $token_p = $rLL->[$Kp]->[_TOKEN_];
+ }
+
+ # remove any closing paren
+ if ( $in_parens && $token_p eq ')' ) {
+ $Kp = $self->K_next_code($Kp);
+ return unless defined($Kp);
+ $type_p = $rLL->[$Kp]->[_TYPE_];
+ $token_p = $rLL->[$Kp]->[_TOKEN_];
+ }
+
+ # Just give up if this shift is not followed by a semicolon or
+ # closing brace or arrow. This is the safe thing to do to avoid
+ # false errors. There are too many ways for problems to arise.
+ # Especially if the next token is one of '||' '//' 'or'.
+ return
+ if ( $type_p ne ';' && $type_p ne '->' && $Kp ne $K_closing );
+ my $level = $rLL->[$KK]->[_LEVEL_];
+
+ # Give up on lower level shifts
+ return unless ( $level == $level_opening + 1 );
+
+ # If we get to the end without finding '(..) = @_;' then
+ # we will consider the count unreliable if we saw a 'pop'
+ # or if a previous block contained other statements.
+ $saw_pop_at_underscore ||= $token eq 'pop';
+
+ $shift_count++;
+ $semicolon_count_after_last_shift = 0;
+
+ # Save self name:
+ # '$self = shift'
+ # | | |
+ # $K_mm $K_m $KK
+ if ( $shift_count == 1 && !$self_name ) {
+ my $K_m = $self->K_previous_code($KK);
+ return unless ( defined($K_m) );
+ my $type_m = $rLL->[$K_m]->[_TYPE_];
+
+ # For something like: sub get_thing {shift->{thing}}
+ # use $_[0] as the name
+ if ( $type_p eq '->' ) {
+ if ( $type_m eq '{' || $type_m eq ';' ) {
+ $self_name = '$_[0]';
+ $item->{self_name} = $self_name;
+ }
+ }
+ else {
+ if ( $type_m eq '=' ) {
+
+ my $K_mm = $self->K_previous_code($K_m);
+ return unless defined($K_mm);
+
+ my $type_mm = $rLL->[$K_mm]->[_TYPE_];
+ my $token_mm = $rLL->[$K_mm]->[_TOKEN_];
+ my $seqno_mm = $rLL->[$K_mm]->[_TYPE_SEQUENCE_];
+
+ # check for $self in parens, like ($self)=shift
+ if ( $seqno_mm && $token_mm eq ')' ) {
+ my $Ko = $K_opening_container->{$seqno_mm};
+ $K_mm = $self->K_next_code($Ko);
+ if ($K_mm) {
+ $type_mm = $rLL->[$K_mm]->[_TYPE_];
+ $token_mm = $rLL->[$K_mm]->[_TOKEN_];
+ }
+ }
- # Returns true if the comma was deleted
+ if ( $type_mm eq 'i' ) {
+ $self_name = $token_mm;
- # For example, we might want to delete this comma:
- # my @asset = ("FASMX", "FASGX", "FASIX",);
- # | |^--------token at index $KK
- # | ^------comma of interest
- # ^-------------token at $Kfirst
+ # we store self_name immediately because it will
+ # be needed even if we cannot get an arg count
+ $item->{self_name} = $self_name;
+ }
+ }
+ }
+ }
- # Verify that the previous token is a comma. Note that we are working in
- # the new token list $rLL_new.
- my $Kp = $self->K_previous_nonblank( undef, $rLL_new );
- return unless ( defined($Kp) );
- if ( $rLL_new->[$Kp]->[_TYPE_] ne ',' ) {
+ # Skip past any parens and @_; let the semicolon be seen next
+ if ( $KK < $Kp - 1 ) { $KK = $Kp - 1 }
- # there must be a '#' between the ',' and closing token; give up.
- return;
- }
+ }
+ elsif ( $token eq 'bless' ) {
- # Do not delete commas when formatting under stress to avoid instability.
- # This fixes b1389, b1390, b1391, b1392. The $high_stress_level has
- # been found to work well for trailing commas.
- if ( $rLL_new->[$Kp]->[_LEVEL_] >= $high_stress_level ) {
- return;
- }
+ # Could look for something like the following:
+ # my $self = bless {}, $class;
+ # my $self = bless {}, shift;
- # See if the user wants this trailing comma
- my $match =
- $self->match_trailing_comma_rule( $KK, $Kfirst, $Kp,
- $trailing_comma_rule, 0 );
+ }
+ elsif ( $is_if_unless{$token} ) {
- # Patch: the --noadd-whitespace flag can cause instability in complex
- # structures. In this case do not delete the comma. Fixes b1409.
- if ( !$match && !$rOpts_add_whitespace ) {
- my $Kn = $self->K_next_nonblank($KK);
- if ( defined($Kn) ) {
- my $type_n = $rLL->[$Kn]->[_TYPE_];
- if ( $type_n ne ';' && $type_n ne '#' ) { return }
- }
- }
+ #-------------------------------
+ # RETURN: Optional early return.
+ #-------------------------------
+ # Give up and exit at 'if' or 'unless' if we have seen a few
+ # semicolons following the last 'shift'. The number '2' here
+ # has been found to work well.
+ if ( $semicolon_count_after_last_shift > 2 ) {
+ if ( !defined($max_arg_count)
+ || $max_arg_count <= $shift_count )
+ {
- # If no match, delete it
- if ( !$match ) {
+ if ( !$saw_pop_at_underscore
+ && $KK >= $K_last_at_underscore )
+ {
+ $item->{shift_count_min} = $shift_count;
+ $item->{shift_count_max} = $shift_count;
+ }
+ return;
+ }
+ }
+ }
+ else {
+ }
+ }
- return $self->unstore_last_nonblank_token(',');
- }
- return;
+ # Check for a container boundary
+ elsif ( $rLL->[$KK]->[_TYPE_SEQUENCE_] ) {
+ if ( $is_opening_type{$type} ) {
-} ## end sub delete_trailing_comma
+ my $seqno_test = $rLL->[$KK]->[_TYPE_SEQUENCE_];
-sub delete_weld_interfering_comma {
+ #---------------------------------------------
+ # Skip past a sub declearation within this sub
+ #---------------------------------------------
+ if ( $self->[_ris_sub_block_]->{$seqno_test}
+ || $self->[_ris_asub_block_]->{$seqno_test} )
+ {
+ my $Kc = $self->[_K_closing_container_]->{$seqno_test};
+ return if ( !$Kc );
+ return if ( $Kc <= $KK );
+ $KK = $Kc;
+ }
+ }
+ }
+ elsif ( $type eq ';' ) {
+ $semicolon_count_after_last_shift++;
+ }
- my ( $self, $KK ) = @_;
+ # scan a quote for @_ and $_[
+ elsif ( $type eq 'Q' ) {
- # Apply the flag '--delete-weld-interfering-commas' to the comma
- # before index $KK
+ my $K_last_code = $self->K_previous_code($KK);
+ next unless defined($K_last_code);
+ my $K_last_type = $rLL->[$K_last_code]->[_TYPE_];
+ if ( $K_last_type eq 'Q' ) {
- # Input parameter:
- # $KK = index of a closing token in OLD ($rLL) token list
- # which is preceded by a comma on the same line.
+ # starting in quote : use old interpolation value
+ }
+ elsif ( $is_re_match_op{$K_last_type} ) {
+ $in_interpolated_quote = 1;
+ }
- # Returns true if the comma was deleted
+ # is not interpolated for leading operators: qw q tr y '
+ elsif ( $token =~ /^(qw | q[^qrx] | tr | [y\'] )/x ) {
+ $in_interpolated_quote = 0;
+ }
- # For example, we might want to delete this comma:
+ # is interpolated for everything else
+ else {
+ $in_interpolated_quote = 1;
+ }
- # my $tmpl = { foo => {no_override => 1, default => 42}, };
- # || ^------$KK
- # |^---$Kp
- # $Kpp---^
- #
- # Note that:
- # index $KK is in the old $rLL array, but
- # indexes $Kp and $Kpp are in the new $rLL_new array.
+ # look for '@_' and '$_[' in an interpolated quote
+ next unless ($in_interpolated_quote);
+ my $pos;
+ $pos = index( $token, '@_' );
+ return
+ if ( $pos == 0
+ || $pos > 0 && substr( $token, $pos - 1, 1 ) ne BACKSLASH );
- my $type_sequence = $rLL->[$KK]->[_TYPE_SEQUENCE_];
- return unless ($type_sequence);
+ $pos = index( $token, '$_[' );
+ return
+ if ( $pos == 0
+ || $pos > 0 && substr( $token, $pos - 1, 1 ) ne BACKSLASH );
+ }
- # Find the previous token and verify that it is a comma.
- my $Kp = $self->K_previous_nonblank( undef, $rLL_new );
- return unless ( defined($Kp) );
- if ( $rLL_new->[$Kp]->[_TYPE_] ne ',' ) {
+ # scan here text for @_ and $_[
+ elsif ( $type eq 'h' ) {
+ next if ( !is_interpolated_here_doc($token) );
+ my $ix_line = $rLL->[$KK]->[_LINE_INDEX_];
+ my $ix_HERE = max( $ix_HERE_END, $ix_line );
+ ( $ix_HERE_END, my $here_text ) = $self->get_here_text($ix_HERE);
- # it is not a comma, so give up ( it is probably a '#' )
- return;
- }
+ if ($here_text) {
+ my $pos;
+ $pos = index( $here_text, '@_' );
+ return
+ if (
+ $pos == 0
+ || ( $pos > 0
+ && substr( $here_text, $pos - 1, 1 ) ne BACKSLASH )
+ );
- # This must be the only comma in this list
- my $rtype_count = $self->[_rtype_count_by_seqno_]->{$type_sequence};
- return
- unless ( defined($rtype_count)
- && $rtype_count->{','}
- && $rtype_count->{','} == 1 );
+ $pos = index( $here_text, '$_[' );
+ return
+ if (
+ $pos == 0
+ || ( $pos > 0
+ && substr( $here_text, $pos - 1, 1 ) ne BACKSLASH )
+ );
+ }
+ }
+ else {
+ # continue search
+ }
+ } ## end while ( ++$KK < $K_closing)
- # Back up to the previous closing token
- my $Kpp = $self->K_previous_nonblank( $Kp, $rLL_new );
- return unless ( defined($Kpp) );
- my $seqno_pp = $rLL_new->[$Kpp]->[_TYPE_SEQUENCE_];
- my $type_pp = $rLL_new->[$Kpp]->[_TYPE_];
+ #--------------------------------
+ # the whole file has been scanned
+ #--------------------------------
- # The containers must be nesting (i.e., sequence numbers must differ by 1 )
- if ( $seqno_pp && $is_closing_type{$type_pp} ) {
- if ( $seqno_pp == $type_sequence + 1 ) {
+ # if no shifts @_ and no references to @_, look for $[n]
+ if ( defined($at_index_max) && !$shift_count ) {
+ $shift_count = $at_index_max + 1;
- # remove the ',' from the top of the new token list
- return $self->unstore_last_nonblank_token(',');
+ # Create a self name like '$_[0]' if we can't find user-defined name.
+ # Then any sub calls with '$_[0]->' will be recognized as self
+ # calls by sub cross_check_sub_calls.
+ if ( !$self_name && $at_index_min == 0 ) {
+ $self_name = $dollar_underscore_zero_name->();
+ $self_name = '$_[0]' unless ($self_name);
+ $item->{self_name} = $self_name;
}
}
+
+ if ( !$saw_pop_at_underscore ) {
+ $item->{shift_count_min} = $shift_count;
+ $item->{shift_count_max} = $shift_count;
+ }
return;
-} ## end sub delete_weld_interfering_comma
+} ## end sub count_sub_input_args
-sub unstore_last_nonblank_token {
+use constant DEBUG_RETURN_COUNT => 0;
- my ( $self, $type ) = @_;
+sub count_sub_return_args {
+ my ( $self, $item ) = @_;
- # remove the most recent nonblank token from the new token list
- # Input parameter:
- # $type = type to be removed (for safety check)
+ # Given: $item = hash ref with
+ # seqno => sequence number of a sub block
+ # Set values for these keys in '$item':
+ # return_count_min => minimum number of output args
+ # = undef if indeterminate, such as @list
+ # K_return_count_min => K value of the min
+ # return_count_max => maximum number of output args
+ # = undef if indeterminate, such as @list
+ # K_return_count_max => K value of the max
+ my $seqno_sub = $item->{seqno};
+ return unless ($seqno_sub);
+
+ my $rKlist = $self->[_rK_return_by_sub_seqno_]->{$seqno_sub};
+ return if ( !defined($rKlist) );
+
+ # loop over all return statements in this sub
+ my $rLL = $self->[_rLL_];
+ my $rhash = {};
+ my $rK_return_count_hash = {};
+
+ # retain old vars during transition phase
+ my $return_count_min;
+ my $return_count_max;
+
+ foreach ( @{$rKlist} ) {
+ my $K_return = $rLL->[$_]->[_TYPE_] eq 'b' ? $_ + 1 : $_;
+## my $type = $rLL->[$K_return]->[_TYPE_];
+ my $token = $rLL->[$K_return]->[_TOKEN_];
+ if ( $token ne 'return' ) {
+ DEVEL_MODE && Fault("expecting 'return' but got $token\n");
+ last;
+ }
+ $rhash->{K_list_start} = $K_return;
+ $self->count_list_elements($rhash);
+ my $count = $rhash->{shift_count_max};
+ if ( !defined($count) ) {
+ $item->{return_count_indefinite} = $K_return;
+ $item->{return_count_max} = undef;
+ last;
+ }
- # Returns true if success
- # false if error
+ # new count?
+ if ( !$rK_return_count_hash->{$count} ) {
+ $rK_return_count_hash->{$count} = $K_return;
+ }
- # This was written and is used for removing commas, but might
- # be useful for other tokens. If it is ever used for other tokens
- # then the issue of what to do about the other variables, such
- # as token counts and the '$last...' vars needs to be considered.
+ # retain old vars during transition phase
+ # Note: using <= to match old results but could use <
+ if ( !defined($return_count_min) || $count <= $return_count_min ) {
+ $return_count_min = $count;
+ $item->{return_count_min} = $count;
+ $item->{K_return_count_min} = $K_return;
+ }
- # Safety check, shouldn't happen
- if ( @{$rLL_new} < 3 ) {
- DEVEL_MODE && Fault("not enough tokens on stack to remove '$type'\n");
- return;
+ # Note: using >= to match old results but could use >
+ if ( !defined($return_count_max) || $count >= $return_count_max ) {
+ $return_count_max = $count;
+ $item->{return_count_max} = $count;
+ $item->{K_return_count_max} = $K_return;
+ }
+ }
+
+ $item->{rK_return_count_hash} = $rK_return_count_hash;
+
+ if ( DEBUG_RETURN_COUNT > 1 ) {
+ my $min = $item->{return_count_min};
+ my $max = $item->{return_count_max};
+ $min = '*' unless defined($min);
+ $max = '*' unless defined($max);
+ print "DEBUG_RETURN: returning min=$min max=$max\n";
}
+ return;
+} ## end sub count_sub_return_args
- my ( $rcomma, $rblank );
+sub count_return_values_wanted {
+ my ( $self, $item ) = @_;
- # case 1: pop comma from top of stack
- if ( $rLL_new->[-1]->[_TYPE_] eq $type ) {
- $rcomma = pop @{$rLL_new};
+ # Given: $item = a hash ref with
+ # seqno_list => sequence number the call arg list of a sub call
+ # Set value for this key in '$item':
+ # return_count_wanted => number of return items wanted from the call
+ # = undef if indeterminate, such as @list
+
+ # get the sequence number of the call arg list for this call
+ my $seqno_list = $item->{seqno_list};
+ return unless ($seqno_list);
+
+ # Give up if call is followed by a bound operator, for example
+ # my ( $fh, $tmpfile ) = $self->io()->tempfile( DIR => $dir );
+ # |
+ # ^--$Kc
+ my $rLL = $self->[_rLL_];
+ my $Kc = $self->[_K_closing_container_]->{$seqno_list};
+ my $Kc_n = $self->K_next_code($Kc);
+ if ($Kc_n) {
+ my $type_n = $rLL->[$Kc_n]->[_TYPE_];
+ my $ok = $type_n eq ';' || $is_closing_type{$type_n};
+ if ( !$ok && $type_n eq 'k' ) {
+ my $token_n = $rLL->[$Kc_n]->[_TOKEN_];
+ $ok ||= $is_if_unless{$token_n};
+ $ok ||= $is_and_or{$token_n};
+ }
+ return unless $ok;
+ }
+
+ my $Ko = $self->[_K_opening_container_]->{$seqno_list};
+ my $K_m = $self->K_previous_code($Ko);
+ my $K_mm = $self->K_previous_code($K_m);
+ return unless ( defined($K_mm) );
+ my $type_m = $rLL->[$K_m]->[_TYPE_];
+ my $token_m = $rLL->[$K_m]->[_TOKEN_];
+ my $type_mm = $rLL->[$K_mm]->[_TYPE_];
+
+ # start of backwards search depends on the call type...
+ # note: see var $rsub_call_paren_info_by_seqno in sub respace_tokens
+ my $K_equals;
+
+ # 'function('
+ if ( $type_m eq 'U' || $type_m eq 'w' ) {
+ $K_equals = $K_mm;
+ }
+
+ # '->function('
+ elsif ( $type_m eq 'i' && $type_mm eq '->' ) {
+ my $K_mmm = $self->K_previous_code($K_mm);
+ my $K_mm4 = $self->K_previous_code($K_mmm);
+ return unless defined($K_mm4);
+ my $type_mmm = $rLL->[$K_mmm]->[_TYPE_];
+
+ # something like '$self->function('
+ if ( $type_mmm eq 'i' ) {
+ $K_equals = $K_mm4;
+ }
+
+ # something complex like '$hash_of_objects{my_obj}->function('
+ else {
+
+ # TBD:
+ return;
+ }
}
- # case 2: pop blank and then comma from top of stack
- elsif ($rLL_new->[-1]->[_TYPE_] eq 'b'
- && $rLL_new->[-2]->[_TYPE_] eq $type )
- {
- $rblank = pop @{$rLL_new};
- $rcomma = pop @{$rLL_new};
+ # '&function('
+ elsif ( $type_m eq 'i' && substr( $token_m, 0, 1 ) eq '&' ) {
+ $K_equals = $K_mm;
}
- # case 3: error, shouldn't happen unless bad call
+ # '$function->(' [ TODO: simple anonymous sub call, not used yet ]
+ elsif ( $type_m eq '->' && $type_mm eq 'i' ) {
+ my $K_mmm = $self->K_previous_code($K_mm);
+ $K_equals = $K_mmm;
+ }
+
+ # error
else {
- DEVEL_MODE && Fault("Could not find token type '$type' to remove\n");
+ DEVEL_MODE
+ && Fault(
+"unexpected call with type_m=$type_m token_m=$token_m type_mm=$type_mm\n"
+ );
return;
}
- # A note on updating vars set by sub store_token for this comma: If we
- # reduce the comma count by 1 then we also have to change the variable
- # $last_nonblank_code_type to be $last_last_nonblank_code_type because
- # otherwise sub store_token is going to ALSO reduce the comma count.
- # Alternatively, we can leave the count alone and the
- # $last_nonblank_code_type alone. Then sub store_token will produce
- # the correct result. This is simpler and is done here.
+ # look for '='
+ # Note that this ignores a return via a slice, like
+ # ($v1,$v2) =(f(x))[1,3]
+ # because this is an array return, and we just want explicit lists
+ if ( !$K_equals || $rLL->[$K_equals]->[_TYPE_] ne '=' ) {
+ return;
+ }
+
+ my $K_c = $self->K_previous_code($K_equals);
+ return unless ( defined($K_c) );
+ my $type_c = $rLL->[$K_c]->[_TYPE_];
+ my $token_c = $rLL->[$K_c]->[_TOKEN_];
+ if ( $token_c ne ')' ) {
- # Now add a blank space after the comma if appropriate.
- # Some unusual spacing controls might need another iteration to
- # reach a final state.
- if ( $rLL_new->[-1]->[_TYPE_] ne 'b' ) {
- if ( defined($rblank) ) {
- $rblank->[_CUMULATIVE_LENGTH_] -= 1; # fix for deleted comma
- push @{$rLL_new}, $rblank;
+ # Handle @array = f(x) or $scalar=f(x), and things like
+ # $rhash->{vv} = f();
+ # $hash{vv} = f();
+ # $array[$index] = f();
+ if ( $is_closing_type{$type_c} ) {
+
+ # backup from the closing brace to any identifier
+ # Note: currently only going back one index, a sub could
+ # be written to handle more complex things
+ my $seqno_c = $rLL->[$K_c]->[_TYPE_SEQUENCE_];
+ return if ( !$seqno_c );
+ my $Ko_c = $self->[_K_opening_container_]->{$seqno_c};
+ return unless defined($Ko_c);
+ my $K_c_new = $self->K_previous_code($Ko_c);
+ return unless defined($K_c_new);
+ $type_c = $rLL->[$K_c_new]->[_TYPE_];
+ $token_c = $rLL->[$K_c_new]->[_TOKEN_];
+
+ if ( $type_c eq '->' ) {
+ $K_c_new = $self->K_previous_code($K_c_new);
+ return unless defined($K_c_new);
+ $type_c = $rLL->[$K_c_new]->[_TYPE_];
+ $token_c = $rLL->[$K_c_new]->[_TOKEN_];
+ }
+ }
+
+ if ( $type_c eq 'i' || $type_c eq 't' ) {
+ my $sigil = substr( $token_c, 0, 1 );
+ if ( $sigil eq '$' ) {
+ $item->{return_count_wanted} = 1;
+ $item->{want_scalar} = 1;
+ }
}
+ return;
}
- return 1;
-} ## end sub unstore_last_nonblank_token
-sub match_trailing_comma_rule {
+ # Count elements in (list of values)=f(x)
+ my $seqno_lhs = $rLL->[$K_c]->[_TYPE_SEQUENCE_];
+ return unless ($seqno_lhs);
+ my $rhash = {};
+ $rhash->{seqno_list} = $seqno_lhs;
+ $self->count_list_elements($rhash);
+ my $return_count_wanted = $rhash->{shift_count_max};
+ if ( DEBUG_RETURN_COUNT > 1 ) {
+ print "DEBUG_RETURN_COUNT: want $return_count_wanted\n";
+ }
+ $item->{return_count_wanted} = $return_count_wanted;
+ return;
+} ## end sub count_return_values_wanted
- my ( $self, $KK, $Kfirst, $Kp, $trailing_comma_rule, $if_add ) = @_;
+sub sub_def_info_maker {
- # Decide if a trailing comma rule is matched.
+ my ( $self, $rpackage_lookup_list, $rprelim_call_info ) = @_;
- # Input parameter:
- # $KK = index of closing token in old ($rLL) token list which follows
- # the location of a possible trailing comma. See diagram below.
- # $Kfirst = (old) index of first token on the current line of input tokens
- # $Kp = index of previous nonblank token in new ($rLL_new) array
- # $trailing_comma_rule = packed user control flags
- # $if_add = true if adding comma, false if deleteing comma
+ # Given:
+ # $rpackage_lookup_list = list with info for finding containing package
+ # $rprelim_call_info = hash ref with first try at call info
+
+ # Returns two hash references:
+ # \%sub_info_by_seqno,
+ # \%sub_seqno_by_key,
+ # where
+ # $sub_info_by_seqno{seqno} = {
+ # seqno => $seqno,
+ # package => $package,
+ # name => $name,
+ # seqno_list => $seqno of the paren list of args
+ # shift_count => number of args
+ # is_signature => true if seqno_list is a sub signature
+ # self_name => name of first arg
+ # }
+ # and
+ # $sub_seqno_by_key{'package::name'} = seqno;
+ # which gives the seqno for a sub name
+
+ # TODO: possible future update:
+ # package name for 'my' sub and anonymous sub will be parent sub seqno
- # Returns:
- # false if no match
- # true if match
+ my $rLL = $self->[_rLL_];
+ my $K_opening_container = $self->[_K_opening_container_];
+ my $rblock_type_of_seqno = $self->[_rblock_type_of_seqno_];
+ my $ris_sub_block = $self->[_ris_sub_block_];
- # For example, we might be checking for addition of a comma here:
+ #----------------------------------
+ # Main loop over subs to count args
+ #----------------------------------
+ my @package_stack = reverse( @{$rpackage_lookup_list} );
+ my ( $current_package, $Kbegin, $Kend ) = @{ pop @package_stack };
+ my %sub_info_by_seqno;
+ my %sub_seqno_by_key;
+ foreach my $seqno ( sort { $a <=> $b } keys %{$ris_sub_block} ) {
- # bless {
- # _name => $name,
- # _price => $price,
- # _rebate => $rebate <------ location of possible trailing comma
- # }, $pkg;
- # ^-------------------closing token at index $KK
+ # update the current package
+ my $Ko = $K_opening_container->{$seqno};
+ while ( $Ko > $Kend && @package_stack ) {
+ ( $current_package, $Kbegin, $Kend ) = @{ pop @package_stack };
+ }
+ my $block_type = $rblock_type_of_seqno->{$seqno};
- return unless ($trailing_comma_rule);
- my ( $trailing_comma_style, $paren_flag ) = @{$trailing_comma_rule};
+ #-----------------------------
+ # Get the sub name and package
+ #-----------------------------
- # List of $trailing_comma_style values:
- # undef stable: do not change
- # '0' : no list should have a trailing comma
- # '1' or '*' : every list should have a trailing comma
- # 'm' a multi-line list should have a trailing commas
- # 'b' trailing commas should be 'bare' (comma followed by newline)
- # 'h' lists of key=>value pairs with a bare trailing comma
- # 'i' same as s=h but also include any list with no more than about one
- # comma per line
- # ' ' or -wtc not defined : leave trailing commas unchanged [DEFAULT].
+ # Examples of what we want to extract from '$block_type':
+ # $block_type $name
+ # 'sub setidentifier($)' => 'setidentifier'
+ # 'method setidentifier($)' => 'setidentifier'
+ # Examples:
+ # "sub hello", "sub hello($)", "sub hello ($)"
+ # There will be a single space after 'sub' but any number before
+ # prototype
+ my $name = $block_type;
+ my $pos_space = index( $block_type, SPACE );
+ if ( $pos_space > 0 ) {
+ $name = substr( $block_type, $pos_space + 1 );
+ }
+ my $pos_paren = index( $name, '(' );
+ my $prototype;
+ if ( $pos_paren > 0 ) {
+ $prototype = substr( $name, $pos_paren );
+ $name = substr( $name, 0, $pos_paren );
+ $name =~ s/\s+$//;
+ }
+
+ my $package = $current_package;
+ if ( ( index( $name, ':' ) >= 0 || index( $name, "'" ) >= 0 )
+ && $name =~ /^(.*\W)(\w+)$/ )
+ {
+ $package = $1;
+ $name = $2;
+ $package =~ s/\'/::/g;
+ $package =~ s/::$//;
+ }
+ $package = 'main' unless ($package);
+
+ # Make a hash of info for this sub
+ my $lno = $rLL->[$Ko]->[_LINE_INDEX_] + 1;
+ my $item = {
+ seqno => $seqno,
+ package => $package,
+ name => $name,
+ line_number => $lno,
+ };
- # Note: an interesting generalization would be to let an upper case
- # letter denote the negation of styles 'm', 'b', 'h', 'i'. This might
- # be useful for undoing operations. It would be implemented as a wrapper
- # around this routine.
+ my $key = $package . '::' . $name;
- #-----------------------------------------
- # No style defined : do not add or delete
- #-----------------------------------------
- if ( !defined($trailing_comma_style) ) { return !$if_add }
+ # Set flag indicating if args may be expected to allow optimization
+ my $call_item = $rprelim_call_info->{$key};
+ $item->{max_arg_count} = $call_item->{max_arg_count};
- #----------------------------------------
- # Set some flags describing this location
- #----------------------------------------
- my $type_sequence = $rLL->[$KK]->[_TYPE_SEQUENCE_];
- return unless ($type_sequence);
- my $closing_token = $rLL->[$KK]->[_TOKEN_];
- my $rtype_count = $self->[_rtype_count_by_seqno_]->{$type_sequence};
- return unless ( defined($rtype_count) && $rtype_count->{','} );
- my $is_permanently_broken =
- $self->[_ris_permanently_broken_]->{$type_sequence};
+ # Add a count of the number of input args
+ $self->count_sub_input_args($item);
- # Note that _ris_broken_container_ also stores the line diff
- # but it is not available at this early stage.
- my $K_opening = $self->[_K_opening_container_]->{$type_sequence};
- return if ( !defined($K_opening) );
-
- # multiline definition 1: opening and closing tokens on different lines
- my $iline_o = $rLL_new->[$K_opening]->[_LINE_INDEX_];
- my $iline_c = $rLL->[$KK]->[_LINE_INDEX_];
- my $line_diff_containers = $iline_c - $iline_o;
- my $has_multiline_containers = $line_diff_containers > 0;
-
- # multiline definition 2: first and last commas on different lines
- my $iline_first = $self->[_rfirst_comma_line_index_]->{$type_sequence};
- my $iline_last = $rLL_new->[$Kp]->[_LINE_INDEX_];
- my $has_multiline_commas;
- my $line_diff_commas = 0;
- if ( !defined($iline_first) ) {
-
- # shouldn't happen if caller checked comma count
- my $type_kp = $rLL_new->[$Kp]->[_TYPE_];
- Fault(
-"at line $iline_last but line of first comma not defined, at Kp=$Kp, type=$type_kp\n"
- ) if (DEVEL_MODE);
- }
- else {
- $line_diff_commas = $iline_last - $iline_first;
- $has_multiline_commas = $line_diff_commas > 0;
- }
+ # Add a count of the number of return args
+ $self->count_sub_return_args($item);
- # To avoid instability in edge cases, when adding commas we uses the
- # multiline_commas definition, but when deleting we use multiline
- # containers. This fixes b1384, b1396, b1397, b1398, b1400.
- my $is_multiline =
- $if_add ? $has_multiline_commas : $has_multiline_containers;
+ # Store the sub info by sequence number
+ $sub_info_by_seqno{$seqno} = $item;
- my $is_bare_multiline_comma = $is_multiline && $KK == $Kfirst;
+ # and save the sub sequence number indexed by sub name
+ $sub_seqno_by_key{$key} = $seqno;
+ }
+ return ( \%sub_info_by_seqno, \%sub_seqno_by_key );
+} ## end sub sub_def_info_maker
- my $match;
+sub update_sub_call_paren_info {
- #----------------------------
- # 0 : does not match any list
- #----------------------------
- if ( $trailing_comma_style eq '0' ) {
- $match = 0;
- }
+ my ( $self, $rpackage_lookup_list ) = @_;
- #------------------------------
- # '*' or '1' : matches any list
- #------------------------------
- elsif ( $trailing_comma_style eq '*' || $trailing_comma_style eq '1' ) {
- $match = 1;
- }
+ # Given:
+ # $rpackage_lookup_list = list with info for finding containing package
- #-----------------------------
- # 'm' matches a Multiline list
- #-----------------------------
- elsif ( $trailing_comma_style eq 'm' ) {
- $match = $is_multiline;
- }
+ # Update the hash of info about the call parameters with arg counts
+ # and package. It contains the sequence number of each paren and
+ # type of call, and we must add the arg count and package.
- #----------------------------------
- # 'b' matches a Bare trailing comma
- #----------------------------------
- elsif ( $trailing_comma_style eq 'b' ) {
- $match = $is_bare_multiline_comma;
- }
+ my $rLL = $self->[_rLL_];
+ my $K_opening_container = $self->[_K_opening_container_];
+ my $K_closing_container = $self->[_K_closing_container_];
+ my $rtype_count_by_seqno = $self->[_rtype_count_by_seqno_];
+ my $rsub_call_paren_info_by_seqno =
+ $self->[_rsub_call_paren_info_by_seqno_];
- #--------------------------------------------------------------------------
- # 'h' matches a bare hash list with about 1 comma and 1 fat comma per line.
- # 'i' matches a bare stable list with about 1 comma per line.
- #--------------------------------------------------------------------------
- elsif ( $trailing_comma_style eq 'h' || $trailing_comma_style eq 'i' ) {
+ my @package_stack = reverse( @{$rpackage_lookup_list} );
+ my ( $current_package, $Kbegin, $Kend ) = @{ pop @package_stack };
- # We can treat these together because they are similar.
- # The set of 'i' matches includes the set of 'h' matches.
+ my $is_dollar_underscore_zero = sub {
- # the trailing comma must be bare for both 'h' and 'i'
- return if ( !$is_bare_multiline_comma );
+ my ($K_closing_bracket) = @_;
- # There must be no more than one comma per line for both 'h' and 'i'
- # The new_comma_count here will include the trailing comma.
- my $new_comma_count = $rtype_count->{','};
- $new_comma_count += 1 if ($if_add);
- my $excess_commas = $new_comma_count - $line_diff_commas - 1;
- if ( $excess_commas > 0 ) {
+ # Given:
+ # $K_closing_bracket - index of a ']'
+ # Return:
+ # true of this is the end of '$_[0]'
+ # false otherwise
+ #
+ # return $_[0]->PP_decode_json(...
+ # |
+ # ---$K_closing_bracket
+ return unless ($K_closing_bracket);
+ my $seqno = $rLL->[$K_closing_bracket]->[_TYPE_SEQUENCE_];
+ return unless ($seqno);
+ my $Ko = $K_opening_container->{$seqno};
+ return unless defined($Ko);
+ my $Knum = $self->K_next_code($Ko);
+ return unless ( $Knum && $rLL->[$Knum]->[_TOKEN_] eq '0' );
+ my $Kc = $self->K_next_code($Knum);
+ return unless ( $Kc eq $K_closing_bracket );
+ my $K_p = $self->K_previous_code($Ko);
+ return unless ( $rLL->[$K_p]->[_TOKEN_] eq '$_' );
+ return 1;
+ }; ## end $is_dollar_underscore_zero = sub
- # Exception for a special edge case for option 'i': if the trailing
- # comma is followed by a blank line or comment, then it cannot be
- # covered. Then we can safely accept a small list to avoid
- # instability (issue b1443).
- if ( $trailing_comma_style eq 'i'
- && $iline_c - $rLL_new->[$Kp]->[_LINE_INDEX_] > 1
- && $new_comma_count <= 2 )
- {
- $match = 1;
- }
- else {
- return;
- }
+ #----------------------------------------------
+ # Loop over sequence numbers of all call parens
+ #----------------------------------------------
+ # parens are of the form f( ->f( &f( where 'f' is a bareword
+ # ^ ^ ^
+ # Note that we do not handle anonymous subs because it is not possible to
+ # connect them to the actual sub definition.
+ foreach
+ my $seqno ( sort { $a <=> $b } keys %{$rsub_call_paren_info_by_seqno} )
+ {
+
+ # update the current package
+ my $Ko = $K_opening_container->{$seqno};
+ while ( $Ko > $Kend && @package_stack ) {
+ ( $current_package, $Kbegin, $Kend ) = @{ pop @package_stack };
}
- # a list of key=>value pairs with at least 2 fat commas is a match
- # for both 'h' and 'i'
- my $fat_comma_count = $rtype_count->{'=>'};
- if ( !$match && $fat_comma_count && $fat_comma_count >= 2 ) {
+ # get the next call list
+ my $item = $rsub_call_paren_info_by_seqno->{$seqno};
+ my $name = $item->{token_m};
+ my $type_mm = $item->{type_mm};
- # comma count (including trailer) and fat comma count must differ by
- # by no more than 1. This allows for some small variations.
- my $comma_diff = $new_comma_count - $fat_comma_count;
- $match = ( $comma_diff >= -1 && $comma_diff <= 1 );
- }
+ # find function and package
+ my $is_ampersand_call;
- # For 'i' only, a list that can be shown to be stable is a match
- if ( !$match && $trailing_comma_style eq 'i' ) {
- $match = (
- $is_permanently_broken
- || ( $rOpts_break_at_old_comma_breakpoints
- && !$rOpts_ignore_old_breakpoints )
- );
+ # name will be like '&function' for an & call
+ if ( substr( $name, 0, 1 ) eq '&' ) {
+ $is_ampersand_call = 1;
+ $name = substr( $name, 1 );
}
- }
-
- #-------------------------------------------------------------------------
- # Unrecognized parameter. This should have been caught in the input check.
- #-------------------------------------------------------------------------
- else {
- DEVEL_MODE && Fault("Unrecognized parameter '$trailing_comma_style'\n");
+ my $call_type = $is_ampersand_call ? '&' : EMPTY_STRING;
- # do not add or delete
- return !$if_add;
- }
+ my $caller_name = EMPTY_STRING;
+ my $class_name = EMPTY_STRING;
+ if ( $type_mm eq '->' ) {
+ $call_type = '->';
+ my $K_m = $self->K_previous_code($Ko);
+ my $K_mm = $self->K_previous_code($K_m);
+ my $K_mmm = $self->K_previous_code($K_mm);
+ if ( defined($K_mmm) ) {
+ my $type_mmm = $rLL->[$K_mmm]->[_TYPE_];
+ my $token_mmm = $rLL->[$K_mmm]->[_TOKEN_];
+ if ( $type_mmm eq 'i' ) {
+ $caller_name = $token_mmm;
+ }
+ elsif ( $type_mmm eq 'w' ) {
- # Now do any special paren check
- if ( $match
- && $paren_flag
- && $paren_flag ne '1'
- && $paren_flag ne '*'
- && $closing_token eq ')' )
- {
- $match &&=
- $self->match_paren_control_flag( $type_sequence, $paren_flag,
- $rLL_new );
- }
+ ## A::B->do_something( $var1, $var2 );
+ ## wwww->iiiiiiiiiiii{ iiiii, iiiii };
+ if ( index( $token_mmm, '::' ) >= 0 ) {
+ $class_name = $token_mmm;
+ $class_name =~ s/::$//;
+ }
+ }
+ elsif ( $token_mmm eq ']' ) {
+ if ( $is_dollar_underscore_zero->($K_mmm) ) {
+ $caller_name = '$_[0]';
+ }
+ }
+ else { }
+ }
+ }
- # Fix for b1379, b1380, b1381, b1382, b1384 part 1. Mark trailing commas
- # for use by -vtc logic to avoid instability when -dtc and -atc are both
- # active.
- if ($match) {
- if ( $if_add && $rOpts_delete_trailing_commas
- || !$if_add && $rOpts_add_trailing_commas )
+ # look for explicit package on name
+ my $package = $current_package;
+ if ( ( index( $name, ':' ) >= 0 || index( $name, "'" ) >= 0 )
+ && $name =~ /^(.*\W)(\w+)$/ )
{
- $self->[_ris_bare_trailing_comma_by_seqno_]->{$type_sequence} = 1;
+ $package = $1;
+ $name = $2;
+ $package =~ s/\'/::/g;
+ $package =~ s/::$//;
+ }
+ else {
+ if ($class_name) {
+ $package = $class_name;
+ }
+ }
+ if ( !$package ) { $package = 'main' }
- # The combination of -atc and -dtc and -cab=3 can be unstable
- # (b1394). So we deactivate -cab=3 in this case.
- # A value of '0' or '4' is required for stability of case b1451.
- if ( $rOpts_comma_arrow_breakpoints == 3 ) {
- $self->[_roverride_cab3_]->{$type_sequence} = 0;
+ # count the args
+ my $rtype_count = $rtype_count_by_seqno->{$seqno};
+ my $arg_count = 0;
+ if ($rtype_count) {
+ my $comma_count = $rtype_count->{','};
+ my $fat_comma_count = $rtype_count->{'=>'};
+ if ($comma_count) { $arg_count += $comma_count }
+ if ($fat_comma_count) { $arg_count += $fat_comma_count }
+ }
+
+ # The comma count does not include any trailing comma, so add 1..
+ if ( !$arg_count ) {
+
+ # ..but not if parens are empty
+ my $Kc = $K_closing_container->{$seqno};
+ my $Kn = $Ko + 1;
+ if ( $Kn < $Kc ) {
+ my $type_n = $rLL->[$Kn]->[_TYPE_];
+ if ( $type_n eq 'b' ) {
+ $Kn += 1;
+ $type_n = $rLL->[$Kn]->[_TYPE_];
+ }
+ if ( $type_n eq '#' ) {
+ $Kn = $self->K_next_code($Ko);
+ }
+ if ( $Kn != $Kc ) { $arg_count += 1 }
}
}
+ else {
+ $arg_count += 1;
+ }
+
+ # The arg count is undefined if there are non-scalars in the list
+ $item->{seqno_list} = $seqno;
+ if ($arg_count) {
+ $item->{is_signature} = 0;
+ $item->{shift_count_min} = 0;
+ $item->{self_name} = EMPTY_STRING;
+ $self->count_list_elements($item);
+ $arg_count = $item->{shift_count_min};
+ }
+
+ # get the return count expected for this call by scanning to the left
+ $self->count_return_values_wanted($item);
+
+ # update the hash of info for this item
+ my $line_number = $rLL->[$Ko]->[_LINE_INDEX_] + 1;
+ $item->{arg_count} = $arg_count;
+ $item->{package} = $package;
+ $item->{name} = $name;
+ $item->{line_number} = $line_number;
+ $item->{call_type} = $call_type;
+ $item->{caller_name} = $caller_name;
+ $item->{class_name} = $class_name;
}
- return $match;
-} ## end sub match_trailing_comma_rule
+ return;
+} ## end sub update_sub_call_paren_info
-sub store_new_token {
+{
+ #-----------------------------------------------------
+ # Sub to look at first use of $self in a specified sub
+ #-----------------------------------------------------
+ my %self_call_cache;
+ my %is_oo_call_cache;
- my ( $self, $type, $token, $Kp ) = @_;
+ sub initialize_self_call_cache {
+ my $self = shift;
- # Create and insert a completely new token into the output stream
+ # must be called once per file before first call to sub self_call_check
+ %self_call_cache = ();
+ %is_oo_call_cache = ();
+ return;
+ } ## end sub initialize_self_call_cache
- # Input parameters:
- # $type = the token type
- # $token = the token text
- # $Kp = index of the previous token in the new list, $rLL_new
+ sub self_call_check {
+ my ( $self, $seqno_sub ) = @_;
- # Returns:
- # $Knew = index in $rLL_new of the new token
+ # Try to decide if a sub call with '$self->' is a call to an
+ # internal sub by looking at the first '$self' usage.
- # This operation is a little tricky because we are creating a new token and
- # we have to take care to follow the requested whitespace rules.
+ # Given:
+ # $seqno_sub = sequence number of sub to be checked
+ # Return:
+ # $is_self_call = true if this is an internal $self-> call
+ # based on the first $self in the sub.
+ # $is_oo_call = true if a call '$self->' appears to be
+ # within an OO framework which hides the $self arg.
+ # This uses the variable _rK_first_self_by_sub_seqno_ which
+ # is set by sub respace_tokens.
- my $Ktop = @{$rLL_new} - 1;
- my $top_is_space = $Ktop >= 0 && $rLL_new->[$Ktop]->[_TYPE_] eq 'b';
- my $Knew;
- if ( $top_is_space && $want_left_space{$type} == WS_NO ) {
+ my $is_self_call = $self_call_cache{$seqno_sub};
+ my $is_oo_call = $is_oo_call_cache{$seqno_sub};
- #----------------------------------------------------
- # Method 1: Convert the top blank into the new token.
- #----------------------------------------------------
+ if ( !defined($is_self_call) ) {
+ $is_self_call = 0;
+ $is_oo_call = 0;
- # Be Careful: we are working on the top of the new stack, on a token
- # which has been stored.
+ my $rLL = $self->[_rLL_];
+ my $K_first_self =
+ $self->[_rK_first_self_by_sub_seqno_]->{$seqno_sub};
- my $rcopy = copy_token_as_type( $rLL_new->[$Ktop], 'b', SPACE );
+ # an index K stored by respace_tokens may be 1 low
+ $K_first_self++
+ if ( $K_first_self
+ && $rLL->[$K_first_self]->[_TYPE_] eq 'b' );
- $Knew = $Ktop;
- $rLL_new->[$Knew]->[_TOKEN_] = $token;
- $rLL_new->[$Knew]->[_TOKEN_LENGTH_] = length($token);
- $rLL_new->[$Knew]->[_TYPE_] = $type;
+ my $Kn = $self->K_next_code($K_first_self);
+ my $type_n = $Kn ? $rLL->[$Kn]->[_TYPE_] : 'b';
- # NOTE: we are changing the output stack without updating variables
- # $last_nonblank_code_type, etc. Future needs might require that
- # those variables be updated here. For now, we just update the
- # type counts as necessary.
+ #-----------------------------------------
+ # Try 3a. if "$self->" then assume OO call
+ #-----------------------------------------
+ if ( $type_n eq '->' ) {
+ $is_self_call = 1;
- if ( $is_counted_type{$type} ) {
- my $seqno = $seqno_stack{ $depth_next - 1 };
- if ($seqno) {
- $self->[_rtype_count_by_seqno_]->{$seqno}->{$type}++;
+ # Also set a flag to reduce the call arg count by 1
+ # because it looks this is an OO system which
+ # hides the $self call arg.
+ # NOTE: to be sure, we could scan all sub args
+ # in advance to check that all first sub args
+ # are not named $self
+ $is_oo_call = 1;
}
- }
- # Then store a new blank
- $self->store_token($rcopy);
- }
- else {
+ #--------------------------
+ # Try 3b. "$self = bless"
+ #--------------------------
+ elsif ( $type_n eq '=' ) {
+ my $Knn = $self->K_next_code($Kn);
+ $is_self_call = $Knn && $rLL->[$Knn]->[_TOKEN_] eq 'bless';
+ }
- #----------------------------------------
- # Method 2: Use the normal storage method
- #----------------------------------------
+ # none of the above
+ else { }
- # Patch for issue c078: keep line indexes in order. If the top
- # token is a space that we are keeping (due to '-wls=...) then
- # we have to check that old line indexes stay in order.
- # In very rare
- # instances in which side comments have been deleted and converted
- # into blanks, we may have filtered down multiple blanks into just
- # one. In that case the top blank may have a higher line number
- # than the previous nonblank token. Although the line indexes of
- # blanks are not really significant, we need to keep them in order
- # in order to pass error checks.
- if ($top_is_space) {
- my $old_top_ix = $rLL_new->[$Ktop]->[_LINE_INDEX_];
- my $new_top_ix = $rLL_new->[$Kp]->[_LINE_INDEX_];
- if ( $new_top_ix < $old_top_ix ) {
- $rLL_new->[$Ktop]->[_LINE_INDEX_] = $new_top_ix;
- }
+ $self_call_cache{$seqno_sub} = $is_self_call;
+ $is_oo_call_cache{$seqno_sub} = $is_oo_call;
}
+ return ( $is_self_call, $is_oo_call );
+ } ## end sub self_call_check
+}
- my $rcopy = copy_token_as_type( $rLL_new->[$Kp], $type, $token );
- $self->store_token($rcopy);
- $Knew = @{$rLL_new} - 1;
- }
- return $Knew;
-} ## end sub store_new_token
+use constant DEBUG_SELF => 0;
-sub check_Q {
+sub cross_check_sub_calls {
- # Check that a quote looks okay, and report possible problems
- # to the logfile.
+ my ($self) = @_;
- my ( $self, $KK, $Kfirst, $line_number ) = @_;
- my $token = $rLL->[$KK]->[_TOKEN_];
- if ( $token =~ /\t/ ) {
- $self->note_embedded_tab($line_number);
- }
+ # This routine looks for issues for these parameters:
+ # --dump-mismatched-args
+ # --warn-mismatched-args
+ # --dump-mismatched-returns
+ # --warn-mismatched-returns
- # The remainder of this routine looks for something like
- # '$var = s/xxx/yyy/;'
- # in case it should have been '$var =~ s/xxx/yyy/;'
+ # It returns a hash of values with any warnings found
- # Start by looking for a token beginning with one of: s y m / tr
- return
- unless ( $is_s_y_m_slash{ substr( $token, 0, 1 ) }
- || substr( $token, 0, 2 ) eq 'tr' );
+ my $rLL = $self->[_rLL_];
- # ... and preceded by one of: = == !=
- my $Kp = $self->K_previous_nonblank( undef, $rLL_new );
- return unless ( defined($Kp) );
- my $previous_nonblank_type = $rLL_new->[$Kp]->[_TYPE_];
- return unless ( $is_unexpected_equals{$previous_nonblank_type} );
- my $previous_nonblank_token = $rLL_new->[$Kp]->[_TOKEN_];
+ # The mismatched-args checks are indicated by these letters:
+ # a = both method and non-method calls to a sub
+ # - even for two subs in a different package
+ # o = overcount: call arg counts exceed number expected by a sub
+ # u = undercount: call arg counts less than number expected by a sub
+ # - except if expecting N or less (N=4 by default)
+ # i = indeterminate: expected number of args was not determined
+ my %call_arg_issue_note = (
+ a => "both method and non-method calls to a sub",
+ o => "excess args passed",
+ u => "fewer args than expected passed",
+ i => "indeterminate sub arg count",
+ );
+ my %do_mismatched_call_type = %call_arg_issue_note;
+ my $mismatched_arg_undercount_cutoff = 0;
+ my $mismatched_arg_overcount_cutoff = 0;
+ my $ris_mismatched_call_excluded_name = {};
+
+ # The mismatched-returns checks are indicated by these letters:
+ my %return_issue_note = (
+ x => "want array but no return seen",
+ y => "want scalar but no return seen",
+ o => "want array with excess count",
+ u => "want array with count not matched by sub",
+ s => "want scalar but sub only returns arrays with count >1",
+ );
+ my %do_mismatched_return_type = %return_issue_note;
+ my $ris_mismatched_return_excluded_name = {};
- my $previous_nonblank_type_2 = 'b';
- my $previous_nonblank_token_2 = EMPTY_STRING;
- my $Kpp = $self->K_previous_nonblank( $Kp, $rLL_new );
- if ( defined($Kpp) ) {
- $previous_nonblank_type_2 = $rLL_new->[$Kpp]->[_TYPE_];
- $previous_nonblank_token_2 = $rLL_new->[$Kpp]->[_TOKEN_];
- }
+ # initialize a cache used for efficiency
+ $self->initialize_self_call_cache();
- my $next_nonblank_token = EMPTY_STRING;
- my $Kn = $KK + 1;
- my $Kmax = @{$rLL} - 1;
- if ( $Kn <= $Kmax && $rLL->[$Kn]->[_TYPE_] eq 'b' ) { $Kn += 1 }
- if ( $Kn <= $Kmax ) {
- $next_nonblank_token = $rLL->[$Kn]->[_TOKEN_];
- }
+ my $is_dump =
+ $rOpts->{'dump-mismatched-args'} || $rOpts->{'dump-mismatched-returns'};
- my $token_0 = $rLL->[$Kfirst]->[_TOKEN_];
- my $type_0 = $rLL->[$Kfirst]->[_TYPE_];
+ # initialize if not in a dump mode
+ if ( !$is_dump ) {
- if (
+ %do_mismatched_call_type = %{$rwarn_mismatched_arg_types};
+ $mismatched_arg_undercount_cutoff =
+ $rOpts->{'warn-mismatched-arg-undercount-cutoff'};
+ $mismatched_arg_overcount_cutoff =
+ $rOpts->{'warn-mismatched-arg-overcount-cutoff'};
+ $ris_mismatched_call_excluded_name =
+ $ris_warn_mismatched_arg_excluded_name;
- # preceded by simple scalar
- $previous_nonblank_type_2 eq 'i'
- && $previous_nonblank_token_2 =~ /^\$/
+ %do_mismatched_return_type = %{$rwarn_mismatched_return_types};
+ $ris_mismatched_return_excluded_name =
+ $ris_warn_mismatched_return_excluded_name;
+ }
- # followed by some kind of termination
- # (but give complaint if we can not see far enough ahead)
- && $next_nonblank_token =~ /^[; \)\}]$/
+ # hardwired name exclusions
+ $ris_mismatched_call_excluded_name->{AUTOLOAD} = 1;
+ $ris_mismatched_call_excluded_name->{DESTROY} = 1;
- # scalar is not declared
- ## =~ /^(my|our|local)$/
- && !( $type_0 eq 'k' && $is_my_our_local{$token_0} )
- )
- {
- my $lno = 1 + $rLL_new->[$Kp]->[_LINE_INDEX_];
- my $guess = substr( $previous_nonblank_token, 0, 1 ) . '~';
- complain(
-"Line $lno: Note: be sure you want '$previous_nonblank_token' instead of '$guess' here\n"
- );
- }
- return;
-} ## end sub check_Q
+ my $K_opening_container = $self->[_K_opening_container_];
+ my $rK_package_list = $self->[_rK_package_list_];
+ my $ris_my_sub_by_seqno = $self->[_ris_my_sub_by_seqno_];
+ my $rsub_call_paren_info_by_seqno =
+ $self->[_rsub_call_paren_info_by_seqno_];
+ my $rK_bless_by_sub_seqno = $self->[_rK_bless_by_sub_seqno_];
-} ## end closure respace_tokens
+ #----------------------------
+ # Make a package lookup table
+ #----------------------------
+ my $rpackage_lists = $self->package_info_maker($rK_package_list);
+ my $rpackage_lookup_list = $rpackage_lists->{'rpackage_lookup_list'};
-sub copy_token_as_type {
+ #-------------------------------------------
+ # Update sub call paren info with arg counts
+ #-------------------------------------------
+ $self->update_sub_call_paren_info($rpackage_lookup_list);
- # This provides a quick way to create a new token by
- # slightly modifying an existing token.
- my ( $rold_token, $type, $token ) = @_;
- if ( !defined($token) ) {
- if ( $type eq 'b' ) {
- $token = SPACE;
- }
- elsif ( $type eq 'q' ) {
- $token = EMPTY_STRING;
- }
- elsif ( $type eq '->' ) {
- $token = '->';
+ #----------------------------------
+ # Preliminary min and max call args
+ #----------------------------------
+ # This is preliminary because some of the calls will eventually be
+ # rejected if they appear to be to external objects. This info is
+ # needed to optimize the sub arg search in the case of zero args.
+ my %upper_bound_call_info;
+ foreach my $seqno ( keys %{$rsub_call_paren_info_by_seqno} ) {
+ my $rcall_item = $rsub_call_paren_info_by_seqno->{$seqno};
+
+ my $call_type = $rcall_item->{call_type};
+ my $package = $rcall_item->{package};
+ my $name = $rcall_item->{name};
+ my $arg_count = $rcall_item->{arg_count};
+ my $key = $package . '::' . $name;
+
+ next unless defined($arg_count);
+ if ( $call_type eq '->' ) {
+ $arg_count += 1;
+ $upper_bound_call_info{$key}->{method_call_count}++;
}
- elsif ( $type eq ';' ) {
- $token = ';';
+ else {
+ $upper_bound_call_info{$key}->{direct_call_count}++;
}
- elsif ( $type eq ',' ) {
- $token = ',';
+ my $max = $upper_bound_call_info{$key}->{max_arg_count};
+ my $min = $upper_bound_call_info{$key}->{min_arg_count};
+ if ( !defined($max) || $arg_count > $max ) {
+ $upper_bound_call_info{$key}->{max_arg_count} = $arg_count;
}
- else {
-
- # Unexpected type ... this sub will work as long as both $token and
- # $type are defined, but we should catch any unexpected types during
- # development.
- if (DEVEL_MODE) {
- Fault(<<EOM);
-sub 'copy_token_as_type' received token type '$type' but expects just one of: 'b' 'q' '->' or ';'
-EOM
- }
-
- # Shouldn't get here
- $token = $type;
+ if ( !defined($min) || $arg_count < $min ) {
+ $upper_bound_call_info{$key}->{min_arg_count} = $arg_count;
}
}
- my @rnew_token = @{$rold_token};
- $rnew_token[_TYPE_] = $type;
- $rnew_token[_TOKEN_] = $token;
- $rnew_token[_TYPE_SEQUENCE_] = EMPTY_STRING;
- return \@rnew_token;
-} ## end sub copy_token_as_type
-
-sub K_next_code {
- my ( $self, $KK, $rLL ) = @_;
+ #-----------------------------------
+ # Get arg counts for sub definitions
+ #-----------------------------------
+ my ( $rsub_info_by_seqno, $rsub_seqno_by_key ) =
+ $self->sub_def_info_maker( $rpackage_lookup_list,
+ \%upper_bound_call_info );
- # return the index K of the next nonblank, non-comment token
- return unless ( defined($KK) && $KK >= 0 );
+ # Hash to hold combined info for subs and calls
+ my %common_hash;
- # use the standard array unless given otherwise
- $rLL = $self->[_rLL_] unless ( defined($rLL) );
- my $Num = @{$rLL};
- my $Knnb = $KK + 1;
- while ( $Knnb < $Num ) {
- if ( !defined( $rLL->[$Knnb] ) ) {
-
- # We seem to have encountered a gap in our array.
- # This shouldn't happen because sub write_line() pushed
- # items into the $rLL array.
- Fault("Undefined entry for k=$Knnb") if (DEVEL_MODE);
- return;
+ #---------------------------------------------
+ # First split the calls into direct and method
+ #---------------------------------------------
+ my @method_call_seqnos;
+ foreach my $seqno ( keys %{$rsub_call_paren_info_by_seqno} ) {
+ my $rcall_item = $rsub_call_paren_info_by_seqno->{$seqno};
+ my $package = $rcall_item->{package};
+ my $name = $rcall_item->{name};
+ my $key = $package . '::' . $name;
+ if ( $rcall_item->{call_type} eq '->' ) {
+ push @method_call_seqnos, $seqno;
}
- if ( $rLL->[$Knnb]->[_TYPE_] ne 'b'
- && $rLL->[$Knnb]->[_TYPE_] ne '#' )
- {
- return $Knnb;
+ else {
+ push @{ $common_hash{$key}->{direct_calls} }, $rcall_item;
}
- $Knnb++;
}
- return;
-} ## end sub K_next_code
-sub K_next_nonblank {
- my ( $self, $KK, $rLL ) = @_;
+ #----------------------------------------------
+ # Now split method calls into self and external
+ #----------------------------------------------
+ my @debug_warnings;
+ foreach my $seqno (@method_call_seqnos) {
+ my $rcall_item = $rsub_call_paren_info_by_seqno->{$seqno};
+ my $package = $rcall_item->{package};
+ my $name = $rcall_item->{name};
+ my $caller_name = $rcall_item->{caller_name};
+ my $class_name = $rcall_item->{class_name};
+ my $key_receiver_sub = $package . '::' . $name;
+ my $is_self_call;
+
+ # Find the sub which contains this call
+ my $seqno_sub_parent = $self->parent_sub_seqno($seqno);
+ if ($seqno_sub_parent) {
+ my $item = $rsub_info_by_seqno->{$seqno_sub_parent};
+ if ($item) {
+
+ my $key_parent_sub = $item->{package} . '::' . $item->{name};
+ my $parent_self_name = $item->{self_name};
+ my $caller_is_dollar_self = $caller_name eq '$self';
+
+ # Decide if this method call is to an internal sub:
+ # Try 1 and Try 2 are general, for any object name
+ # Try 3 and Try 4 are guesses for common uses of '$self'
+
+ #------------------------------------------------
+ # Try 1: Parent sub self name matches caller name
+ #------------------------------------------------
+ if ($parent_self_name) {
+
+ # and the only calls to parent sub (if any) are arrow calls.
+ if (
+ $parent_self_name eq $caller_name
+ && ( !$common_hash{$key_parent_sub}->{direct_calls}
+ || $caller_is_dollar_self )
+ )
+ {
+ $is_self_call = 1;
+ }
+ }
- # return the index K of the next nonblank token, or
- # return undef if none
- return unless ( defined($KK) && $KK >= 0 );
+ #---------------------------------------------------------
+ # Try 2. See if the name was blessed in the containing sub
+ #---------------------------------------------------------
+ if ( !$is_self_call ) {
+ my $item_self = $item->{self_name};
+ $item_self = 'undef' unless $item_self;
+ my $rK_bless_list =
+ $rK_bless_by_sub_seqno->{$seqno_sub_parent};
+ if ($rK_bless_list) {
+ my $Ko = $K_opening_container->{$seqno};
+ foreach my $blessing ( @{$rK_bless_list} ) {
+
+ # Index K and blessed name were stored with sub.
+ # $K_blessed may be 1 token before K of '$self'
+ my ( $K_blessed, $name_blessed ) = @{$blessing};
+
+ # name of blessed object must match
+ next if ( $name_blessed ne $caller_name );
+
+ # keyword 'bless' must be at top sub level. We have
+ # to back up 1 token in case $self is in parens.
+ my $Kp = $self->K_previous_code($K_blessed);
+ next if ( !$Kp );
+ my $parent_seqno = $self->parent_seqno_by_K($Kp);
+ next
+ if (!$parent_seqno
+ || $parent_seqno != $seqno_sub_parent );
- # The third arg allows this routine to be used on any array. This is
- # useful in sub respace_tokens when we are copying tokens from an old $rLL
- # to a new $rLL array. But usually the third arg will not be given and we
- # will just use the $rLL array in $self.
- $rLL = $self->[_rLL_] unless ( defined($rLL) );
- my $Num = @{$rLL};
- my $Knnb = $KK + 1;
- return unless ( $Knnb < $Num );
- return $Knnb if ( $rLL->[$Knnb]->[_TYPE_] ne 'b' );
- return unless ( ++$Knnb < $Num );
- return $Knnb if ( $rLL->[$Knnb]->[_TYPE_] ne 'b' );
-
- # Backup loop. Very unlikely to get here; it means we have neighboring
- # blanks in the token stream.
- $Knnb++;
- while ( $Knnb < $Num ) {
-
- # Safety check, this fault shouldn't happen: The $rLL array is the
- # main array of tokens, so all entries should be used. It is
- # initialized in sub write_line, and then re-initialized by sub
- # store_token() within sub respace_tokens. Tokens are pushed on
- # so there shouldn't be any gaps.
- if ( !defined( $rLL->[$Knnb] ) ) {
- Fault("Undefined entry for k=$Knnb") if (DEVEL_MODE);
- return;
- }
- if ( $rLL->[$Knnb]->[_TYPE_] ne 'b' ) { return $Knnb }
- $Knnb++;
- }
- return;
-} ## end sub K_next_nonblank
+ # bless must be before the call
+ next if ( $K_blessed > $Ko );
-sub K_previous_code {
+ $is_self_call = 1;
+ last;
+ }
+ }
+ }
- # return the index K of the previous nonblank, non-comment token
- # Call with $KK=undef to start search at the top of the array
- my ( $self, $KK, $rLL ) = @_;
+ #-------------------------------------------------------
+ # Try 3. Caller is '$self'; look at first '$self' in sub
+ #-------------------------------------------------------
+ if ( !$is_self_call && $caller_is_dollar_self ) {
+ ( $is_self_call, $rcall_item->{is_oo_call} ) =
+ $self->self_call_check($seqno_sub_parent);
+ }
+
+ #-------------------------------------------------------------
+ # Try 4. caller is '$self': receiver='$self', '$class', '$_[0]'
+ #-------------------------------------------------------------
+ if ( !$is_self_call && $caller_is_dollar_self ) {
+ my $seqno_sub_called =
+ $rsub_seqno_by_key->{$key_receiver_sub};
+ if ($seqno_sub_called) {
+ my $item_called =
+ $rsub_info_by_seqno->{$seqno_sub_called};
+ my $receiver = $item_called->{self_name};
+
+ #------------------------------------------------
+ # Try 4a: receiver has some recognized self names
+ #------------------------------------------------
+ if (
+ $receiver
+ && ( $receiver eq $caller_name
+ || $receiver eq '$class'
+ || $receiver eq '$_[0]' )
+ )
+ {
+ $is_self_call = 1;
+ }
- # use the standard array unless given otherwise
- $rLL = $self->[_rLL_] unless ( defined($rLL) );
- my $Num = @{$rLL};
- if ( !defined($KK) ) { $KK = $Num }
- elsif ( $KK > $Num ) {
+ #-----------------------------------
+ # Try 4b: check for a recursive call
+ #-----------------------------------
+ else {
+ $is_self_call =
+ $seqno_sub_called == $seqno_sub_parent;
+ }
+ }
+ }
- # This fault can be caused by a programming error in which a bad $KK is
- # given. The caller should make the first call with KK_new=undef to
- # avoid this error.
- Fault(
-"Program Bug: K_previous_nonblank_new called with K=$KK which exceeds $Num"
- ) if (DEVEL_MODE);
- return;
- }
- my $Kpnb = $KK - 1;
- while ( $Kpnb >= 0 ) {
- if ( $rLL->[$Kpnb]->[_TYPE_] ne 'b'
- && $rLL->[$Kpnb]->[_TYPE_] ne '#' )
- {
- return $Kpnb;
- }
- $Kpnb--;
- }
- return;
-} ## end sub K_previous_code
+ if ( DEBUG_SELF
+ && !$is_self_call
+ && $caller_is_dollar_self
+ && $seqno_sub_parent )
+ {
+ my $Ko_sub = $K_opening_container->{$seqno_sub_parent};
+ my $ln_parent = $rLL->[$Ko_sub]->[_LINE_INDEX_] + 1;
+ my $Ko = $K_opening_container->{$seqno};
+ my $ln = $rLL->[$Ko]->[_LINE_INDEX_] + 1;
+ my $parent_self = $item->{self_name};
+ my $receiver_self = 'missing';
+ my $ln_receiver = 'undef';
+ my $seqno_sub_called =
+ $rsub_seqno_by_key->{$key_receiver_sub};
+
+ if ($seqno_sub_called) {
+ my $item_called =
+ $rsub_info_by_seqno->{$seqno_sub_called};
+ $receiver_self = $item_called->{self_name};
+ my $Ko_receiver =
+ $K_opening_container->{$seqno_sub_called};
+ $ln_receiver = $rLL->[$Ko_receiver]->[_LINE_INDEX_] + 1;
+ }
-sub K_previous_nonblank {
+ # use DEBUG_SELF=3 to see missing subs
+ else {
+ next if ( DEBUG_SELF < 3 );
+ }
- # return index of previous nonblank token before item K;
- # Call with $KK=undef to start search at the top of the array
- my ( $self, $KK, $rLL ) = @_;
+ # use DEBUG_SELF=2 to see undef-self-undef
+ next
+ if ( DEBUG_SELF < 2 && !$parent_self && !$receiver_self );
+ if ( !$parent_self ) { $parent_self = 'undef' }
+ if ( !$receiver_self ) { $receiver_self = 'undef' }
+ push @debug_warnings,
+ {
+ Ko => $Ko,
+ caller_name => $caller_name,
+ parent_self => $parent_self,
+ receiver_self => $receiver_self,
+ sub_called => $name,
+ line_number => $ln,
+ ln_parent => $ln_parent,
+ ln_receiver => $ln_receiver,
+ };
+ }
+ }
+ }
- # use the standard array unless given otherwise
- $rLL = $self->[_rLL_] unless ( defined($rLL) );
- my $Num = @{$rLL};
- if ( !defined($KK) ) { $KK = $Num }
- elsif ( $KK > $Num ) {
+ # Save this method call as either an internal (self) or external call
+ if ($is_self_call) {
+ push @{ $common_hash{$key_receiver_sub}->{self_calls} },
+ $rcall_item;
+ }
+ else {
- # This fault can be caused by a programming error in which a bad $KK is
- # given. The caller should make the first call with KK_new=undef to
- # avoid this error.
- Fault(
-"Program Bug: K_previous_nonblank_new called with K=$KK which exceeds $Num"
- ) if (DEVEL_MODE);
- return;
+ # mark calls made by unknown (non-self) objects, we can't track
+ # them, but we can track calls at the class level.
+ if ( !$class_name ) {
+ $rcall_item->{is_unknown_object_call} = 1;
+ }
+ }
}
- my $Kpnb = $KK - 1;
- return unless ( $Kpnb >= 0 );
- return $Kpnb if ( $rLL->[$Kpnb]->[_TYPE_] ne 'b' );
- return unless ( --$Kpnb >= 0 );
- return $Kpnb if ( $rLL->[$Kpnb]->[_TYPE_] ne 'b' );
- # Backup loop. We should not get here unless some routine
- # slipped repeated blanks into the token stream.
- return unless ( --$Kpnb >= 0 );
- while ( $Kpnb >= 0 ) {
- if ( $rLL->[$Kpnb]->[_TYPE_] ne 'b' ) { return $Kpnb }
- $Kpnb--;
+ if ( DEBUG_SELF && @debug_warnings ) {
+ @debug_warnings = sort { $a->{Ko} <=> $b->{Ko} } @debug_warnings;
+ my $output_string = EMPTY_STRING;
+ foreach my $item (@debug_warnings) {
+## my $caller_name = $item->{caller_name};
+ my $parent_self = $item->{parent_self};
+ my $receiver_self = $item->{receiver_self};
+ my $sub_called = $item->{sub_called};
+ my $line_number = $item->{line_number};
+ my $ln_parent = $item->{ln_parent};
+ my $ln_receiver = $item->{ln_receiver};
+ $output_string .=
+"$line_number: \$self->$sub_called in parent line $ln_parent with self=$parent_self to receiver line $ln_receiver with self=$receiver_self\n";
+ }
+ warning($output_string);
}
- return;
-} ## end sub K_previous_nonblank
-
-sub parent_seqno_by_K {
- # Return the sequence number of the parent container of token K, if any.
+ #-------------------------------
+ # Loop to merge prototype counts
+ #-------------------------------
+ foreach my $key ( keys %common_hash ) {
+ my $seqno_sub = $rsub_seqno_by_key->{$key};
+ next if ( !defined($seqno_sub) );
+ my $rsub_item = $rsub_info_by_seqno->{$seqno_sub};
+ next if ( !$rsub_item->{prototype} );
+ my $item = $common_hash{$key};
+ my $rdirect_calls = $item->{direct_calls};
+ my $rself_calls = $item->{self_calls};
+ my $num_direct = defined($rdirect_calls) ? @{$rdirect_calls} : 0;
+ my $num_self = defined($rself_calls) ? @{$rself_calls} : 0;
- my ( $self, $KK ) = @_;
- my $rLL = $self->[_rLL_];
+ # Use prototype values if given and all calls are direct
+ # Otherwise, ignore the prototype.
+ next if ($num_self);
+ next if ( !$num_direct );
- # The task is to jump forward to the next container token
- # and use the sequence number of either it or its parent.
+ my $shift_count_min = $rsub_item->{prototype_count_min};
+ my $shift_count_max = $rsub_item->{prototype_count_max};
+ if ($num_self) {
+ if ( defined($shift_count_min) ) { $shift_count_min++ }
+ if ( defined($shift_count_max) ) { $shift_count_max++ }
+ }
- # For example, consider the following with seqno=5 of the '[' and ']'
- # being called with index K of the first token of each line:
+ # For calls with '&' to subs with prototypes, use the upper bound of
+ # the prototype max and the max found by scanning the script.
+ my $shift_count_max_amp = $shift_count_max;
+ if ( defined($shift_count_max) ) {
+ my $standard_max = $rsub_item->{shift_count_max};
+ if ( !defined($standard_max) || $standard_max > $shift_count_max ) {
+ $shift_count_max_amp = $standard_max;
+ }
+ }
+ $rsub_item->{shift_count_max_amp} = $shift_count_max_amp;
- # # result
- # push @tests, # -
- # [ # -
- # sub { 99 }, 'do {&{%s} for 1,2}', # 5
- # '(&{})(&{})', undef, # 5
- # [ 2, 2, 0 ], 0 # 5
- # ]; # -
+ # overwrite values found by scanning the script with prototype values
+ $rsub_item->{shift_count_min} = $shift_count_min;
+ $rsub_item->{shift_count_max} = $shift_count_max;
- # NOTE: The ending parent will be SEQ_ROOT for a balanced file. For
- # unbalanced files, last sequence number will either be undefined or it may
- # be at a deeper level. In either case we will just return SEQ_ROOT to
- # have a defined value and allow formatting to proceed.
- my $parent_seqno = SEQ_ROOT;
- my $type_sequence = $rLL->[$KK]->[_TYPE_SEQUENCE_];
- if ($type_sequence) {
- $parent_seqno = $self->[_rparent_of_seqno_]->{$type_sequence};
}
- else {
- my $Kt = $rLL->[$KK]->[_KNEXT_SEQ_ITEM_];
- if ( defined($Kt) ) {
- $type_sequence = $rLL->[$Kt]->[_TYPE_SEQUENCE_];
- my $type = $rLL->[$Kt]->[_TYPE_];
- # if next container token is closing, it is the parent seqno
- if ( $is_closing_type{$type} ) {
- $parent_seqno = $type_sequence;
+ #--------------------------------------------------------------
+ # Loop over all sub calls to compare call and return arg counts
+ #--------------------------------------------------------------
+ foreach my $seqno ( keys %{$rsub_call_paren_info_by_seqno} ) {
+
+ my $rcall_item = $rsub_call_paren_info_by_seqno->{$seqno};
+
+ # Skip method calls by unknown objects
+ next if ( $rcall_item->{is_unknown_object_call} );
+
+ my $arg_count = $rcall_item->{arg_count};
+ my $return_count_wanted = $rcall_item->{return_count_wanted};
+ my $want_scalar = $rcall_item->{want_scalar};
+ my $package = $rcall_item->{package};
+ my $name = $rcall_item->{name};
+ my $call_type = $rcall_item->{call_type};
+ my $key = $package . '::' . $name;
+
+ my ( $shift_count_min, $shift_count_max, $self_name );
+ my ( $return_count_min, $return_count_max, $return_count_indefinite );
+ my ($rK_return_count_hash);
+
+ # look for the sub ..
+ my $seqno_sub = $rsub_seqno_by_key->{$key};
+ my $rK_return_list;
+ my $saw_wantarray;
+ if ( defined($seqno_sub) ) {
+
+ my $rsub_item = $rsub_info_by_seqno->{$seqno_sub};
+ $saw_wantarray =
+ defined( $self->[_rK_wantarray_by_sub_seqno_]->{$seqno_sub} );
+
+ # skip 'my' subs for now, they need special treatment. If
+ # anonymous subs are added, 'my' subs could also be added then.
+ if ( !$ris_my_sub_by_seqno->{$seqno_sub} ) {
+ $common_hash{$key}->{rsub_item} = $rsub_item;
+ $shift_count_min = $rsub_item->{shift_count_min};
+ $shift_count_max = $rsub_item->{shift_count_max};
+ if ( $call_type eq '&' && $rsub_item->{prototype} ) {
+ $shift_count_max = $rsub_item->{shift_count_max_amp};
+ }
+ $self_name = $rsub_item->{self_name};
+ $return_count_min = $rsub_item->{return_count_min};
+ $return_count_max = $rsub_item->{return_count_max};
+ $return_count_indefinite =
+ $rsub_item->{return_count_indefinite};
+ $rK_return_list =
+ $self->[_rK_return_by_sub_seqno_]->{$seqno_sub};
+ $common_hash{$key}->{rK_return_list} = $rK_return_list;
+ $rK_return_count_hash = $rsub_item->{rK_return_count_hash};
+ }
+ }
+
+ #------------------------------------
+ # compare caller/sub input arg counts
+ #------------------------------------
+ if ( defined($shift_count_min) && defined($arg_count) ) {
+ if ( $call_type eq '->' && !$rcall_item->{is_oo_call} ) {
+ $arg_count += 1;
}
+ my $excess = $arg_count - $shift_count_min;
- # otherwise we want its parent container
+ my $max = $common_hash{$key}->{max_arg_count};
+ my $min = $common_hash{$key}->{min_arg_count};
+ if ( !defined($max) || $arg_count > $max ) {
+ $common_hash{$key}->{max_arg_count} = $arg_count;
+ }
+ if ( !defined($min) || $arg_count < $min ) {
+ $common_hash{$key}->{min_arg_count} = $arg_count;
+ }
+ if ( $excess < 0 ) {
+ push @{ $common_hash{$key}->{under_count} }, $rcall_item;
+ }
+ elsif ( $excess > 0 ) {
+ if ( defined($shift_count_max) ) {
+ $excess = $arg_count - $shift_count_max;
+ if ( $excess > 0 ) {
+ push @{ $common_hash{$key}->{over_count} }, $rcall_item;
+ }
+ }
+ }
else {
- $parent_seqno = $self->[_rparent_of_seqno_]->{$type_sequence};
+ ## $excess = 0
}
}
- }
- $parent_seqno = SEQ_ROOT unless ( defined($parent_seqno) );
- return $parent_seqno;
-} ## end sub parent_seqno_by_K
-sub is_in_block_by_i {
- my ( $self, $i ) = @_;
+ #---------------------------------------------
+ # compare caller/sub return counts if possible
+ #---------------------------------------------
- # returns true if
- # token at i is contained in a BLOCK
- # or is at root level
- # or there is some kind of error (i.e. unbalanced file)
- # returns false otherwise
+ # rhs check: only check subs returning finite lists (i.e. not '@list');
+ next if ($return_count_indefinite);
- if ( $i < 0 ) {
- DEVEL_MODE && Fault("Bad call, i='$i'\n");
- return 1;
- }
+ # lhs check: only check when a finite return list is wanted
+ next if ( !$return_count_wanted );
- my $seqno = $parent_seqno_to_go[$i];
- return 1 if ( !$seqno || $seqno eq SEQ_ROOT );
- return 1 if ( $self->[_rblock_type_of_seqno_]->{$seqno} );
- return;
-} ## end sub is_in_block_by_i
+ # ignore scalar if wantarray seen
+ next if ( $want_scalar && $saw_wantarray );
-sub is_in_list_by_i {
- my ( $self, $i ) = @_;
+ # update min-max want ranges for the output report
+ my $max = $common_hash{$key}->{want_count_max};
+ my $min = $common_hash{$key}->{want_count_min};
+ if ( !defined($max) || $return_count_wanted > $max ) {
+ $common_hash{$key}->{want_count_max} = $return_count_wanted;
+ }
+ if ( !defined($min) || $return_count_wanted < $min ) {
+ $common_hash{$key}->{want_count_min} = $return_count_wanted;
+ }
- # returns true if token at i is contained in a LIST
- # returns false otherwise
- my $seqno = $parent_seqno_to_go[$i];
- return unless ( $seqno && $seqno ne SEQ_ROOT );
- if ( $self->[_ris_list_by_seqno_]->{$seqno} ) {
- return 1;
- }
- return;
-} ## end sub is_in_list_by_i
+ # return issue 'x': want array but no return seen
+ # return issue 'y': want scalar but no return seen
+ if ( !defined($rK_return_list) ) {
+ if ($want_scalar) {
+ push @{ $common_hash{$key}->{return_issues}->{y} }, $rcall_item;
+ }
+ else {
+ push @{ $common_hash{$key}->{return_issues}->{x} }, $rcall_item;
+ }
+ }
-sub is_list_by_K {
+ # safety check
+ elsif ( !defined($return_count_max) ) {
- # Return true if token K is in a list
- my ( $self, $KK ) = @_;
+ # shouldn't happen-should be defined if $rK_return_list is defined
+ DEVEL_MODE && Fault("return_count_max should be defined here\n");
+ }
- my $parent_seqno = $self->parent_seqno_by_K($KK);
- return unless defined($parent_seqno);
- return $self->[_ris_list_by_seqno_]->{$parent_seqno};
-} ## end sub is_list_by_K
+ # check for exact match
+ elsif ( $return_count_wanted == $return_count_max ) {
+ ## ok
+ }
-sub is_list_by_seqno {
+ # return issue 'o': overwant
+ elsif ( $return_count_wanted > $return_count_max ) {
- # Return true if the immediate contents of a container appears to be a
- # list.
- my ( $self, $seqno ) = @_;
- return unless defined($seqno);
- return $self->[_ris_list_by_seqno_]->{$seqno};
-} ## end sub is_list_by_seqno
+ # but no error for scalar request of 1 when max 0 returned
+ if ( !$want_scalar ) {
+ push @{ $common_hash{$key}->{return_issues}->{o} }, $rcall_item;
+ }
+ }
-sub resync_lines_and_tokens {
+ # if want less than max...
+ else {
- my $self = shift;
+ # issue 'u': want array for an unmatched count less than max
+ # issue 's': want scalar but all return counts are >1
+ if ( defined($rK_return_count_hash) ) {
+ my $K_return = $rK_return_count_hash->{$return_count_wanted};
+ if ( !defined($K_return) ) {
+ if ($want_scalar) {
+ push @{ $common_hash{$key}->{return_issues}->{s} },
+ $rcall_item;
+ }
+ else {
+ push @{ $common_hash{$key}->{return_issues}->{u} },
+ $rcall_item;
+ }
+ }
+ }
+ else {
+ ## safety check, shouldn't happen
+ DEVEL_MODE && Fault("return count hash not defined\n");
+ }
+ }
+ }
- # Re-construct the arrays of tokens associated with the original input
- # lines since they have probably changed due to inserting and deleting
- # blanks and a few other tokens.
+ #------------------------------------
+ # Construct one-line warning messages
+ #------------------------------------
+ my @call_arg_warnings;
+ my @return_warnings;
+ my $max_shift_count_with_undercount = 0;
+ my $number_of_undercount_warnings = 0;
- # Return paremeters:
- # set severe_error = true if processing needs to terminate
- my $severe_error;
- my $rqw_lines = [];
+ # variables with information about a sub needed for warning output:
+ my (
- my $rLL = $self->[_rLL_];
- my $Klimit = $self->[_Klimit_];
- my $rlines = $self->[_rlines_];
- my @Krange_code_without_comments;
- my @Klast_valign_code;
+ $lno, $name,
+ $shift_count_min, $shift_count_max,
+ $min_arg_count, $max_arg_count,
+ $return_count_min, $return_count_max,
+ $want_count_min, $want_count_max,
+ );
- # This is the next token and its line index:
- my $Knext = 0;
- my $Kmax = defined($Klimit) ? $Klimit : -1;
+ my $push_call_arg_warning = sub {
+ my ( $letter, $note ) = @_;
+ my $shift_count = $shift_count_min;
+ if ( $shift_count_min ne '*' && $shift_count_min ne $shift_count_max ) {
+ $shift_count = "$shift_count_min-$shift_count_max";
+ }
+ my $output_line =
+"$lno:$letter:$name:$shift_count:$min_arg_count:$max_arg_count: $note\n";
+ push @call_arg_warnings,
+ {
+ line_number => $lno,
+ letter => $letter,
+ name => $name,
+ output_line => $output_line,
+ };
+ return;
+ }; ## end $push_call_arg_warning = sub
- # Verify that old line indexes are in still order. If this error occurs,
- # check locations where sub 'respace_tokens' creates new tokens (like
- # blank spaces). It must have set a bad old line index.
- if ( DEVEL_MODE && defined($Klimit) ) {
- my $iline = $rLL->[0]->[_LINE_INDEX_];
- foreach my $KK ( 1 .. $Klimit ) {
- my $iline_last = $iline;
- $iline = $rLL->[$KK]->[_LINE_INDEX_];
- if ( $iline < $iline_last ) {
- my $KK_m = $KK - 1;
- my $token_m = $rLL->[$KK_m]->[_TOKEN_];
- my $token = $rLL->[$KK]->[_TOKEN_];
- my $type_m = $rLL->[$KK_m]->[_TYPE_];
- my $type = $rLL->[$KK]->[_TYPE_];
- Fault(<<EOM);
-Line indexes out of order at index K=$KK:
-at KK-1 =$KK_m: old line=$iline_last, type='$type_m', token='$token_m'
-at KK =$KK: old line=$iline, type='$type', token='$token',
-EOM
- }
+ my $push_return_warning = sub {
+ my ( $letter, $note, $lno_return ) = @_;
+ my $return_count = $return_count_min;
+ if ( $return_count_min ne '*'
+ && $return_count_min ne $return_count_max )
+ {
+ $return_count = "$return_count_min-$return_count_max";
}
- }
+ my $output_line =
+"$lno_return:$letter:$name:$return_count:$want_count_min:$want_count_max: $note\n";
+ push @return_warnings,
+ {
+ line_number => $lno_return,
+ letter => $letter,
+ name => $name,
+ output_line => $output_line,
+ };
+ return;
+ }; ## end $push_return_warning = sub
+
+ #-------------------
+ # Loop over each sub
+ #-------------------
+ foreach my $key ( keys %common_hash ) {
+ my $item = $common_hash{$key};
+
+ # Check for mixed method/direct calls:
+ my $rsub_item = $item->{rsub_item};
+ next unless defined($rsub_item);
+
+ $name = $rsub_item->{name};
+ $lno = $rsub_item->{line_number};
+## my $rK_return_list = $item->{rK_return_list};
+ my $rself_calls = $item->{self_calls};
+ my $rdirect_calls = $item->{direct_calls};
+ my $num_self = defined($rself_calls) ? @{$rself_calls} : 0;
+ my $num_direct = defined($rdirect_calls) ? @{$rdirect_calls} : 0;
+
+## my $K_return_count_min = $rsub_item->{K_return_count_min};
+ my $K_return_count_max = $rsub_item->{K_return_count_max};
+
+ $shift_count_min = $rsub_item->{shift_count_min};
+ $shift_count_max = $rsub_item->{shift_count_max};
+ $return_count_min = $rsub_item->{return_count_min};
+ $return_count_max = $rsub_item->{return_count_max};
+ $min_arg_count = $item->{min_arg_count};
+ $max_arg_count = $item->{max_arg_count};
+ $want_count_min = $item->{want_count_min};
+ $want_count_max = $item->{want_count_max};
+
+ # change undefs to '*' for the output text
+ foreach (
- my $iline = -1;
- foreach my $line_of_tokens ( @{$rlines} ) {
- $iline++;
- my $line_type = $line_of_tokens->{_line_type};
- if ( $line_type eq 'CODE' ) {
+ $shift_count_min, $shift_count_max,
+ $return_count_min, $return_count_max,
+ $min_arg_count, $max_arg_count,
+ $want_count_min, $want_count_max,
+ )
+ {
+ $_ = '*' unless defined($_);
+ }
- # Get the old number of tokens on this line
- my $rK_range_old = $line_of_tokens->{_rK_range};
- my ( $Kfirst_old, $Klast_old ) = @{$rK_range_old};
- my $Kdiff_old = 0;
- if ( defined($Kfirst_old) ) {
- $Kdiff_old = $Klast_old - $Kfirst_old;
- }
-
- # Find the range of NEW K indexes for the line:
- # $Kfirst = index of first token on line
- # $Klast = index of last token on line
- my ( $Kfirst, $Klast );
-
- my $Knext_beg = $Knext; # this will be $Kfirst if we find tokens
-
- # Optimization: Although the actual K indexes may be completely
- # changed after respacing, the number of tokens on any given line
- # will often be nearly unchanged. So we will see if we can start
- # our search by guessing that the new line has the same number
- # of tokens as the old line.
- my $Knext_guess = $Knext + $Kdiff_old;
- if ( $Knext_guess > $Knext
- && $Knext_guess < $Kmax
- && $rLL->[$Knext_guess]->[_LINE_INDEX_] <= $iline )
- {
+ #-----------------------------------------------------------------
+ # Make a one-line message for each mismatch call issue of this sub
+ #-----------------------------------------------------------------
- # the guess is good, so we can start our search here
- $Knext = $Knext_guess + 1;
- }
+ my $rover_count = $item->{over_count};
+ my $runder_count = $item->{under_count};
+ my $num_over_count = defined($rover_count) ? @{$rover_count} : 0;
+ my $num_under_count = defined($runder_count) ? @{$runder_count} : 0;
- while ($Knext <= $Kmax
- && $rLL->[$Knext]->[_LINE_INDEX_] <= $iline )
- {
- $Knext++;
- }
+ #--------------------------------------------------
+ # issue 'a': subs with both self-> and direct calls
+ #--------------------------------------------------
+ if ( $num_self && $num_direct && $do_mismatched_call_type{'a'} ) {
- if ( $Knext > $Knext_beg ) {
+ my $letter = 'a';
+ my $lines_self_calls = stringify_line_range($rself_calls);
+ my $lines_direct_calls = stringify_line_range($rdirect_calls);
+ my $self_name = $rsub_item->{self_name};
+ if ( !defined($self_name) ) { $self_name = EMPTY_STRING }
+ my $ess1 = $num_self > 1 ? 's' : EMPTY_STRING;
+ my $ess2 = $num_direct > 1 ? 's' : EMPTY_STRING;
+ my $str = $self_name . '->call' . $ess1;
+ my $note =
+"$num_self $str($lines_self_calls) and $num_direct call$ess2($lines_direct_calls)";
+ $push_call_arg_warning->( $letter, $note );
+ }
- $Klast = $Knext - 1;
+ #---------------------------------------------------------
+ # Ignore calls to a sub which was not defined in this file
+ #---------------------------------------------------------
+ if ( !defined($rsub_item) ) {
+ next;
+ }
- # Delete any terminal blank token
- if ( $rLL->[$Klast]->[_TYPE_] eq 'b' ) { $Klast -= 1 }
+ #-------------------------------------------------------------------
+ # issue 'i': indeterminate. Could not determine a specific arg count
+ #-------------------------------------------------------------------
+ if ( $shift_count_min eq '*' ) {
+ my $letter = 'i';
+ if ( $do_mismatched_call_type{$letter} ) {
- if ( $Klast < $Knext_beg ) {
- $Klast = undef;
- }
- else {
+ # skip *:*:* (no disagreement - call counts also indeterminate)
+ next
+ if ( $shift_count_min eq $min_arg_count
+ && $shift_count_min eq $max_arg_count );
- $Kfirst = $Knext_beg;
+ my $note = $call_arg_issue_note{$letter};
+ $push_call_arg_warning->( $letter, $note );
+ }
+ }
- # Save ranges of non-comment code. This will be used by
- # sub keep_old_line_breaks.
- if ( $rLL->[$Kfirst]->[_TYPE_] ne '#' ) {
- push @Krange_code_without_comments, [ $Kfirst, $Klast ];
- }
+ # otherwise check call arg counts
+ else {
- # Only save ending K indexes of code types which are blank
- # or 'VER'. These will be used for a convergence check.
- # See related code in sub 'convey_batch_to_vertical_aligner'
- my $CODE_type = $line_of_tokens->{_code_type};
- if ( !$CODE_type
- || $CODE_type eq 'VER' )
- {
- push @Klast_valign_code, $Klast;
- }
- }
+ #---------------------
+ # issue 'o': overcount
+ #---------------------
+ if ( $num_over_count
+ && $do_mismatched_call_type{'o'}
+ && $shift_count_max >= $mismatched_arg_overcount_cutoff )
+ {
+ my $letter = 'o';
+ my $line_range = stringify_line_range($rover_count);
+ my $total = $num_direct + $num_self;
+ my $note = $call_arg_issue_note{$letter};
+ $note .=
+ $total > 1
+ ? " at $num_over_count of $total calls ($line_range)"
+ : " at $line_range";
+ $push_call_arg_warning->( $letter, $note );
}
- # It is only safe to trim the actual line text if the input
- # line had a terminal blank token. Otherwise, we may be
- # in a quote.
- if ( $line_of_tokens->{_ended_in_blank_token} ) {
- $line_of_tokens->{_line_text} =~ s/\s+$//;
- }
- $line_of_tokens->{_rK_range} = [ $Kfirst, $Klast ];
+ #----------------------
+ # issue 'u': undercount
+ #----------------------
+ if ($num_under_count) {
- # Deleting semicolons can create new empty code lines
- # which should be marked as blank
- if ( !defined($Kfirst) ) {
- my $CODE_type = $line_of_tokens->{_code_type};
- if ( !$CODE_type ) {
- $line_of_tokens->{_code_type} = 'BL';
+ if ( $shift_count_min > $max_shift_count_with_undercount ) {
+ $max_shift_count_with_undercount = $shift_count_min;
}
- }
- else {
- #---------------------------------------------------
- # save indexes of all lines with a 'q' at either end
- # for later use by sub find_multiline_qw
- #---------------------------------------------------
- if ( $rLL->[$Kfirst]->[_TYPE_] eq 'q'
- || $rLL->[$Klast]->[_TYPE_] eq 'q' )
+ # Skip the warning for small lists with undercount
+ if ( $do_mismatched_call_type{'u'}
+ && $shift_count_min >= $mismatched_arg_undercount_cutoff )
{
- push @{$rqw_lines}, $iline;
+ my $letter = 'u';
+ my $line_range = stringify_line_range($runder_count);
+ my $total = $num_direct + $num_self;
+
+ my $note = $call_arg_issue_note{$letter};
+ $note .=
+ $total > 1
+ ? " at $num_under_count of $total calls ($line_range)"
+ : " at $line_range";
+
+ $number_of_undercount_warnings++;
+ $push_call_arg_warning->( $letter, $note );
}
}
}
+
+ #-------------------------------------------------------------------
+ # Make a one-line message for each mismatch return issue of this sub
+ #-------------------------------------------------------------------
+ my $return_issues = $item->{return_issues};
+ if ($return_issues) {
+ foreach my $letter ( keys %return_issue_note ) {
+ next if ( !$do_mismatched_return_type{$letter} );
+ my $rissues = $return_issues->{$letter};
+ my $number = defined($rissues) ? @{$rissues} : 0;
+ next unless ($number);
+ my $line_range = stringify_line_range($rissues);
+ my $total = $num_direct + $num_self;
+
+ my $note = $return_issue_note{$letter};
+ $note .=
+ $total > 1
+ ? " at $number of $total calls ($line_range)"
+ : " at $line_range";
+
+ # The one-line message shows the line number of the return
+ # with the maximum count if there are returns. If no returns
+ # (types 'x' and 'y') it shows the first line of the sub ($lno).
+ my $lno_return =
+ defined($K_return_count_max)
+ ? $rLL->[$K_return_count_max]->[_LINE_INDEX_] + 1
+ : $lno;
+
+ $push_return_warning->( $letter, $note, $lno_return );
+ } ## end loop to save one line for mismatched returns
+ }
}
- # There shouldn't be any nodes beyond the last one. This routine is
- # relinking lines and tokens after the tokens have been respaced. A fault
- # here indicates some kind of bug has been introduced into the above loops.
- # There is not good way to keep going; we better stop here.
- if ( $Knext <= $Kmax ) {
- Fault_Warn(
- "unexpected tokens at end of file when reconstructing lines");
- $severe_error = 1;
- return ( $severe_error, $rqw_lines );
+ #-----------------------------------------------
+ # Make the sorted/filtered call arg issue report
+ #-----------------------------------------------
+ my $rcall_arg_warnings = sort_warnings( \@call_arg_warnings );
+ $rcall_arg_warnings = filter_excluded_names( $rcall_arg_warnings,
+ $ris_mismatched_call_excluded_name );
+ my $call_arg_warning_output = EMPTY_STRING;
+ my $call_arg_hint = EMPTY_STRING;
+ if ( @{$rcall_arg_warnings} ) {
+ my $header =
+ "Issue types are 'a'=arrow mismatch 'u'=undercount 'o'=overcount";
+ if ($is_dump) { $header .= " 'i'=indeterminate" }
+ $call_arg_warning_output = <<EOM;
+$header
+Line:Issue:Sub:#args:Min:Max: note
+EOM
+ foreach ( @{$rcall_arg_warnings} ) {
+ $call_arg_warning_output .= $_->{output_line};
+ }
+ if ( !$is_dump && $number_of_undercount_warnings ) {
+ my $wmauc_min = $max_shift_count_with_undercount + 1;
+ $call_arg_hint = <<EOM;
+Note: use -wmauc=$wmauc_min or greater to prevent undercount warnings in this file
+or put parentheses around default sub args and use -wmauc=0
+EOM
+ $call_arg_warning_output .= $call_arg_hint;
+ }
}
- $self->[_rKrange_code_without_comments_] = \@Krange_code_without_comments;
- # Setup the convergence test in the FileWriter based on line-ending indexes
- my $file_writer_object = $self->[_file_writer_object_];
- $file_writer_object->setup_convergence_test( \@Klast_valign_code );
+ #---------------------------------------------
+ # Make the sorted/filtered return issue report
+ #---------------------------------------------
+ my $rreturn_warnings = sort_warnings( \@return_warnings );
+ $rreturn_warnings = filter_excluded_names( $rreturn_warnings,
+ $ris_mismatched_return_excluded_name );
- return ( $severe_error, $rqw_lines );
+ my $return_warning_output = EMPTY_STRING;
+ if ( @{$rreturn_warnings} ) {
+ $return_warning_output = <<EOM;
+Issue types 'u'=under-want 'o'=over-want 'x','y'=no return 's'=scalar-array mix
+Line:Issue:Sub:#Returned:Min_wanted:Max_wanted: note
+EOM
+ foreach ( @{$rreturn_warnings} ) {
+ $return_warning_output .= $_->{output_line};
+ }
+ }
-} ## end sub resync_lines_and_tokens
+ return {
+ call_arg_warning_output => $call_arg_warning_output,
+ return_warning_output => $return_warning_output,
+ };
+} ## end sub cross_check_sub_calls
+
+sub sort_warnings {
+
+ my ($rwarnings) = @_;
+
+ # Given:
+ # $rwarnigns = ref to list of warning info hashes
+ # Return updated $rwarnings
+ # - Sorted by line number
+ if ( @{$rwarnings} ) {
+
+ # sort by line number
+ $rwarnings = [
+ sort {
+ $a->{line_number} <=> $b->{line_number}
+ || $a->{letter} cmp $b->{letter}
+ } @{$rwarnings}
+ ];
+ }
+ return $rwarnings;
+} ## end sub sort_warnings
+
+sub stringify_line_range {
+ my ($rcalls) = @_;
+
+ # Given:
+ # $rcalls = ref to list of call info
+ # Return:
+ # $string = single line of text with just the line range
+
+ my $string = EMPTY_STRING;
+ if ( $rcalls && @{$rcalls} ) {
+ my @sorted =
+ sort { $a->{line_number} <=> $b->{line_number} } @{$rcalls};
+ my $num = @sorted;
+ my $lno_beg = $sorted[0]->{line_number};
+ my $lno_end = $sorted[-1]->{line_number};
+ if ( $num == 1 ) {
+ $string = "line $lno_beg";
+ }
+ elsif ( $num == 2 ) {
+ $string = "lines $lno_beg,$lno_end";
+ }
+ else {
+ $string = "lines $lno_beg..$lno_end";
+ }
+ }
+ return $string;
+} ## end sub stringify_line_range
+
+sub initialize_warn_mismatched {
+
+ # a - mismatched arrow operator calls
+ # o - overcount
+ # u - undercount
+ $rwarn_mismatched_arg_types =
+ initialize_warn_hash( 'warn-mismatched-arg-types', 1, [qw( a o u )] );
+ $ris_warn_mismatched_arg_excluded_name =
+ make_excluded_name_hash('warn-mismatched-arg-exclusion-list');
+
+ # x - want array but no return seen
+ # o - want array with excess count
+ # u - want array with unmatched count
+ # y - want scalar but no return seen
+ # s - want scalar but only arrays with count > 1 returned
+ $rwarn_mismatched_return_types =
+ initialize_warn_hash( 'warn-mismatched-return-types',
+ 1, [qw( x o u y s )] );
+ $ris_warn_mismatched_return_excluded_name =
+ make_excluded_name_hash('warn-mismatched-return-exclusion-list');
+ return;
+} ## end sub initialize_warn_mismatched
+
+sub warn_mismatched {
+ my ($self) = @_;
+
+ # process both --warn-mismatched-args and --warn-mismatched-returns,
+ my $rhash = $self->cross_check_sub_calls();
+
+ my $wma_key = 'warn-mismatched-args';
+ if ( $rOpts->{$wma_key} ) {
+ my $output_lines = $rhash->{call_arg_warning_output};
+ if ($output_lines) {
+ chomp $output_lines;
+ warning(<<EOM);
+Begin scan for --$wma_key
+$output_lines
+End scan for --$wma_key
+EOM
+ }
+ }
+
+ my $wmr_key = 'warn-mismatched-returns';
+ if ( $rOpts->{$wmr_key} ) {
+ my $output_lines = $rhash->{return_warning_output};
+ if ($output_lines) {
+ chomp $output_lines;
+ warning(<<EOM);
+Begin scan for --$wmr_key
+$output_lines
+End scan for --$wmr_key
+EOM
+ }
+ }
+ return;
+} ## end sub warn_mismatched
+
+sub dump_mismatched_args {
+ my ($self) = @_;
+
+ # process a --dump-mismatched-args command
+ my $rhash = $self->cross_check_sub_calls();
+ my $output_string = $rhash->{call_arg_warning_output};
+ if ($output_string) {
+ my $input_stream_name = get_input_stream_name();
+ chomp $output_string;
+ print {*STDOUT} <<EOM;
+$input_stream_name: output for --dump-mismatched-args
+$output_string
+EOM
+ }
+ return;
+} ## end sub dump_mismatched_args
+
+sub dump_mismatched_returns {
+ my ($self) = @_;
+
+ # process a --dump-mismatched-returns command
+ my $rhash = $self->cross_check_sub_calls();
+ my $output_string = $rhash->{return_warning_output};
+ if ($output_string) {
+ my $input_stream_name = get_input_stream_name();
+ chomp $output_string;
+ print {*STDOUT} <<EOM;
+$input_stream_name: output for --dump-mismatched-returns
+$output_string
+EOM
+ }
+ return;
+} ## end sub dump_mismatched_returns
sub check_for_old_break {
my ( $self, $KK, $rkeep_break_hash, $rbreak_hash ) = @_;
if ( $flag eq 'b' ) { $match = $block_type }
elsif ( $flag eq 'B' ) { $match = !$block_type }
else {
- # unknown code - no match
+ ## unknown code - no match
+ DEVEL_MODE && Fault(<<EOM);
+unexpected code '$flag' for --keep-old-breakpoints: expecting 'b' or 'B'
+EOM
}
}
+ else {
+ # no match
+ }
}
if ($match) {
my $type = $rLL->[$KK]->[_TYPE_];
sub keep_old_line_breaks {
+ my ($self) = @_;
+
# Called once per file to find and mark any old line breaks which
# should be kept. We will be translating the input hashes into
# token indexes.
# = 2 make a soft break (keep building current batch)
# best for something like leading ->
- my ($self) = @_;
-
my $rLL = $self->[_rLL_];
my $rKrange_code_without_comments =
$self->[_rKrange_code_without_comments_];
# leading '->' use a value of 2 which causes a soft
# break rather than a hard break
if ( $type eq '->' ) {
+
+ # ignore -bom after an opening token ( a syntax error, b1475 )
+ my $Kp = $self->K_previous_nonblank($Kfirst);
+ next if ( !defined($Kp) );
+ next if ( $is_opening_type{ $rLL->[$Kp]->[_TYPE_] } );
+
+ # ignore -bom if this does not look like a method call; c426
+ my $Kn = $self->K_next_nonblank($Kfirst);
+ next if ( !defined($Kn) );
+ my $token_n = $rLL->[$Kn]->[_TYPE_];
+ next if ( $token_n eq '{' || $token_n eq '[' );
+
$rbreak_before_Kfirst->{$Kfirst} = 2;
}
# Fix for b1120: only for parens, not braces
elsif ( $token eq ')' ) {
my $Kn = $self->K_next_nonblank($Kfirst);
- next
- unless ( defined($Kn)
- && $Kn <= $Klast
- && $rLL->[$Kn]->[_TYPE_] eq '->' );
+ next if ( !defined($Kn) );
+ next if ( $Kn > $Klast );
+ next if ( $rLL->[$Kn]->[_TYPE_] ne '->' );
my $seqno = $rLL->[$Kfirst]->[_TYPE_SEQUENCE_];
- next unless ($seqno);
+ next if ( !$seqno );
# Note: in previous versions there was a fix here to avoid
# instability between conflicting -bom and -pvt or -pvtc flags.
# opens the opening paren when the closing paren opens.
# Relevant cases are b977, b1215, b1270, b1303
+ # ignore -bom if this does not look like a method call; c426
+ $Kn = $self->K_next_nonblank($Kn);
+ next if ( !defined($Kn) );
+ my $token_n = $rLL->[$Kn]->[_TYPE_];
+ next if ( $token_n eq '{' || $token_n eq '[' );
+
$rbreak_container->{$seqno} = 1;
}
+ else {
+ # not a special case
+ }
}
}
sub weld_containers {
+ my ($self) = @_;
+
# Called once per file to do any welding operations requested by --weld*
# flags.
- my ($self) = @_;
# This count is used to eliminate needless calls for weld checks elsewhere
$total_weld_count = 0;
$Kend = $rK_weld_right->{$Kstart};
my $Knext = $rK_weld_right->{$Kend};
while ( defined($Knext) ) {
+ if ( $Knext <= $Kend ) {
+ ## shouldn't happen: K should increase for right weld
+ DEVEL_MODE && Fault(<<EOM);
+Error: Knext=$Knext = rK_weld_right->{$Kend} is not increasing
+EOM
+ last;
+ }
$Kend = $Knext;
$Knext = $rK_weld_right->{$Kend};
- }
+ } ## end while ( defined($Knext) )
# Set weld values this chain
foreach ( $Kstart + 1 .. $Kend ) {
return;
} ## end sub weld_containers
-sub cumulative_length_before_K {
- my ( $self, $KK ) = @_;
- my $rLL = $self->[_rLL_];
- return ( $KK <= 0 ) ? 0 : $rLL->[ $KK - 1 ]->[_CUMULATIVE_LENGTH_];
-}
-
sub weld_cuddled_blocks {
+
my ($self) = @_;
# Called once per file to handle cuddled formatting
# loop over structure items to find cuddled pairs
my $level = 0;
- my $KNEXT = $self->[_K_first_seq_item_];
- while ( defined($KNEXT) ) {
- my $KK = $KNEXT;
- $KNEXT = $rLL->[$KNEXT]->[_KNEXT_SEQ_ITEM_];
+ foreach my $KK ( @{ $self->[_rK_sequenced_token_list_] } ) {
my $rtoken_vars = $rLL->[$KK];
my $type_sequence = $rtoken_vars->[_TYPE_SEQUENCE_];
if ( !$type_sequence ) {
# A fault here implies that an error was made in the little loop at
# the bottom of sub 'respace_tokens' which set the values of
- # _KNEXT_SEQ_ITEM_. Or an error has been introduced in the
- # loop control lines above.
+ # _rK_sequenced_token_list_. Or an error has been introduced in
+ # the loop control lines above.
Fault("sequence = $type_sequence not defined at K=$KK")
if (DEVEL_MODE);
next;
if ( $level < $last_level ) { $in_chain{$last_level} = undef }
elsif ( $level > $last_level ) { $in_chain{$level} = undef }
+ else {
+ # level unchanged
+ }
# We are only looking at code blocks
my $token = $rtoken_vars->[_TOKEN_];
if ( !$block_type ) {
# patch for unrecognized block types which may not be labeled
- my $Kp = $self->K_previous_nonblank($KK);
- while ( $Kp && $rLL->[$Kp]->[_TYPE_] eq '#' ) {
- $Kp = $self->K_previous_nonblank($Kp);
- }
+ my $Kp = $self->K_previous_code($KK);
next unless $Kp;
$block_type = $rLL->[$Kp]->[_TOKEN_];
}
else { $in_chain{$level} = undef }
}
}
+ else {
+ # not a curly brace
+ }
}
return;
} ## end sub weld_cuddled_blocks
sub find_nested_pairs {
- my $self = shift;
+
+ my ($self) = @_;
# This routine is called once per file to do preliminary work needed for
# the --weld-nested option. This information is also needed for adding
# semicolons.
+ # Returns:
+ # \@nested_pairs = ref to a list in which each item is a ref to
+ # to the sequence numbers of two nested containers:
+ # [ $seqno_inner, $seqno_outer ]
+
my $rLL = $self->[_rLL_];
return unless ( defined($rLL) && @{$rLL} );
my $Num = @{$rLL};
my $K_opening_container = $self->[_K_opening_container_];
my $K_closing_container = $self->[_K_closing_container_];
my $rblock_type_of_seqno = $self->[_rblock_type_of_seqno_];
+ my $rK_next_seqno_by_K = $self->[_rK_next_seqno_by_K_];
# We define an array of pairs of nested containers
my @nested_pairs;
if ( $K_outer_closing < $Num
&& $rLL->[$K_outer_closing]->[_TYPE_] eq 'b' );
- next unless ( $K_outer_closing < $Num );
+ next if ( $K_outer_closing >= $Num );
my $outer_seqno = $rLL->[$K_outer_closing]->[_TYPE_SEQUENCE_];
- next unless ($outer_seqno);
+ next if ( !$outer_seqno );
my $token_outer_closing = $rLL->[$K_outer_closing]->[_TOKEN_];
- next unless ( $is_closing_token{$token_outer_closing} );
+ next if ( !$is_closing_token{$token_outer_closing} );
# Simple filter: No commas or semicolons in the outer container
my $rtype_count = $self->[_rtype_count_by_seqno_]->{$outer_seqno};
# Now we have to check the opening tokens.
my $K_outer_opening = $K_opening_container->{$outer_seqno};
my $K_inner_opening = $K_opening_container->{$inner_seqno};
- next unless defined($K_outer_opening) && defined($K_inner_opening);
+ next if ( !defined($K_outer_opening) );
+ next if ( !defined($K_inner_opening) );
my $inner_blocktype = $rblock_type_of_seqno->{$inner_seqno};
my $outer_blocktype = $rblock_type_of_seqno->{$outer_seqno};
# Verify that the inner opening token is the next container after the
# outer opening token.
- my $K_io_check = $rLL->[$K_outer_opening]->[_KNEXT_SEQ_ITEM_];
+ my $K_io_check = $rK_next_seqno_by_K->[$K_outer_opening];
next unless defined($K_io_check);
if ( $K_io_check != $K_inner_opening ) {
next unless defined($seqno_signature);
my $K_signature_closing = $K_closing_container->{$seqno_signature};
next unless defined($K_signature_closing);
- my $K_test = $rLL->[$K_signature_closing]->[_KNEXT_SEQ_ITEM_];
+ my $K_test = $rK_next_seqno_by_K->[$K_signature_closing];
next
unless ( defined($K_test) && $K_test == $K_inner_opening );
# Count the number of nonblank characters separating them.
# Note: the $nonblank_count includes the inner opening container
# but not the outer opening container, so it will be >= 1.
- if ( $K_diff < 0 ) { next } # Shouldn't happen
+ if ( $K_diff < 0 ) {
+
+ # Shouldn't happen
+ DEVEL_MODE
+ && Fault(
+"unexpected negative index diff=$K_diff = Kio-Koo =$K_inner_opening - $K_outer_opening"
+ );
+ next;
+ }
my $nonblank_count = 0;
my $type;
my $is_name;
next;
}
+ #------------------------------------
+ # Make the final list of nested pairs
+ #------------------------------------
+
# The weld routine expects the pairs in order in the form
# [$seqno_inner, $seqno_outer]
# And they must be in the same order as the inner closing tokens
sub match_paren_control_flag {
+ my ( $self, $seqno, $flag, ($rLL) ) = @_;
+
+ # Input parameters:
+ # $seqno = sequence number of the container (should be paren)
+ # $flag = the flag which defines what matches
+ # $rLL = an optional alternate token list needed for respace operations
+
# Decide if this paren is excluded by user request:
# undef matches no parens
# '*' matches all parens
# 'F' matches if 'f' does not.
# 'w' matches if either 'k' or 'f' match.
# 'W' matches if 'w' does not.
- my ( $self, $seqno, $flag, $rLL ) = @_;
- # Input parameters:
- # $seqno = sequence number of the container (should be paren)
- # $flag = the flag which defines what matches
- # $rLL = an optional alternate token list needed for respace operations
$rLL = $self->[_rLL_] unless ( defined($rLL) );
return 0 unless ( defined($flag) );
elsif ( $flag eq 'F' ) { $match = !$is_f }
elsif ( $flag eq 'w' ) { $match = $is_w }
elsif ( $flag eq 'W' ) { $match = !$is_w }
+ else {
+ ## no match
+ DEVEL_MODE && Fault(<<EOM);
+unexpected code '$flag' in sub match_paren_control_flag: expecting one of kKfFwW
+EOM
+ }
return $match;
} ## end sub match_paren_control_flag
sub is_excluded_weld {
- # decide if this weld is excluded by user request
my ( $self, $KK, $is_leading ) = @_;
+
+ # Decide if this weld is excluded by user request
+
+ # Given:
+ # $KK = index of this weld token
+ # $is_leading = true if this will the outer token of a weld
+
my $rLL = $self->[_rLL_];
my $rtoken_vars = $rLL->[$KK];
my $token = $rtoken_vars->[_TOKEN_];
@type_ok_after_bareword{@q} = (1) x scalar(@q);
# these types do not 'like' to be separated from a following paren
- @q = qw(w i q Q G C Z U);
- @{has_tight_paren}{@q} = (1) x scalar(@q);
+ @q = qw( w i q Q G C Z U );
+ @has_tight_paren{@q} = (1) x scalar(@q);
} ## end BEGIN
use constant DEBUG_WELD => 0;
sub setup_new_weld_measurements {
+ my ( $self, $Kouter_opening, $Kinner_opening ) = @_;
+
# Define quantities to check for excess line lengths when welded.
# Called by sub 'weld_nested_containers' and sub 'weld_nested_quotes'
- my ( $self, $Kouter_opening, $Kinner_opening ) = @_;
-
- # Given indexes of outer and inner opening containers to be welded:
- # $Kouter_opening, $Kinner_opening
+ # Given:
+ # ($Kouter_opening, $Kinner_opening) = indexes of outer and inner opening
+ # containers to be welded
# Returns these variables:
# $new_weld_ok = true (new weld ok) or false (do not start new weld)
# $starting_lentot = starting cumulative length
# $msg = diagnostic message for debugging
- my $rLL = $self->[_rLL_];
- my $rlines = $self->[_rlines_];
+ my $rLL = $self->[_rLL_];
+ my $rlines = $self->[_rlines_];
+ my $rK_next_seqno_by_K = $self->[_rK_next_seqno_by_K_];
my $starting_level;
my $starting_ci;
my $iline_oo = $rLL->[$Kouter_opening]->[_LINE_INDEX_];
my $rK_range = $rlines->[$iline_oo]->{_rK_range};
- my ( $Kfirst, $Klast ) = @{$rK_range};
+ my ( $Kfirst, $Klast_uu ) = @{$rK_range};
#-------------------------------------------------------------------------
# We now define a reference index, '$Kref', from which to start measuring
# the previous line in length calculations. This check added to
# fix case b1174 which had a '?' on the line
my $no_previous_seq_item = $Kref == $Kouter_opening
- || $rLL->[$Kref]->[_KNEXT_SEQ_ITEM_] == $Kouter_opening;
-
+ || $rK_next_seqno_by_K->[$Kref] == $Kouter_opening;
if ( $no_previous_seq_item
&& substr( $type_prev, 0, 1 ) eq '=' )
{
if ( $type_prev eq '=>' ) {
my $iline_prev = $rLL->[$Kprev]->[_LINE_INDEX_];
my $rK_range_prev = $rlines->[$iline_prev]->{_rK_range};
- my ( $Kfirst_prev, $Klast_prev ) = @{$rK_range_prev};
+ my ( $Kfirst_prev, $Klast_prev_uu ) = @{$rK_range_prev};
+ my $nb_count = 0;
foreach my $KK ( reverse( $Kfirst_prev .. $Kref - 1 ) ) {
next if ( $rLL->[$KK]->[_TYPE_] eq 'b' );
$Kref = $KK;
- last;
+
+ # Continue at type 'w' to get previous dash. Example:
+ # -classification => [ qw(
+ # This fixes b1502.
+ last if ( $nb_count || $rLL->[$KK]->[_TYPE_] ne 'w' );
+ $nb_count++;
}
}
}
}
+ else {
+ # do not need to backup
+ }
}
# STEP 3: Now look ahead for a ternary and, if found, use it.
# Also look for a ')' at the same level and, if found, use it.
# This fixes case b1224.
if ( $Kref < $Kouter_opening ) {
- my $Knext = $rLL->[$Kref]->[_KNEXT_SEQ_ITEM_];
- my $level_oo = $rLL->[$Kouter_opening]->[_LEVEL_];
- while ( $Knext < $Kouter_opening ) {
+ my $Knext = $rK_next_seqno_by_K->[$Kref];
+ my $level_oo = $rLL->[$Kouter_opening]->[_LEVEL_];
+ my $Knext_last = $Knext;
+ while ( $Knext && $Knext < $Kouter_opening ) {
if ( $rLL->[$Knext]->[_LEVEL_] == $level_oo ) {
if ( $is_ternary{ $rLL->[$Knext]->[_TYPE_] }
|| $rLL->[$Knext]->[_TOKEN_] eq ')' )
last;
}
}
- $Knext = $rLL->[$Knext]->[_KNEXT_SEQ_ITEM_];
+ $Knext = $rK_next_seqno_by_K->[$Knext];
+ if ( $Knext <= $Knext_last ) {
+ ## shouldn't happen: $rK_next_seqno_by_K is corrupted
+ DEVEL_MODE && Fault(<<EOM);
+Knext should not increase: Knext_last=$Knext_last >= Knext=$Knext
+EOM
+ last;
+ }
+ $Knext_last = $Knext;
+ } ## end while ( $Knext && $Knext ...)
+ }
+
+ # fix c1468 - do not measure from a leading opening block brace -
+ # which is not a one-line block
+ if ( $Kref < $Kouter_opening
+ && $Kref == $Kfirst
+ && $rLL->[$Kref]->[_TOKEN_] eq '{' )
+ {
+ my $seqno_ref = $rLL->[$Kref]->[_TYPE_SEQUENCE_];
+ if ($seqno_ref) {
+ my $block_type = $self->[_rblock_type_of_seqno_]->{$seqno_ref};
+ if ($block_type) {
+ my $Kref_c = $self->[_K_closing_container_]->{$seqno_ref};
+ my $ln_ref_o = $rLL->[$Kref]->[_LINE_INDEX_];
+ my $ln_ref_c = $rLL->[$Kref_c]->[_LINE_INDEX_];
+ if ( $ln_ref_c > $ln_ref_o ) {
+ $Kref = $self->K_next_nonblank($Kref);
+ }
+ }
}
}
my $type_prev = $rLL->[$Kprev]->[_TYPE_];
my $type_pp = 'b';
if ( $Kprev >= 0 ) { $type_pp = $rLL->[ $Kprev - 1 ]->[_TYPE_] }
- unless (
- $type_prev =~ /^[\,\.\;]/
- || $type_prev =~ /^[=\{\[\(\L]/
- && ( $type_pp eq 'b' || $type_pp eq '}' || $type_first eq 'k' )
- || $type_first =~ /^[=\,\.\;\{\[\(\L]/
- || $type_first eq '||'
- || (
- $type_first eq 'k'
- && ( $token_first eq 'if'
- || $token_first eq 'or' )
- )
- )
- {
+
+ my $is_good_location =
+
+ $type_prev =~ /^[\,\.\;]/
+ || ( $type_prev =~ /^[=\{\[\(\L]/
+ && ( $type_pp eq 'b' || $type_pp eq '}' || $type_first eq 'k' ) )
+ || $type_first =~ /^[=\,\.\;\{\[\(\L]/
+ || $type_first eq '||'
+ || (
+ $type_first eq 'k'
+ && ( $token_first eq 'if'
+ || $token_first eq 'or' )
+ );
+
+ if ( !$is_good_location ) {
$msg =
"Skipping weld: poor break with -lp and ci at type_first='$type_first' type_prev='$type_prev' type_pp=$type_pp\n";
$new_weld_ok = 0;
} ## end sub setup_new_weld_measurements
sub excess_line_length_for_Krange {
+
my ( $self, $Kfirst, $Klast ) = @_;
# returns $excess_length =
} ## end sub excess_line_length_for_Krange
sub weld_nested_containers {
+
my ($self) = @_;
# Called once per file for option '--weld-nested-containers'
my $rlines = $self->[_rlines_];
my $K_opening_container = $self->[_K_opening_container_];
my $K_closing_container = $self->[_K_closing_container_];
+ my $rK_next_seqno_by_K = $self->[_rK_next_seqno_by_K_];
my $rblock_type_of_seqno = $self->[_rblock_type_of_seqno_];
my $ris_asub_block = $self->[_ris_asub_block_];
my $rmax_vertical_tightness = $self->[_rmax_vertical_tightness_];
my $rnested_pairs = $self->find_nested_pairs();
# Return unless there are nested pairs to weld
- return unless defined($rnested_pairs) && @{$rnested_pairs};
+ return unless ( defined($rnested_pairs) && @{$rnested_pairs} );
# NOTE: It would be nice to apply RULE 5 right here by deleting unwanted
# pairs. But it isn't clear if this is possible because we don't know
my $iline_outer_opening = -1;
my $weld_count_this_start = 0;
+ my $weld_starts_in_block = 0;
# OLD: $single_line_tol added to fix cases b1180 b1181
# = $rOpts_continuation_indentation > $rOpts_indent_columns ? 1 : 0;
- # NEW: $single_line_tol=0; fixes b1212 and b1180-1181 work now
- my $single_line_tol = 0;
+ # NEW: $single_line_tol=0 fixes b1212; and b1180-1181 work ok now
+ # =1 for -vmll and -lp; fixes b1452, b1453, b1454
+ # NOTE: the combination -vmll and -lp can be unstable, especially when
+ # also combined with -wn. It may eventually be necessary to turn off -vmll
+ # if -lp is set. For now, this works. The value '1' is a minimum which
+ # works but can be increased if necessary.
+ my $single_line_tol =
+ $rOpts_variable_maximum_line_length && $rOpts_line_up_parentheses
+ ? 1
+ : 0;
my $multiline_tol = $single_line_tol + 1 +
max( $rOpts_indent_columns, $rOpts_continuation_indentation );
# Main loop over nested pairs...
# We are working from outermost to innermost pairs so that
# level changes will be complete when we arrive at the inner pairs.
- while ( my $item = pop( @{$rnested_pairs} ) ) {
+ while ( @{$rnested_pairs} ) {
+ my $item = pop @{$rnested_pairs};
my ( $inner_seqno, $outer_seqno ) = @{$item};
my $Kouter_opening = $K_opening_container->{$outer_seqno};
my $inner_level = $inner_opening->[_LEVEL_];
if ( $inner_level >= $high_stress_level ) { next }
+ # extra tolerance added under high stress to fix b1481
+ my $stress_tol = ( $high_stress_level - $inner_level <= 1 ) ? 1 : 0;
+
# Set flag saying if this pair starts a new weld
my $starting_new_weld = !( @welds && $outer_seqno == $welds[-1]->[0] );
foreach my $iline ( $iline_oo + 1 .. $iline_io ) {
my $rK_range = $rlines->[$iline]->{_rK_range};
next unless defined($rK_range);
- my ( $Kfirst, $Klast ) = @{$rK_range};
+ my ( $Kfirst, $Klast_uu ) = @{$rK_range};
next unless defined($Kfirst);
if ( $rLL->[$Kfirst]->[_TYPE_] eq '->' ) {
$do_not_weld_rule = 7;
{
if (DEBUG_WELD) {
$Msg .= "RULE 0: Not welding due to sheared inner parens\n";
- print $Msg;
+ print {*STDOUT} $Msg;
}
next;
}
# Remember the line we are using as a reference
$iline_outer_opening = $iline_oo;
$weld_count_this_start = 0;
+ $weld_starts_in_block = 0;
( my $new_weld_ok, $maximum_text_length, $starting_lentot, my $msg )
= $self->setup_new_weld_measurements( $Kouter_opening,
|| $iline_ic != $iline_oc )
)
{
- if (DEBUG_WELD) { print $msg}
+ if (DEBUG_WELD) { print {*STDOUT} $msg }
next;
}
# Then do not weld if no other containers between inner
# opening and closing.
- my $Knext_seq_item = $inner_opening->[_KNEXT_SEQ_ITEM_];
+ my $Knext_seq_item = $rK_next_seqno_by_K->[$Kinner_opening];
if ( $Knext_seq_item == $Kinner_closing ) {
$do_not_weld_rule = 1;
}
$is_one_line_weld || $is_multiline_weld
? $single_line_tol
: $multiline_tol;
+ $tol += $stress_tol;
# By how many characters does this exceed the text window?
my $excess =
if (DEBUG_WELD) {
$Msg .= "Not welding due to RULE $do_not_weld_rule\n";
- print $Msg;
+ print {*STDOUT} $Msg;
}
# Normally, a broken pair should not decrease indentation of
$weld_count_this_start++;
if (DEBUG_WELD) {
$Msg .= "Starting new weld\n";
- print $Msg;
+ print {*STDOUT} $Msg;
}
push @welds, $item;
+ my $parent_seqno = $self->parent_seqno_by_K($Kouter_closing);
+ $weld_starts_in_block = $parent_seqno == SEQ_ROOT
+ || $rblock_type_of_seqno->{$parent_seqno};
+
$rK_weld_right->{$Kouter_opening} = $Kinner_opening;
$rK_weld_left->{$Kinner_opening} = $Kouter_opening;
$weld_count_this_start++;
if (DEBUG_WELD) {
$Msg .= "Extending current weld\n";
- print $Msg;
+ print {*STDOUT} $Msg;
}
unshift @{ $welds[-1] }, $inner_seqno;
$rK_weld_right->{$Kouter_opening} = $Kinner_opening;
$rLL->[$Kinner_opening]->[_CI_LEVEL_] =
$rLL->[$Kouter_opening]->[_CI_LEVEL_];
- # But do not copy the closing ci level ... it can give poor results
- ## $rLL->[$Kinner_closing]->[_CI_LEVEL_] =
- ## $rLL->[$Kouter_closing]->[_CI_LEVEL_];
+ # But only copy the closing ci level if the outer container is
+ # in a block; otherwise poor results can be produced.
+ if ($weld_starts_in_block) {
+ $rLL->[$Kinner_closing]->[_CI_LEVEL_] =
+ $rLL->[$Kouter_closing]->[_CI_LEVEL_];
+ }
}
- }
+ } ## end while ( @{$rnested_pairs})
return;
} ## end sub weld_nested_containers
sub weld_nested_quotes {
+ my $self = shift;
+
# Called once per file for option '--weld-nested-containers'. This
# does welding on qw quotes.
- my $self = shift;
-
# See if quotes are excluded from welding
my $rflags = $weld_nested_exclusion_rules{'q'};
return if ( defined($rflags) && defined( $rflags->[1] ) );
- my $rK_weld_left = $self->[_rK_weld_left_];
- my $rK_weld_right = $self->[_rK_weld_right_];
-
my $rLL = $self->[_rLL_];
return unless ( defined($rLL) && @{$rLL} );
my $Num = @{$rLL};
+ my $rK_weld_left = $self->[_rK_weld_left_];
+ my $rK_weld_right = $self->[_rK_weld_right_];
my $K_opening_container = $self->[_K_opening_container_];
my $K_closing_container = $self->[_K_closing_container_];
my $rlines = $self->[_rlines_];
return if ( $test_type ne $quote_type );
}
return 1;
- };
+ }; ## end $is_single_quote = sub
# Length tolerance - same as previously used for sub weld_nested
my $multiline_tol =
1 + max( $rOpts_indent_columns, $rOpts_continuation_indentation );
# look for single qw quotes nested in containers
- my $KNEXT = $self->[_K_first_seq_item_];
- while ( defined($KNEXT) ) {
- my $KK = $KNEXT;
- $KNEXT = $rLL->[$KNEXT]->[_KNEXT_SEQ_ITEM_];
- my $rtoken_vars = $rLL->[$KK];
- my $outer_seqno = $rtoken_vars->[_TYPE_SEQUENCE_];
- if ( !$outer_seqno ) {
- next if ( $KK == 0 ); # first token in file may not be container
+ foreach my $outer_seqno ( keys %{$K_opening_container} ) {
+ my $Kouter_opening = $K_opening_container->{$outer_seqno};
- # A fault here implies that an error was made in the little loop at
- # the bottom of sub 'respace_tokens' which set the values of
- # _KNEXT_SEQ_ITEM_. Or an error has been introduced in the
- # loop control lines above.
- Fault("sequence = $outer_seqno not defined at K=$KK")
- if (DEVEL_MODE);
- next;
+ # see if the next token is a quote of some type
+ my $Kn = $Kouter_opening + 1;
+ next if ( $Kn >= $Num - 1 );
+ my $next_type = $rLL->[$Kn]->[_TYPE_];
+ if ( $next_type eq 'b' ) {
+ $next_type = $rLL->[ ++$Kn ]->[_TYPE_];
}
- my $token = $rtoken_vars->[_TOKEN_];
- if ( $is_opening_token{$token} ) {
+ next if ( $next_type ne 'q' && $next_type ne 'Q' );
+ my $next_token = $rLL->[$Kn]->[_TOKEN_];
+ next if ( substr( $next_token, 0, 1 ) ne 'q' );
- # see if the next token is a quote of some type
- my $Kn = $KK + 1;
- $Kn += 1
- if ( $Kn < $Num && $rLL->[$Kn]->[_TYPE_] eq 'b' );
- next unless ( $Kn < $Num );
+ # The token before the closing container must also be a quote
+ my $Kouter_closing = $K_closing_container->{$outer_seqno};
+ my $Kinner_closing = $self->K_previous_nonblank($Kouter_closing);
+ next unless ( $rLL->[$Kinner_closing]->[_TYPE_] eq $next_type );
- my $next_token = $rLL->[$Kn]->[_TOKEN_];
- my $next_type = $rLL->[$Kn]->[_TYPE_];
- next
- unless ( ( $next_type eq 'q' || $next_type eq 'Q' )
- && substr( $next_token, 0, 1 ) eq 'q' );
-
- # The token before the closing container must also be a quote
- my $Kouter_closing = $K_closing_container->{$outer_seqno};
- my $Kinner_closing = $self->K_previous_nonblank($Kouter_closing);
- next unless $rLL->[$Kinner_closing]->[_TYPE_] eq $next_type;
-
- # This is an inner opening container
- my $Kinner_opening = $Kn;
-
- # Do not weld to single-line quotes. Nothing is gained, and it may
- # look bad.
- next if ( $Kinner_closing == $Kinner_opening );
-
- # Only weld to quotes delimited with container tokens. This is
- # because welding to arbitrary quote delimiters can produce code
- # which is less readable than without welding.
- my $closing_delimiter =
- substr( $rLL->[$Kinner_closing]->[_TOKEN_], -1, 1 );
- next
- unless ( $is_closing_token{$closing_delimiter}
- || $closing_delimiter eq '>' );
+ # This is an inner opening container
+ my $Kinner_opening = $Kn;
- # Now make sure that there is just a single quote in the container
- next
- unless (
- $is_single_quote->(
- $Kinner_opening + 1,
- $Kinner_closing - 1,
- $next_type
- )
- );
+ # Do not weld to single-line quotes. Nothing is gained, and it may
+ # look bad.
+ next if ( $Kinner_closing == $Kinner_opening );
- # OK: This is a candidate for welding
- my $Msg = EMPTY_STRING;
- my $do_not_weld;
-
- my $Kouter_opening = $K_opening_container->{$outer_seqno};
- my $iline_oo = $rLL->[$Kouter_opening]->[_LINE_INDEX_];
- my $iline_io = $rLL->[$Kinner_opening]->[_LINE_INDEX_];
- my $iline_oc = $rLL->[$Kouter_closing]->[_LINE_INDEX_];
- my $iline_ic = $rLL->[$Kinner_closing]->[_LINE_INDEX_];
- my $is_old_weld =
- ( $iline_oo == $iline_io && $iline_ic == $iline_oc );
-
- # Fix for case b1189. If quote is marked as type 'Q' then only weld
- # if the two closing tokens are on the same input line. Otherwise,
- # the closing line will be output earlier in the pipeline than
- # other CODE lines and welding will not actually occur. This will
- # leave a half-welded structure with potential formatting
- # instability. This might be fixed by adding a check for a weld on
- # a closing Q token and sending it down the normal channel, but it
- # would complicate the code and is potentially risky.
- next
- if (!$is_old_weld
- && $next_type eq 'Q'
- && $iline_ic != $iline_oc );
+ # RULE: Avoid welding under stress. This is an alternate b1502 fix.
+ my $inner_level = $rLL->[$Kinner_opening]->[_LEVEL_];
+ if ( $inner_level >= $high_stress_level ) { next }
- # If welded, the line must not exceed allowed line length
- ( my $ok_to_weld, $maximum_text_length, $starting_lentot, my $msg )
- = $self->setup_new_weld_measurements( $Kouter_opening,
- $Kinner_opening );
- if ( !$ok_to_weld ) {
- if (DEBUG_WELD) { print $msg}
- next;
- }
+ # Only weld to quotes delimited with container tokens. This is
+ # because welding to arbitrary quote delimiters can produce code
+ # which is less readable than without welding.
+ my $closing_delimiter =
+ substr( $rLL->[$Kinner_closing]->[_TOKEN_], -1, 1 );
+ next
+ unless ( $is_closing_token{$closing_delimiter}
+ || $closing_delimiter eq '>' );
- my $length =
- $rLL->[$Kinner_opening]->[_CUMULATIVE_LENGTH_] - $starting_lentot;
- my $excess = $length + $multiline_tol - $maximum_text_length;
+ # Now make sure that there is just a single quote in the container
+ next
+ unless (
+ $is_single_quote->(
+ $Kinner_opening + 1,
+ $Kinner_closing - 1,
+ $next_type
+ )
+ );
- my $excess_max = ( $is_old_weld ? $multiline_tol : 0 );
- if ( $excess >= $excess_max ) {
- $do_not_weld = 1;
- }
+ # OK: This is a candidate for welding
+ my $Msg = EMPTY_STRING;
+ my $do_not_weld;
+
+ my $iline_oo = $rLL->[$Kouter_opening]->[_LINE_INDEX_];
+ my $iline_io = $rLL->[$Kinner_opening]->[_LINE_INDEX_];
+ my $iline_oc = $rLL->[$Kouter_closing]->[_LINE_INDEX_];
+ my $iline_ic = $rLL->[$Kinner_closing]->[_LINE_INDEX_];
+ my $is_old_weld = ( $iline_oo == $iline_io && $iline_ic == $iline_oc );
+
+ # Fix for case b1189. If quote is marked as type 'Q' then only weld
+ # if the two closing tokens are on the same input line. Otherwise,
+ # the closing line will be output earlier in the pipeline than
+ # other CODE lines and welding will not actually occur. This will
+ # leave a half-welded structure with potential formatting
+ # instability. This might be fixed by adding a check for a weld on
+ # a closing Q token and sending it down the normal channel, but it
+ # would complicate the code and is potentially risky.
+ next
+ if (!$is_old_weld
+ && $next_type eq 'Q'
+ && $iline_ic != $iline_oc );
+
+ # If welded, the line must not exceed allowed line length
+ ( my $ok_to_weld, $maximum_text_length, $starting_lentot, my $msg ) =
+ $self->setup_new_weld_measurements( $Kouter_opening,
+ $Kinner_opening );
+ if ( !$ok_to_weld ) {
+ if (DEBUG_WELD) { print {*STDOUT} $msg }
+ next;
+ }
- if (DEBUG_WELD) {
- if ( !$is_old_weld ) { $is_old_weld = EMPTY_STRING }
- $Msg .=
+ my $length =
+ $rLL->[$Kinner_opening]->[_CUMULATIVE_LENGTH_] - $starting_lentot;
+ my $excess = $length + $multiline_tol - $maximum_text_length;
+
+ my $excess_max = ( $is_old_weld ? $multiline_tol : 0 );
+ if ( $excess >= $excess_max ) {
+ $do_not_weld = 1;
+ }
+
+ if (DEBUG_WELD) {
+ if ( !$is_old_weld ) { $is_old_weld = EMPTY_STRING }
+ $Msg .=
"excess=$excess>=$excess_max, multiline_tol=$multiline_tol, is_old_weld='$is_old_weld'\n";
- }
+ }
- # Check weld exclusion rules for outer container
- if ( !$do_not_weld ) {
- my $is_leading = !defined( $rK_weld_left->{$Kouter_opening} );
- if ( $self->is_excluded_weld( $KK, $is_leading ) ) {
- if (DEBUG_WELD) {
- $Msg .=
+ # Check weld exclusion rules for outer container
+ if ( !$do_not_weld ) {
+ my $is_leading = !defined( $rK_weld_left->{$Kouter_opening} );
+ if ( $self->is_excluded_weld( $Kouter_opening, $is_leading ) ) {
+ if (DEBUG_WELD) {
+ $Msg .=
"No qw weld due to weld exclusion rules for outer container\n";
- }
- $do_not_weld = 1;
}
+ $do_not_weld = 1;
}
+ }
- # Check the length of the last line (fixes case b1039)
- if ( !$do_not_weld ) {
- my $rK_range_ic = $rlines->[$iline_ic]->{_rK_range};
- my ( $Kfirst_ic, $Klast_ic ) = @{$rK_range_ic};
- my $excess_ic =
- $self->excess_line_length_for_Krange( $Kfirst_ic,
- $Kouter_closing );
-
- # Allow extra space for additional welded closing container(s)
- # and a space and comma or semicolon.
- # NOTE: weld len has not been computed yet. Use 2 spaces
- # for now, correct for a single weld. This estimate could
- # be made more accurate if necessary.
- my $weld_len =
- defined( $rK_weld_right->{$Kouter_closing} ) ? 2 : 0;
- if ( $excess_ic + $weld_len + 2 > 0 ) {
- if (DEBUG_WELD) {
- $Msg .=
-"No qw weld due to excess ending line length=$excess_ic + $weld_len + 2 > 0\n";
- }
- $do_not_weld = 1;
- }
- }
+ # Check the length of the last line (fixes case b1039)
+ if ( !$do_not_weld ) {
+ my $rK_range_ic = $rlines->[$iline_ic]->{_rK_range};
+ my ( $Kfirst_ic, $Klast_ic_uu ) = @{$rK_range_ic};
+ my $excess_ic =
+ $self->excess_line_length_for_Krange( $Kfirst_ic,
+ $Kouter_closing );
- if ($do_not_weld) {
+ # Allow extra space for additional welded closing container(s)
+ # and a space and comma or semicolon.
+ # NOTE: weld len has not been computed yet. Use 2 spaces
+ # for now, correct for a single weld. This estimate could
+ # be made more accurate if necessary.
+ my $weld_len = defined( $rK_weld_right->{$Kouter_closing} ) ? 2 : 0;
+ if ( $excess_ic + $weld_len + 2 > 0 ) {
if (DEBUG_WELD) {
- $Msg .= "Not Welding QW\n";
- print $Msg;
+ $Msg .=
+"No qw weld due to excess ending line length=$excess_ic + $weld_len + 2 > 0\n";
}
- next;
+ $do_not_weld = 1;
}
+ }
- # OK to weld
+ if ($do_not_weld) {
if (DEBUG_WELD) {
- $Msg .= "Welding QW\n";
- print $Msg;
+ $Msg .= "Not Welding QW\n";
+ print {*STDOUT} $Msg;
}
+ next;
+ }
- $rK_weld_right->{$Kouter_opening} = $Kinner_opening;
- $rK_weld_left->{$Kinner_opening} = $Kouter_opening;
+ # OK to weld
+ if (DEBUG_WELD) {
+ $Msg .= "Welding QW\n";
+ print {*STDOUT} $Msg;
+ }
- $rK_weld_right->{$Kinner_closing} = $Kouter_closing;
- $rK_weld_left->{$Kouter_closing} = $Kinner_closing;
+ $rK_weld_right->{$Kouter_opening} = $Kinner_opening;
+ $rK_weld_left->{$Kinner_opening} = $Kouter_opening;
- # Undo one indentation level if an extra level was added to this
- # multiline quote
- my $qw_seqno =
- $self->[_rstarting_multiline_qw_seqno_by_K_]->{$Kinner_opening};
- if ( $qw_seqno
- && $self->[_rmultiline_qw_has_extra_level_]->{$qw_seqno} )
- {
- foreach my $K ( $Kinner_opening + 1 .. $Kinner_closing - 1 ) {
- $rLL->[$K]->[_LEVEL_] -= 1;
- }
- $rLL->[$Kinner_opening]->[_CI_LEVEL_] = 0;
- $rLL->[$Kinner_closing]->[_CI_LEVEL_] = 0;
+ $rK_weld_right->{$Kinner_closing} = $Kouter_closing;
+ $rK_weld_left->{$Kouter_closing} = $Kinner_closing;
+
+ # Undo one indentation level if an extra level was added to this
+ # multiline quote
+ my $qw_seqno =
+ $self->[_rstarting_multiline_qw_seqno_by_K_]->{$Kinner_opening};
+ if ( $qw_seqno
+ && $self->[_rmultiline_qw_has_extra_level_]->{$qw_seqno} )
+ {
+ foreach my $K ( $Kinner_opening + 1 .. $Kinner_closing - 1 ) {
+ $rLL->[$K]->[_LEVEL_] -= 1;
}
+ $rLL->[$Kinner_opening]->[_CI_LEVEL_] = 0;
+ $rLL->[$Kinner_closing]->[_CI_LEVEL_] = 0;
+ }
- # undo CI for other welded quotes
- else {
+ # undo CI for other welded quotes
+ else {
- foreach my $K ( $Kinner_opening .. $Kinner_closing ) {
- $rLL->[$K]->[_CI_LEVEL_] = 0;
- }
+ foreach my $K ( $Kinner_opening .. $Kinner_closing ) {
+ $rLL->[$K]->[_CI_LEVEL_] = 0;
}
+ }
- # Change the level of a closing qw token to be that of the outer
- # containing token. This will allow -lp indentation to function
- # correctly in the vertical aligner.
- # Patch to fix c002: but not if it contains text
- if ( length( $rLL->[$Kinner_closing]->[_TOKEN_] ) == 1 ) {
- $rLL->[$Kinner_closing]->[_LEVEL_] =
- $rLL->[$Kouter_closing]->[_LEVEL_];
- }
+ # Change the level of a closing qw token to be that of the outer
+ # containing token. This will allow -lp indentation to function
+ # correctly in the vertical aligner.
+ # Patch to fix c002: but not if it contains text
+ if ( length( $rLL->[$Kinner_closing]->[_TOKEN_] ) == 1 ) {
+ $rLL->[$Kinner_closing]->[_LEVEL_] =
+ $rLL->[$Kouter_closing]->[_LEVEL_];
}
}
return;
my ( $self, $seqno ) = @_;
- # given a sequence number:
- # return true if it is welded either left or right
- # return false otherwise
+ # Given:
+ # $seqno = a sequence number:
+ # Return:
+ # true if it is welded either left or right
+ # false otherwise
return unless ( $total_weld_count && defined($seqno) );
my $KK_o = $self->[_K_opening_container_]->{$seqno};
return unless defined($KK_o);
sub mark_short_nested_blocks {
+ my $self = shift;
+
# This routine looks at the entire file and marks any short nested blocks
# which should not be broken. The results are stored in the hash
# $rshort_nested->{$type_sequence}
# The flag which is set here will be checked in two places:
# 'sub process_line_of_CODE' and 'sub starting_one_line_block'
- my $self = shift;
return if $rOpts->{'indent-only'};
my $rLL = $self->[_rLL_];
return unless ( $rOpts->{'one-line-block-nesting'} );
- my $K_opening_container = $self->[_K_opening_container_];
- my $K_closing_container = $self->[_K_closing_container_];
- my $rbreak_container = $self->[_rbreak_container_];
- my $ris_broken_container = $self->[_ris_broken_container_];
- my $rshort_nested = $self->[_rshort_nested_];
- my $rblock_type_of_seqno = $self->[_rblock_type_of_seqno_];
+ my $K_opening_container = $self->[_K_opening_container_];
+ my $K_closing_container = $self->[_K_closing_container_];
+ my $rbreak_container = $self->[_rbreak_container_];
+ my $ris_broken_container = $self->[_ris_broken_container_];
+ my $rshort_nested = $self->[_rshort_nested_];
+ my $rblock_type_of_seqno = $self->[_rblock_type_of_seqno_];
+ my $rK_sequenced_token_list = $self->[_rK_sequenced_token_list_];
# Variables needed for estimating line lengths
my $maximum_text_length;
my $length = $self->cumulative_length_before_K($K) - $starting_lentot;
my $excess_length = $length + $length_tol - $maximum_text_length;
return ($excess_length);
- };
+ }; ## end $excess_length_to_K = sub
# loop over all containers
my @open_block_stack;
my $iline = -1;
- my $KNEXT = $self->[_K_first_seq_item_];
- while ( defined($KNEXT) ) {
- my $KK = $KNEXT;
- $KNEXT = $rLL->[$KNEXT]->[_KNEXT_SEQ_ITEM_];
+ foreach my $KK ( @{$rK_sequenced_token_list} ) {
my $rtoken_vars = $rLL->[$KK];
my $type_sequence = $rtoken_vars->[_TYPE_SEQUENCE_];
if ( !$type_sequence ) {
# A fault here implies that an error was made in the little loop at
# the bottom of sub 'respace_tokens' which set the values of
- # _KNEXT_SEQ_ITEM_. Or an error has been introduced in the
+ # $rK_sequenced_token_list. Or an error has been introduced in the
# loop control lines above.
Fault("sequence = $type_sequence not defined at K=$KK")
if (DEVEL_MODE);
# We are only marking nested code blocks,
# so check for a previous block on the stack
- next unless ( @open_block_stack > 1 );
+ next if ( @open_block_stack <= 1 );
# Looks OK, mark this as a short nested block
$rshort_nested->{$type_sequence} = 1;
my ($self) = @_;
- # Called once per file to do special indentation adjustments.
- # These routines adjust levels either by changing _CI_LEVEL_ directly or
+ # Called once per file to define the levels to be used for computing
+ # actual indentation. These levels are initialized to be the structural
+ # levels and then are adjusted if necessary for special purposes.
+ # The adjustments are made either by changing _CI_LEVEL_ directly or
# by setting modified levels in the array $self->[_radjusted_levels_].
- # Initialize the adjusted levels. These will be the levels actually used
- # for computing indentation.
-
# NOTE: This routine is called after the weld routines, which may have
- # already adjusted _LEVEL_, so we are making adjustments on top of those
- # levels. It would be much nicer to have the weld routines also use this
- # adjustment, but that gets complicated when we combine -gnu -wn and have
- # some welded quotes.
- my $Klimit = $self->[_Klimit_];
- my $rLL = $self->[_rLL_];
- my $radjusted_levels = $self->[_radjusted_levels_];
+ # already adjusted the initial values of _LEVEL_, so we are making
+ # adjustments on top of those levels. It would be nicer to have the
+ # weld routines also use this adjustment, but that gets complicated
+ # when we combine -gnu -wn and also have some welded quotes.
+ my $rLL = $self->[_rLL_];
+ return unless ( @{$rLL} );
- return unless ( defined($Klimit) );
+ # Initialize the adjusted levels to be the structural levels
+ my @adjusted_levels = map { $_->[_LEVEL_] } @{$rLL};
+ $self->[_radjusted_levels_] = \@adjusted_levels;
- foreach my $KK ( 0 .. $Klimit ) {
- $radjusted_levels->[$KK] = $rLL->[$KK]->[_LEVEL_];
- }
+ my $min_starting_level = min(@adjusted_levels);
# First set adjusted levels for any non-indenting braces.
$self->do_non_indenting_braces();
$self->extended_ci()
if ($rOpts_extended_continuation_indentation);
- # Now clip any adjusted levels to be non-negative
- $self->clip_adjusted_levels();
+ # Now clip any starting or adjusted levels to be non-negative
+ $self->clip_adjusted_levels($min_starting_level);
return;
} ## end sub special_indentation_adjustments
sub clip_adjusted_levels {
+ my ( $self, $min_starting_level ) = @_;
+
# Replace any negative adjusted levels with zero.
- # Negative levels can occur in files with brace errors.
- my ($self) = @_;
+ # Negative levels can only occur in files with brace errors.
+ # Given:
+ # $min_starting_level = minimum (adjusted) level of the input stream
+
+ # Clip the original _LEVEL_ values to zero if necessary
+ my $rLL = $self->[_rLL_];
+ if ( $min_starting_level < 0 ) {
+ foreach my $item ( @{$rLL} ) {
+ if ( $item->[_LEVEL_] < 0 ) { $item->[_LEVEL_] = 0 }
+ }
+ }
+
+ # Clip the adjusted levels to zero if necessary
my $radjusted_levels = $self->[_radjusted_levels_];
- return unless defined($radjusted_levels) && @{$radjusted_levels};
+ return unless ( defined($radjusted_levels) && @{$radjusted_levels} );
my $min = min( @{$radjusted_levels} ); # fast check for min
if ( $min < 0 ) {
# slow loop, but rarely needed
foreach ( @{$radjusted_levels} ) { $_ = 0 if ( $_ < 0 ) }
}
+
return;
} ## end sub clip_adjusted_levels
sub do_non_indenting_braces {
+ my ($self) = @_;
+
# Called once per file to handle the --non-indenting-braces parameter.
# Remove indentation within marked braces if requested
- my ($self) = @_;
# Any non-indenting braces have been found by sub find_non_indenting_braces
# and are defined by the following hash:
my $KK = $K_opening_container->{$seqno};
my $line_of_tokens = $rlines->[$ix];
my $rK_range = $line_of_tokens->{_rK_range};
- my ( $Kfirst, $Klast ) = @{$rK_range};
+ my ( $Kfirst_uu, $Klast ) = @{$rK_range};
$rspecial_side_comment_type->{$Klast} = 'NIB';
push @K_stack, [ $KK, 1 ];
my $Kc = $K_closing_container->{$seqno};
if ( !@whitespace_level_stack ) {
push @whitespace_level_stack, $level_abs;
}
- elsif ( $level_abs > $whitespace_last_level ) {
- $level = $whitespace_level_stack[-1] +
- ( $level_abs - $whitespace_last_level );
+ else {
+ if ( $level_abs > $whitespace_last_level ) {
+ $level = $whitespace_level_stack[-1] +
+ ( $level_abs - $whitespace_last_level );
- if (
- # 1 Try to break at a block brace
- (
- $level > $rOpts_whitespace_cycle
- && $last_nonblank_type eq '{'
- && $last_nonblank_token eq '{'
- )
+ if (
+ # 1 Try to break at a block brace
+ (
+ $level > $rOpts_whitespace_cycle
+ && $last_nonblank_type eq '{'
+ && $last_nonblank_token eq '{'
+ )
- # 2 Then either a brace or bracket
- || ( $level > $rOpts_whitespace_cycle + 1
- && $last_nonblank_token =~ /^[\{\[]$/ )
+ # 2 Then either a brace or bracket
+ || ( $level > $rOpts_whitespace_cycle + 1
+ && $last_nonblank_token =~ /^[\{\[]$/ )
- # 3 Then a paren too
- || $level > $rOpts_whitespace_cycle + 2
- )
- {
- $level = 1;
+ # 3 Then a paren too
+ || $level > $rOpts_whitespace_cycle + 2
+ )
+ {
+ $level = 1;
+ }
+ push @whitespace_level_stack, $level;
}
- push @whitespace_level_stack, $level;
}
$level = $whitespace_level_stack[-1];
$radjusted_levels->[$KK] = $level;
my ($self) = @_;
- # This routine is called once per batch to implement parameters
+ # This routine is called once per batch to implement parameters:
# --break-before-hash-brace=n and similar -bbx=n flags
# and their associated indentation flags:
# --break-before-hash-brace-and-indent and similar -bbxi=n
$length_tol *= 2;
}
+ #-------------------------------------------------------
+ # These arrays are used to mark the affected containers:
+ #-------------------------------------------------------
my $rbreak_before_container_by_seqno = {};
my $rwant_reduced_ci = {};
+
+ #------------------------------
+ # Main loop over all containers
+ #------------------------------
foreach my $seqno ( keys %{$K_opening_container} ) {
#----------------------------------------------------------------
# decide if a list is complex. For now it is only applied when -lp
# and -vmll are used, but eventually it may become the standard method.
# Fixes b1274, b1275, and others, including b1099.
+ # Update: case b1469 also had this type of problem; it had the
+ # combination ci>i and used -xci. This is just a band-aid; eventually
+ # it might be best if all cases use this logic, but that would change
+ # existing formatting.
if ( $break_option == 2 ) {
+ my $b1469 = $rOpts_continuation_indentation > $rOpts_indent_columns
+ && $rOpts_extended_continuation_indentation;
+
if ( $rOpts_line_up_parentheses
- || $rOpts_variable_maximum_line_length )
+ || $rOpts_variable_maximum_line_length
+ || $b1469 )
{
# Start with the basic definition of a complex list...
my $token_p = defined($Kp) ? $rLL->[$Kp]->[_TOKEN_] : 'b';
if ( $is_opening_token{$token_p} && $token_p ne '(' ) {
- my $Kc = $K_closing_container->{$seqno};
- my $Km = $self->K_previous_nonblank($Kc);
- my $token_m =
- defined($Km) ? $rLL->[$Km]->[_TOKEN_] : 'b';
+ my $Kc = $K_closing_container->{$seqno};
+ my $Km = $self->K_previous_nonblank($Kc);
+ my $token_m = 'b';
+ my $type_m = SPACE;
+ if ( defined($Km) ) {
+ $token_m = $rLL->[$Km]->[_TOKEN_];
+ $type_m = $rLL->[$Km]->[_TYPE_];
+ }
# ignore any optional ending comma
- if ( $token_m eq ',' ) {
+ if ( $type_m eq ',' ) {
$Km = $self->K_previous_nonblank($Km);
$token_m =
defined($Km) ? $rLL->[$Km]->[_TOKEN_] : 'b';
&& $rlec_count_by_seqno->{$seqno};
DEBUG_BBX
- && print STDOUT
+ && print {*STDOUT}
"BBX: Looking at seqno=$seqno, token = $token with option=$break_option\n";
# -bbx=1 = stable, try to follow input
my $iline = $rLL->[$KK]->[_LINE_INDEX_];
my $rK_range = $rlines->[$iline]->{_rK_range};
- my ( $Kfirst, $Klast ) = @{$rK_range};
+ my ( $Kfirst, $Klast_uu ) = @{$rK_range};
next unless ( $KK == $Kfirst );
}
if ( !$ok_to_break ) {
DEBUG_BBX
- && print STDOUT "Not breaking at seqno=$seqno: $Msg\n";
+ && print {*STDOUT} "Not breaking at seqno=$seqno: $Msg\n";
next;
}
DEBUG_BBX
- && print STDOUT "OK to break at seqno=$seqno: $Msg\n";
+ && print {*STDOUT} "OK to break at seqno=$seqno: $Msg\n";
# Patch: turn off -xci if -bbx=2 and -lp
# This fixes cases b1090 b1095 b1101 b1116 b1118 b1121 b1122
# ok to break
}
- # Shouldn't happen! Bad flag, but make behavior same as 3
+ # Bad flag, this shouldn't happen because of the integer range checks.
+ # Continue using behavior same as option 3 if not in DEVEL_MODE
else {
- # ok to break
+ DEVEL_MODE && Fault(<<EOM);
+Bad -bbx break option=$break_option for '$token': fix integer range checks.
+EOM
}
# Set a flag for actual implementation later in
# sub insert_breaks_before_list_opening_containers
$rbreak_before_container_by_seqno->{$seqno} = 1;
DEBUG_BBX
- && print STDOUT "BBX: ok to break at seqno=$seqno\n";
+ && print {*STDOUT} "BBX: ok to break at seqno=$seqno\n";
# -bbxi=0: Nothing more to do if the ci value remains unchanged
my $ci_flag = $container_indentation_options{$token};
next unless ($rtype_count);
my $fat_comma_count = $rtype_count->{'=>'};
DEBUG_BBX
- && print STDOUT "BBX: fat comma count=$fat_comma_count\n";
+ && print {*STDOUT} "BBX: fat comma count=$fat_comma_count\n";
if ( $fat_comma_count && $fat_comma_count >= 2 ) { $OK = 1 }
}
$self->cumulative_length_before_K($KK);
my $excess_length = $length - $maximum_text_length;
DEBUG_BBX
- && print STDOUT
+ && print {*STDOUT}
"BBX: excess=$excess_length: maximum_text_length=$maximum_text_length, length=$length, ci=$ci\n";
# OK if the net container definitely breaks on length
if ( $excess_length > $length_tol ) {
$OK = 1;
DEBUG_BBX
- && print STDOUT "BBX: excess_length=$excess_length\n";
+ && print {*STDOUT} "BBX: excess_length=$excess_length\n";
}
# Otherwise skip it
# Part 3: Looks OK: apply -bbx=n and any related -bbxi=n flag
#------------------------------------------------------------
- DEBUG_BBX && print STDOUT "BBX: OK to break\n";
+ DEBUG_BBX && print {*STDOUT} "BBX: OK to break\n";
# -bbhbi=n
# -bbsbi=n
# n=0 default indentation (usually one ci)
# n=1 outdent one ci
# n=2 indent one level (minus one ci)
- # n=3 indent one extra ci [This may be dropped]
# NOTE: We are adjusting indentation of the opening container. The
# closing container will normally follow the indentation of the opening
# unknown option
else {
# Shouldn't happen - leave ci unchanged
+ DEVEL_MODE && Fault(<<EOM);
+unexpected ci flag '$ci_flag' for -bbpi -bbsbi -bbhbi: expecting one of 0 1 2
+EOM
}
$rLL->[$KK]->[_CI_LEVEL_] = $ci if ( $ci >= 0 );
}
+ #------------------
+ # Store the results
+ #------------------
$self->[_rbreak_before_container_by_seqno_] =
$rbreak_before_container_by_seqno;
$self->[_rwant_reduced_ci_] = $rwant_reduced_ci;
+
return;
} ## end sub break_before_list_opening_containers
sub extended_ci {
+ my ($self) = @_;
+
# This routine implements the -xci (--extended-continuation-indentation)
# flag. We add CI to interior tokens of a container which itself has CI but
# only if a token does not already have CI.
# The operations to remove unwanted CI are done in sub 'undo_ci'.
- my ($self) = @_;
-
my $rLL = $self->[_rLL_];
return unless ( defined($rLL) && @{$rLL} );
my %available_space;
# Loop over all opening container tokens
- my $K_opening_container = $self->[_K_opening_container_];
- my $K_closing_container = $self->[_K_closing_container_];
+ my $K_opening_container = $self->[_K_opening_container_];
+ my $K_closing_container = $self->[_K_closing_container_];
+ my $rK_sequenced_token_list = $self->[_rK_sequenced_token_list_];
my @seqno_stack;
my $seqno_top;
- my $KLAST;
- my $KNEXT = $self->[_K_first_seq_item_];
+ my $K_last;
# The following variable can be used to allow a little extra space to
# avoid blinkers. A value $len_tol = 20 fixed the following
# be used to minimize the chance of a blinker.
my $len_tol = 0;
- while ( defined($KNEXT) ) {
+ foreach my $KK ( @{$rK_sequenced_token_list} ) {
# Fix all tokens up to the next sequence item if we are changing CI
if ($seqno_top) {
my $is_list = $ris_list_by_seqno->{$seqno_top};
my $space = $available_space{$seqno_top};
my $count = 0;
- foreach my $Kt ( $KLAST + 1 .. $KNEXT - 1 ) {
+ foreach my $Kt ( $K_last + 1 .. $KK - 1 ) {
next if ( $rLL->[$Kt]->[_CI_LEVEL_] );
$ris_seqno_controlling_ci->{$seqno_top} += $count;
}
- $KLAST = $KNEXT;
- my $KK = $KNEXT;
- $KNEXT = $rLL->[$KNEXT]->[_KNEXT_SEQ_ITEM_];
+ $K_last = $KK;
my $seqno = $rLL->[$KK]->[_TYPE_SEQUENCE_];
}
# If this does not have ci, update ci if necessary and continue looking
- elsif ( !$rLL->[$KK]->[_CI_LEVEL_] ) {
- if ($seqno_top) {
- $rLL->[$KK]->[_CI_LEVEL_] = 1;
- $rseqno_controlling_my_ci->{$KK} = $seqno_top;
- $ris_seqno_controlling_ci->{$seqno_top}++;
+ else {
+ if ( !$rLL->[$KK]->[_CI_LEVEL_] ) {
+ if ($seqno_top) {
+ $rLL->[$KK]->[_CI_LEVEL_] = 1;
+ $rseqno_controlling_my_ci->{$KK} = $seqno_top;
+ $ris_seqno_controlling_ci->{$seqno_top}++;
+ }
+ next;
}
- next;
}
# We are looking for opening container tokens with ci
# shouldn't happen
if ( $type ne 'q' ) {
- DEVEL_MODE && print STDERR <<EOM;
+ DEVEL_MODE && print {*STDERR} <<EOM;
STRANGE: started multiline qw at K=$K_start_multiline_qw but didn't see q qw at K=$Kfirst\n";
EOM
$K_start_multiline_qw = undef;
# works well but is currently only activated when the -xci flag is set.
# The reason is to avoid unexpected changes in formatting.
if ($rOpts_extended_continuation_indentation) {
- while ( my ( $qw_seqno_x, $rKrange ) =
- each %{$rKrange_multiline_qw_by_seqno} )
- {
+ foreach my $qw_seqno_x ( keys %{$rKrange_multiline_qw_by_seqno} ) {
+ my $rKrange = $rKrange_multiline_qw_by_seqno->{$qw_seqno_x};
my ( $Kbeg, $Kend ) = @{$rKrange};
# require isolated closing token
- my $token_end = $rLL->[$Kend]->[_TOKEN_];
- next
- unless ( length($token_end) == 1
- && ( $is_closing_token{$token_end} || $token_end eq '>' ) );
+ my $token_end = $rLL->[$Kend]->[_TOKEN_];
+ my $is_isolated_closing = length($token_end) == 1
+ && ( $is_closing_token{$token_end} || $token_end eq '>' );
+ next unless ($is_isolated_closing);
# require isolated opening token
my $token_beg = $rLL->[$Kbeg]->[_TOKEN_];
# multiline quotes
if ( $rOpts_line_up_parentheses && !$rOpts_extended_line_up_parentheses ) {
- while ( my ( $qw_seqno_x, $rKrange ) =
- each %{$rKrange_multiline_qw_by_seqno} )
- {
+ foreach my $qw_seqno_x ( keys %{$rKrange_multiline_qw_by_seqno} ) {
+ my $rKrange = $rKrange_multiline_qw_by_seqno->{$qw_seqno_x};
my ( $Kbeg, $Kend ) = @{$rKrange};
my $parent_seqno = $self->parent_seqno_by_K($Kend);
next unless ($parent_seqno);
unless ($is_tightly_contained);
# continue up the tree marking parent containers
- while (1) {
- $parent_seqno = $self->[_rparent_of_seqno_]->{$parent_seqno};
- last
- unless ( defined($parent_seqno)
- && $parent_seqno ne SEQ_ROOT );
- $ris_excluded_lp_container->{$parent_seqno} = 1;
- }
+ $self->mark_parent_containers( $parent_seqno,
+ $ris_excluded_lp_container );
}
}
} ## end BEGIN
sub is_fragile_block_type {
+
my ( $self, $block_type, $seqno ) = @_;
# Given:
} ## end sub xlp_collapsed_lengths_initialize
sub cumulative_length_to_comma {
+
my ( $self, $KK, $K_comma, $K_closing ) = @_;
# Given:
my $rlines = $self->[_rlines_];
my $rcollapsed_length_by_seqno = $self->[_rcollapsed_length_by_seqno_];
my $rtype_count_by_seqno = $self->[_rtype_count_by_seqno_];
+ my $rK_next_seqno_by_K = $self->[_rK_next_seqno_by_K_];
my $K_start_multiline_qw;
my $level_start_multiline_qw = 0;
# Second rule: otherwise, look for an extra indentation
# level from the start and add one indentation level if
# found.
- elsif ( $level > $level_start_multiline_qw ) {
- $len += $rOpts_indent_columns;
+ else {
+ if ( $level > $level_start_multiline_qw ) {
+ $len += $rOpts_indent_columns;
+ }
}
if ( $len > $max_prong_len ) { $max_prong_len = $len }
}
}
+ # If starting in quote type Q we have no control over indentation
+ # so just ignore the length of this token (see git #138)
+ elsif ( $rLL->[$K_first]->[_TYPE_] eq 'Q' ) {
+ if ( $line_of_tokens->{_starting_in_quote} ) {
+ $K_begin_loop = $K_first + 1;
+ next if ( $K_begin_loop > $K_last );
+ }
+ }
+ else {
+ }
+
$K_start_multiline_qw = undef;
# Find the terminal token, before any side comment
&& !$has_comment )
{
my $seqno_end = $rLL->[$K_terminal]->[_TYPE_SEQUENCE_];
- my $Kc_test = $rLL->[$K_terminal]->[_KNEXT_SEQ_ITEM_];
+ my $Kc_test = $rK_next_seqno_by_K->[$K_terminal];
# We are looking for a short broken remnant on the next
# line; something like the third line here (b1408):
my ( $self, $iline, $K_begin_loop, $K_terminal, $K_last ) = @_;
+ # Loop over tokens on a line for sub xlp_collapse_lengths
+
+ # Given:
+ # $iline = line number in input stream
+ # ($K_begin_loop, $K_terminal) = token index range to scan
+ # $K_last = last token index on this line
+
my $rLL = $self->[_rLL_];
my $K_closing_container = $self->[_K_closing_container_];
#----------------------------------
# Loop over tokens on this line ...
#----------------------------------
+ my $type;
foreach my $KK ( $K_begin_loop .. $K_terminal ) {
- my $type = $rLL->[$KK]->[_TYPE_];
- next if ( $type eq 'b' );
+ next if ( ( $type = $rLL->[$KK]->[_TYPE_] ) eq 'b' );
#------------------------
# Handle sequenced tokens
$handle_len = $rOpts_indent_columns;
}
- elsif ( $is_handle_type{$last_nonblank_type} ) {
- $handle_len = $len;
- $handle_len += 1
- if ( $KK > 0 && $rLL->[ $KK - 1 ]->[_TYPE_] eq 'b' );
+ else {
+ if ( $is_handle_type{$last_nonblank_type} ) {
+ $handle_len = $len;
+ $handle_len += 1
+ if ( $KK > 0
+ && $rLL->[ $KK - 1 ]->[_TYPE_] eq 'b' );
+ }
}
# Set a flag if the 'Interrupted List Rule' will be applied
elsif ( $collapsed_len < MIN_BLOCK_LEN ) {
$collapsed_len = MIN_BLOCK_LEN;
}
+ else {
+ # none of these rules applies
+ }
}
# Store the result. Some extra space, '2', allows for
}
}
- # it is a ternary - no special processing for these yet
+ # it is a ternary or input file is unbalanced
else {
}
# Count lengths of things like 'xx => yy' as a single item
if ( $type eq '=>' ) {
$len += $token_length + 1;
+
+ # fix $len for -naws, issue b1457
+ if ( !$rOpts_add_whitespace ) {
+ if ( defined( $rLL->[ $KK + 1 ] )
+ && $rLL->[ $KK + 1 ]->[_TYPE_] ne 'b' )
+ {
+ $len -= 1;
+ }
+ }
+
if ( $len > $max_prong_len ) { $max_prong_len = $len }
}
elsif ( $last_nonblank_type eq '=>' ) {
sub is_excluded_lp {
- # Decide if this container is excluded by user request:
- # returns true if this token is excluded (i.e., may not use -lp)
- # returns false otherwise
+ my ( $self, $KK ) = @_;
- # The control hash can either describe:
- # what to exclude: $line_up_parentheses_control_is_lxpl = 1, or
- # what to include: $line_up_parentheses_control_is_lxpl = 0
+ # Decide if this container is excluded by user request
- # Input parameter:
+ # Given:
# $KK = index of the container opening token
+ # Return:
+ # true if this token is excluded (i.e., may not use -lp)
+ # false otherwise
+
+ # The control hash can either describe:
+ # what to exclude: $line_up_parentheses_control_is_lpxl = 1, or
+ # what to include: $line_up_parentheses_control_is_lpxl = 0
- my ( $self, $KK ) = @_;
my $rLL = $self->[_rLL_];
my $rtoken_vars = $rLL->[$KK];
my $token = $rtoken_vars->[_TOKEN_];
if ( !defined($rflags) ) {
# There is no entry for this container, so we are done
- return !$line_up_parentheses_control_is_lxpl;
+ return !$line_up_parentheses_control_is_lpxl;
}
my ( $flag1, $flag2 ) = @{$rflags};
elsif ( $flag1 eq 'F' ) { $match_flag1 = !$is_f }
elsif ( $flag1 eq 'w' ) { $match_flag1 = $is_w }
elsif ( $flag1 eq 'W' ) { $match_flag1 = !$is_w }
- ## else { no match found }
+ else {
+ ## no match
+ DEVEL_MODE && Fault(<<EOM);
+unexpected --lp-exclusion code '$flag1': expecting one of kKfFwW
+EOM
+ }
}
# See if we can exclude this based on the flag1 test...
- if ($line_up_parentheses_control_is_lxpl) {
+ if ($line_up_parentheses_control_is_lpxl) {
return 1 if ($match_flag1);
}
else {
sub process_all_lines {
+ my $self = shift;
+
#----------------------------------------------------------
# Main loop to format all lines of a file according to type
#----------------------------------------------------------
- my $self = shift;
my $rlines = $self->[_rlines_];
my $rOpts_keep_old_blank_lines = $rOpts->{'keep-old-blank-lines'};
my $file_writer_object = $self->[_file_writer_object_];
# Let logger see all non-blank lines of code. This is a slow
# operation so we avoid it if it is not going to be saved.
if ( $save_logfile && $logger_object ) {
+
+ # get updated indentation levels
+ my $rK_range = $line_of_tokens->{_rK_range};
+ my ( $Kfirst, $Klast_uu ) = @{$rK_range};
+ if ( defined($Kfirst) ) {
+ my $level_0 = $self->[_radjusted_levels_]->[$Kfirst];
+ my $ci_level_0 =
+ $self->[_rLL_]->[$Kfirst]->[_CI_LEVEL_];
+ $line_of_tokens->{_level_0} = $level_0;
+ $line_of_tokens->{_ci_level_0} = $ci_level_0;
+ }
+
$logger_object->black_box( $line_of_tokens,
- $vertical_aligner_object->get_output_line_number );
+ $vertical_aligner_object->get_output_line_number() );
}
}
# Handle Format Skipping (FS) and Verbatim (VB) Lines
if ( $CODE_type eq 'VB' || $CODE_type eq 'FS' ) {
- $self->write_unindented_line("$input_line");
+ $self->write_unindented_line($input_line);
$file_writer_object->reset_consecutive_blank_lines();
next;
}
# out of __END__ and __DATA__ sections, because
# the user may be using this section for any purpose whatsoever
if ( $rOpts->{'delete-pod'} ) { $skip_line = 1; }
- if ( $rOpts->{'trim-pod'} ) { $input_line =~ s/\s+$// }
+ if ( $rOpts->{'trim-pod'} ) {
+ chomp $input_line;
+ $input_line =~ s/\s+$//;
+ $input_line .= "\n";
+ }
if ( !$skip_line
&& !$in_format_skipping_section
&& $line_type eq 'POD_START'
elsif ( $line_type eq 'SKIP_END' ) {
$file_writer_object->reset_consecutive_blank_lines();
}
+ else {
+ ## some other line type
+ }
# write unindented non-code line
if ( !$skip_line ) {
# 0 = never (delete if exist)
# 1 = stable (keep unchanged)
# 2 = always (insert if missing)
- return $rhash_of_desires
- unless $rOpts_kgb_size_min > 0
+ my $ok = $rOpts_kgb_size_min > 0
&& ( $rOpts_kgb_before != 1
|| $rOpts_kgb_after != 1
|| $rOpts_kgb_inside
|| $rOpts_kgb_delete );
+ return $rhash_of_desires if ( !$ok );
+
+ # The following parameter combination can be unstable (c302):
+ if ( $rOpts_kgb_size_max
+ && $rOpts_kgb_after == INSERT
+ && $rOpts_kgb_before == DELETE )
+ {
+ # We reset kgb_before=STABLE to fix and continue
+ $rOpts_kgb_before = STABLE;
+ }
+
return;
} ## end sub kgb_initialize_options
} ## end sub kgb_initialize
sub kgb_insert_blank_after {
+
my ($i) = @_;
+
+ # Given:
+ # $i = line number after which blank is requested
+
$rhash_of_desires->{$i} = 1;
my $ip = $i + 1;
if ( defined( $rhash_of_desires->{$ip} )
return unless ($rOpts_kgb_inside);
# loop over sub-groups, index k
- push @subgroup, scalar @group;
+ push @subgroup, scalar(@group);
my $kbeg = 1;
my $kend = @subgroup - 1;
foreach my $k ( $kbeg .. $kend ) {
my $j_e = $subgroup[$k] - 1;
# index i is the actual line number of a keyword
- my ( $i_b, $tok_b, $count_b ) = @{ $group[$j_b] };
- my ( $i_e, $tok_e, $count_e ) = @{ $group[$j_e] };
+ my ( $i_b, $tok_b_uu, $count_b ) = @{ $group[$j_b] };
+ my ( $i_e_uu, $tok_e_uu, $count_e ) = @{ $group[$j_e] };
my $num = $count_e - $count_b + 1;
# This subgroup runs from line $ib to line $ie-1, but may contain
my $nog_b = my $nog_e = 1;
if ( @iblanks && !$rOpts_kgb_delete ) {
my $j_bb = $j_b + $num - 1;
- my ( $i_bb, $tok_bb, $count_bb ) = @{ $group[$j_bb] };
+ my ( $i_bb_uu, $tok_bb_uu, $count_bb ) = @{ $group[$j_bb] };
$nog_b = $count_bb - $count_b + 1 == $num;
my $j_ee = $j_e - ( $num - 1 );
- my ( $i_ee, $tok_ee, $count_ee ) = @{ $group[$j_ee] };
+ my ( $i_ee_uu, $tok_ee_uu, $count_ee ) = @{ $group[$j_ee] };
$nog_e = $count_e - $count_ee + 1 == $num;
}
if ( $nog_b && $k > $kbeg ) {
kgb_insert_blank_after( $i_b - 1 );
}
if ( $nog_e && $k < $kend ) {
- my ( $i_ep, $tok_ep, $count_ep ) =
+ my ( $i_ep, $tok_ep_uu, $count_ep_uu ) =
@{ $group[ $j_e + 1 ] };
kgb_insert_blank_after( $i_ep - 1 );
}
# delete line $i if it is blank
my $rlines = $self->[_rlines_];
- return unless ( $i >= 0 && $i < @{$rlines} );
+ return if ( $i < 0 || $i >= @{$rlines} );
return if ( $rlines->[$i]->{_line_type} ne 'CODE' );
my $code_type = $rlines->[$i]->{_code_type};
if ( $code_type eq 'BL' ) { $rhash_of_desires->{$i} = 2; }
# always remove unwanted trailing blank lines from our list
return unless (@iblanks);
- while ( my $ibl = pop(@iblanks) ) {
+ while (@iblanks) {
+ my $ibl = pop @iblanks;
if ( $ibl < $iend ) { push @iblanks, $ibl; last }
$iend = $ibl;
}
# now mark mark interior blank lines for deletion if requested
return unless ($rOpts_kgb_delete);
- while ( my $ibl = pop(@iblanks) ) { $rhash_of_desires->{$ibl} = 2 }
+ while (@iblanks) {
+ my $ibl = pop @iblanks;
+ $rhash_of_desires->{$ibl} = 2;
+ }
return;
} ## end sub kgb_delete_inner_blank_lines
sub kgb_end_group {
- # end a group of keywords
- my ( $self, $bad_ending ) = @_;
+ my ( $self, ($bad_ending) ) = @_;
+
+ # End a group of keywords
+
+ # Given:
+ # $bad_ending = false if group ends ok
+ # true if group ends badly (strange pattern)
+
if ( defined($ibeg) && $ibeg >= 0 ) {
# then handle sufficiently large groups
# Do not insert a blank after a comment
# (this could be subject to a flag in the future)
- if ( $code_type !~ /(BC|SBC|SBCX)/ ) {
+ if ( $code_type !~ /(?:BC|SBC|SBCX)/ ) {
if ( $rOpts_kgb_before == INSERT ) {
kgb_insert_blank_after( $ibeg - 1 );
elsif ( $rOpts_kgb_before == DELETE ) {
$self->kgb_delete_if_blank( $ibeg - 1 );
}
+ else {
+ ## == STABLE
+ }
}
}
elsif ( $rOpts_kgb_after == DELETE ) {
$self->kgb_delete_if_blank( $iend + 1 );
}
+ else {
+ ## == STABLE
+ }
}
}
}
sub kgb_find_container_end {
+ my ($self) = @_;
+
# If the keyword line is continued onto subsequent lines, find the
# closing token '$K_closing' so that we can easily skip past the
# contents of the container.
# -contents only one level deep
# -not welded
- my ($self) = @_;
-
# First check: skip if next line is not one deeper
my $Knext_nonblank = $self->K_next_nonblank($K_last);
return if ( !defined($Knext_nonblank) );
# Opening container must exist and be on this line
my $Ko = $self->[_K_opening_container_]->{$parent_seqno};
- return unless ( defined($Ko) && $Ko > $K_first && $Ko <= $K_last );
+ return if ( !defined($Ko) || $Ko <= $K_first || $Ko > $K_last );
# Verify that the closing container exists and is on a later line
my $Kc = $self->[_K_closing_container_]->{$parent_seqno};
- return unless ( defined($Kc) && $Kc > $K_last );
+ return if ( !defined($Kc) || $Kc <= $K_last );
# That's it
$K_closing = $Kc;
} ## end sub kgb_find_container_end
sub kgb_add_to_group {
+
my ( $self, $i, $token, $level ) = @_;
# End the previous group if we have reached the maximum
return;
} ## end sub kgb_add_to_group
- #---------------------
- # -kgb main subroutine
- #---------------------
-
sub keyword_group_scan {
+
my $self = shift;
# Called once per file to process --keyword-group-blanks-* parameters.
+ # This is the main subroutine for the -kgb option
# Task:
# Manipulate blank lines around keyword groups (kgb* flags)
if ( $CODE_type eq 'BL' ) {
if ( $ibeg >= 0 ) {
$iend = $i;
- push @{iblanks}, $i;
+ push @iblanks, $i;
# propagate current subgroup token
my $tok = $group[-1]->[1];
# Check for deviation from PATTERN 2, single statement:
elsif ( $level != $level_beg ) { $self->kgb_end_group(1) }
+ else {
+ ## no deviation
+ }
}
# Do not look for keywords in lists ( keyword 'my' can occur in
# lists, see case b760); fixed for c048.
- if ( $self->is_list_by_K($K_first) ) {
+ # Switch from ->is_list_by_K to !->is_in_block_by_K to fix b1464
+ if ( !$self->is_in_block_by_K($K_first) ) {
if ( $ibeg >= 0 ) { $iend = $i }
next;
}
my $line_of_tokens;
my $no_internal_newlines;
my $CODE_type;
+ my $current_line_starts_in_quote;
# range of K of tokens for the current line
my ( $K_first, $K_last );
# past stored nonblank tokens and flags
my (
- $K_last_nonblank_code, $looking_for_else,
+ $K_last_nonblank_code, $K_dangling_elsif,
$is_static_block_comment, $last_CODE_type,
$last_line_had_side_comment, $next_parent_seqno,
$next_slevel,
# Called once at the start of a new file
sub initialize_process_line_of_CODE {
$K_last_nonblank_code = undef;
- $looking_for_else = 0;
+ $K_dangling_elsif = 0;
$is_static_block_comment = 0;
$last_line_had_side_comment = 0;
$next_parent_seqno = SEQ_ROOT;
sub leading_spaces_to_go {
- # return the number of indentation spaces for a token in the output
- # stream
-
my ($ii) = @_;
+
+ # Return the number of indentation spaces for token at index $ii
+ # in the output stream
+
return 0 if ( $ii < 0 );
my $indentation = $leading_spaces_to_go[$ii];
return ref($indentation) ? $indentation->get_spaces() : $indentation;
sub create_one_line_block {
- # set index starting next one-line block
+ # note that this updates a closure variable
+ $index_start_one_line_block = shift;
+
+ # Set index starting next one-line block
+ # Given:
+ # $index_start_one_line_block = starting index in _to_go array
+ # undef => end current one-line block
+ #
# call with no args to delete the current one-line block
- ($index_start_one_line_block) = @_;
return;
} ## end sub create_one_line_block
}
return;
}
+ else {
+ ## all ok
+ }
}
# Do not start a batch with a blank token.
$next_parent_seqno = $rparent_of_seqno->{$seqno};
}
else {
- my $Kt = $rLL->[$Ktoken_vars]->[_KNEXT_SEQ_ITEM_];
+ my $Kt = $self->[_rK_next_seqno_by_K_]->[$Ktoken_vars];
if ( defined($Kt) ) {
- my $type_sequence_t = $rLL->[$Kt]->[_TYPE_SEQUENCE_];
- my $type_t = $rLL->[$Kt]->[_TYPE_];
# if next container token is closing, it is the parent seqno
- if ( $is_closing_type{$type_t} ) {
- $next_parent_seqno = $type_sequence_t;
+ if ( $is_closing_type{ $rLL->[$Kt]->[_TYPE_] } ) {
+ $next_parent_seqno = $rLL->[$Kt]->[_TYPE_SEQUENCE_];
}
# otherwise we want its parent container
else {
$next_parent_seqno =
- $rparent_of_seqno->{$type_sequence_t};
+ $rparent_of_seqno->{ $rLL->[$Kt]->[_TYPE_SEQUENCE_] };
}
}
}
$next_parent_seqno = SEQ_ROOT
- unless ( defined($next_parent_seqno) );
+ if ( !defined($next_parent_seqno) );
#--------------------------------------
# End coding from sub parent_seqno_by_K
$next_slevel = $rdepth_of_opening_seqno->[$next_parent_seqno] + 1;
}
- # Clip levels to zero if there are level errors in the file.
- # We had to wait until now for reasons explained in sub 'write_line'.
- if ( $level < 0 ) { $level = 0 }
-
# Safety check that length is defined. This is slow and should not be
# needed now, so just do it in DEVEL_MODE to check programming changes.
# Formerly needed for --indent-only, in which the entire set of tokens
## $block_type_to_go[$max_index_to_go] = EMPTY_STRING;
## $type_sequence_to_go[$max_index_to_go] = $seqno;
- # NOTE1: nobreak_to_go can be treated as a sparse array, but testing
+ # NOTE: nobreak_to_go can be treated as a sparse array, but testing
# showed that there is almost no efficiency gain because an if test
# would need to be added.
- # NOTE2: Eventually '$type_sequence_to_go' can be also handled as a
- # sparse array with undef's, but this will require extensive testing
- # because of its heavy use.
-
# We keep a running sum of token lengths from the start of this batch:
# summed_lengths_to_go[$i] = total length to just before token $i
# summed_lengths_to_go[$i+1] = total length to just after token $i
# Define the indentation that this token will have in two cases:
# Without CI = reduced_spaces_to_go
# With CI = leading_spaces_to_go
- if ( ( $Ktoken_vars == $K_first )
- && $line_of_tokens->{_starting_in_quote} )
+ $leading_spaces_to_go[$max_index_to_go] =
+ $reduced_spaces_to_go[$max_index_to_go] =
+ $rOpts_indent_columns * $radjusted_levels->[$Ktoken_vars];
+ if ($ci_level) {
+ $leading_spaces_to_go[$max_index_to_go] +=
+ $rOpts_continuation_indentation;
+ }
+
+ # Correct these values if we are starting in a continued quote
+ if ( $current_line_starts_in_quote
+ && $Ktoken_vars == $K_first )
{
# in a continued quote - correct value set above if first token
if ( $max_index_to_go == 0 ) { $starting_in_quote = 1 }
$leading_spaces_to_go[$max_index_to_go] = 0;
$reduced_spaces_to_go[$max_index_to_go] = 0;
}
- else {
- $leading_spaces_to_go[$max_index_to_go] =
- $reduced_spaces_to_go[$max_index_to_go] =
- $rOpts_indent_columns * $radjusted_levels->[$Ktoken_vars];
-
- $leading_spaces_to_go[$max_index_to_go] +=
- $rOpts_continuation_indentation * $ci_level
- if ($ci_level);
- }
DEBUG_STORE && do {
- my ( $a, $b, $c ) = caller();
- print STDOUT
-"STORE: from $a $c: storing token $token type $type lev=$level at $max_index_to_go\n";
+ my ( $pkg, $file_uu, $lno ) = caller();
+ print {*STDOUT}
+"STORE: from $pkg $lno: storing token $token type $type lev=$level at $max_index_to_go\n";
};
return;
} ## end sub store_token_to_go
sub flush_batch_of_CODE {
- # Finish and process the current batch.
- # This must be the only call to grind_batch_of_CODE()
my ($self) = @_;
- # If a batch has been started ...
- if ( $max_index_to_go >= 0 ) {
-
- # Create an array to hold variables for this batch
- my $this_batch = [];
+ # Finish and process the current batch.
+ # This must be the only call to grind_batch_of_CODE()
- $this_batch->[_starting_in_quote_] = 1 if ($starting_in_quote);
- $this_batch->[_ending_in_quote_] = 1 if ($ending_in_quote);
+ return if ( $max_index_to_go < 0 );
- if ( $CODE_type || $last_CODE_type ) {
- $this_batch->[_batch_CODE_type_] =
- $K_to_go[$max_index_to_go] >= $K_first
- ? $CODE_type
- : $last_CODE_type;
- }
+ # Create an array to hold variables for this batch
+ my $this_batch = $self->[_this_batch_] = [];
- $last_line_had_side_comment =
- ( $max_index_to_go > 0 && $types_to_go[$max_index_to_go] eq '#' );
+ $this_batch->[_starting_in_quote_] = 1 if ($starting_in_quote);
+ $this_batch->[_ending_in_quote_] = 1 if ($ending_in_quote);
- # The flag $is_static_block_comment applies to the line which just
- # arrived. So it only applies if we are outputting that line.
- if ( $is_static_block_comment && !$last_line_had_side_comment ) {
- $this_batch->[_is_static_block_comment_] =
- $K_to_go[0] == $K_first;
- }
+ if ( $CODE_type || $last_CODE_type ) {
+ $this_batch->[_batch_CODE_type_] =
+ $K_to_go[$max_index_to_go] >= $K_first
+ ? $CODE_type
+ : $last_CODE_type;
+ }
- $this_batch->[_ri_starting_one_line_block_] =
- $ri_starting_one_line_block;
+ $last_line_had_side_comment =
+ ( $max_index_to_go > 0 && $types_to_go[$max_index_to_go] eq '#' );
- $self->[_this_batch_] = $this_batch;
+ # The flag $is_static_block_comment applies to the line which just
+ # arrived. So it only applies if we are outputting that line.
+ if ( $is_static_block_comment && !$last_line_had_side_comment ) {
+ $this_batch->[_is_static_block_comment_] = $K_to_go[0] == $K_first;
+ }
- #-------------------
- # process this batch
- #-------------------
- $self->grind_batch_of_CODE();
+ $this_batch->[_ri_starting_one_line_block_] =
+ $ri_starting_one_line_block;
- # Done .. this batch is history
- $self->[_this_batch_] = undef;
+ #-------------------
+ # process this batch
+ #-------------------
+ $self->grind_batch_of_CODE();
- initialize_batch_variables();
- }
+ # Done .. this batch is history
+ initialize_batch_variables();
return;
} ## end sub flush_batch_of_CODE
return;
} ## end sub flush_vertical_aligner
- # flush is called to output any tokens in the pipeline, so that
- # an alternate source of lines can be written in the correct order
sub flush {
- my ( $self, $CODE_type_flush ) = @_;
+ my ( $self, ($CODE_type_flush) ) = @_;
- # end the current batch with 1 exception
+ # Sub flush is called to output any tokens in the pipeline, so that
+ # an alternate source of lines can be written in the correct order
+ # Optional parameter:
+ # $CODE_type_flush = 'BL' for flushing to insert a blank line
$index_start_one_line_block = undef;
- # Exception: if we are flushing within the code stream only to insert
- # blank line(s), then we can keep the batch intact at a weld. This
- # improves formatting of -ce. See test 'ce1.ce'
- if ( $CODE_type_flush && $CODE_type_flush eq 'BL' ) {
- $self->end_batch() if ( $max_index_to_go >= 0 );
- }
+ # End the current batch, if it holds any tokens, with 1 exception
+ if ( $max_index_to_go >= 0 ) {
+
+ # Exception: if we are flushing within the code stream only to
+ # insert blank line(s), then we can keep the batch intact at a
+ # weld. This improves formatting of -ce. See test 'ce1.ce'
+ if ( $CODE_type_flush && $CODE_type_flush eq 'BL' ) {
+ $self->end_batch();
+ }
- # otherwise, we have to shut things down completely.
- else { $self->flush_batch_of_CODE() }
+ # otherwise, we have to shut things down completely.
+ else { $self->flush_batch_of_CODE() }
+ }
$self->flush_vertical_aligner();
return;
$is_assignment_or_fat_comma{'=>'} = 1;
}
+ sub add_missing_else {
+
+ my ($self) = @_;
+
+ # Add a missing 'else' block.
+ # $K_dangling_elsif = index of closing elsif brace not followed by else
+
+ # Make sure everything looks okay
+ if ( !$K_dangling_elsif
+ || $K_dangling_elsif < $K_first
+ || $rLL->[$K_dangling_elsif]->[_TYPE_] ne '}' )
+ {
+ DEVEL_MODE && Fault("could not find closing elsif brace\n");
+ }
+
+ my $comment = $rOpts->{'add-missing-else-comment'};
+
+ # Safety check
+ if ( substr( $comment, 0, 1 ) ne '#' ) { $comment = '#' . $comment }
+
+ # Calculate indentation
+ my $level = $radjusted_levels->[$K_dangling_elsif];
+ my $spaces = SPACE x ( $level * $rOpts_indent_columns );
+ my $line1 = $spaces . "else {\n";
+ my $line3 = $spaces . "}\n";
+ $spaces .= SPACE x $rOpts_indent_columns;
+ my $line2 = $spaces . $comment . "\n";
+
+ # clear the output pipeline
+ $self->flush();
+
+ my $file_writer_object = $self->[_file_writer_object_];
+
+ $file_writer_object->write_code_line($line1);
+ $file_writer_object->write_code_line($line2);
+ $file_writer_object->write_code_line($line3);
+ return;
+ } ## end sub add_missing_else
+
sub process_line_of_CODE {
my ( $self, $my_line_of_tokens ) = @_;
}
( $K_first, $K_last ) = @{$rK_range};
- $last_CODE_type = $CODE_type;
- $CODE_type = $line_of_tokens->{_code_type};
+ $last_CODE_type = $CODE_type;
+ $CODE_type = $line_of_tokens->{_code_type};
+ $current_line_starts_in_quote = $line_of_tokens->{_starting_in_quote};
$rLL = $self->[_rLL_];
$radjusted_levels = $self->[_radjusted_levels_];
my ( $is_block_comment, $has_side_comment );
if ( $rLL->[$K_last]->[_TYPE_] eq '#' ) {
- if ( $K_last == $K_first ) { $is_block_comment = 1 }
- else { $has_side_comment = 1 }
+ if ( $K_last == $K_first && $CODE_type ne 'HSC' ) {
+ $is_block_comment = 1;
+ }
+ else { $has_side_comment = 1 }
}
my $is_static_block_comment_without_leading_space =
#--------------------------------------------
if ( $self->[_save_logfile_] ) {
+ my $guessed_indentation_level =
+ $line_of_tokens->{_guessed_indentation_level};
+
# Compare input/output indentation except for:
# - hanging side comments
# - continuation lines (have unknown leading blank space)
# - and lines which are quotes (they may have been outdented)
- my $guessed_indentation_level =
- $line_of_tokens->{_guessed_indentation_level};
+ my $exception =
+ $CODE_type eq 'HSC'
+ || $rtok_first->[_CI_LEVEL_] > 0
+ || $guessed_indentation_level == 0
+ && $rtok_first->[_TYPE_] eq 'Q';
- unless ( $CODE_type eq 'HSC'
- || $rtok_first->[_CI_LEVEL_] > 0
- || $guessed_indentation_level == 0
- && $rtok_first->[_TYPE_] eq 'Q' )
- {
+ if ( !$exception ) {
my $input_line_number = $line_of_tokens->{_line_number};
$self->compare_indentation_levels( $K_first,
$guessed_indentation_level, $input_line_number );
$self->flush();
my $line = $input_line;
- # Fix for rt #125506 Unexpected string formating
+ # Fix for rt #125506 Unexpected string formatting
# in which leading space of a terminal quote was removed
$line =~ s/\s+$//;
$line =~ s/^\s+// unless ( $line_of_tokens->{_starting_in_quote} );
# Handle all other lines ...
#---------------------------
- # If we just saw the end of an elsif block, write nag message
- # if we do not see another elseif or an else.
- if ($looking_for_else) {
-
- ## /^(elsif|else)$/
- if ( !$is_elsif_else{ $rLL->[$K_first_true]->[_TOKEN_] } ) {
- write_logfile_entry("(No else block)\n");
- }
- $looking_for_else = 0;
- }
+ $K_dangling_elsif = 0;
# This is a good place to kill incomplete one-line blocks
if ( $max_index_to_go >= 0 ) {
}
}
+ if ( $K_dangling_elsif && $rOpts_add_missing_else ) {
+ $self->add_missing_else();
+ }
+
return;
} ## end sub process_line_of_CODE
# } else ...
if ($rbrace_follower) {
my $token = $rtoken_vars->[_TOKEN_];
- unless ( $rbrace_follower->{$token} ) {
+ if ( !$rbrace_follower->{$token} ) {
$self->end_batch() if ( $max_index_to_go >= 0 );
}
$rbrace_follower = undef;
$is_closing_BLOCK = 1;
$nobreak_BEFORE_BLOCK = $no_internal_newlines;
}
+ else {
+ ## error - block should be enclosed by curly brace
+ DEVEL_MODE && Fault(<<EOM);
+block type '$block_type' has unexpected container type '$type'
+EOM
+ }
}
}
# if before last token ... do not allow breaks which would
# promote a side comment to a block comment
- elsif ($Ktoken_vars == $K_last - 1
+ if ( $Ktoken_vars == $K_last - 1
|| $Ktoken_vars == $K_last - 2
&& $rLL->[ $K_last - 1 ]->[_TYPE_] eq 'b' )
{
$self->store_token_to_go( $Ktoken_vars, $rtoken_vars );
$self->end_batch()
- unless (
- $no_internal_newlines
- || ( $rOpts_keep_interior_semicolons
- && $Ktoken_vars < $K_last )
- || ( $next_nonblank_token eq '}' )
+ if (
+ !$no_internal_newlines
+ && ( !$rOpts_keep_interior_semicolons
+ || $Ktoken_vars >= $K_last )
+ && ( $next_nonblank_token ne '}' )
);
}
{
# but only if allowed
- unless ($nobreak_BEFORE_BLOCK) {
+ if ( !$nobreak_BEFORE_BLOCK ) {
# since we already stored this token, we must unstore it
$self->unstore_token_to_go();
# Remember the type of token just before the
# opening brace. It would be more general to use
# a stack, but this will work for one-line blocks.
- $one_line_block_type =
- $types_to_go[$index_start_one_line_block];
+ # c1461 fix
+ my $Ko = $self->[_K_opening_container_]->{$type_sequence};
+ my $Kom = $self->K_previous_nonblank($Ko);
+ if ( defined($Kom) ) {
+ $one_line_block_type = $rLL->[$Kom]->[_TYPE_];
+ }
# we have to actually make it by removing tentative
# breaks that were set within it
&& $Kc - $Ktoken_vars <= 2 );
$rbrace_follower = undef if ($keep_going);
}
+ else {
+ ## not an exception
+ }
}
else {
$rbrace_follower = \%is_anon_sub_brace_follower;
# complain if not.
if ( $block_type eq 'elsif' ) {
- if ( $next_nonblank_token_type eq 'b' ) { # end of line?
- $looking_for_else = 1; # ok, check on next line
+ # more code on this line ? ( this is unusual )
+ if ( $next_nonblank_token_type ne 'b'
+ && $next_nonblank_token_type ne '#' )
+ {
+ # check for 'elsif' or 'else'
+ if ( !$is_elsif_else{$next_nonblank_token} ) {
+ write_logfile_entry("(No else block)\n");
+
+ # Note that we cannot add a missing else block
+ # in this case because more code follows the
+ # closing elsif brace on the same line.
+ if ( $rOpts_warn_missing_else && !DEVEL_MODE ) {
+ my $lno =
+ $rLL->[$Ktoken_vars]->[_LINE_INDEX_] + 1;
+ warning("$lno: No else block\n");
+ }
+ }
}
+
+ # no more code on this line, so check on next line
else {
- ## /^(elsif|else)$/
- if ( !$is_elsif_else{$next_nonblank_token} ) {
- write_logfile_entry("No else block :(\n");
+ my $K_next = $self->K_next_code($K_last);
+ if ( !defined($K_next)
+ || $rLL->[$K_next]->[_TYPE_] ne 'k'
+ || !$is_elsif_else{ $rLL->[$K_next]->[_TOKEN_] } )
+ {
+ $K_dangling_elsif = $Ktoken_vars;
+ write_logfile_entry("(No else block)\n");
+ if ( $rOpts_warn_missing_else && !DEVEL_MODE ) {
+ my $lno =
+ $rLL->[$Ktoken_vars]->[_LINE_INDEX_] + 1;
+ if ($rOpts_add_missing_else) {
+ warning(
+ "$lno: Adding missing else block\n");
+ }
+ else {
+ warning(
+"$lno: No else block (use -ame to add one)\n"
+ );
+ }
+ }
}
}
}
elsif ( ( $next_nonblank_token_type eq 'b' )
&& $rOpts_add_newlines )
{
- unless ($rbrace_follower) {
+ if ( !$rbrace_follower ) {
$self->end_batch()
- unless ( $no_internal_newlines
- || $max_index_to_go < 0 );
+ if (!$no_internal_newlines
+ && $max_index_to_go >= 0 );
}
}
elsif ($rbrace_follower) {
# 'do' block. This could also be used for other block
# types, but that would cause a significant change in
# existing formatting without much benefit.
- if ( $next_nonblank_token eq ','
+ if ( $next_nonblank_token_type eq ','
&& $Knnb eq $K_last
&& $block_type eq 'do'
&& $rOpts_add_newlines
}
else {
$self->end_batch()
- unless ( $no_internal_newlines
- || $max_index_to_go < 0 );
+ if (!$no_internal_newlines
+ && $max_index_to_go >= 0 );
}
$rbrace_follower = undef;
else {
$self->end_batch()
- unless ( $no_internal_newlines
- || $max_index_to_go < 0 );
+ if ( !$no_internal_newlines && $max_index_to_go >= 0 );
}
} ## end treatment of closing block token
} ## end closure process_line_of_CODE
-sub is_trailing_comma {
- my ( $self, $KK ) = @_;
-
- # Given:
- # $KK - index of a comma in token list
- # Return:
- # true if the comma at index $KK is a trailing comma
- # false if not
-
- my $rLL = $self->[_rLL_];
- my $type_KK = $rLL->[$KK]->[_TYPE_];
- if ( $type_KK ne ',' ) {
- DEVEL_MODE
- && Fault("Bad call: expected type ',' but received '$type_KK'\n");
- return;
- }
- my $Knnb = $self->K_next_nonblank($KK);
- if ( defined($Knnb) ) {
- my $type_sequence = $rLL->[$Knnb]->[_TYPE_SEQUENCE_];
- my $type_Knnb = $rLL->[$Knnb]->[_TYPE_];
- if ( $type_sequence && $is_closing_type{$type_Knnb} ) {
- return 1;
- }
- }
- return;
-} ## end sub is_trailing_comma
-
sub tight_paren_follows {
my ( $self, $K_to_go_0, $K_ic ) = @_;
- # Input parameters:
+ # Given:
# $K_to_go_0 = first token index K of this output batch (=K_to_go[0])
# $K_ic = index of the closing do brace (=K_to_go[$max_index_to_go])
- # Return parameter:
+ # Return:
# false if we want a break after the closing do brace
# true if we do not want a break after the closing do brace
sub starting_one_line_block {
+ my ( $self, $Kj, $K_last_nonblank, $K_last ) = @_;
+
# After seeing an opening curly brace, look for the closing brace and see
# if the entire block will fit on a line. This routine is not always right
# so a check is made later (at the closing brace) to make sure we really
# Calls 'create_one_line_block' if one-line block might be formed.
- # Also returns a flag '$too_long':
- # true = distance from opening keyword to OPENING brace exceeds
+ # Returns:
+ # $too_long:
+ # true = distance from opening keyword to OPENING brace exceeds
# the maximum line length.
- # false (simple return) => not too long
+ # false otherwise
# Note that this flag is for distance from the statement start to the
# OPENING brace, not the closing brace.
- my ( $self, $Kj, $K_last_nonblank, $K_last ) = @_;
-
my $rbreak_container = $self->[_rbreak_container_];
my $rshort_nested = $self->[_rshort_nested_];
my $rLL = $self->[_rLL_];
my $rblock_type_of_seqno = $self->[_rblock_type_of_seqno_];
# kill any current block - we can only go 1 deep
- create_one_line_block();
+ create_one_line_block(undef);
my $i_start = 0;
my $seqno = $type_sequence_to_go[$i_start];
return unless ($seqno);
my $K_opening = $K_opening_container->{$seqno};
- return unless defined($K_opening);
+ return if ( !defined($K_opening) );
my $i_opening = $i_start + ( $K_opening - $K_start );
# give up if not on this line
- return unless ( $i_opening >= 0 );
+ return if ( $i_opening < 0 );
$i_start = $i_opening;
# go back one token before the opening paren
my $lev = $levels_to_go[$i_start];
if ( $lev > $rLL->[$Kj]->[_LEVEL_] ) { return }
}
- }
+ # include a length of any preceding assignment token if we break before
+ # it (b1461)
+ elsif ( $i_start > 0 ) {
+ my $i_eq = $i_start - 1;
+ if ( $types_to_go[$i_eq] eq 'b' && $i_eq > 0 ) { $i_eq--; }
+ my $type_eq = $types_to_go[$i_eq];
+ if ( $is_assignment{$type_eq} && $want_break_before{$type_eq} ) {
+ $i_start = $i_eq;
+ }
+ }
+ else {
+ # $i_start is 0 - cannot back up
+ }
+ }
elsif ( $previous_nonblank_token eq ')' ) {
# For something like "if (xxx) {", the keyword "if" will be
if ( substr( $block_type, -2, 2 ) eq '()' ) {
$stripped_block_type = substr( $block_type, 0, -2 );
}
- unless ( $tokens_to_go[$i_start] eq $stripped_block_type ) {
+ if ( $tokens_to_go[$i_start] ne $stripped_block_type ) {
return;
}
}
if ( $types_to_go[$i_start] eq 'b' ) {
$i_start++;
}
- unless ( $tokens_to_go[$i_start] eq $block_type ) {
+ if ( $tokens_to_go[$i_start] ne $block_type ) {
return;
}
}
# ; # very long comment......
# so we do not need to include the length of the comment, which
# would break the block. Project 'bioperl' has coding like this.
- ## !~ /^(if|else|elsif|unless)$/
if ( !$is_if_unless_elsif_else{$block_type}
|| $K_last == $Ki_nonblank )
{
# essential but helps keep newer and older formatting the same.
$self->[_ris_short_broken_eval_block_]->{$type_sequence_j} = 1;
}
+ else {
+ # do not continue the search
+ }
}
return;
} ## end sub starting_one_line_block
sub compare_indentation_levels {
- # Check to see if output line tabbing agrees with input line
- # this can be very useful for debugging a script which has an extra
+ my ( $self, $K_first, $guessed_indentation_level, $line_number ) = @_;
+
+ # Check to see if output line leading space agrees with input line.
+ # This can be very useful for debugging a script which has an extra
# or missing brace.
+ # Given:
+ # $K_first = index of first token on the line
+ # $guessed_indentation_level = guess based on leading spaces in input
+ # $line_number = line number in input stream
- my ( $self, $K_first, $guessed_indentation_level, $line_number ) = @_;
return unless ( defined($K_first) );
my $rLL = $self->[_rLL_];
# - If a break is made after an opening token, then a break will
# also be made before the corresponding closing token.
- # Returns '$i_nonblank':
+ # Returns: $i_nonblank
# = index of the token after which the breakpoint was actually placed
# = undef if breakpoint was not set.
my $i_nonblank;
}
DEBUG_FORCE && do {
- my ( $a, $b, $c ) = caller();
+ my ( $pkg, $file_uu, $lno ) = caller();
my $msg =
-"FORCE $forced_breakpoint_count after call from $a $c with i=$i max=$max_index_to_go";
+"FORCE $forced_breakpoint_count after call from $pkg $lno with i=$i max=$max_index_to_go";
if ( !defined($i_nonblank) ) {
$i = EMPTY_STRING unless defined($i);
$msg .= " but could not set break after i='$i'\n";
" Also set closing breakpoint corresponding to this token\n";
}
}
- print STDOUT $msg;
+ print {*STDOUT} $msg;
};
return $i_nonblank;
sub set_forced_breakpoint_AFTER {
my ( $self, $i ) = @_;
- # This routine is only called by sub set_forced_breakpoint and
- # sub set_closing_breakpoint.
-
# Set a breakpoint AFTER the token at index $i in the _to_go arrays.
# Exceptions:
# - the index of the token after which the break was set, or
# - undef if no break was set
- return unless ( defined($i) && $i >= 0 );
+ # This routine is only called by sub set_forced_breakpoint and
+ # sub set_closing_breakpoint.
+
+ return if ( !defined($i) );
+ return if ( $i < 0 );
# Back up at a blank so we have a token to examine.
# This was added to fix for cases like b932 involving an '=' break.
my $token = $tokens_to_go[$i];
my $type = $types_to_go[$i];
+ # patch for phantom commas, used for -qwaf
+ if ( !$token && $type eq ',' ) { $token = ',' }
+
# For certain tokens, use user settings to decide if we break before or
# after it
if ( $break_before_or_after_token{$token}
# breaks are forced before 'if' and 'unless'
elsif ( $is_if_unless{$token} && $type eq 'k' ) { $i-- }
+ else {
+ # do not break before
+ }
if ( $i >= 0 && $i <= $max_index_to_go ) {
my $i_nonblank = ( $types_to_go[$i] ne 'b' ) ? $i : $i - 1;
if ( $i_start < 0 ) {
$i_start = 0;
- my ( $a, $b, $c ) = caller();
+ my ( $pkg, $file_uu, $lno ) = caller();
# Bad call, can only be due to a recent programming change.
Fault(
-"Program Bug: undo_forced_breakpoint_stack from $a $c has bad i=$i_start "
+"Program Bug: undo_forced_breakpoint_stack from $pkg $lno has bad i=$i_start "
) if (DEVEL_MODE);
return;
}
while ( $forced_breakpoint_undo_count > $i_start ) {
+ $forced_breakpoint_undo_count--;
my $i =
- $forced_breakpoint_undo_stack[ --$forced_breakpoint_undo_count ];
+ $forced_breakpoint_undo_stack[$forced_breakpoint_undo_count];
if ( $i >= 0 && $i <= $max_index_to_go ) {
$forced_breakpoint_to_go[$i] = 0;
$forced_breakpoint_count--;
DEBUG_UNDOBP && do {
- my ( $a, $b, $c ) = caller();
- print STDOUT
-"UNDOBP: undo forced_breakpoint i=$i $forced_breakpoint_undo_count from $a $c max=$max_index_to_go\n";
+ my ( $pkg, $file_uu, $lno ) = caller();
+ print {*STDOUT}
+"UNDOBP: undo forced_breakpoint i=$i $forced_breakpoint_undo_count from $pkg $lno max=$max_index_to_go\n";
};
}
# shouldn't happen, but not a critical error
else {
if (DEVEL_MODE) {
- my ( $a, $b, $c ) = caller();
+ my ( $pkg, $file_uu, $lno ) = caller();
Fault(<<EOM);
-Program Bug: undo_forced_breakpoint from $a $c has i=$i but max=$max_index_to_go
+Program Bug: undo_forced_breakpoint from $pkg $lno has i=$i but max=$max_index_to_go
EOM
}
}
- }
+ } ## end while ( $forced_breakpoint_undo_count...)
return;
} ## end sub undo_forced_breakpoint_stack
} ## end closure set_forced_breakpoint
sub set_closing_breakpoint {
- # set a breakpoint at a matching closing token
my ( $self, $i_break ) = @_;
+ # Set a breakpoint at a matching closing token
+ # Given:
+ # $i_break = index of the opening token
+
if ( defined( $mate_index_to_go[$i_break] ) ) {
# Don't reduce the '2' in the statement below.
sub check_grind_input {
+ my ($self) = @_;
+
# Check for valid input to sub grind_batch_of_CODE. An error here
# would most likely be due to an error in 'sub store_token_to_go'.
- my ($self) = @_;
+ # NOTE: This is only called when DEVEL_MODE is set.
# Be sure there are tokens in the batch
if ( $max_index_to_go < 0 ) {
# The tokens of the batch are in the '_to_go' arrays.
#-----------------------------------------------------------------
+ # $this_batch = ref to array of vars for this output batch
my $this_batch = $self->[_this_batch_];
+
$this_batch->[_peak_batch_size_] = $peak_batch_size;
$this_batch->[_batch_count_] = ++$batch_count;
$output_str = join EMPTY_STRING,
@tokens_to_go[ 0 .. $max_index_to_go ];
}
- print STDERR <<EOM;
+ print {*STDOUT} <<EOM;
grind got batch number $batch_count with $max_index_to_go tokens, last type '$type' tok='$token', text:
$output_str
EOM
return if ( $max_index_to_go < 0 );
+ my $lp_object_count_this_batch;
if ($rOpts_line_up_parentheses) {
- $self->set_lp_indentation();
+ $lp_object_count_this_batch = $self->set_lp_indentation();
}
- #--------------------------------------------------
- # Shortcut for block comments
- # Note that this shortcut does not work for -lp yet
- #--------------------------------------------------
- elsif ( !$max_index_to_go && $types_to_go[0] eq '#' ) {
- my $ibeg = 0;
- $this_batch->[_ri_first_] = [$ibeg];
- $this_batch->[_ri_last_] = [$ibeg];
+ #-----------------------------
+ # Shortcut for block comments.
+ #-----------------------------
- $self->convey_batch_to_vertical_aligner();
+ my $is_HSC;
- my $level = $levels_to_go[$ibeg];
- $self->[_last_line_leading_type_] = $types_to_go[$ibeg];
- $self->[_last_line_leading_level_] = $level;
- $nonblank_lines_at_depth[$level] = 1;
- return;
+ if ( !$max_index_to_go
+ && $types_to_go[0] eq '#' )
+ {
+
+ # But not for block comments with lp because they must use the lp
+ # corrector step below.
+ # And not for hanging side comments.
+
+ my $batch_CODE_type = $this_batch->[_batch_CODE_type_];
+ $is_HSC = $batch_CODE_type && $batch_CODE_type eq 'HSC';
+
+ if ( !$is_HSC
+ && !$lp_object_count_this_batch )
+ {
+ my $ibeg = 0;
+ $this_batch->[_ri_first_] = [$ibeg];
+ $this_batch->[_ri_last_] = [$ibeg];
+
+ $self->convey_batch_to_vertical_aligner();
+
+ my $level = $levels_to_go[$ibeg];
+ $self->[_last_line_leading_type_] = $types_to_go[$ibeg];
+ $self->[_last_line_leading_level_] = $level;
+ $nonblank_lines_at_depth[$level] = 1;
+ return;
+ }
}
#-------------
# gather info needed by sub break_long_lines
if ( $type_sequence_to_go[$i] ) {
- my $seqno = $type_sequence_to_go[$i];
- my $token = $tokens_to_go[$i];
# remember indexes of any tokens controlling xci
# in this batch. This list is needed by sub undo_ci.
+ my $seqno = $type_sequence_to_go[$i];
if ( $self->[_ris_seqno_controlling_ci_]->{$seqno} ) {
push @ix_seqno_controlling_ci, $i;
}
- if ( $is_opening_sequence_token{$token} ) {
+ if ( $is_opening_sequence_token{ $tokens_to_go[$i] } ) {
if ( $self->[_rbreak_container_]->{$seqno} ) {
$self->set_forced_breakpoint($i);
}
push @colon_list, $type;
}
}
- elsif ( $is_closing_sequence_token{$token} ) {
+ else { ## $is_closing_sequence_token{$token}
if ( $i > 0 && $self->[_rbreak_container_]->{$seqno} ) {
$self->set_forced_breakpoint( $i - 1 );
if ( $type eq ':' ) {
push @colon_list, $type;
}
- } ## end elsif ( $is_closing_sequence_token...)
+ }
} ## end if ($seqno)
elsif ( $type eq 'f' ) {
push @i_for_semicolon, $i;
}
+ else {
+ ## not a special type
+ }
} ## end for ( my $i = 0 ; $i <=...)
# quit if we see anything besides words, function, blanks
# at this level
elsif ( $types_to_go[$i] !~ /^[\(\)Gwib]$/ ) { last }
+ else {
+ ## keep going
+ }
}
}
$blank_count = 1;
}
}
+ else {
+ # no blank line needed
+ }
}
# blank lines before subs except declarations and one-liners
- elsif ( $leading_type eq 'i' ) {
+ # Fix for c250: added new type 'P', changed 'i' to 'S'
+ elsif ( $leading_type eq 'S' || $leading_type eq 'P' ) {
my $special_identifier =
$self->[_ris_special_identifier_token_]->{$leading_token};
if ($special_identifier) {
!$self->[_ris_short_broken_eval_block_]->{$pseqno}
);
}
+ else {
+ DEVEL_MODE && Fault(<<EOM);
+Found special identifier '$special_identifier', but expecting 'sub' or 'package'
+EOM
+ }
}
}
}
}
}
+ else {
+ # no blank line needed
+ }
if ($blank_count) {
# update blank line variables and count number of consecutive
# non-blank, non-comment lines at this level
if ( $leading_level == $last_line_leading_level
- && $leading_type ne '#'
+ && ( $leading_type ne '#' || $is_HSC )
&& defined( $nonblank_lines_at_depth[$leading_level] ) )
{
$nonblank_lines_at_depth[$leading_level]++;
$nonblank_lines_at_depth[$leading_level] = 1;
}
- $self->[_last_line_leading_type_] = $leading_type;
+ $self->[_last_line_leading_type_] = $is_HSC ? 'q' : $leading_type;
$self->[_last_line_leading_level_] = $leading_level;
#--------------------------
$rLL->[$Kend]->[_LINE_INDEX_] - $rLL->[$Kbeg]->[_LINE_INDEX_];
}
+ # Optional optimization: avoid calling break_lists for a single block
+ # brace. This is done by turning off the flag $is_unbalanced_batch.
+ elsif ($is_unbalanced_batch) {
+ my $block_type = $block_type_to_go[0];
+ if ( $block_type
+ && !$lp_object_count_this_batch
+ && $is_block_without_semicolon{$block_type} )
+ {
+ # opening blocks can skip break_lists call if no commas in
+ # container.
+ if ( $leading_type eq '{' ) {
+ my $seqno = $type_sequence_to_go[0];
+ my $rtype_count = $self->[_rtype_count_by_seqno_]->{$seqno};
+ if ($rtype_count) {
+ my $comma_count = $rtype_count->{','};
+ if ( !$comma_count ) {
+ $is_unbalanced_batch = 0;
+ }
+ }
+ }
+
+ # closing block braces can be skipped
+ else {
+ $is_unbalanced_batch = 0;
+ }
+
+ }
+ }
+ else {
+ # single balanced token
+ }
+
my $rbond_strength_bias = [];
if (
$is_long_line
$self->break_all_chain_tokens( $ri_first, $ri_last );
+ $self->break_method_call_chains( $ri_first, $ri_last )
+ if ( %{ $self->[_rseqno_arrow_call_chain_start_] } );
+
$self->break_equals( $ri_first, $ri_last )
if @{$ri_first} >= 3;
}
}
+ # The batch has now been divided into lines
+ $this_batch->[_ri_first_] = $ri_first;
+ $this_batch->[_ri_last_] = $ri_last;
+
#-------------------
# -lp corrector step
#-------------------
- if ($rOpts_line_up_parentheses) {
- $self->correct_lp_indentation( $ri_first, $ri_last );
+ if ($lp_object_count_this_batch) {
+ $self->correct_lp_indentation();
}
#--------------------
# ship this batch out
#--------------------
- $this_batch->[_ri_first_] = $ri_first;
- $this_batch->[_ri_last_] = $ri_last;
-
$self->convey_batch_to_vertical_aligner();
#-------------------------------------------------------------------
sub iprev_to_go {
my ($i) = @_;
+
+ # Given index $i of a token in the '_to_go' arrays, return
+ # the index of the previous nonblank token.
return $i - 1 > 0
&& $types_to_go[ $i - 1 ] eq 'b' ? $i - 2 : $i - 1;
- }
+ } ## end sub iprev_to_go
sub unmask_phantom_token {
my ( $self, $iend ) = @_;
sub save_opening_indentation {
- # This should be called after each batch of tokens is output. It
- # saves indentations of lines of all unmatched opening tokens.
+ my ( $self, $rindentation_list ) = @_;
+
+ # Save indentations of lines of all unmatched opening tokens.
# These will be used by sub get_opening_indentation.
+ # This should be called after each batch of tokens is output.
- my ( $self, $ri_first, $ri_last, $rindentation_list,
- $runmatched_opening_indexes )
- = @_;
+ # Given:
+ # $rindentation_list = ref to indentations for each line
+
+ # $runmatched_opening_indexes = list of indexes of unmatched tokens
+ my $this_batch = $self->[_this_batch_];
+ my $runmatched_opening_indexes =
+ $this_batch->[_runmatched_opening_indexes_];
$runmatched_opening_indexes = []
if ( !defined($runmatched_opening_indexes) );
# we need to save indentations of any unmatched opening tokens
# in this batch because we may need them in a subsequent batch.
- foreach ( @{$runmatched_opening_indexes}, @i_qw ) {
+ foreach my $i_opening ( @{$runmatched_opening_indexes}, @i_qw ) {
- my $seqno = $type_sequence_to_go[$_];
+ my $seqno = $type_sequence_to_go[$i_opening];
if ( !$seqno ) {
- if ( $seqno_qw_opening && $_ == $max_index_to_go ) {
+ if ( $seqno_qw_opening && $i_opening == $max_index_to_go ) {
$seqno = $seqno_qw_opening;
}
else {
}
$saved_opening_indentation{$seqno} = [
- lookup_opening_indentation(
- $_, $ri_first, $ri_last, $rindentation_list
+ $self->lookup_opening_indentation(
+ $i_opening, $rindentation_list
)
];
}
} ## end sub save_opening_indentation
sub get_saved_opening_indentation {
+
my ($seqno) = @_;
+
+ # Lookup indentation of an output line with a given container token
+
+ # Given:
+ # $seqno = sequence number of a container token
+
my ( $indent, $offset, $is_leading, $exists ) = ( 0, 0, 0, 0 );
if ($seqno) {
sub lookup_opening_indentation {
- # get the indentation of the line in the current output batch
+ my ( $self, $i_opening, $rindentation_list ) = @_;
+
+ # Get the indentation of the line in the current output batch
# which output a selected opening token
#
- # given:
+ # Given:
# $i_opening - index of an opening token in the current output batch
# whose line indentation we need
- # $ri_first - reference to list of the first index $i for each output
- # line in this batch
- # $ri_last - reference to list of the last index $i for each output line
- # in this batch
# $rindentation_list - reference to a list containing the indentation
# used for each line. (NOTE: the first slot in
# this list is the last returned line number, and this is
# followed by the list of indentations).
#
- # return
+ # Return
# -the indentation of the line which contained token $i_opening
# -and its offset (number of columns) from the start of the line
- my ( $i_opening, $ri_start, $ri_last, $rindentation_list ) = @_;
+ # $ri_first - reference to list of the first index $i for each output
+ # line in this batch
+ # $ri_last - reference to list of the last index $i for each output
+ # line in this batch
+ my $this_batch = $self->[_this_batch_];
+ my $ri_first = $this_batch->[_ri_first_];
+ my $ri_last = $this_batch->[_ri_last_];
if ( !@{$ri_last} ) {
my $nline = $rindentation_list->[0]; # line number of previous lookup
# reset line location if necessary
- $nline = 0 if ( $i_opening < $ri_start->[$nline] );
+ $nline = 0 if ( $i_opening < $ri_first->[$nline] );
# find the correct line
- unless ( $i_opening > $ri_last->[-1] ) {
+ if ( $i_opening <= $ri_last->[-1] ) {
while ( $i_opening > $ri_last->[$nline] ) { $nline++; }
}
$rindentation_list->[0] =
$nline; # save line number to start looking next call
- my $ibeg = $ri_start->[$nline];
+ my $ibeg = $ri_first->[$nline];
my $offset = token_sequence_length( $ibeg, $i_opening ) - 1;
my $is_leading = ( $ibeg == $i_opening );
return ( $rindentation_list->[ $nline + 1 ], $offset, $is_leading );
sub terminal_type_i {
- # returns type of last token on this line (terminal token), as follows:
- # returns # for a full-line comment
- # returns ' ' for a blank line
- # otherwise returns final token type
-
my ( $ibeg, $iend ) = @_;
+ # Given:
+ # ($ibeg, $iend) = index range of the current output buffer line
+ # Returns type of last token on this line (terminal token), as follows:
+ # # for a full-line comment
+ # ' ' for a blank line
+ # otherwise returns final token type
+
# Start at the end and work backwards
my $i = $iend;
my $type_i = $types_to_go[$i];
if ( $type_i eq '#' ) {
$i--;
if ( $i < $ibeg ) {
- return wantarray ? ( $type_i, $ibeg ) : $type_i;
+ return $type_i;
}
$type_i = $types_to_go[$i];
}
if ( $type_i eq 'b' ) {
$i--;
if ( $i < $ibeg ) {
- return wantarray ? ( $type_i, $ibeg ) : $type_i;
+ return $type_i;
}
$type_i = $types_to_go[$i];
}
{
$type_i = 'b';
}
- return wantarray ? ( $type_i, $i ) : $type_i;
+ return $type_i;
} ## end sub terminal_type_i
sub pad_array_to_go {
+ my ($self) = @_;
+
# To simplify coding in break_lists and set_bond_strengths, it helps to
# create some extra blank tokens at the end of the arrays. We also add
# some undef's to help guard against using invalid data.
- my ($self) = @_;
$K_to_go[ $max_index_to_go + 1 ] = undef;
$tokens_to_go[ $max_index_to_go + 1 ] = EMPTY_STRING;
$tokens_to_go[ $max_index_to_go + 2 ] = EMPTY_STRING;
# Nesting depths are set to be >=0 in sub write_line, so it should
# not be possible to get here unless the code has a bracing error
# which leaves a closing brace with zero nesting depth.
- unless ( get_saw_brace_error() ) {
+ if ( !get_saw_brace_error() ) {
if (DEVEL_MODE) {
Fault(<<EOM);
Program bug in pad_array_to_go: hit nesting error which should have been caught
elsif ( $is_opening_type{ $types_to_go[$max_index_to_go] } ) {
$nesting_depth_to_go[ $max_index_to_go + 1 ] += 1;
}
+ else {
+ ## must be ? or :
+ }
return;
} ## end sub pad_array_to_go
+sub break_method_call_chains {
+
+ my ( $self, $ri_left, $ri_right ) = @_;
+
+ # If there is a break at any member of a method call chain, break
+ # at each method call in the chain (all or none logic). git #171.
+
+ # Given:
+ # $ri_first - reference to list of the first index $i for each output
+ # line in this batch
+ # $ri_last - reference to list of the last index $i for each output line
+ # in this batch
+
+ return unless ( %{ $self->[_rseqno_arrow_call_chain_start_] } );
+
+ # Look for '->' breakpoints
+ my @i_arrow_breaks;
+ my $rlist = !$want_break_before{'->'} ? $ri_right : $ri_left;
+ foreach my $ii ( @{$rlist} ) {
+ if ( $types_to_go[$ii] eq '->' ) { push @i_arrow_breaks, $ii }
+ }
+ return unless (@i_arrow_breaks);
+
+ # See if these are part of a call chain
+ my @insert_list;
+ my %is_end_i;
+ @is_end_i{ @{$ri_left} } = (1) x scalar( @{$ri_left} );
+ @is_end_i{ @{$ri_right} } = (1) x scalar( @{$ri_right} );
+ my $one = !$want_break_before{'->'} ? 0 : 1;
+ foreach my $ii (@i_arrow_breaks) {
+
+ my $ip = iprev_to_go($ii);
+ next if ( $ip < 0 || $tokens_to_go[$ip] ne ')' );
+ my $seqno = $type_sequence_to_go[$ip];
+ my $seqno_start = $self->[_rseqno_arrow_call_chain_start_]->{$seqno};
+ next unless ($seqno_start);
+
+ # Found a call chain...
+ my @Klist = @{ $self->[_rarrow_call_chain_]->{$seqno_start} };
+ my $Kref = $K_to_go[0];
+ foreach my $KK (@Klist) {
+
+ # Add missing breaks
+ my $i_K = $KK - $Kref;
+ next if ( $i_K <= 0 || $i_K >= $max_index_to_go );
+ next if ( $is_end_i{$i_K} );
+ if ( $K_to_go[$i_K] != $KK ) {
+ ## shouldn't happen due to previous checks on i vs K
+ DEVEL_MODE && Fault(<<EOM);
+ unexpected array offset error i=$i_K K=$KK Kref= $Kref
+EOM
+ next;
+ }
+ push @insert_list, $i_K - $one;
+ }
+ }
+
+ # Insert any new break points
+ if (@insert_list) {
+ $self->insert_additional_breaks( \@insert_list, $ri_left, $ri_right );
+ }
+ return;
+} ## end sub break_method_call_chains
+
sub break_all_chain_tokens {
- # scan the current breakpoints looking for breaks at certain "chain
+ my ( $self, $ri_left, $ri_right ) = @_;
+
+ # Scan the current breakpoints looking for breaks at certain "chain
# operators" (. : && || + etc) which often occur repeatedly in a long
# statement. If we see a break at any one, break at all similar tokens
# within the same container.
- #
- my ( $self, $ri_left, $ri_right ) = @_;
+
+ # Given:
+ # $ri_first - reference to list of the first index $i for each output
+ # line in this batch
+ # $ri_last - reference to list of the last index $i for each output line
+ # in this batch
my %saw_chain_type;
my %left_chain_type;
my %right_chain_type;
my %interior_chain_type;
+ my @insert_list;
my $nmax = @{$ri_right} - 1;
# scan the left and right end tokens of all lines
- my $count = 0;
+ my $end_count = 0;
for my $n ( 0 .. $nmax ) {
my $il = $ri_left->[$n];
my $ir = $ri_right->[$n];
next if ( $typel eq '?' );
push @{ $left_chain_type{$keyl} }, $il;
$saw_chain_type{$keyl} = 1;
- $count++;
+ $end_count++;
}
if ( $is_chain_operator{$keyr} && !$want_break_before{$typer} ) {
next if ( $typer eq '?' );
push @{ $right_chain_type{$keyr} }, $ir;
$saw_chain_type{$keyr} = 1;
- $count++;
+ $end_count++;
}
}
- return unless $count;
+
+ return unless $end_count;
# now look for any interior tokens of the same types
- $count = 0;
+ my $interior_count = 0;
my $has_interior_dot_or_plus;
for my $n ( 0 .. $nmax ) {
my $il = $ri_left->[$n];
$key = '*' if ( $key eq '/' );
if ( $saw_chain_type{$key} ) {
push @{ $interior_chain_type{$key} }, $i;
- $count++;
+ $interior_count++;
$has_interior_dot_or_plus ||= ( $key eq '.' || $key eq '+' );
}
}
}
- return unless $count;
+ return unless $interior_count;
my @keys = keys %saw_chain_type;
}
# now make a list of all new break points
- my @insert_list;
# loop over all chain types
foreach my $key (@keys) {
sub insert_additional_breaks {
- # this routine will add line breaks at requested locations after
+ my ( $self, $ri_break_list, $ri_first, $ri_last ) = @_;
+
+ # This routine will add line breaks at requested locations after
# sub break_long_lines has made preliminary breaks.
- my ( $self, $ri_break_list, $ri_first, $ri_last ) = @_;
+ # Given:
+ # $ri_break_list = list of index locations for additional breaks
+ # $ri_first - reference to current list of the first index $i for each
+ # output line in this batch
+ # $ri_last - reference to current list of the last index $i for each
+ # output line in this batch
+
my $i_f;
my $i_l;
my $line_number = 0;
$i_f = $ri_first->[$line_number];
$i_l = $ri_last->[$line_number];
- while ( $i_break_left >= $i_l ) {
+ while ( $i_l <= $i_break_left ) {
$line_number++;
# shouldn't happen unless caller passes bad indexes
}
$i_f = $ri_first->[$line_number];
$i_l = $ri_last->[$line_number];
- }
+ } ## end while ( $i_l <= $i_break_left)
# Do not leave a blank at the end of a line; back up if necessary
if ( $types_to_go[$i_break_left] eq 'b' ) { $i_break_left-- }
sub in_same_container_i {
- # Check to see if tokens at i1 and i2 are in the same container, and
+ my ( $self, $i1, $i2 ) = @_;
+
+ # Check to see if tokens at $i1 and $i2 are in the same container, and
# not separated by certain characters: => , ? : || or
# This is an interface between the _to_go arrays to the rLL array
- my ( $self, $i1, $i2 ) = @_;
# quick check
my $parent_seqno_1 = $parent_seqno_to_go[$i1];
return if ( $parent_seqno_to_go[$i2] ne $parent_seqno_1 );
if ( $i2 < $i1 ) { ( $i1, $i2 ) = ( $i2, $i1 ) }
- my $K1 = $K_to_go[$i1];
- my $K2 = $K_to_go[$i2];
- my $rLL = $self->[_rLL_];
+ my $K1 = $K_to_go[$i1];
+ my $K2 = $K_to_go[$i2];
my $depth_1 = $nesting_depth_to_go[$i1];
return if ( $depth_1 < 0 );
my $type_1 = $types_to_go[$i1];
my $rbreak = ( $type_1 ne ':' ) ? $ris_break_token : $ris_comma_token;
+ my $rK_next_seqno_by_K = $self->[_rK_next_seqno_by_K_];
+
# Fast preliminary loop to verify that tokens are in the same container
- my $KK = $K1;
- while (1) {
- $KK = $rLL->[$KK]->[_KNEXT_SEQ_ITEM_];
- last if !defined($KK);
+ my $KK = $K1;
+ my $Knext_last = $KK;
+ while ( defined( $KK = $rK_next_seqno_by_K->[$KK] ) ) {
+
+ if ( $KK <= $Knext_last ) {
+ ## shouldn't happen: $rK_next_seqno_by_K is corrupted
+ DEVEL_MODE && Fault(<<EOM);
+Knext should not increase: Knext_last=$Knext_last >= Knext=$KK
+EOM
+ last;
+ }
+ $Knext_last = $KK;
+
last if ( $KK >= $K2 );
my $ii = $i1 + $KK - $K1;
my $depth_i = $nesting_depth_to_go[$ii];
my $tok_i = $tokens_to_go[$ii];
return if ( $tok_i eq '?' || $tok_i eq ':' );
}
- }
+ } ## end while ( defined( $KK = $rK_next_seqno_by_K...))
# Slow loop checking for certain characters
my $depth_i = $nesting_depth_to_go[$ii];
next if ( $depth_i > $depth_1 );
return if ( $depth_i < $depth_1 );
- my $tok_i = $tokens_to_go[$ii];
- return if ( $rbreak->{$tok_i} );
+ my $typ_i = $types_to_go[$ii];
+ if ( $typ_i eq 'k' ) { $typ_i = $tokens_to_go[$ii] }
+ return if ( $rbreak->{$typ_i} );
}
return 1;
} ## end sub in_same_container_i
sub break_equals {
+ my ( $self, $ri_left, $ri_right ) = @_;
+
# Look for assignment operators that could use a breakpoint.
+
+ # Given:
+ # $ri_first - reference to current list of the first index $i for each
+ # output line in this batch
+ # $ri_last - reference to current list of the last index $i for each
+ # output line in this batch
+
# For example, in the following snippet
#
# $HOME = $ENV{HOME}
# The logic here follows the logic in set_logical_padding, which
# will add the padding in the second line to improve alignment.
#
- my ( $self, $ri_left, $ri_right ) = @_;
my $nmax = @{$ri_right} - 1;
- return unless ( $nmax >= 2 );
+ return if ( $nmax < 2 );
# scan the left ends of first two lines
my $tokbeg = EMPTY_STRING;
for my $n ( 1 .. 2 ) {
my $il_n = $ri_left->[$n];
my $ir_n = $ri_right->[$n];
- foreach my $i ( $il_n + 1 .. $ir_n ) {
- my $type = $types_to_go[$i];
+ foreach my $ii ( $il_n + 1 .. $ir_n ) {
+ my $type = $types_to_go[$ii];
return
if ( $is_assignment{$type}
- && $nesting_depth_to_go[$i] eq $depth_beg );
+ && $nesting_depth_to_go[$ii] eq $depth_beg );
}
}
sub Debug_dump_breakpoints {
- # Debug routine to dump current breakpoints...not normally called
- # We are given indexes to the current lines:
- # $ri_beg = ref to array of BEGinning indexes of each line
- # $ri_end = ref to array of ENDing indexes of each line
my ( $self, $ri_beg, $ri_end, $msg ) = @_;
- print STDERR "----Dumping breakpoints from: $msg----\n";
+
+ # Debug routine to dump current breakpoints...not normally called
+ # Given: indexes to the current lines:
+ # $ri_beg = ref to array of BEGinning indexes of each line
+ # $ri_end = ref to array of ENDing indexes of each line
+ print {*STDOUT} "----Dumping breakpoints from: $msg----\n";
for my $n ( 0 .. @{$ri_end} - 1 ) {
my $ibeg = $ri_beg->[$n];
my $iend = $ri_end->[$n];
foreach my $i ( $ibeg .. $iend ) {
$text .= $tokens_to_go[$i];
}
- print STDERR "$n ($ibeg:$iend) $text\n";
+ print {*STDOUT} "$n ($ibeg:$iend) $text\n";
}
- print STDERR "----\n";
+ print {*STDOUT} "----\n";
return;
} ## end sub Debug_dump_breakpoints
sub delete_one_line_semicolons {
my ( $self, $ri_beg, $ri_end ) = @_;
+
+ # Given: indexes to the current lines:
+ # $ri_beg = ref to array of beginning indexes of each line
+ # $ri_end = ref to array of ending indexes of each line
+
my $rLL = $self->[_rLL_];
my $K_opening_container = $self->[_K_opening_container_];
my $ibeg_2 = $ri_beg->[$nn];
# Define certain good joint tokens
- my ( $itok, $itokp, $itokm );
+ my $itok;
foreach my $itest ( $iend_1, $ibeg_2 ) {
my $type = $types_to_go[$itest];
if ( $is_math_op{$type}
my $num_sections = @{$rsections};
if ( DEBUG_RECOMBINE > 1 ) {
- print STDERR <<EOM;
+ print {*STDOUT} <<EOM;
sections=$num_sections; nmax_sec=$nmax_section
EOM
}
if ( DEBUG_RECOMBINE > 0 ) {
my $max = 0;
- print STDERR
+ print {*STDOUT}
"-----\n$num_sections sections found for nmax=$nmax_start\n";
foreach my $sect ( @{$rsections} ) {
my ( $nbeg, $nend ) = @{$sect};
my $num = $nend - $nbeg;
if ( $num > $max ) { $max = $num }
- print STDERR "$nbeg $nend\n";
+ print {*STDOUT} "$nbeg $nend\n";
}
- print STDERR "max size=$max of $nmax_start lines\n";
+ print {*STDOUT} "max size=$max of $nmax_start lines\n";
}
# Loop over all sub-sections. Note that we have to work backwards
# from the end of the batch since the sections use original line
# numbers, and the line numbers change as we go.
- while ( my $section = pop @{$rsections} ) {
+ foreach my $section ( reverse @{$rsections} ) {
my ( $nbeg, $nend ) = @{$section};
$self->recombine_section_loop(
{
_has_terminal_semicolon => $has_terminal_semicolon,
}
);
- }
+ } ## end while ( my $section = pop...)
return;
} ## end sub recombine_breakpoints
# };
+ #-------------
+ # How it works
+ #-------------
+
+ # We are working with a sequence of output lines and looking at
+ # each pair. We must decide if it is better to join each of
+ # these line pairs.
+
+ # The brute force method is to loop through all line pairs and
+ # join the best possible pair, as determined by either some
+ # logical criterion or by the maximum 'bond strength' assigned
+ # to the joining token. Then keep doing this until there are
+ # no remaining line pairs to join.
+
+ # This works, but a problem is that it can theoretically take
+ # on the order of N^2 comparisons in some pathological cases.
+ # This can require an excessive amount of run time.
+
+ # We can avoid excessive run time by conceptually dividing the
+ # work into two phases. In the first phase we make any joints
+ # required by user settings or logic other than the strength of
+ # joints. In the second phase we make any remaining joints
+ # based on strengths. To do this optimally, we do a preliminary
+ # sort on joint strengths and always loop in that order. That
+ # way, we can stop a search on the first joint strength because
+ # it will be the maximum.
+
+ # This method is very fast, requiring no more than 3*N line
+ # comparisons, where N is the number of lines (see below).
+
my $ri_beg = $rhash->{_ri_beg};
my $ri_end = $rhash->{_ri_end};
# Iteration limit
#----------------
- # This was originally an O(n-squared) loop which required a check on
- # the maximum number of iterations for safety. It is now a very fast
- # loop which runs in O(n) time, but a check on total number of
- # iterations is retained to guard against future programming errors.
+ # This is now a very fast loop which runs in O(n) time, but a
+ # check on total number of iterations is retained to guard
+ # against future programming errors.
# Most cases require roughly 1 comparison per line pair (1 full pass).
# The upper bound is estimated to be about 3 comparisons per line pair
# The most extreme cases in my collection are:
# camel1.t - needs 2.7 compares per line (12 without optimization)
# ternary.t - needs 2.8 compares per line (12 without optimization)
- # So a value of MAX_COMPARE_RATIO = 3 looks like an upper bound as
+ # c206 - needs 3.3 compares per line, found with random testing
+ # So a value of MAX_COMPARE_RATIO = 4 looks like an upper bound as
# long as optimization is used. A value of 20 should allow all code to
# pass even if optimization is turned off for testing.
-
- # The OPTIMIZE_OK flag should be true except for testing.
- use constant MAX_COMPARE_RATIO => 20;
- use constant OPTIMIZE_OK => 1;
+ use constant MAX_COMPARE_RATIO => DEVEL_MODE ? 4 : 20;
my $num_pairs = $nend - $nbeg + 1;
my $max_compares = MAX_COMPARE_RATIO * $num_pairs;
last;
}
- } ## end iteration loop
+ } ## end while (1)
if (DEBUG_RECOMBINE) {
my $ratio = sprintf "%0.3f", $rhash->{_num_compares} / $num_pairs;
- print STDERR
+ print {*STDOUT}
"exiting recombine_inner_loop with $nmax_last lines, opt=$rhash->{_optimization_on}, starting pairs=$num_pairs, num_compares=$rhash->{_num_compares}, ratio=$ratio\n";
}
sub recombine_inner_loop {
my ( $self, $rhash ) = @_;
- # This is the inner loop of the recombine operation. We look at all of
- # the remaining joints in this section and select the best joint to be
- # recombined. If a recombination is made, the number of lines
- # in this section will be reduced by one.
+ # This is the inner loop of the recombine operation. We are working on
+ # a sequence of multiple lines. We look at each pair of lines and
+ # decide if formatting would be improved if the pair were joined
+ # into a single line. If there are multiple of such possible
+ # recombinations, we select the best. If a recombination is made,
+ # the number of lines in this group of lines will be reduced by one.
+ # See comments in the calling routine for further explanation.
+ # Input:
+ # $rhash has parameters controlling this recombine operation
# Returns: nothing
my $rK_weld_right = $self->[_rK_weld_right_];
my $ix_best = 0;
my $num_bs = 0;
- # The range of lines in this group is $nbeg to $nstop
+ # The index range of lines in this group is $nbeg to $nstop
my $nmax = @{$ri_end} - 1;
my $nstop = $nmax - $rhash->{_num_freeze};
my $num_joints = $nstop - $nbeg;
my $type_ibeg_2 = $types_to_go[$ibeg_2];
DEBUG_RECOMBINE > 1 && do {
- print STDERR
+ print {*STDOUT}
"RECOMBINE: ix=$ix iend1=$iend_1 iend2=$iend_2 n=$n nmax=$nmax if=$ibeg_1 type=$type_ibeg_1 =$tokens_to_go[$ibeg_1] next_type=$type_ibeg_2 next_tok=$tokens_to_go[$ibeg_2]\n";
};
# do not recombine if we would skip in indentation levels
if ( $n < $nmax ) {
- my $if_next = $ri_beg->[ $n + 1 ];
+
+ my $if_next = $ri_beg->[ $n + 1 ];
+ my $level_1 = $levels_to_go[$ibeg_1];
+ my $level_2 = $levels_to_go[$ibeg_2];
+ my $level_if_next = $levels_to_go[$if_next];
+
next
if (
- $levels_to_go[$ibeg_1] < $levels_to_go[$ibeg_2]
- && $levels_to_go[$ibeg_2] < $levels_to_go[$if_next]
+ $level_1 < $level_2
+ && $level_2 < $level_if_next
# but an isolated 'if (' is undesirable
&& !(
# we have seen a good break on strength, and
&& $num_bs
- # we are allowed to optimize
- && OPTIMIZE_OK
-
)
{
+
+ # To deactivate optimization for testing purposes, the next
+ # line can be commented out. This will increase run time.
$rhash->{_optimization_on} = 1;
if (DEBUG_RECOMBINE) {
my $num_compares = $rhash->{_num_compares};
my $pair_count = @ix_list;
- print STDERR
+ print {*STDOUT}
"Entering optimization phase at $num_compares compares, pair count = $pair_count\n";
}
}
my $nmax = @{$ri_end} - 1;
my $ibeg_1 = $ri_beg->[ $n - 1 ];
my $iend_1 = $ri_end->[ $n - 1 ];
- my $ibeg_2 = $ri_beg->[$n];
+## my $ibeg_2 = $ri_beg->[$n];
my $iend_2 = $ri_end->[$n];
if ($itok) {
elsif ( $is_assignment{$type} ) {
##TBD
- } ## end assignment
+ }
+ else {
+ # not a special type
+ }
+ ## end assignment
}
# ok to combine lines
my ( $ri_beg, $ri_end, $n, $this_line_is_semicolon_terminated ) = @_;
# Recombine Section 2:
- # Examine token at $iend_1 (right end of first line of pair)
+ # We are comparing two lines to see if they should be combined
+ # into a single line. This sub examines the token '$iend_1' in
+ # the following diagram (right end of first line of pair):
# Here are Indexes of the endpoint tokens of the two lines:
#
$skip_Section_3 ||= 1;
}
- return
- unless (
+ my $keep_going = (
$skip_Section_3
- # handle '.' and '?' specially below
- || ( $type_ibeg_2 =~ /^[\.\?]$/ )
+ # handle '.' and '?' specially below
+ || ( $type_ibeg_2 =~ /^[\.\?]$/ )
- # fix for c054 (unusual -pbp case)
- || $type_ibeg_2 eq '=='
+ # fix for c054 (unusual -pbp case)
+ || $type_ibeg_2 eq '=='
+ );
- );
+ return unless ($keep_going);
}
elsif ( $type_iend_1 eq '{' ) {
if ( $levels_to_go[$ibeg_1] ne $levels_to_go[$ibeg_2] );
# do not recombine unless next line ends in :
- return unless $type_iend_2 eq ':';
+ return unless ( $type_iend_2 eq ':' );
}
# for lines ending in a comma...
if ( $type_ibeg_1 eq '}'
&& $type_ibeg_2 eq 'i' )
{
- return
- unless ( ( $ibeg_1 == ( $iend_1 - 1 ) )
- && ( $iend_2 == ( $ibeg_2 + 1 ) )
- && $this_line_is_semicolon_terminated );
+ my $combine_ok =
+ ( ( $ibeg_1 == ( $iend_1 - 1 ) )
+ && ( $iend_2 == ( $ibeg_2 + 1 ) )
+ && $this_line_is_semicolon_terminated );
+
+ return if ( !$combine_ok );
# override breakpoint
$forced_breakpoint_to_go[$iend_1] = 0;
# do not recombine after a comma unless this will
# leave just 1 more line
- return unless ( $n + 1 >= $nmax );
+ return if ( $n + 1 < $nmax );
# do not recombine if there is a change in
# indentation depth
&& $ibeg_2 == $iend_2
&& token_sequence_length( $ibeg_2, $ibeg_2 ) <
$rOpts_short_concatenation_item_length );
- my $is_ternary = (
+ my $is_ternary_joint = (
$type_ibeg_1 eq '?' && ( $ibeg_3 >= 0
&& $types_to_go[$ibeg_3] eq ':' )
);
# will put ?/: at start of adjacent lines
if ( $ibeg_1 != $iend_1
&& !$is_short_quote
- && !$is_ternary )
+ && !$is_ternary_joint )
{
- return
- unless (
+ my $combine_ok = (
(
# unless we can reduce this to two lines
$nmax < $n + 2
- # or three lines, the last with a leading
- # semicolon
- || ( $nmax == $n + 2
+ # or three lines, the last with a leading
+ # semicolon
+ || ( $nmax == $n + 2
&& $types_to_go[$ibeg_nmax] eq ';' )
- # or the next line ends with a here doc
- || $type_iend_2 eq 'h'
+ # or the next line ends with a here doc
+ || $type_iend_2 eq 'h'
- # or the next line ends in an open paren or
- # brace and the break hasn't been forced
- # [dima.t]
- || ( !$forced_breakpoint_to_go[$iend_1]
+ # or the next line ends in an open paren or
+ # brace and the break hasn't been forced
+ # [dima.t]
+ || (!$forced_breakpoint_to_go[$iend_1]
&& $type_iend_2 eq '{' )
)
- # do not recombine if the two lines might align
- # well this is a very approximate test for this
- && (
+ # do not recombine if the two lines might align
+ # well this is a very approximate test for this
+ && (
# RT#127633 - the leading tokens are not
# operators
# or they are different
|| ( $ibeg_3 >= 0
&& $type_ibeg_2 ne $types_to_go[$ibeg_3] )
- )
- );
+ )
+ );
+
+ return if ( !$combine_ok );
if (
}
}
- unless ( $tokens_to_go[$ibeg_2] =~ /^[\{\(\[]$/ ) {
+ if ( $tokens_to_go[$ibeg_2] !~ /^[\{\(\[]$/ ) {
$forced_breakpoint_to_go[$iend_1] = 0;
}
}
$summed_lengths_to_go[$ibeg_2];
my $iend_1_minus = max( $ibeg_1, iprev_to_go($iend_1) );
- return
- unless (
+ my $combine_ok = (
# ... unless there is just one and we can reduce
# this to two lines if we do. For example, this
# check for 2 lines, not in a long broken '.' chain
( $n == 2 && $n == $nmax && $type_iend_1 ne $type_iend_2 )
- # ... or this would strand a short quote , like this
- # "some long quote" .
- # "\n";
- || (
+ # ... or this would strand a short quote , like this
+ # "some long quote" .
+ # "\n";
+ || (
$types_to_go[$i_next_nonblank] eq 'Q'
&& $i_next_nonblank >= $iend_2 - 2
&& $token_lengths_to_go[$i_next_nonblank] <
# additional constraints to fix c167
&& ( $types_to_go[$iend_1_minus] ne 'Q'
|| $summed_len_2 < $summed_len_1 )
- )
- );
+ )
+ );
+ return if ( !$combine_ok );
+
+ # added for issue c352
+ if ($this_line_is_semicolon_terminated) {
+ $forced_breakpoint_to_go[$iend_1] = 0;
+ }
+ }
+ else {
+ # not a special type
}
return ( 1, $skip_Section_3 );
} ## end sub recombine_section_2
# Scan line ibeg_2 to $iend_2 up to last token for complexity.
# We are not counting the last token in case it is an opening paren.
+
+ # Given:
+ # $ri_end - ref to list of indexes of line-ending tokens
+ # $n = current line index
+ # $nmax = maximum line index
+ # ($ibeg_2, $iend_2) = index range of line to scan
+
# Return:
# true if rhs is simple, ok to recombine
# false otherwise
my ( $ri_beg, $ri_end, $n, $this_line_is_semicolon_terminated ) = @_;
# Recombine Section 3:
- # Examine token at $ibeg_2 (right end of first line of pair)
+ # We are comparing two lines to see if they should be combined
+ # into a single line. This sub examines the token '$ibeg_2' in
+ # the following diagram (left end of second line of pair):
# Here are Indexes of the endpoint tokens of the two lines:
#
&& $types_to_go[$ii] eq ':'
&& $levels_to_go[$ii] == $lev;
}
- return unless ( $local_count > 1 );
+ return if ( $local_count <= 1 );
}
$forced_breakpoint_to_go[$iend_1] = 0;
}
my $summed_len_2 = $summed_lengths_to_go[ $iend_2 + 1 ] -
$summed_lengths_to_go[$ibeg_2];
- return
- unless (
+ my $combine_ok = (
# ... unless there is just one and we can reduce
# this to two lines if we do. For example, this
( $n == 2 && $n == $nmax && $type_ibeg_1 ne $type_ibeg_2 )
- # ... or this would strand a short quote , like this
- # . "some long quote"
- # . "\n";
- || (
+ # ... or this would strand a short quote , like this
+ # . "some long quote"
+ # . "\n";
+ || (
$types_to_go[$i_next_nonblank] eq 'Q'
&& $i_next_nonblank >= $iend_2 - 1
&& $token_lengths_to_go[$i_next_nonblank] <
&& $n == $nmax
&& $this_line_is_semicolon_terminated )
)
- )
- );
+ )
+ );
+
+ return if ( !$combine_ok );
+
+ # added for issue c352
+ if ($this_line_is_semicolon_terminated) {
+ $forced_breakpoint_to_go[$iend_1] = 0;
+ }
}
# handle leading keyword..
# handle leading "or"
if ( $tokens_to_go[$ibeg_2] eq 'or' ) {
- return
- unless (
+
+ my $combine_ok = (
$this_line_is_semicolon_terminated
- && (
+ && (
$type_ibeg_1 eq '}'
|| (
# then combine everything together
&& ( $iend_2 - $ibeg_2 <= 7 )
)
- )
- );
+ )
+ );
+
+ return if ( !$combine_ok );
#X: RT #81854
$forced_breakpoint_to_go[$iend_1] = 0
- unless ( $old_breakpoint_to_go[$iend_1] );
+ if ( !$old_breakpoint_to_go[$iend_1] );
}
# handle leading 'and' and 'xor'
# if !$this->{Parents}{$_}
# or $this->{Parents}{$_} eq $_;
#
- return
- unless (
- $this_line_is_semicolon_terminated
- && (
+ my $combine_ok = $this_line_is_semicolon_terminated
+ && (
- # following 'if' or 'unless' or 'or'
- $type_ibeg_1 eq 'k'
- && ( $is_if_unless{ $tokens_to_go[$ibeg_1] }
- || $tokens_to_go[$ibeg_1] eq 'or' )
- )
+ # following 'if' or 'unless' or 'or'
+ $type_ibeg_1 eq 'k'
+ && ( $is_if_unless{ $tokens_to_go[$ibeg_1] }
+ || $tokens_to_go[$ibeg_1] eq 'or' )
);
+ return if ( !$combine_ok );
}
# handle leading "if" and "unless"
# if ( $lang !~ /${l}$/i );
# into:
# next if ( $lang !~ /${l}$/i );
- return
- unless (
- $this_line_is_semicolon_terminated
+ my $combine_ok = $this_line_is_semicolon_terminated
- # previous line begins with 'and' or 'or'
- && $type_ibeg_1 eq 'k'
- && $is_and_or{ $tokens_to_go[$ibeg_1] }
-
- );
+ # previous line begins with 'and' or 'or'
+ && $type_ibeg_1 eq 'k'
+ && $is_and_or{ $tokens_to_go[$ibeg_1] };
+ return if ( !$combine_ok );
}
# handle all other leading keywords
# keywords look best at start of lines,
# but combine things like "1 while"
- unless ( $is_assignment{$type_iend_1} ) {
+ if ( !$is_assignment{$type_iend_1} ) {
return
if ( ( $type_iend_1 ne 'k' )
&& ( $tokens_to_go[$ibeg_2] ne 'while' ) );
# maybe looking at something like:
# unless $TEXTONLY || $item =~ m%</?(hr>|p>|a|img)%i;
+ my $combine_ok = $this_line_is_semicolon_terminated
- return
- unless (
- $this_line_is_semicolon_terminated
-
- # previous line begins with an 'if' or 'unless'
- # keyword
- && $type_ibeg_1 eq 'k'
- && $is_if_unless{ $tokens_to_go[$ibeg_1] }
+ # previous line begins with an 'if' or 'unless'
+ # keyword
+ && $type_ibeg_1 eq 'k'
+ && $is_if_unless{ $tokens_to_go[$ibeg_1] };
- );
+ return if ( !$combine_ok );
}
# handle line with leading = or similar
elsif ( $is_assignment{$type_ibeg_2} ) {
return unless ( $n == 1 || $n == $nmax );
return if ( $old_breakpoint_to_go[$iend_1] );
- return
- unless (
+ my $combine_ok = (
- # unless we can reduce this to two lines
+ # if we can reduce this to two lines
$nmax == 2
- # or three lines, the last with a leading semicolon
- || ( $nmax == 3 && $types_to_go[$ibeg_nmax] eq ';' )
+ # or three lines, the last with a leading semicolon
+ || ( $nmax == 3 && $types_to_go[$ibeg_nmax] eq ';' )
- # or the next line ends with a here doc
- || $type_iend_2 eq 'h'
+ # or the next line ends with a here doc
+ || $type_iend_2 eq 'h'
- # or this is a short line ending in ;
- || ( $n == $nmax
+ # or this is a short line ending in ;
+ || ( $n == $nmax
&& $this_line_is_semicolon_terminated )
- );
+ );
+ return if ( !$combine_ok );
$forced_breakpoint_to_go[$iend_1] = 0;
+ }
+ else {
+
}
return ( 1, $bs_tweak );
} ## end sub recombine_section_3
# Called once per batch to look for and do any final line breaks for
# long ternary chains
+ # Given:
+ # $ri_left = ref to array with token indexes of the left line ends
+ # $ri_right = ref to array with token indexes of the right line ends
my $nmax = @{$ri_right} - 1;
my $typer = $types_to_go[$ir];
return if ( $typel eq '?' );
return if ( $typer eq '?' );
- if ( $typel eq ':' ) { $i_first_colon = $il; last; }
- elsif ( $typer eq ':' ) { $i_first_colon = $ir; last; }
+ if ( $typel eq ':' ) { $i_first_colon = $il; last; }
+ if ( $typer eq ':' ) { $i_first_colon = $ir; last; }
}
# For long ternary chains,
# This routine is called once per batch to implement the parameters
# --break-before-hash-brace, etc.
+ # Given:
+ # $ri_left = ref to array with token indexes of the left line ends
+ # $ri_right = ref to array with token indexes of the right line ends
+
# Nothing to do if none of these parameters has been set
return unless %break_before_container_types;
my $nmax = @{$ri_right} - 1;
- return unless ( $nmax >= 0 );
+ return if ( $nmax < 0 );
my $rLL = $self->[_rLL_];
for my $n ( 0 .. $nmax ) {
my $il = $ri_left->[$n];
my $ir = $ri_right->[$n];
- next unless ( $ir > $il );
+ next if ( $ir <= $il );
my $Kl = $K_to_go[$il];
my $Kr = $K_to_go[$ir];
my $Kend = $Kr;
}
my $token = $rLL->[$Kend]->[_TOKEN_];
- next unless ( $is_opening_token{$token} );
- next unless ( $Kl < $Kend - 1 );
+ next if ( !$is_opening_token{$token} );
+ next if ( $Kl >= $Kend - 1 );
my $seqno = $rLL->[$Kend]->[_TYPE_SEQUENCE_];
- next unless ( defined($seqno) );
+ next if ( !defined($seqno) );
# Use the flag which was previously set
next unless ( $rbreak_before_container_by_seqno->{$seqno} );
sub correct_lp_indentation {
+ my ($self) = @_;
+
# When the -lp option is used, we need to make a last pass through
# each line to correct the indentation positions in case they differ
# from the predictions. This is necessary because perltidy uses a
# predictor is usually good, but sometimes stumbles. The corrector
# tries to patch things up once the actual opening paren locations
# are known.
- my ( $self, $ri_first, $ri_last ) = @_;
+
+ my $this_batch = $self->[_this_batch_];
+
+ my $ri_first = $this_batch->[_ri_first_];
+ my $ri_last = $this_batch->[_ri_last_];
# first remove continuation indentation if appropriate
my $max_line = @{$ri_first} - 1;
# blocks may be too long when given -lp indentation. We will fix that now
# if possible, using the list of these closing block indexes.
my $ri_starting_one_line_block =
- $self->[_this_batch_]->[_ri_starting_one_line_block_];
+ $this_batch->[_ri_starting_one_line_block_];
if ( @{$ri_starting_one_line_block} ) {
$self->correct_lp_indentation_pass_1( $ri_first, $ri_last,
$ri_starting_one_line_block );
#
# We leave it to the aligner to decide how to do this.
if ( $line == 1 && $i == $ibeg ) {
- $self->[_this_batch_]->[_do_not_pad_] = 1;
+ $this_batch->[_do_not_pad_] = 1;
}
#--------------------------------------------
$actual_pos = $predicted_pos;
- my ( $indent, $offset, $is_leading, $exists ) =
+ my ( $indent, $offset, $is_leading_uu, $exists_uu ) =
get_saved_opening_indentation($align_seqno);
if ( defined($indent) ) {
# So some of the one-line blocks may be too long when given -lp
# indentation. We will fix that now if possible, using the list of these
# closing block indexes.
+ # Given:
+ # $ri_first - reference to current list of the first index $i for each
+ # output line in this batch
+ # $ri_last - reference to current list of the last index $i for each
+ # output line in this batch
+ # $ri_starting_one_line_block = list of indexes starting 1-line blocks
my @ilist = @{$ri_starting_one_line_block};
return unless (@ilist);
my $max_line = @{$ri_first} - 1;
- my $inext = shift(@ilist);
+ my $inext = shift @ilist;
# loop over lines, checking length of each with a one-line block
my ( $ibeg, $iend );
if ( $available_spaces > 0 ) {
my $delete_want = min( $available_spaces, $excess );
- my $deleted_spaces =
+ my $deleted_spaces_uu =
$self->reduce_lp_indentation( $ibeg, $delete_want );
$available_spaces = $self->get_available_spaces_to_go($ibeg);
}
sub undo_lp_ci {
+ my ( $self, $line_open, $i_start, $closing_index, $ri_first, $ri_last ) =
+ @_;
+
# If there is a single, long parameter within parens, like this:
#
# $self->command( "/msg "
# . $infoline->chan
# . " You said $1, but did you know that it's square was "
# . $1 * $1 . " ?");
+ # Given:
+ # $line_open = index of line with opening paren
+ # $i_start = index of token at end of starting line ["/msg" above]
+ # $closing_index = index of the closing token
+ # $ri_first - reference to current list of the first index $i for each
+ # output line in this batch
+ # $ri_last - reference to current list of the last index $i for each
+ # output line in this batch
- my ( $self, $line_open, $i_start, $closing_index, $ri_first, $ri_last ) =
- @_;
my $max_line = @{$ri_first} - 1;
# must be multiple lines
- return unless $max_line > $line_open;
+ return if ( $max_line <= $line_open );
my $lev_start = $levels_to_go[$i_start];
my $ci_start_plus = 1 + $ci_levels_to_go[$i_start];
return if ( $lev_start != $levels_to_go[$ibeg] );
return if ( $ci_start_plus != $ci_levels_to_go[$ibeg] );
last if ( $closing_index <= $iend );
- }
+ } ## end while ( ++$n <= $max_line)
# we can reduce the indentation of all continuation lines
my $continuation_line_count = $n - $line_open;
return;
} ## end sub undo_lp_ci
-###############################################
-# CODE SECTION 10: Code to break long statments
-###############################################
+################################################
+# CODE SECTION 10: Code to break long statements
+################################################
use constant DEBUG_BREAK_LINES => 0;
sub break_long_lines {
+ my ( $self, $saw_good_break, $rcolon_list, $rbond_strength_bias ) = @_;
+
#-----------------------------------------------------------
# Break a batch of tokens into lines which do not exceed the
# maximum line length.
#-----------------------------------------------------------
- my ( $self, $saw_good_break, $rcolon_list, $rbond_strength_bias ) = @_;
-
# Input parameters:
# $saw_good_break - a flag set by break_lists
# $rcolon_list - ref to a list of all the ? and : tokens in the batch,
if ( $ii >= 0 && $ii <= $max_index_to_go ) {
$rbond_strength_to_go->[$ii] += $bias;
}
- elsif (DEVEL_MODE) {
- my $KK = $K_to_go[0];
- my $lno = $self->[_rLL_]->[$KK]->[_LINE_INDEX_];
- Fault(
+ else {
+ if (DEVEL_MODE) {
+ my $KK = $K_to_go[0];
+ my $lno = $self->[_rLL_]->[$KK]->[_LINE_INDEX_];
+ Fault(
"Bad bond strength bias near line $lno: i=$ii must be between 0 and $max_index_to_go\n"
- );
+ );
+ }
}
}
}
#------------------------------------------------------------------
# Find the best next breakpoint based on token-token bond strengths
#------------------------------------------------------------------
- my ( $i_lowest, $lowest_strength, $leading_alignment_type, $Msg ) =
+ my ( $i_lowest, $lowest_strength, $Msg ) =
$self->break_lines_inner_loop(
$i_begin,
elsif ( $types_to_go[$i_lowest] eq '?' ) {
$self->set_closing_breakpoint($i_lowest);
}
+ else {
+ # not at a '?'
+ }
#--------------------------------------------------------
# ?/: rule 3 : if we break at a ':' then we save
elsif ( $types_to_go[$i_lowest] eq ':' ) {
push @i_colon_breaks, $i_lowest;
}
+ else {
+ # not at a ':'
+ }
# here we should set breaks for all '?'/':' pairs which are
# separated by this line
}
+ # Fix two-line shear (c406)
+ my $i_next_nonblank = $inext_to_go[$i_lowest];
+ if ( $tokens_to_go[$i_next_nonblank] eq ')' ) {
+
+ # Example of a '2 line shear':
+
+ # $wrapped->add_around_modifier(
+ # sub { push @tracelog => 'around 1'; $_[0]->(); } );
+
+ # If we try formatting this with increasing line lengths, the
+ # break based on bond strengths is after the '(' until the closing
+ # paren is just beyond the line length limit. In that case, it can
+ # switch to being just before the ')'. This is rare, and may be
+ # undesirable because it can cause unexpected formatting
+ # variations between similar code, and worse, instability with
+ # trailing commas. So we check for that here and put the break
+ # back after the opening '(' if the ')' is not preceded by a ','.
+ # Issue c406.
+ my $i_prev = iprev_to_go($i_next_nonblank);
+ my $i_opening = $mate_index_to_go[$i_next_nonblank];
+ if ( $types_to_go[$i_prev] ne ','
+ && defined($i_opening)
+ && $i_opening > $i_last_break )
+ {
+ # set a forced breakpoint to block recombination
+ $i_lowest = $i_opening;
+ $forced_breakpoint_to_go[$i_lowest] = 1;
+ }
+ }
+
+ #--------------------------------------------------
# guard against infinite loop (should never happen)
+ #--------------------------------------------------
if ( $i_lowest <= $i_last_break ) {
DEVEL_MODE
&& Fault("i_lowest=$i_lowest <= i_last_break=$i_last_break\n");
}
DEBUG_BREAK_LINES
- && print STDOUT
+ && print {*STDOUT}
"BREAK: best is i = $i_lowest strength = $lowest_strength;\nReason>> $Msg\n";
$line_count++;
if ( ( $i_begin <= $imax ) && ( $types_to_go[$i_begin] eq 'b' ) ) {
$i_begin++;
}
- }
+ } ## end while ( $i_begin <= $imax)
#-------------------------------------------------
# END of main loop to set continuation breakpoints
use constant TINY_BIAS => 0.0001;
use constant MAX_BIAS => 0.001;
+my %is_dot_and_or;
+
+BEGIN {
+ my @q = qw( . && || );
+ @is_dot_and_or{@q} = (1) x scalar(@q);
+}
+
sub break_lines_inner_loop {
- #-----------------------------------------------------------------
# Find the best next breakpoint in index range ($i_begin .. $imax)
# which, if possible, does not exceed the maximum line length.
- #-----------------------------------------------------------------
my (
- $self, #
+ $self,
$i_begin,
$i_last_break,
# Returns:
# $i_lowest = index of best breakpoint
# $lowest_strength = 'bond strength' at best breakpoint
- # $leading_alignment_type = special token type after break
# $Msg = string of debug info
my $Msg = EMPTY_STRING;
# Do not separate an isolated bare word from an opening paren.
# Alternate Fix #2 for issue b1299. This waits as long as possible
# to make the decision.
+ # Note for fix #c250: to keep line breaks unchanged under -extrude when
+ # switching from 'i' to 'S' for subs, we would have to also check 'S', i.e.
+ # =~/^[Si]$/. But this was never necessary at a sub signature, so we leave
+ # it alone and allow the new version to be different for --extrude. For a
+ # test file run perl527/signatures.t with --extrude.
if ( $types_to_go[$i_begin] eq 'i'
&& substr( $tokens_to_go[$i_begin], 0, 1 ) =~ /\w/ )
{
&& ( $nesting_depth_to_go[$i_begin] >
$nesting_depth_to_go[$i_next_nonblank] )
&& (
- $next_nonblank_type =~ /^(\.|\&\&|\|\|)$/
+ ## /^(\.|\&\&|\|\|)$/
+ $is_dot_and_or{$next_nonblank_type}
|| (
$next_nonblank_type eq 'k'
(
$next_nonblank_block_type =~ /$SUB_PATTERN/
- || $next_nonblank_block_type =~ /$ASUB_PATTERN/
+ || $matches_ASUB{$next_nonblank_block_type}
)
&& ( $nesting_depth_to_go[$i_begin] ==
$nesting_depth_to_go[$i_next_nonblank] )
}
if ( length($ltok) > 6 ) { $ltok = substr( $ltok, 0, 8 ) }
if ( length($rtok) > 6 ) { $rtok = substr( $rtok, 0, 8 ) }
- print STDOUT
+ print {*STDOUT}
"BREAK: i=$i_test imax=$imax $types_to_go[$i_test] $next_nonblank_type sp=($leading_spaces) lnext= $summed_lengths_to_go[$i_testp2] str=$strength $ltok $rtok\n";
};
};
last;
}
- }
+ } ## end while ( ++$i_test <= $imax)
#-----------------------------------------------------
# End INNER_LOOP over the indexes in the _to_go arrays
# We will break at imax if no other break was found.
if ( $i_lowest < 0 ) { $i_lowest = $imax }
- return ( $i_lowest, $lowest_strength, $leading_alignment_type, $Msg );
+ return ( $i_lowest, $lowest_strength, $Msg );
} ## end sub break_lines_inner_loop
sub do_colon_breaks {
+
my ( $self, $ri_colon_breaks, $ri_first, $ri_last ) = @_;
- # using a simple method for deciding if we are in a ?/: chain --
- # this is a chain if it has multiple ?/: pairs all in order;
- # otherwise not.
- # Note that if line starts in a ':' we count that above as a break
+ # Given:
+ # $ri_colon_breaks = ref to list of indexes breaks at ':' tokens
+ # $ri_first - reference to current list of the first index $i for each
+ # output line in this batch
+ # $ri_last - reference to current list of the last index $i for each
+ # output line in this batch
+
+ # Add additional breaks if we are in a ?/: chain.
+ # Simplified method used here: This is a ?/: chain if it has
+ # multiple ?/: pairs all in order; otherwise not.
my @insert_list = ();
foreach ( @{$ri_colon_breaks} ) {
# These routines and variables are involved in finding good
# places to break long lists.
- use constant DEBUG_BREAK_LISTS => 0;
-
my (
$block_type,
$current_depth,
$depth,
- $i,
$i_last_colon,
$i_line_end,
$i_line_start,
@item_count_stack,
@last_comma_index,
@last_dot_index,
- @last_nonblank_type,
+ @last_nonblank_type_stack,
@old_breakpoint_count_stack,
@opening_structure_index_stack,
@rfor_semicolon_list,
# boost tol for combination -lp -xci
if ($rOpts_extended_continuation_indentation) {
$lp_tol_boost = 2;
+
+ # and one more for -lp -xci -vmll (b1470, b1474, b1266)
+ if ($rOpts_variable_maximum_line_length) {
+ $lp_tol_boost = max( 2, $rOpts_indent_columns );
+ }
}
# boost tol for combination -lp and any -vtc > 0, but only for
# a new depth
sub check_for_new_minimum_depth {
my ( $self, $depth_t, $seqno ) = @_;
+
+ # Initialize for a new minimum depth
+
+ # Given:
+ # $depth_t = new depth
+ # $seqno = sequence number of the parent container
if ( $depth_t < $minimum_depth ) {
$minimum_depth = $depth_t;
# these arrays need not retain values between calls
+ my $old_seqno = $type_sequence_stack[$depth_t];
+ my $changed_seqno = !defined($old_seqno) || $old_seqno != $seqno;
$type_sequence_stack[$depth_t] = $seqno;
$override_cab3[$depth_t] = undef;
if ( $rOpts_comma_arrow_breakpoints == 3 && $seqno ) {
$override_cab3[$depth_t] = $self->[_roverride_cab3_]->{$seqno};
}
- $breakpoint_stack[$depth_t] = $starting_breakpoint_count;
- $container_type[$depth_t] = EMPTY_STRING;
- $identifier_count_stack[$depth_t] = 0;
- $index_before_arrow[$depth_t] = -1;
- $interrupted_list[$depth_t] = 1;
- $item_count_stack[$depth_t] = 0;
- $last_nonblank_type[$depth_t] = EMPTY_STRING;
+ $breakpoint_stack[$depth_t] = $starting_breakpoint_count;
+ $container_type[$depth_t] = EMPTY_STRING;
+ $identifier_count_stack[$depth_t] = 0;
+ $index_before_arrow[$depth_t] = -1;
+ $interrupted_list[$depth_t] = 1;
+ $item_count_stack[$depth_t] = 0;
+ $last_nonblank_type_stack[$depth_t] = EMPTY_STRING;
$opening_structure_index_stack[$depth_t] = -1;
$breakpoint_undo_stack[$depth_t] = undef;
$i_equals[$depth_t] = -1;
# these arrays must retain values between calls
- if ( !defined( $has_broken_sublist[$depth_t] ) ) {
+ if ( $changed_seqno || !defined( $has_broken_sublist[$depth_t] ) ) {
$dont_align[$depth_t] = 0;
$has_broken_sublist[$depth_t] = 0;
$want_comma_break[$depth_t] = 0;
return;
} ## end sub check_for_new_minimum_depth
- # routine to decide which commas to break at within a container;
- # returns:
- # $bp_count = number of comma breakpoints set
- # $do_not_break_apart = a flag indicating if container need not
- # be broken open
sub set_comma_breakpoints {
- my ( $self, $dd, $rbond_strength_bias ) = @_;
+ my ( $self, $i, $dd, $rbond_strength_bias ) = @_;
+
+ # Decide which commas to break at within a container
+ # Given:
+ # $i = index of current token in main loop over tokens, or
+ # = $max_index_to_go + 1 for post-loop operations (c410)
+ # $dd = stack depth
+ # $rbond-strength_bias = ref to array of bond strength biases which
+ # may be updated for commas not in lists
+ # Return:
+ # $bp_count = number of comma breakpoints set
+ # $do_not_break_apart = a flag indicating if container need not
+ # be broken open
my $bp_count = 0;
my $do_not_break_apart = 0;
# always open comma lists not preceded by keywords,
# barewords, identifiers (that is, anything that doesn't
# look like a function call)
- my $must_break_open = $last_nonblank_type[$dd] !~ /^[kwiU]$/;
+ # c250: added new sub identifier type 'S'
+ my $must_break_open =
+ $last_nonblank_type_stack[$dd] !~ /^[kwiUS]$/;
$self->table_maker(
{
$bp_count = $forced_breakpoint_count - $fbc;
$do_not_break_apart = 0 if $must_break_open;
}
+ else {
+ ## no real commas, nothing to do
+ }
}
return ( $bp_count, $do_not_break_apart );
} ## end sub set_comma_breakpoints
BEGIN {
- my @q = qw< k R } ) ] Y Z U w i q Q .
- = **= += *= &= <<= &&= -= /= |= >>= ||= //= .= %= ^= x=>;
+ my @q = qw# k R } ) ] Y Z U w i q Q .
+ = **= += *= &= <<= &&= -= /= |= >>= ||= //= .= %= ^= x= #;
@is_uncontained_comma_break_included_type{@q} = (1) x scalar(@q);
} ## end BEGIN
sub do_uncontained_comma_breaks {
- # Handle commas not in containers...
+ my ( $self, $dd, $rbond_strength_bias ) = @_;
+
+ # Handle commas not in containers
+
+ # Given:
+ # $dd = depth of this layer of commas
+ # $rbond_strength_bias = array of bond strengths to be updated
+
# This is a catch-all routine for commas that we
# don't know what to do with because the don't fall
# within containers. We will bias the bond strength
# won't work very well. However, the user can always
# prevent following the old breakpoints with the
# -iob flag.
- my ( $self, $dd, $rbond_strength_bias ) = @_;
# Check added for issue c131; an error here would be due to an
# error initializing @comma_index when entering depth $dd.
} ## end sub do_uncontained_comma_breaks
my %is_logical_container;
- my %quick_filter;
+ my %quick_filter_A;
+ my %quick_filter_B;
BEGIN {
my @q = qw# if elsif unless while and or err not && | || ? : ! #;
@is_logical_container{@q} = (1) x scalar(@q);
- # This filter will allow most tokens to skip past a section of code
- %quick_filter = %is_assignment;
- @q = qw# => . ; < > ~ #;
+ # Filters to allow most tokens to skip past tedious if-elsif blocks
+ %quick_filter_A = %is_assignment;
+ @q = qw( || && f k );
+ @quick_filter_A{@q} = (1) x scalar(@q);
+
+ %quick_filter_B = %is_assignment;
+ @q = qw# => . ; < > ~ #;
push @q, ',';
push @q, 'f'; # added for ';' for issue c154
- @quick_filter{@q} = (1) x scalar(@q);
+ @quick_filter_B{@q} = (1) x scalar(@q);
+
} ## end BEGIN
sub set_for_semicolon_breakpoints {
my ( $self, $dd ) = @_;
+
+ # Given:
+ # $dd = depth of this layer
+
+ # Set breakpoints for semicolons in C-style 'for' containers
foreach ( @{ $rfor_semicolon_list[$dd] } ) {
$self->set_forced_breakpoint($_);
}
return;
} ## end sub set_for_semicolon_breakpoints
- sub set_logical_breakpoints {
- my ( $self, $dd ) = @_;
+ sub set_logical_breakpoints {
+ my ( $self, $dd ) = @_;
+
+ # Given:
+ # $dd = depth of this layer
+
+ # Set breakpoints at logical operators
if (
$item_count_stack[$dd] == 0
&& $is_logical_container{ $container_type[$dd] }
# Look for breaks in this order:
# 0 1 2 3
# or and || &&
- foreach my $i ( 0 .. 3 ) {
- if ( $rand_or_list[$dd][$i] ) {
- foreach ( @{ $rand_or_list[$dd][$i] } ) {
+ foreach my $ii ( 0 .. 3 ) {
+ if ( $rand_or_list[$dd][$ii] ) {
+ foreach ( @{ $rand_or_list[$dd][$ii] } ) {
$self->set_forced_breakpoint($_);
}
sub is_unbreakable_container {
- # never break a container of one of these types
- # because bad things can happen (map1.t)
my $dd = shift;
+
+ # Given:
+ # $dd = depth of this layer
+ # Return:
+ # true if the container should not be broken
+ # false otherwise
+
+ # never break a container of one of these types
+ # because bad things can happen (map1.t):
return $is_sort_map_grep{ $container_type[$dd] };
} ## end sub is_unbreakable_container
my ( $self, $is_long_line, $rbond_strength_bias ) = @_;
- #--------------------------------------------------------------------
# This routine is called once per batch, if the batch is a list, to
# set line breaks so that hierarchical structure can be displayed and
- # so that list items can be vertically aligned. The output of this
- # routine is stored in the array @forced_breakpoint_to_go, which is
- # used by sub 'break_long_lines' to set final breakpoints. This is
- # probably the most complex routine in perltidy, so I have
- # broken it into pieces and over-commented it.
- #--------------------------------------------------------------------
+ # so that list items can be vertically aligned.
+
+ # Given:
+ # $is_long_line = true if this batch requires multiple output lines
+ # $rbond_strength_bias = array of bond strengths to be updated
+ # Task:
+ # Update the array @forced_breakpoint_to_go with breakpoints.
+ # This array is used by sub 'break_long_lines' to set final
+ # breakpoints.
+
+ # This is probably the most complex routine in perltidy,
+ # so I have broken it into pieces and over-commented it.
$starting_depth = $nesting_depth_to_go[0];
$block_type = SPACE;
$current_depth = $starting_depth;
- $i = -1;
$i_last_colon = -1;
$i_line_end = -1;
$i_line_start = -1;
#----------------------------------------
# Main loop over all tokens in this batch
#----------------------------------------
- while ( ++$i <= $max_index_to_go ) {
+ foreach my $i ( 0 .. $max_index_to_go ) {
if ( $type ne 'b' ) {
$i_last_nonblank_token = $i - 1;
$last_nonblank_type = $type;
$last_nonblank_token = $token;
$last_nonblank_block_type = $block_type;
}
- $type = $types_to_go[$i];
+
+ # set break if flag was set
+ if ( $i_want_previous_break >= 0 ) {
+ $self->set_forced_breakpoint($i_want_previous_break);
+ $i_want_previous_break = -1;
+ }
+
+ $type = $types_to_go[$i];
+ next if ( $type eq 'b' );
+
+ $last_old_breakpoint_count = $old_breakpoint_count;
+
$block_type = $block_type_to_go[$i];
$token = $tokens_to_go[$i];
$type_sequence = $type_sequence_to_go[$i];
# Loop Section A: Look for special breakpoints...
#-------------------------------------------
- # set break if flag was set
- if ( $i_want_previous_break >= 0 ) {
- $self->set_forced_breakpoint($i_want_previous_break);
- $i_want_previous_break = -1;
- }
-
- $last_old_breakpoint_count = $old_breakpoint_count;
-
# Check for a good old breakpoint ..
if ( $old_breakpoint_to_go[$i] ) {
( $i_want_previous_break, $i_old_assignment_break ) =
- $self->examine_old_breakpoint( $i_next_nonblank,
+ $self->examine_old_breakpoint( $i, $i_next_nonblank,
$i_want_previous_break, $i_old_assignment_break );
}
- next if ( $type eq 'b' );
-
$depth = $nesting_depth_to_go[ $i + 1 ];
$total_depth_variation += abs( $depth - $depth_last );
# remember locations of '||' and '&&' for possible breaks if we
# decide this is a long logical expression.
- if ( $type eq '||' ) {
- push @{ $rand_or_list[$depth][2] }, $i;
- ++$has_old_logical_breakpoints[$depth]
- if ( ( $i == $i_line_start || $i == $i_line_end )
- && $rOpts_break_at_old_logical_breakpoints );
- }
- elsif ( $type eq '&&' ) {
- push @{ $rand_or_list[$depth][3] }, $i;
- ++$has_old_logical_breakpoints[$depth]
- if ( ( $i == $i_line_start || $i == $i_line_end )
- && $rOpts_break_at_old_logical_breakpoints );
- }
- elsif ( $type eq 'f' ) {
- push @{ $rfor_semicolon_list[$depth] }, $i;
- }
- elsif ( $type eq 'k' ) {
- if ( $token eq 'and' ) {
- push @{ $rand_or_list[$depth][1] }, $i;
+ if ( $quick_filter_A{$type} ) {
+ if ( $type eq '||' ) {
+ push @{ $rand_or_list[$depth][2] }, $i;
++$has_old_logical_breakpoints[$depth]
if ( ( $i == $i_line_start || $i == $i_line_end )
&& $rOpts_break_at_old_logical_breakpoints );
}
-
- # break immediately at 'or's which are probably not in a logical
- # block -- but we will break in logical breaks below so that
- # they do not add to the forced_breakpoint_count
- elsif ( $token eq 'or' ) {
- push @{ $rand_or_list[$depth][0] }, $i;
+ elsif ( $type eq '&&' ) {
+ push @{ $rand_or_list[$depth][3] }, $i;
++$has_old_logical_breakpoints[$depth]
if ( ( $i == $i_line_start || $i == $i_line_end )
&& $rOpts_break_at_old_logical_breakpoints );
- if ( $is_logical_container{ $container_type[$depth] } ) {
+ }
+ elsif ( $type eq 'f' ) {
+ push @{ $rfor_semicolon_list[$depth] }, $i;
+ }
+ elsif ( $type eq 'k' ) {
+ if ( $token eq 'and' ) {
+ push @{ $rand_or_list[$depth][1] }, $i;
+ ++$has_old_logical_breakpoints[$depth]
+ if ( ( $i == $i_line_start || $i == $i_line_end )
+ && $rOpts_break_at_old_logical_breakpoints );
}
- else {
- if ($is_long_line) { $self->set_forced_breakpoint($i) }
- elsif ( ( $i == $i_line_start || $i == $i_line_end )
+
+ # break immediately at 'or's which are probably not in a
+ # logical block -- but we will break in logical breaks
+ # below so that they do not add to the
+ # forced_breakpoint_count
+ elsif ( $token eq 'or' ) {
+ push @{ $rand_or_list[$depth][0] }, $i;
+ ++$has_old_logical_breakpoints[$depth]
+ if ( ( $i == $i_line_start || $i == $i_line_end )
+ && $rOpts_break_at_old_logical_breakpoints );
+ if ( $is_logical_container{ $container_type[$depth] } )
+ {
+ }
+ else {
+ if ($is_long_line) {
+ $self->set_forced_breakpoint($i);
+ }
+ elsif ( ( $i == $i_line_start || $i == $i_line_end )
+ && $rOpts_break_at_old_logical_breakpoints )
+ {
+ $saw_good_breakpoint = 1;
+ }
+ else {
+ ## not a good break
+ }
+ }
+ }
+ elsif ( $token eq 'if' || $token eq 'unless' ) {
+ push @{ $rand_or_list[$depth][4] }, $i;
+ if ( ( $i == $i_line_start || $i == $i_line_end )
&& $rOpts_break_at_old_logical_breakpoints )
{
- $saw_good_breakpoint = 1;
+ $self->set_forced_breakpoint($i);
}
}
- }
- elsif ( $token eq 'if' || $token eq 'unless' ) {
- push @{ $rand_or_list[$depth][4] }, $i;
- if ( ( $i == $i_line_start || $i == $i_line_end )
- && $rOpts_break_at_old_logical_breakpoints )
- {
- $self->set_forced_breakpoint($i);
+ else {
+ ## not one of: 'and' 'or' 'if' 'unless'
}
}
- }
- elsif ( $is_assignment{$type} ) {
- $i_equals[$depth] = $i;
+ elsif ( $is_assignment{$type} ) {
+ $i_equals[$depth] = $i;
+ }
+ else {
+ # error : no code to handle a type in %quick_filter_A
+ DEVEL_MODE && Fault(<<EOM);
+Missing code to handle token type '$type' which is in the quick_filter_A
+EOM
+ }
}
- #-----------------------------------------
- # Loop Section B: Handle a sequenced token
- #-----------------------------------------
if ($type_sequence) {
- $self->break_lists_type_sequence;
- }
- #------------------------------------------
- # Loop Section C: Handle Increasing Depth..
- #------------------------------------------
+ #-----------------------------------------
+ # Loop Section B: Handle a sequenced token
+ #-----------------------------------------
+ $self->break_lists_type_sequence($i);
- # hardened against bad input syntax: depth jump must be 1 and type
- # must be opening..fixes c102
- if ( $depth == $current_depth + 1 && $is_opening_type{$type} ) {
- $self->break_lists_increasing_depth();
- }
+ #------------------------------------------
+ # Loop Section C: Handle Increasing Depth..
+ #------------------------------------------
- #------------------------------------------
- # Loop Section D: Handle Decreasing Depth..
- #------------------------------------------
+ # hardened against bad input syntax: depth jump must be 1 and
+ # type must be opening..fixes c102
+ if ( $depth == $current_depth + 1 && $is_opening_type{$type} ) {
+ $self->break_lists_increasing_depth($i);
+ }
+
+ #------------------------------------------
+ # Loop Section D: Handle Decreasing Depth..
+ #------------------------------------------
- # hardened against bad input syntax: depth jump must be 1 and type
- # must be closing .. fixes c102
- elsif ( $depth == $current_depth - 1 && $is_closing_type{$type} ) {
+ # hardened against bad input syntax: depth jump must be 1 and
+ # type must be closing .. fixes c102
+ elsif ($depth == $current_depth - 1
+ && $is_closing_type{$type} )
+ {
- $self->break_lists_decreasing_depth();
+ # Note that $rbond_strength_bias will not get changed by
+ # this call. It gets changed in the call to
+ # set_comma_breakpoints at the end of this routine for
+ # commas not in lists.
+ $self->break_lists_decreasing_depth( $i,
+ $rbond_strength_bias );
- $comma_follows_last_closing_token =
- $next_nonblank_type eq ',' || $next_nonblank_type eq '=>';
+ $comma_follows_last_closing_token =
+ $next_nonblank_type eq ',' || $next_nonblank_type eq '=>';
+ }
+ else {
+ ## not a depth change
+ }
}
#----------------------------------
$current_depth = $depth;
# most token types can skip the rest of this loop
- next unless ( $quick_filter{$type} );
+ next if ( !$quick_filter_B{$type} );
+
+ # Turn off comma alignment if we are sure that this is not a list
+ # environment. To be safe, we will do this if we see certain
+ # non-list tokens, such as ';', '=', and also the environment is
+ # not a list.
+ ## $type =~ /^[\;\<\>\~f]$/ || $is_assignment{$type}
+ if ( $is_non_list_type{$type} ) {
+ if ( !$self->is_in_list_by_i($i) ) {
+ $dont_align[$depth] = 1;
+ $want_comma_break[$depth] = 0;
+ $index_before_arrow[$depth] = -1;
+
+ # no special comma breaks in C-style 'for' terms (c154)
+ if ( $type eq 'f' ) { $last_comma_index[$depth] = undef }
+ }
+ }
+
+ # handle any commas
+ elsif ( $type eq ',' ) {
+ $self->study_comma( $i, $comma_follows_last_closing_token );
+ }
# handle comma-arrow
- if ( $type eq '=>' ) {
+ elsif ( $type eq '=>' ) {
next if ( $last_nonblank_type eq '=>' );
next if $rOpts_break_at_old_comma_breakpoints;
next
$want_comma_break[$depth] = 1;
$index_before_arrow[$depth] = $i_last_nonblank_token;
next;
- }
+ }
elsif ( $type eq '.' ) {
$last_dot_index[$depth] = $i;
- }
-
- # Turn off comma alignment if we are sure that this is not a list
- # environment. To be safe, we will do this if we see certain
- # non-list tokens, such as ';', '=', and also the environment is
- # not a list.
- ## $type =~ /^[\;\<\>\~f]$/ || $is_assignment{$type}
- elsif ( $is_non_list_type{$type}
- && !$self->is_in_list_by_i($i) )
- {
- $dont_align[$depth] = 1;
- $want_comma_break[$depth] = 0;
- $index_before_arrow[$depth] = -1;
- # no special comma breaks in C-style 'for' terms (c154)
- if ( $type eq 'f' ) { $last_comma_index[$depth] = undef }
}
+ else {
- # now just handle any commas
- next if ( $type ne ',' );
- $self->study_comma($comma_follows_last_closing_token);
-
- } ## end while ( ++$i <= $max_index_to_go)
+ # error : no code to handle a type in %quick_filter_B
+ DEVEL_MODE && Fault(<<EOM);
+Missing code to handle token type '$type' which is in the quick_filter_B
+EOM
+ }
- #-------------------------------------------
- # END of loop over all tokens in this batch
- # Now set breaks for any unfinished lists ..
- #-------------------------------------------
+ } ## end main loop over tokens
+ #----------------------------------------
+ # Now set breaks for any unfinished lists
+ #----------------------------------------
foreach my $dd ( reverse( $minimum_depth .. $current_depth ) ) {
$interrupted_list[$dd] = 1;
$has_broken_sublist[$dd] = 1 if ( $dd < $current_depth );
- $self->set_comma_breakpoints( $dd, $rbond_strength_bias )
- if ( $item_count_stack[$dd] );
+ if ( $item_count_stack[$dd] ) {
+ $self->set_comma_breakpoints( $max_index_to_go + 1,
+ $dd, $rbond_strength_bias );
+
+ }
$self->set_logical_breakpoints($dd)
if ( $has_old_logical_breakpoints[$dd] );
- $self->set_for_semicolon_breakpoints($dd);
+ $self->set_for_semicolon_breakpoints($dd)
+ if ( @{ $rfor_semicolon_list[$dd] } );
# break open container...
my $i_opening = $opening_structure_index_stack[$dd];
- if ( defined($i_opening) && $i_opening >= 0 ) {
- $self->set_forced_breakpoint($i_opening)
- unless (
- is_unbreakable_container($dd)
-
- # Avoid a break which would place an isolated ' or "
- # on a line
- || ( $type eq 'Q'
- && $i_opening >= $max_index_to_go - 2
- && ( $token eq "'" || $token eq '"' ) )
- );
+ if (
+ defined($i_opening)
+ && $i_opening >= 0
+ && !is_unbreakable_container($dd)
+
+ # Avoid a break which would place an isolated ' or "
+ # on a line
+ && !(
+ $type eq 'Q'
+ && $i_opening >= $max_index_to_go - 2
+ && ( $token eq "'" || $token eq '"' )
+ )
+ )
+ {
+ $self->set_forced_breakpoint($i_opening);
}
} ## end for ( my $dd = $current_depth...)
- #----------------------------------------
- # Return the flag '$saw_good_breakpoint'.
- #----------------------------------------
- # This indicates if the input file had some good breakpoints. This
- # flag will be used to force a break in a line shorter than the
+ #------------------------------------------------
+ # Set the return the flag '$saw_good_breakpoint'.
+ #------------------------------------------------
+ # This flag indicates if the input file had some good breakpoints.
+ # It will be used to force a break in a line shorter than the
# allowed line length.
if ( $has_old_logical_breakpoints[$current_depth] ) {
$saw_good_breakpoint = 1;
{
$saw_good_breakpoint = 1;
}
+ else {
+ ## not a good breakpoint
+ }
return $saw_good_breakpoint;
} ## end sub break_lists
sub study_comma {
- # study and store info for a list comma
+ my ( $self, $i, $comma_follows_last_closing_token ) = @_;
- my ( $self, $comma_follows_last_closing_token ) = @_;
+ # Study and store info for a list comma
+ # Given:
+ # $i = index of this comma in the _to_go output batch array
+ # $comma_follows_last_closing_token = true if it follows ')' '}' or ']'
$last_dot_index[$depth] = undef;
$last_comma_index[$depth] = $i;
# but not if there is a side comment after the comma
if ( $want_comma_break[$depth] ) {
- if ( $next_nonblank_type =~ /^[\)\}\]R]$/ ) {
+ if ( $is_closing_type{$next_nonblank_type} ) {
if ($rOpts_comma_arrow_breakpoints) {
$want_comma_break[$depth] = 0;
return;
# $y - $R, -fill => 'black',
# );
my $ibreak = $index_before_arrow[$depth] - 1;
- if ( $ibreak > 0
- && $tokens_to_go[ $ibreak + 1 ] !~ /^[\)\}\]]$/ )
+ if ( $ibreak > 0
+ && !$is_closing_token{ $tokens_to_go[ $ibreak + 1 ] } )
{
if ( $tokens_to_go[$ibreak] eq '-' ) { $ibreak-- }
if ( $types_to_go[$ibreak] eq 'b' ) { $ibreak-- }
# LIKE_THIS,=> 4,
# );
# This example is for -tso but should be general rule
- if ( $tokens_to_go[ $ibreak + 1 ] ne '->'
- && $tokens_to_go[ $ibreak + 1 ] ne ',' )
+ if ( $types_to_go[ $ibreak + 1 ] ne '->'
+ && $types_to_go[ $ibreak + 1 ] ne ',' )
{
$self->set_forced_breakpoint($ibreak);
}
# Setup filters for detecting very poor breaks to ignore.
# b1097: old breaks after type 'L' and before 'R' are poor
# b1450: old breaks at 'eq' and related operators are poor
- my @q = qw(== <= >= !=);
+ my @q = qw( == <= >= != );
- @{poor_types}{@q} = (1) x scalar(@q);
- @{poor_next_types}{@q} = (1) x scalar(@q);
+ @poor_types{@q} = (1) x scalar(@q);
+ @poor_next_types{@q} = (1) x scalar(@q);
$poor_types{'L'} = 1;
$poor_next_types{'R'} = 1;
- @q = qw(eq ne le ge lt gt);
- @{poor_keywords}{@q} = (1) x scalar(@q);
- @{poor_next_keywords}{@q} = (1) x scalar(@q);
+ @q = qw( eq ne le ge lt gt );
+ @poor_keywords{@q} = (1) x scalar(@q);
+ @poor_next_keywords{@q} = (1) x scalar(@q);
} ## end BEGIN
sub examine_old_breakpoint {
- my ( $self, $i_next_nonblank, $i_want_previous_break,
+ my ( $self, $i, $i_next_nonblank, $i_want_previous_break,
$i_old_assignment_break )
= @_;
if ( $next_nonblank_type eq 'k' ) {
$poor_break ||= $poor_next_keywords{$next_nonblank_token};
}
- else { $poor_break ||= $poor_next_types{$next_nonblank_type} }
+ else {
+ $poor_break ||= $poor_next_types{$next_nonblank_type};
+ }
# Also ignore any high stress level breaks; fixes b1395
$poor_break ||= $levels_to_go[$i] >= $high_stress_level;
elsif ( $is_assignment{$next_nonblank_type} ) {
$i_old_assignment_break = $i_next_nonblank;
}
+ else {
+ ## not old assignment break
+ }
RETURN:
return ( $i_want_previous_break, $i_old_assignment_break );
sub break_lists_type_sequence {
- my ($self) = @_;
+ my ( $self, $i ) = @_;
# We have encountered a sequenced token while setting list breakpoints
+ # Given:
+ # $i = index of this token in the _to_go output batch array
+
# if closing type, one of } ) ] :
if ( $is_closing_sequence_token{$token} ) {
sub break_lists_increasing_depth {
- my ($self) = @_;
+ my ( $self, $i ) = @_;
+
+ # Given:
+ # $i = index of this token in the _to_go output batch array
#--------------------------------------------
# prepare for a new list when depth increases
$index_before_arrow[$depth] = -1;
$interrupted_list[$depth] = 0;
$item_count_stack[$depth] = 0;
- $last_nonblank_type[$depth] = $last_nonblank_type;
+ $last_nonblank_type_stack[$depth] = $last_nonblank_type;
$opening_structure_index_stack[$depth] = $i;
$breakpoint_undo_stack[$depth] = $forced_breakpoint_undo_count;
$dont_align[$depth] =
# code BLOCKS are handled at a higher level
- ##( $block_type ne EMPTY_STRING )
$block_type
# certain paren lists
sub break_lists_decreasing_depth {
- my ( $self, $rbond_strength_bias ) = @_;
+ my ( $self, $i, $rbond_strength_bias ) = @_;
+
+ # Given:
+ # $i = index of this token in the _to_go output batch array
+ # $rbond_strength_bias = list of bond strengths to be updated
# We have arrived at a closing container token in sub break_lists:
# the token at index $i is one of these: ')','}', ']'
# formatting styles
# These breaks are made by calling sub 'set_forced_breakpoint'
+ # Note that $rbond_strength_bias is passed to sub
+ # set_comma_breakpoints, but it will not be changed. It only gets
+ # changed by later calls for incomplete lists.
+
$self->check_for_new_minimum_depth( $depth, $parent_seqno_to_go[$i] )
if ( $depth < $minimum_depth );
# Set breaks at commas to display a table of values if appropriate
#-----------------------------------------------------------------
my ( $bp_count, $do_not_break_apart ) = ( 0, 0 );
- ( $bp_count, $do_not_break_apart ) =
- $self->set_comma_breakpoints( $current_depth, $rbond_strength_bias )
- if ( $item_count_stack[$current_depth] );
+ if ( $item_count_stack[$current_depth] ) {
+ ( $bp_count, $do_not_break_apart ) =
+ $self->set_comma_breakpoints( $i, $current_depth,
+ $rbond_strength_bias );
+ }
#-----------------------------------------------------------
# Now set flags needed to decide if we should break open the
{
$is_long_term =
$cab_flag == 4
- || $cab_flag == 0 && $last_nonblank_token eq ','
+ || $cab_flag == 0 && $last_nonblank_type eq ','
|| $cab_flag == 5 && $old_breakpoint_to_go[$i_opening];
}
$last_old_breakpoint_count
# or user wants to form long blocks with arrows
- || $cab_flag == 2
+ # check on _rbreak_container_ added for b1500
+ || ( $cab_flag == 2
+ && !$self->[_rbreak_container_]->{$type_sequence} )
)
# and we made breakpoints between the opening and closing
{
$self->set_forced_breakpoint($i_prev);
}
+ else {
+ ## not a breakpoint
+ }
}
}
my $icomma = $last_comma_index[$depth];
if ( defined($icomma) && ( $i_opening - $icomma ) < 5 ) {
- unless ( $forced_breakpoint_to_go[$icomma] ) {
+ if ( !$forced_breakpoint_to_go[$icomma] ) {
$self->set_forced_breakpoint($icomma);
}
}
# they are complex
set_fake_breakpoint();
}
+ else {
+ ## do not break open
+ }
return;
} ## end sub break_lists_decreasing_depth
BEGIN {
# Added 'w' to fix b1172
- my @q = qw(k w i Z ->);
+ my @q = qw( k w i Z -> );
@is_kwiZ{@q} = (1) x scalar(@q);
# added = for b1211
- @q = qw<( [ { L R } ] ) = b>;
+ @q = qw< ( [ { L R } ] ) = b >;
push @q, ',';
@is_key_type{@q} = (1) x scalar(@q);
} ## end BEGIN
# token.
my ( $self, $i_opening_paren ) = @_;
+ # Given:
+ # $i_opening_paren = index of the opening token in the _to_go arrays
+ # note: it could be any of { [ (
+
# This will be the return index
my $i_opening_minus = $i_opening_paren;
if ( $types_to_go[$i_opening_minus] eq 'b' ) { $i_opening_minus++ }
}
+ # fix for b1467
+ elsif ( $type_prev_nb eq '=' ) {
+ if ( $want_break_before{$type_prev_nb} ) {
+ $i_opening_minus = $iprev_nb;
+ }
+ }
+
+ else {
+ ## previous token not special
+ }
+
DEBUG_FIND_START && print <<EOM;
FIND_START: i=$i_opening_paren tok=$tokens_to_go[$i_opening_paren] => im=$i_opening_minus tok=$tokens_to_go[$i_opening_minus]
EOM
{ ## begin closure table_maker
- my %is_keyword_with_special_leading_term;
-
- BEGIN {
-
- # These keywords have prototypes which allow a special leading item
- # followed by a list
- my @q = qw(
- chmod
- formline
- grep
- join
- kill
- map
- pack
- printf
- push
- sprintf
- unshift
- );
- @is_keyword_with_special_leading_term{@q} = (1) x scalar(@q);
- } ## end BEGIN
-
use constant DEBUG_SPARSE => 0;
sub table_maker {
+ my ( $self, $rhash_IN ) = @_;
+
# Given a list of comma-separated items, set breakpoints at some of
# the commas, if necessary, to make it easy to read.
# This is done by making calls to 'set_forced_breakpoint'.
# $rhash_A: For contents see return from sub 'table_layout_A'
# $rhash_B: For contents see return from sub 'table_layout_B'
- my ( $self, $rhash_IN ) = @_;
-
# Find lengths of all list items needed for calculating page layout
my $rhash_A = table_layout_A($rhash_IN);
return if ( !defined($rhash_A) );
# Rule.
#-------------------------------------------------------------
if ($has_broken_sublist) {
-
$self->apply_broken_sublist_rule( $rhash_A, $interrupted );
-
return;
}
}
}
+ # Increase tol when -atc and -dtc are both used to allow for
+ # possible loss in length on next pass due to a comma. Fixes b1455.
+ if (
+ $rOpts_delete_trailing_commas
+ && $rOpts_add_trailing_commas
+
+ # optional additional restriction which works for b1455:
+ && $rOpts_extended_continuation_indentation
+ && $rOpts_continuation_indentation > $rOpts_indent_columns
+ )
+ {
+ $tol += 1;
+ }
+
+ # c410: check for $i_closing_paren > $max_index_to_go
my $i_opening_minus = $self->find_token_starting_list($i_opening_paren);
- my $excess =
- $self->excess_line_length( $i_opening_minus, $i_closing_paren );
+ my $iend = min( $i_closing_paren, $max_index_to_go );
+ my $excess = $self->excess_line_length( $i_opening_minus, $iend );
return if ( $excess + $tol <= 0 );
#---------------------------------------
my ( $self, $rhash_A, $interrupted ) = @_;
+ # Break at (almost) every comma for a list containing a broken
+ # sublist.
+
my $ritem_lengths = $rhash_A->{_ritem_lengths};
my $ri_term_begin = $rhash_A->{_ri_term_begin};
my $ri_term_end = $rhash_A->{_ri_term_end};
else {
$skipped_count = 0;
my $i_tc = $ri_term_comma->[ $j - 1 ];
- last unless defined $i_tc;
+ last unless defined($i_tc);
$self->set_forced_breakpoint($i_tc);
}
}
) = @_;
- # The number of fields worked out to be negative, so we
- # have to make an emergency fix.
+ # The computed number of table fields is negative, so we have to make
+ # an emergency fix.
my $rcomma_index = $rhash_IN->{rcomma_index};
my $next_nonblank_type = $rhash_IN->{next_nonblank_type};
sub break_multiline_list {
my ( $self, $rhash_IN, $rhash_A, $i_opening_minus ) = @_;
- # Overriden variables
+ # We have a list spanning multiple lines and are trying
+ # to decide the best way to set comma breakpoints.
+
+ # Overridden variables
my $item_count = $rhash_A->{_item_count_A};
my $identifier_count = $rhash_A->{_identifier_count_A};
# Derived variables:
- my $ritem_lengths = $rhash_A->{_ritem_lengths};
- my $ri_term_begin = $rhash_A->{_ri_term_begin};
- my $ri_term_end = $rhash_A->{_ri_term_end};
+## my $ritem_lengths = $rhash_A->{_ritem_lengths};
+## my $ri_term_begin = $rhash_A->{_ri_term_begin};
+## my $ri_term_end = $rhash_A->{_ri_term_end};
my $ri_term_comma = $rhash_A->{_ri_term_comma};
my $rmax_length = $rhash_A->{_rmax_length};
my $comma_count = $rhash_A->{_comma_count};
my $first_term_length = $rhash_A->{_first_term_length};
my $i_first_comma = $rhash_A->{_i_first_comma};
my $i_last_comma = $rhash_A->{_i_last_comma};
- my $i_true_last_comma = $rhash_A->{_i_true_last_comma};
-
- # Veriables received from caller
- my $i_opening_paren = $rhash_IN->{i_opening_paren};
- my $i_closing_paren = $rhash_IN->{i_closing_paren};
- my $rcomma_index = $rhash_IN->{rcomma_index};
- my $next_nonblank_type = $rhash_IN->{next_nonblank_type};
- my $list_type = $rhash_IN->{list_type};
- my $interrupted = $rhash_IN->{interrupted};
+## my $i_true_last_comma = $rhash_A->{_i_true_last_comma};
+
+ # Variables received from caller
+ my $i_opening_paren = $rhash_IN->{i_opening_paren};
+## my $i_closing_paren = $rhash_IN->{i_closing_paren};
+ my $rcomma_index = $rhash_IN->{rcomma_index};
+ my $next_nonblank_type = $rhash_IN->{next_nonblank_type};
+ my $list_type = $rhash_IN->{list_type};
+## my $interrupted = $rhash_IN->{interrupted};
my $rdo_not_break_apart = $rhash_IN->{rdo_not_break_apart};
my $must_break_open = $rhash_IN->{must_break_open};
+
## NOTE: these input vars from caller use the values from rhash_A (see above):
## my $item_count = $rhash_IN->{item_count};
## my $identifier_count = $rhash_IN->{identifier_count};
$item_count = $hash_B->{_item_count_B};
# New variables
- my $columns = $hash_B->{_columns};
- my $formatted_columns = $hash_B->{_formatted_columns};
- my $formatted_lines = $hash_B->{_formatted_lines};
- my $max_width = $hash_B->{_max_width};
- my $new_identifier_count = $hash_B->{_new_identifier_count};
- my $number_of_fields = $hash_B->{_number_of_fields};
- my $odd_or_even = $hash_B->{_odd_or_even};
+ my $columns = $hash_B->{_columns};
+ my $formatted_columns = $hash_B->{_formatted_columns};
+ my $formatted_lines = $hash_B->{_formatted_lines};
+ my $max_width = $hash_B->{_max_width};
+ my $new_identifier_count = $hash_B->{_new_identifier_count};
+ my $number_of_fields = $hash_B->{_number_of_fields};
+## my $odd_or_even = $hash_B->{_odd_or_even};
my $packed_columns = $hash_B->{_packed_columns};
my $packed_lines = $hash_B->{_packed_lines};
my $pair_width = $hash_B->{_pair_width};
# NOTE: we should really use the true break count here,
# which can be greater if there are large terms and
# little space, but usually this will work well enough.
- unless ($must_break_open) {
-
- if ( $break_count <= 1 ) {
- ${$rdo_not_break_apart} = 1;
- }
- elsif ( $is_lp_formatting && !$need_lp_break_open ) {
+ if ( !$must_break_open ) {
+ if ( $break_count <= 1
+ || ( $is_lp_formatting && !$need_lp_break_open ) )
+ {
${$rdo_not_break_apart} = 1;
}
}
# How many spaces across the page will we fill?
my $columns_per_line =
- ( int $number_of_fields / 2 ) * $pair_width +
+ int( $number_of_fields / 2 ) * $pair_width +
( $number_of_fields % 2 ) * $max_width;
- print STDOUT
+ print {*STDOUT}
"SPARSE:cols=$columns commas=$comma_count items:$item_count ids=$identifier_count pairwidth=$pair_width fields=$number_of_fields lines packed: $packed_lines packed_cols=$packed_columns fmtd:$formatted_lines cols /line:$columns_per_line unused:$unused_columns fmtd:$formatted_columns sparsity=$sparsity allow=$max_allowed_sparsity\n";
};
# actually looking back token by token.
if ( !$too_long && $i_opening_paren > 0 && $list_type eq '=>' ) {
my $i_opening_minus_test = $i_opening_paren - 4;
- if ( $i_opening_minus >= 0 ) {
+ if ( $i_opening_minus_test >= 0 ) {
$too_long = $self->excess_line_length( $i_opening_minus_test,
$i_effective_last_comma + 1 ) > 0;
}
$ri_ragged_break_list );
++$break_count if ($use_separate_first_term);
- unless ($must_break_open_container) {
- if ( $break_count <= 1 ) {
- ${$rdo_not_break_apart} = 1;
- }
- elsif ( $is_lp_formatting && !$need_lp_break_open ) {
+ if ( !$must_break_open_container ) {
+ if ( $break_count <= 1
+ || ( $is_lp_formatting && !$need_lp_break_open ) )
+ {
${$rdo_not_break_apart} = 1;
}
}
# Returns:
# - nothing if this list is empty, or
- # - a ref to a hash containg some derived parameters
+ # - a ref to a hash containing some derived parameters
my $i_opening_paren = $rhash_IN->{i_opening_paren};
my $i_closing_paren = $rhash_IN->{i_closing_paren};
( $types_to_go[ $i_last_comma + 1 ] eq 'b' )
? $i_last_comma + 1
: $i_last_comma;
+
+ # NOTE: $i_closing_paren = $max_index_to_go+1 for a list which does
+ # not end in a closing paren. So the following test works (c410)
my $i_e =
( $types_to_go[ $i_closing_paren - 1 ] eq 'b' )
? $i_closing_paren - 2
# Returns:
# - nothing if nothing more to do
- # - a ref to a hash containg some derived parameters
+ # - a ref to a hash containing some derived parameters
# Variables from caller
my $i_opening_paren = $rhash_IN->{i_opening_paren};
${$rdo_not_break_apart} = 1;
}
}
+ else {
+ # break
+ }
}
}
{
$number_of_fields_best = $number_of_fields_max;
}
+ else {
+ # no change
+ }
# If we are crowded and the -lp option is being used, try
# to undo some indentation
{
( $number_of_fields, $number_of_fields_best, $columns ) =
$self->lp_table_fix(
-
- $columns,
- $i_first_comma,
- $max_width,
- $number_of_fields,
- $number_of_fields_best,
- $odd_or_even,
- $pair_width,
- $ritem_lengths,
-
+ {
+ columns => $columns,
+ i_first_comma => $i_first_comma,
+ max_width => $max_width,
+ number_of_fields => $number_of_fields,
+ number_of_fields_best => $number_of_fields_best,
+ odd_or_even => $odd_or_even,
+ pair_width => $pair_width,
+ ritem_lengths => $ritem_lengths,
+ }
);
}
#------------------------------------------------------------------
# How many lines will this require?
- my $formatted_lines = $item_count / ($number_of_fields);
- if ( $formatted_lines != int $formatted_lines ) {
- $formatted_lines = 1 + int $formatted_lines;
+ my $formatted_lines = $item_count / $number_of_fields;
+ if ( $formatted_lines != int($formatted_lines) ) {
+ $formatted_lines = 1 + int($formatted_lines);
}
# So far we've been trying to fill out to the right margin. But
# try to undo some -lp indentation to improve table formatting
- my (
-
- $self, #
-
- $columns,
- $i_first_comma,
- $max_width,
- $number_of_fields,
- $number_of_fields_best,
- $odd_or_even,
- $pair_width,
- $ritem_lengths,
+ my ( $self, $rcall_hash ) = @_;
- ) = @_;
+ my $columns = $rcall_hash->{columns};
+ my $i_first_comma = $rcall_hash->{i_first_comma};
+ my $max_width = $rcall_hash->{max_width};
+ my $number_of_fields = $rcall_hash->{number_of_fields};
+ my $number_of_fields_best = $rcall_hash->{number_of_fields_best};
+ my $odd_or_even = $rcall_hash->{odd_or_even};
+ my $pair_width = $rcall_hash->{pair_width};
+ my $ritem_lengths = $rcall_hash->{ritem_lengths};
my $available_spaces =
$self->get_available_spaces_to_go($i_first_comma);
write_logfile_entry(
"List: auto formatting with $number_of_fields fields/row\n");
+ if ( $number_of_fields < 1 ) {
+ ## shouldn't happen - caller passed bad parameter
+ DEVEL_MODE && Fault("bad number of fields=$number_of_fields\n");
+ return;
+ }
+
my $j_first_break =
$use_separate_first_term
? $number_of_fields
sub study_list_complexity {
+ my ( $self, $ri_term_begin, $ri_term_end, $ritem_lengths, $max_width ) = @_;
+
# Look for complex tables which should be formatted with one term per line.
# Returns the following:
#
# $number_of_fields_best = suggested number of fields based on
# complexity; = 0 if any number may be used.
#
- my ( $self, $ri_term_begin, $ri_term_end, $ritem_lengths, $max_width ) = @_;
my $item_count = @{$ri_term_begin};
my $complex_item_count = 0;
my $number_of_fields_best = $rOpts_maximum_fields_per_table;
my $i_max = @{$ritem_lengths} - 1;
- ##my @item_complexity;
my $i_last_last_break = -3;
my $i_last_break = -2;
elsif ( $types_to_go[$ib] =~ /^[w\-]$/ ) {
$quote_count++;
}
+ else {
+ # not a quote
+ }
if ( $ib eq $ie ) {
if ( $is_quote && $tokens_to_go[$ib] =~ /\s/ ) {
}
}
else {
- if ( grep { $_ eq 'b' } @types_to_go[ $ib .. $ie ] ) {
+ if ( first { $_ eq 'b' } @types_to_go[ $ib .. $ie ] ) {
$complex_item_count++;
$weighted_length *= 2;
}
- if ( grep { $_ eq '..' } @types_to_go[ $ib .. $ie ] ) {
+ if ( first { $_ eq '..' } @types_to_go[ $ib .. $ie ] ) {
$weighted_length += 4;
}
}
# add weight for extra tokens.
$weighted_length += 2 * ( $ie - $ib );
-## my $BUB = join '', @tokens_to_go[$ib..$ie];
-## print "# COMPLEXITY:$weighted_length $BUB\n";
-
-##push @item_complexity, $weighted_length;
-
- # now mark a ragged break after this item it if it is 'long and
- # complex':
+ # mark a ragged break after this item it if it is 'long and complex':
if ( $weighted_length >= $definitely_complex ) {
# if we broke after the previous term
{
pop @i_ragged_break_list;
}
+ else {
+ # ok as is
+ }
}
my $identifier_count = $i_max + 1 - $quote_count;
sub get_maximum_fields_wanted {
+ my ($ritem_lengths) = @_;
+
# Not all tables look good with more than one field of items.
# This routine looks at a table and decides if it should be
# formatted with just one field or not.
# This coding is still under development.
- my ($ritem_lengths) = @_;
-
my $number_of_fields_best = 0;
# For just a few items, we tentatively assume just 1 field.
my $last_length = undef;
my $total_variation_1 = 0;
my $total_variation_2 = 0;
- my @total_variation_2 = ( 0, 0 );
+
+ my @total_variation_2_sums = ( 0, 0 );
foreach my $j ( 0 .. $item_count - 1 ) {
my $ll = $last_length_2[$is_odd];
if ( defined($ll) ) {
my $dl = abs( $length - $ll );
- $total_variation_2[$is_odd] += $dl;
+ $total_variation_2_sums[$is_odd] += $dl;
}
else {
$first_length_2[$is_odd] = $length;
}
$last_length_2[$is_odd] = $length;
}
- $total_variation_2 = $total_variation_2[0] + $total_variation_2[1];
+ $total_variation_2 =
+ $total_variation_2_sums[0] + $total_variation_2_sums[1];
my $factor = ( $item_count > 10 ) ? 1 : ( $item_count > 5 ) ? 0.75 : 0;
- unless ( $total_variation_2 < $factor * $total_variation_1 ) {
+ if ( $total_variation_2 >= $factor * $total_variation_1 ) {
$number_of_fields_best = 1;
}
}
sub compactify_table {
- # given a table with a certain number of fields and a certain number
+ my ( $item_count, $number_of_fields, $formatted_lines, $odd_or_even ) = @_;
+
+ # For a table with a certain number of fields and a certain number
# of lines, see if reducing the number of fields will make it look
# better.
- my ( $item_count, $number_of_fields, $formatted_lines, $odd_or_even ) = @_;
+
+ # Given:
+ # $item_count = count of list items
+ # $number_of_fields = current number of items per line
+ # $formatted_lines = number of lines this will require
+ # $odd_or_even = 1=>odd field count is ok, 2=>want even count
+
+ # Return:
+ # $number_of_fields = updated number of items per line
+
if ( $number_of_fields >= $odd_or_even * 2 && $formatted_lines > 0 ) {
my $min_fields = $number_of_fields;
+ if ( $odd_or_even < 1 ) {
+ ## shouldn't happen - caller passed bad parameter
+ DEVEL_MODE && Fault("bad value for odd_or_even=$odd_or_even\n");
+ return $number_of_fields;
+ }
+
while ($min_fields >= $odd_or_even
&& $min_fields * $formatted_lines >= $item_count )
{
$number_of_fields = $min_fields;
$min_fields -= $odd_or_even;
- }
+ } ## end while ( $min_fields >= $odd_or_even...)
}
return $number_of_fields;
} ## end sub compactify_table
sub set_ragged_breakpoints {
- # Set breakpoints in a list that cannot be formatted nicely as a
- # table.
my ( $self, $ri_term_comma, $ri_ragged_break_list ) = @_;
+ # Set breakpoints in a list that cannot be formatted nicely as a table.
+
my $break_count = 0;
foreach ( @{$ri_ragged_break_list} ) {
my $j = $ri_term_comma->[$_];
sub copy_old_breakpoints {
my ( $self, $i_first_comma, $i_last_comma ) = @_;
+
+ # We are formatting a list and have decided to make comma breaks
+ # the same as in the input file.
+
+ # If the comma style is under certain controls, and if this is a
+ # comma breakpoint with the comma at the beginning of the next
+ # line, then we must pass that index instead. This will allow sub
+ # set_forced_breakpoints to check and follow the user settings. This
+ # produces a uniform style and can prevent instability (b1422).
+ #
+ # The flag '$controlled_comma_style' will be set if the user
+ # entered any of -wbb=',' -wba=',' -kbb=',' -kba=','. It is not
+ # set for the -boc flag, but we will need to treat -boc in the
+ # same way for lists with breaks both before and after commas to
+ # avoid excess iterations.
+
+ my @i_old_breaks;
for my $i ( $i_first_comma .. $i_last_comma ) {
if ( $old_breakpoint_to_go[$i] ) {
+ push @i_old_breaks, $i;
+ }
+ }
- # If the comma style is under certain controls, and if this is a
- # comma breakpoint with the comma is at the beginning of the next
- # line, then we must pass that index instead. This will allow sub
- # set_forced_breakpoints to check and follow the user settings. This
- # produces a uniform style and can prevent instability (b1422).
- #
- # The flag '$controlled_comma_style' will be set if the user
- # entered any of -wbb=',' -wba=',' -kbb=',' -kba=','. It is not
- # needed or set for the -boc flag.
- my $ibreak = $i;
- if ( $types_to_go[$ibreak] ne ',' && $controlled_comma_style ) {
- my $index = $inext_to_go[$ibreak];
- if ( $index > $ibreak && $types_to_go[$index] eq ',' ) {
- $ibreak = $index;
- }
+ # just copy old breakpoints unless $controlled_comma_style or -boc
+ if ( !$controlled_comma_style
+ && !$rOpts_break_at_old_comma_breakpoints )
+ {
+ foreach my $ii (@i_old_breaks) {
+ $self->set_forced_breakpoint($ii);
+ }
+ return;
+ }
+
+ # Scan for commas before and after the old breakpoints...
+ my @i_breaks;
+ my $num_after;
+ my $num_before;
+ foreach my $i (@i_old_breaks) {
+ my $i_break = $i;
+ if ( $types_to_go[$i_break] ne ',' ) {
+ my $index = $inext_to_go[$i_break];
+ if ( $index > $i_break && $types_to_go[$index] eq ',' ) {
+ $i_break = $index;
+ $num_before++;
}
- $self->set_forced_breakpoint($ibreak);
}
+ else { $num_after++; }
+ push @i_breaks, $i_break;
+ }
+
+ # -boc by itself can use old breaks except when there are mixed
+ # leading and trailing commas. In that case excess iterations
+ # can occur (see b878)
+ if ( !$controlled_comma_style
+ && $rOpts_break_at_old_comma_breakpoints )
+ {
+
+ my $mixed = $num_before && $num_after;
+ if ( !$mixed ) {
+ @i_breaks = @i_old_breaks;
+ }
+ }
+
+ foreach my $ii (@i_breaks) {
+ $self->set_forced_breakpoint($ii);
}
+
return;
} ## end sub copy_old_breakpoints
sub set_nobreaks {
my ( $self, $i, $j ) = @_;
+
+ # Given:
+ # $i = starting index in _to_go arrays
+ # $j = ending index in _to_go arrays
+ # Task:
+ # set nobreak_to_go for index range $i .. $j
if ( $i >= 0 && $i <= $j && $j <= $max_index_to_go ) {
0 && do {
- my ( $a, $b, $c ) = caller();
- print STDOUT
-"NOBREAK: forced_breakpoint $forced_breakpoint_count from $a $c with i=$i max=$max_index_to_go type=$types_to_go[$i]\n";
+ my ( $pkg, $file_uu, $lno ) = caller();
+ print {*STDOUT}
+"NOBREAK: forced_breakpoint $forced_breakpoint_count from $pkg $lno with i=$i max=$max_index_to_go type=$types_to_go[$i]\n";
};
@nobreak_to_go[ $i .. $j ] = (1) x ( $j - $i + 1 );
# shouldn't happen; non-critical error
else {
if (DEVEL_MODE) {
- my ( $a, $b, $c ) = caller();
+ my ( $pkg, $file_uu, $lno ) = caller();
Fault(<<EOM);
-NOBREAK ERROR: from $a $c with i=$i j=$j max=$max_index_to_go
+NOBREAK ERROR: from $pkg $lno with i=$i j=$j max=$max_index_to_go
EOM
}
}
# return length of a line of tokens ($ibeg .. $iend)
my ( $ibeg, $iend ) = @_;
- # Start with the leading spaces on this line ...
- my $length = $leading_spaces_to_go[$ibeg];
- if ( ref($length) ) { $length = $length->get_spaces() }
+ # get the leading spaces on this line ...
+ my $spaces = $leading_spaces_to_go[$ibeg];
+ if ( ref($spaces) ) { $spaces = $spaces->get_spaces() }
# ... then add the net token length
- $length +=
- $summed_lengths_to_go[ $iend + 1 ] - $summed_lengths_to_go[$ibeg];
+ return $spaces + $summed_lengths_to_go[ $iend + 1 ] -
+ $summed_lengths_to_go[$ibeg];
- return $length;
} ## end sub total_line_length
sub excess_line_length {
- # return number of characters by which a line of tokens ($ibeg..$iend)
+ my ( $self, $ibeg, $iend, ($ignore_right_weld) ) = @_;
+
+ # Return number of characters by which a line of tokens ($ibeg..$iend)
# exceeds the allowable line length.
+ # Given:
+ # $ibeg, $iend = range of indexes of this line in the _to_go arrays
+ # $ignore_right_weld = optional flag = true to exclude any right weld
# NOTE: profiling shows that efficiency of this routine is essential.
- my ( $self, $ibeg, $iend, $ignore_right_weld ) = @_;
-
# Start with the leading spaces on this line ...
my $excess = $leading_spaces_to_go[$ibeg];
if ( ref($excess) ) { $excess = $excess->get_spaces() }
- # ... then add the net token length, minus the maximum length
- $excess +=
- $summed_lengths_to_go[ $iend + 1 ] -
- $summed_lengths_to_go[$ibeg] -
- $maximum_line_length_at_level[ $levels_to_go[$ibeg] ];
-
# ... and include right weld lengths unless requested not to
if ( $total_weld_count
&& $type_sequence_to_go[$iend]
$excess += $wr if defined($wr);
}
- return $excess;
+ # ... then add the net token length, minus the maximum length
+ return $excess +
+ $summed_lengths_to_go[ $iend + 1 ] -
+ $summed_lengths_to_go[$ibeg] -
+ $maximum_line_length_at_level[ $levels_to_go[$ibeg] ];
+
} ## end sub excess_line_length
sub get_spaces {
$rOpts_extended_line_up_parentheses = 0;
}
+ # fix for b1459: -naws adds stress for -xlp
+ if ( $high_stress_level <= 2 && !$rOpts_add_whitespace ) {
+ $rOpts_extended_line_up_parentheses = 0;
+ }
+
+ # fix for b1465: -vmll adds stress for -xlp
+ if ( $high_stress_level <= 2 && $rOpts_variable_maximum_line_length ) {
+ $rOpts_extended_line_up_parentheses = 0;
+ }
+
$rLP = [];
# initialize the leading whitespace stack to negative levels
BEGIN {
my @q = qw< } ) ] >;
@hash_test1{@q} = (1) x scalar(@q);
- @q = qw(: ? f);
+ @q = qw( : ? f );
push @q, ',';
@hash_test2{@q} = (1) x scalar(@q);
@q = qw( . || && );
my ($self) = @_;
- #------------------------------------------------------------------
# Define the leading whitespace for all tokens in the current batch
# when the -lp formatting is selected.
- #------------------------------------------------------------------
- return unless ($rOpts_line_up_parentheses);
- return unless ( defined($max_index_to_go) && $max_index_to_go >= 0 );
+ # Returns number of tokens in this batch which have leading spaces
+ # defined by an lp object:
+ my $lp_object_count_this_batch = 0;
+
+ # Safety check: this should not be called when there is nothing to do
+ if ( !$rOpts_line_up_parentheses
+ || !defined($max_index_to_go)
+ || $max_index_to_go < 0 )
+ {
+ my $lp_str =
+ defined($rOpts_line_up_parentheses)
+ ? $rOpts_line_up_parentheses
+ : 'undef';
+ my $max_str =
+ defined($max_index_to_go) ? $max_index_to_go : 'undef';
+ DEVEL_MODE
+ && Fault(
+"should not be here with -lp=$lp_str -max_index_to_go=$max_str\n"
+ );
+ return $lp_object_count_this_batch;
+ }
# List of -lp indentation objects created in this batch
$rlp_object_list = [];
my %last_lp_equals = ();
- my $rLL = $self->[_rLL_];
- my $starting_in_quote = $self->[_this_batch_]->[_starting_in_quote_];
+ my $rLL = $self->[_rLL_];
+ my $this_batch = $self->[_this_batch_];
+
+ my $starting_in_quote = $this_batch->[_starting_in_quote_];
my $imin = 0;
elsif ( $is_assignment{$type} ) {
$last_lp_equals{$total_depth} = $ii;
}
+ else {
+ ## not a special type
+ }
# this token might start a new line if ..
if (
# replace leading whitespace with indentation objects where used
#---------------------------------------------------------------
if ( $rLP->[$max_lp_stack]->[_lp_object_] ) {
+ $lp_object_count_this_batch++;
my $lp_object = $rLP->[$max_lp_stack]->[_lp_object_];
$leading_spaces_to_go[$ii] = $lp_object;
if ( $max_lp_stack > 0
undo_incomplete_lp_indentation()
if ( !$rOpts_extended_line_up_parentheses );
- return;
+ return $lp_object_count_this_batch;
} ## end sub set_lp_indentation
sub lp_equals_break_check {
if ( $types_to_go[$i_test] eq 'b' );
}
elsif ( $types_to_go[ $i_test + 1 ] eq 'b' ) { $i_test++ }
+ else {
+ # no change needed
+ }
my $test_position = total_line_length( $i_test, $ii );
my $mll = $maximum_line_length_at_level[ $levels_to_go[$i_test] ];
sub lp_decreasing_depth {
my ( $self, $ii ) = @_;
+ # This is called by sub set_lp_indentation for a token at index $ii
+ # which has a lower nesting depth compared to the previous token.
+ # We have to update the stack variables for the new indentation.
+
my $rLL = $self->[_rLL_];
my $level = $levels_to_go[$ii];
my $ci_level = $ci_levels_to_go[$ii];
- # loop to find the first entry at or completely below this level
+ # loop to decrease $max_lp_stack until we find the first entry at or
+ # completely below this level
while (1) {
# Be sure we have not hit the stack bottom - should never
# happen because only negative levels can get here, and
# $level was forced to be positive above.
- if ( !$max_lp_stack ) {
+ if ( $max_lp_stack <= 0 ) {
# non-fatal, just keep going except in DEVEL_MODE
if (DEVEL_MODE) {
my $total_depth = $nesting_depth_to_go[$ii];
$comma_count = $lp_comma_count{$total_depth};
$arrow_count = $lp_arrow_count{$total_depth};
- $comma_count = 0 unless $comma_count;
- $arrow_count = 0 unless $arrow_count;
+ $comma_count = 0 if ( !defined($comma_count) );
+ $arrow_count = 0 if ( !defined($arrow_count) );
}
$lp_object->set_comma_count($comma_count);
}
}
+ #------------------
# go down one level
+ #------------------
--$max_lp_stack;
my $rLP_top = $rLP->[$max_lp_stack];
$current_ci_level = $ci_lev;
last;
}
- }
+ } ## end while (1)
return;
} ## end sub lp_decreasing_depth
sub lp_increasing_depth {
my ( $self, $ii ) = @_;
+ # This is called by sub set_lp_indentation for a token at index $ii
+ # which has an increasing nesting depth compared to the previous token.
+ # We have to update the stack variables for the new indentation.
+
my $rLL = $self->[_rLL_];
my $type = $types_to_go[$ii];
}
$available_spaces = $test_space_count - $min_gnu_indentation;
+ # Fix for combo -naws and -xlp (b1501; also b1466)
+ my $tol = !$rOpts_add_whitespace
+ && $rOpts_extended_line_up_parentheses ? 1 : 0;
+
# Do not startup -lp indentation mode if no space ...
# ... or if it puts the opening far to the right
if ( !$in_lp_mode
- && ( $available_spaces <= 0 || $next_opening_too_far ) )
+ && ( $available_spaces <= $tol || $next_opening_too_far ) )
{
$space_count += $standard_increment;
$available_spaces = 0;
# comment we don't know what the space to the actual
# next code token will be. We will allow a space for
# sub correct_lp to move it in if necessary.
- if ( $type eq '#'
+ # NOTE for c314, c400: this fix is not really necessary,
+ # and it caused a DEVEL_MODE fault when -i=0.
+ # It could be completely removed, but this would change
+ # existing formatting in a few cases. So for now, the fix
+ # is to only skip this if -i=0.
+ if (
+ $type eq '#'
&& $max_index_to_go > 0
- && $align_seqno )
+ && $align_seqno
+
+ # fix for c314, c400 (see above note)
+ && $rOpts_indent_columns > 0
+ )
{
$available_spaces += 1;
}
available_spaces => $available_spaces,
lp_item_index => $lp_item_index,
align_seqno => $align_seqno,
- stack_depth => $max_lp_stack,
K_begin_line => $K_begin_line,
standard_spaces => $standard_spaces,
K_extra_space => $K_extra_space,
DEBUG_LP && do {
my $tok_beg = $rLL->[$K_begin_line]->[_TOKEN_];
my $token = $tokens_to_go[$ii];
- print STDERR <<EOM;
+ print {*STDOUT} <<EOM;
DEBUG_LP: Created object at tok=$token type=$type for seqno $align_seqno level=$level ci=$ci_level spaces=$space_count avail=$available_spaces kbeg=$K_begin_line tokbeg=$tok_beg lp=$lp_position_predictor
EOM
};
sub check_for_long_gnu_style_lines {
- # look at the current estimated maximum line length, and
+ # Look at the current estimated maximum line length, and
# remove some whitespace if it exceeds the desired maximum
my ($ii_to_go) = @_;
+ # Given:
+ # $ii_to_go = index of current token under consideration
+
# nothing can be done if no stack items defined for this line
return if ( $max_lp_object_list < 0 );
my $tol = 2;
# But reduce tol to 0 at a terminal comma; fixes b1432
- if ( $tokens_to_go[$ii_to_go] eq ','
+ if ( $types_to_go[$ii_to_go] eq ','
&& $ii_to_go < $max_index_to_go )
{
my $in = $ii_to_go + 1;
my $ci_level =
$rlp_object_list->[$i_debug]->get_ci_level();
my $old_level = $rlp_object_list->[$i]->get_level();
- my $old_ci_level =
+ my $old_ci_level_uu =
$rlp_object_list->[$i]->get_ci_level();
Fault(<<EOM);
program bug with -lp: want to delete $deleted_spaces from item $i, but old=$old_spaces deleted: lev=$level ci=$ci_level deleted: level=$old_level ci=$ci_level
EOM
}
}
- }
+ } ## end while ( ++$i <= $max_lp_object_list)
$lp_position_predictor -= $deleted_spaces;
$spaces_needed -= $deleted_spaces;
- last unless ( $spaces_needed > 0 );
+ last if ( $spaces_needed <= 0 );
}
return;
} ## end sub check_for_long_gnu_style_lines
sub reduce_lp_indentation {
- # reduce the leading whitespace at token $i if possible by $spaces_needed
- # (a large value of $spaces_needed will remove all excess space)
+ my ( $self, $i, $spaces_wanted ) = @_;
+
+ # Reduce the leading whitespace at token $i if possible by $spaces_wanted
+ # (a large value of $spaces_wanted will remove all excess space)
# NOTE: to be called from break_lists only for a sequence of tokens
# contained between opening and closing parens/braces/brackets
- my ( $self, $i, $spaces_wanted ) = @_;
my $deleted_spaces = 0;
my $item = $leading_spaces_to_go[$i];
sub check_convey_batch_input {
+ my ($self) = @_;
+
# Check for valid input to sub convey_batch_to_vertical_aligner. An
# error here would most likely be due to an error in the calling
# routine 'sub grind_batch_of_CODE'.
- my ( $self, $ri_first, $ri_last ) = @_;
+ my $this_batch = $self->[_this_batch_];
+ my $ri_first = $this_batch->[_ri_first_];
+ my $ri_last = $this_batch->[_ri_last_];
+
+ # $ri_first = ref to list of starting line indexes in _to_go arrays
+ # $ri_last = ref to list of ending line indexes in _to_go arrays
if ( !defined($ri_first) || !defined($ri_last) ) {
Fault(<<EOM);
Undefined line ranges ri_first and/r ri_last
}
my ( $ibeg, $iend );
foreach my $n ( 0 .. $nmax ) {
- my $ibeg_m = $ibeg;
my $iend_m = $iend;
$ibeg = $ri_first->[$n];
$iend = $ri_last->[$n];
my $ris_list_by_seqno = $self->[_ris_list_by_seqno_];
my $this_batch = $self->[_this_batch_];
- my $do_not_pad = $this_batch->[_do_not_pad_];
- my $starting_in_quote = $this_batch->[_starting_in_quote_];
- my $ending_in_quote = $this_batch->[_ending_in_quote_];
- my $is_static_block_comment = $this_batch->[_is_static_block_comment_];
- my $batch_CODE_type = $this_batch->[_batch_CODE_type_];
- my $ri_first = $this_batch->[_ri_first_];
- my $ri_last = $this_batch->[_ri_last_];
+ my $do_not_pad = $this_batch->[_do_not_pad_];
+ my $batch_CODE_type = $this_batch->[_batch_CODE_type_];
+ my $ri_first = $this_batch->[_ri_first_];
+ my $ri_last = $this_batch->[_ri_last_];
- $self->check_convey_batch_input( $ri_first, $ri_last ) if (DEVEL_MODE);
+ $self->check_convey_batch_input() if (DEVEL_MODE);
my $n_last_line = @{$ri_first} - 1;
- my $ibeg_next = $ri_first->[0];
- my $iend_next = $ri_last->[0];
+ my $ibeg = $ri_first->[0];
+ my $iend = $ri_last->[0];
- my $type_beg_next = $types_to_go[$ibeg_next];
- my $type_end_next = $types_to_go[$iend_next];
- my $token_beg_next = $tokens_to_go[$ibeg_next];
+ my $type_beg = $types_to_go[$ibeg];
+ my $type_end = $types_to_go[$iend];
+ my $token_beg = $tokens_to_go[$ibeg];
my $rindentation_list = [0]; # ref to indentations for each line
- my ( $cscw_block_comment, $closing_side_comment, $is_block_comment );
+ my ( $cscw_block_comment, $closing_side_comment, $is_block_comment,
+ $is_HSC );
- if ( !$max_index_to_go && $type_beg_next eq '#' ) {
- $is_block_comment = 1;
+ if ( !$max_index_to_go
+ && $type_beg eq '#' )
+ {
+ if ( $batch_CODE_type && $batch_CODE_type eq 'HSC' ) { $is_HSC = 1 }
+ else { $is_block_comment = 1 }
}
if ($rOpts_closing_side_comments) {
( $closing_side_comment, $cscw_block_comment ) =
- $self->add_closing_side_comment( $ri_first, $ri_last );
- }
-
- if ( $n_last_line > 0 || $rOpts_extended_continuation_indentation ) {
- $self->undo_ci( $ri_first, $ri_last,
- $this_batch->[_rix_seqno_controlling_ci_] );
+ $self->add_closing_side_comment();
}
# for multi-line batches ...
if ( $n_last_line > 0 ) {
+ # undo continuation indentation
+ $self->undo_ci();
+
# flush before a long if statement to avoid unwanted alignment
$self->flush_vertical_aligner()
- if ( $type_beg_next eq 'k'
- && $is_if_unless{$token_beg_next} );
+ if ( $type_beg eq 'k'
+ && $is_if_unless{$token_beg} );
- $self->set_logical_padding( $ri_first, $ri_last, $starting_in_quote )
+ $self->set_logical_padding()
if ($rOpts_logical_padding);
- $self->xlp_tweak( $ri_first, $ri_last )
+ $self->xlp_tweak()
if ($rOpts_extended_line_up_parentheses);
}
+ # -xci must undo continuation indentation even for single lines
+ elsif ($rOpts_extended_continuation_indentation) {
+ $self->undo_ci();
+ }
+ else {
+ # ok: single line, no -xci
+ }
+
if (DEVEL_MODE) { $self->check_batch_summed_lengths() }
# ----------------------------------------------------------
# define the vertical alignments for all lines of this batch
# ----------------------------------------------------------
- my $rline_alignments =
- $self->make_vertical_alignments( $ri_first, $ri_last );
+ my $rline_alignments;
+
+ # Quick handling of lines with a single tokens
+ if ( !$max_index_to_go ) {
+
+ # Hanging side comment
+ if ($is_HSC) {
+ $rline_alignments = make_HSC_vertical_alignments();
+ }
+
+ # All Other single tokens
+ # = [ [ $rtokens, $rfields, $rpatterns, $rfield_lengths ] ];
+ else {
+
+ $rline_alignments = [
+ [
+ [],
+ [ $tokens_to_go[0] ],
+ [ $types_to_go[0] ],
+ [ $summed_lengths_to_go[1] - $summed_lengths_to_go[0] ],
+ ]
+ ];
+ }
+ }
+
+ # Multiple tokens
+ else {
+ $rline_alignments = $self->make_vertical_alignments();
+ }
# ----------------------------------------------
# loop to send each line to the vertical aligner
# ----------------------------------------------
- my ( $type_beg, $type_end, $token_beg, $ljump );
+ my ( $type_beg_last, $type_end_last, $ibeg_next, $iend_next, $ljump );
- for my $n ( 0 .. $n_last_line ) {
+ foreach my $nline ( 0 .. $n_last_line ) {
# ----------------------------------------------------------------
# This hash will hold the args for vertical alignment of this line
# ----------------------------------------------------------------
my $rvao_args = {};
- my $type_beg_last = $type_beg;
- my $type_end_last = $type_end;
-
- my $ibeg = $ibeg_next;
- my $iend = $iend_next;
- my $Kbeg = $K_to_go[$ibeg];
- my $Kend = $K_to_go[$iend];
+ if ( $nline > 0 ) {
+ $type_beg_last = $type_beg;
+ $type_end_last = $type_end;
- $type_beg = $type_beg_next;
- $type_end = $type_end_next;
- $token_beg = $token_beg_next;
+ $ibeg = $ibeg_next;
+ $iend = $iend_next;
- # ---------------------------------------------------
- # Define the check value 'Kend' to send for this line
- # ---------------------------------------------------
- # The 'Kend' value is an integer for checking that lines come out of
- # the far end of the pipeline in the right order. It increases
- # linearly along the token stream. But we only send ending K values of
- # non-comments down the pipeline. This is equivalent to checking that
- # the last CODE_type is blank or equal to 'VER'. See also sub
- # resync_lines_and_tokens for related coding. Note that
- # '$batch_CODE_type' is the code type of the line to which the ending
- # token belongs.
- my $Kend_code =
- $batch_CODE_type && $batch_CODE_type ne 'VER' ? undef : $Kend;
+ $type_beg = $types_to_go[$ibeg];
+ $type_end = $types_to_go[$iend];
+ $token_beg = $tokens_to_go[$ibeg];
+ }
- # Get some vars on line [n+1], if any,
- # and define $ljump = level jump needed by 'sub get_final_indentation'
- if ( $n < $n_last_line ) {
- $ibeg_next = $ri_first->[ $n + 1 ];
- $iend_next = $ri_last->[ $n + 1 ];
+ my $Kbeg = $K_to_go[$ibeg];
+ my $Kend = $K_to_go[$iend];
- $type_beg_next = $types_to_go[$ibeg_next];
- $type_end_next = $types_to_go[$iend_next];
- $token_beg_next = $tokens_to_go[$ibeg_next];
+ if ( $nline < $n_last_line ) {
+ $ibeg_next = $ri_first->[ $nline + 1 ];
+ $iend_next = $ri_last->[ $nline + 1 ];
my $Kbeg_next = $K_to_go[$ibeg_next];
- $ljump = $rLL->[$Kbeg_next]->[_LEVEL_] - $rLL->[$Kend]->[_LEVEL_];
+ $ljump =
+ $rLL->[$Kbeg_next]->[_LEVEL_] - $rLL->[$Kend]->[_LEVEL_];
}
elsif ( !$is_block_comment && $Kend < $Klimit ) {
$ljump = 0;
}
+ # ---------------------------------------------------
+ # Define the check value 'Kend' to send for this line
+ # ---------------------------------------------------
+ # The 'Kend' value is an integer for checking that lines come out of
+ # the far end of the pipeline in the right order. It increases
+ # linearly along the token stream. But we only send ending K values of
+ # non-comments down the pipeline. This is equivalent to checking that
+ # the last CODE_type is blank or equal to 'VER'. See also sub
+ # resync_lines_and_tokens for related coding. Note that
+ # '$batch_CODE_type' is the code type of the line to which the ending
+ # token belongs.
+ if ( !$batch_CODE_type || $batch_CODE_type eq 'VER' ) {
+ $rvao_args->{Kend} = $Kend;
+ }
+
# ---------------------------------------------
# get the vertical alignment info for this line
# ---------------------------------------------
# to achieve vertical alignment. These fields are the actual text
# which will be output, so from here on no more changes can be made to
# the text.
- my $rline_alignment = $rline_alignments->[$n];
- my ( $rtokens, $rfields, $rpatterns, $rfield_lengths ) =
- @{$rline_alignment};
+ my $rline_alignment = $rline_alignments->[$nline];
# Programming check: (shouldn't happen)
# The number of tokens which separate the fields must always be
# one less than the number of fields. If this is not true then
# an error has been introduced in sub make_alignment_patterns.
if (DEVEL_MODE) {
+ my ( $rtokens, $rfields, $rpatterns_uu, $rfield_lengths_uu ) =
+ @{$rline_alignment};
if ( @{$rfields} && ( @{$rtokens} != ( @{$rfields} - 1 ) ) ) {
my $nt = @{$rtokens};
my $nf = @{$rfields};
$ibeg,
$iend,
- $rfields,
- $rpatterns,
- $ri_first,
- $ri_last,
$rindentation_list,
$ljump,
- $starting_in_quote,
- $is_static_block_comment,
);
&& $rOpts_outdent_long_comments
# but not if this is a static block comment
- && !$is_static_block_comment
+ && !$this_batch->[_is_static_block_comment_]
)
)
{
# flush at an 'if' which follows a line with (1) terminal semicolon
# or (2) terminal block_type which is not an 'if'. This prevents
# unwanted alignment between the lines.
- elsif ( $type_beg eq 'k' && $token_beg eq 'if' ) {
+ elsif ( $token_beg eq 'if' && $type_beg eq 'k' ) {
my $type_m = 'b';
my $block_type_m;
$rvao_args->{break_alignment_before} = 1;
}
}
+ else {
+ # do not need to break vertical alignment here
+ }
# ----------------------------------
# define 'rvertical_tightness_flags'
# ----------------------------------
# These flags tell the vertical aligner if/when to combine consecutive
# lines, based on the user input parameters.
- $rvao_args->{rvertical_tightness_flags} =
- $self->set_vertical_tightness_flags( $n, $n_last_line, $ibeg, $iend,
- $ri_first, $ri_last, $ending_in_quote, $closing_side_comment )
- unless ( $is_block_comment
- || $self->[_no_vertical_tightness_flags_] );
+ if ( !$is_block_comment
+ && !$self->[_no_vertical_tightness_flags_] )
+ {
+ $rvao_args->{rvertical_tightness_flags} =
+ $self->set_vertical_tightness_flags( $nline,
+ $closing_side_comment );
+ }
# ----------------------------------
# define 'is_terminal_ternary' flag
# : ' elsewhere in this document'
# );
#
- if ( $type_beg eq ':' || $n > 0 && $type_end_last eq ':' ) {
+ if ( $type_beg eq ':' || $nline > 0 && $type_end_last eq ':' ) {
my $is_terminal_ternary = 0;
- my $last_leading_type = $n > 0 ? $type_beg_last : ':';
+ my $last_leading_type = $nline > 0 ? $type_beg_last : ':';
my $terminal_type = $types_to_go[$i_terminal];
if ( $terminal_type ne ';'
- && $n_last_line > $n
+ && $n_last_line > $nline
&& $level_end == $lev )
{
my $Kbeg_next = $K_to_go[$ibeg_next];
# );
$is_terminal_ternary = 1;
- my $KP = $rLL->[$Kbeg]->[_KNEXT_SEQ_ITEM_];
- while ( defined($KP) && $KP <= $Kend ) {
+ my $rK_next_seqno_by_K = $self->[_rK_next_seqno_by_K_];
+ my $KP = $Kbeg;
+ my $Knext_last = $KP;
+ while ( defined( $KP = $rK_next_seqno_by_K->[$KP] ) ) {
+
+ if ( $KP <= $Knext_last ) {
+ ## shouldn't happen: $rK_next_seqno_by_K is corrupted
+ DEVEL_MODE && Fault(<<EOM);
+Knext should not increase: Knext_last=$Knext_last >= Knext=$KP
+EOM
+ last;
+ }
+ $Knext_last = $KP;
+
+ last if ( $KP > $Kend );
my $type_KP = $rLL->[$KP]->[_TYPE_];
if ( $type_KP eq '?' || $type_KP eq ':' ) {
$is_terminal_ternary = 0;
last;
}
- $KP = $rLL->[$KP]->[_KNEXT_SEQ_ITEM_];
- }
+ } ## end while ( defined( $KP = $rK_next_seqno_by_K...))
}
$rvao_args->{is_terminal_ternary} = $is_terminal_ternary;
}
# -------------------------------------------------
# add any new closing side comment to the last line
# -------------------------------------------------
- if ( $closing_side_comment && $n == $n_last_line && @{$rfields} ) {
+ if ( $closing_side_comment && $nline == $n_last_line ) {
+ my ( $rtokens, $rfields, $rpatterns, $rfield_lengths ) =
+ @{$rline_alignment};
- $rfields->[-1] .= " $closing_side_comment";
+ if ( @{$rfields} ) {
+ $rfields->[-1] .= " $closing_side_comment";
- # NOTE: Patch for csc. We can just use 1 for the length of the csc
- # because its length should not be a limiting factor from here on.
- $rfield_lengths->[-1] += 2;
+ # NOTE: Patch for csc. We can just use 1 for the length of the csc
+ # because its length should not be a limiting factor from here on.
+ $rfield_lengths->[-1] += 2;
- # repack
- $rline_alignment =
- [ $rtokens, $rfields, $rpatterns, $rfield_lengths ];
+ # repack
+ $rline_alignment =
+ [ $rtokens, $rfields, $rpatterns, $rfield_lengths ];
+ }
}
# ------------------------
# -----------------------------------
# Store the remaining non-flag values
# -----------------------------------
- $rvao_args->{Kend} = $Kend_code;
$rvao_args->{ci_level} = $ci_levels_to_go[$ibeg];
$rvao_args->{indentation} = $indentation;
$rvao_args->{level_end} = $nesting_depth_end;
# and either
&& (
# line has either single opening token
- $iend_next == $ibeg_next
+ $iend == $ibeg
# or is a single token followed by opening token.
# Note that sub identifiers have blanks like 'sub doit'
# $token_beg !~ /\s+/
- || ( $iend_next - $ibeg_next <= 2 && index( $token_beg, SPACE ) < 0 )
+ || ( $iend - $ibeg <= 2 && index( $token_beg, SPACE ) < 0 )
)
# and limit total to 10 character widths
- && token_sequence_length( $ibeg_next, $iend_next ) <= 10;
+ && token_sequence_length( $ibeg, $iend ) <= 10;
- # remember indentation of lines containing opening containers for
+ # Remember indentation of lines containing opening containers for
# later use by sub get_final_indentation
- $self->save_opening_indentation( $ri_first, $ri_last,
- $rindentation_list, $this_batch->[_runmatched_opening_indexes_] )
+ $self->save_opening_indentation($rindentation_list)
if ( $this_batch->[_runmatched_opening_indexes_]
|| $types_to_go[$max_index_to_go] eq 'q' );
- # output any new -cscw block comment
+ # Output any new -cscw block comment
if ($cscw_block_comment) {
$self->flush_vertical_aligner();
my $file_writer_object = $self->[_file_writer_object_];
sub check_batch_summed_lengths {
- my ( $self, $msg ) = @_;
+ my ( $self, ($msg) ) = @_;
+
+ # Debug routine for summed lengths
+ # $msg = optional debug message
+
$msg = EMPTY_STRING unless defined($msg);
my $rLL = $self->[_rLL_];
# These 'tokens' are not aligned. We need this to remove [
# from the above list because it has type ='{'
- @q = qw([);
+ @q = qw( [ );
@is_not_vertical_alignment_token{@q} = (1) x scalar(@q);
# these are the only types aligned at a line end
- @q = qw(&& || =>);
+ @q = qw( && || => );
@is_terminal_alignment_type{@q} = (1) x scalar(@q);
# these tokens only align at line level
@is_low_level_alignment_token{@q} = (1) x scalar(@q);
# eq and ne were removed from this list to improve alignment chances
- @q = qw(if unless and or err for foreach while until);
+ @q = qw( if unless and or err for foreach while until );
@is_vertical_alignment_keyword{@q} = (1) x scalar(@q);
} ## end BEGIN
+ # These are the main return variables. They are closure variables
+ # for efficient access by sub .._token_loop needs.
my $ralignment_type_to_go;
my $ralignment_counts;
- my $ralignment_hash_by_line;
sub set_vertical_alignment_markers {
- my ( $self, $ri_first, $ri_last ) = @_;
+ my ($self) = @_;
- #----------------------------------------------------------------------
- # This routine looks at output lines for certain tokens which can serve
- # as vertical alignment markers (such as an '=').
- #----------------------------------------------------------------------
+ # This routine looks at all output lines of a batch for certain tokens
+ # which can serve as vertical alignment markers (such as an '=').
- # Input parameters:
- # $ri_first = ref to list of starting line indexes in _to_go arrays
- # $ri_last = ref to list of ending line indexes in _to_go arrays
+ # $ri_first = ref to list of starting line indexes in _to_go arrays
+ # $ri_last = ref to list of ending line indexes in _to_go arrays
+ my $this_batch = $self->[_this_batch_];
+ my $ri_first = $this_batch->[_ri_first_];
+ my $ri_last = $this_batch->[_ri_last_];
# Method: We look at each token $i in this output batch and set
# $ralignment_type_to_go->[$i] equal to those tokens at which we would
# accept vertical alignment.
- # Initialize closure (and return) variables:
- $ralignment_type_to_go = [];
- $ralignment_counts = [];
- $ralignment_hash_by_line = [];
+ #----------------------------
+ # Initialize return variables
+ #----------------------------
+ $ralignment_type_to_go = [];
+ $ralignment_counts = [];
# NOTE: closing side comments can insert up to 2 additional tokens
# beyond the original $max_index_to_go, so we need to check ri_last for
my $token = $tokens_to_go[$max_i];
my $KK = $K_to_go[$max_i];
+ my $is_closing_block = $types_to_go[$i_terminal] eq '}'
+ && $tokens_to_go[$i_terminal] eq '}';
+
+ # Patch to check for asub closing side comments (c380)
+ # These follow '};' rather than bare '}'
+ $is_closing_block ||=
+ $types_to_go[$i_terminal] eq ';'
+ && $i_terminal == $inext_to_go[0]
+ && $types_to_go[0] eq '}'
+ && $tokens_to_go[0] eq '}';
+
# Do not align various special side comments
my $do_not_align = (
# it is any specially marked side comment
( defined($KK) && $self->[_rspecial_side_comment_type_]->{$KK} )
- # or it is a static side comment
+ # or it is a static side comment
|| ( $rOpts->{'static-side-comments'}
&& $token =~ /$static_side_comment_pattern/ )
# or a closing side comment
- || ( $types_to_go[$i_terminal] eq '}'
- && $tokens_to_go[$i_terminal] eq '}'
+ || ( $is_closing_block
&& $token =~ /$closing_side_comment_prefix_pattern/ )
);
if ( !$do_not_align ) {
$ralignment_type_to_go->[$max_i] = '#';
- $ralignment_hash_by_line->[$max_line]->{$max_i} = '#';
$ralignment_counts->[$max_line]++;
}
}
# Loop over each line of this batch ...
# -------------------------------------
- foreach my $line ( 0 .. $max_line ) {
+ foreach my $nline ( 0 .. $max_line ) {
- my $ibeg = $ri_first->[$line];
- my $iend = $ri_last->[$line];
+ my $ibeg = $ri_first->[$nline];
+ my $iend = $ri_last->[$nline];
next if ( $iend <= $ibeg );
#----------------------------------
# Loop over all tokens on this line
#----------------------------------
- $self->set_vertical_alignment_markers_token_loop( $line, $ibeg,
+ $self->set_vertical_alignment_markers_token_loop( $nline, $ibeg,
$iend );
}
RETURN:
- return ( $ralignment_type_to_go, $ralignment_counts,
- $ralignment_hash_by_line );
+ return ( $ralignment_type_to_go, $ralignment_counts );
} ## end sub set_vertical_alignment_markers
- sub set_vertical_alignment_markers_token_loop {
- my ( $self, $line, $ibeg, $iend ) = @_;
+ my %is_dot_question_colon;
- # Set vertical alignment markers for the tokens on one line
- # of the current output batch. This is done by updating the
- # three closure variables:
- # $ralignment_type_to_go
- # $ralignment_counts
- # $ralignment_hash_by_line
+ BEGIN {
+ my @q = qw( . ? : );
+ @is_dot_question_colon{@q} = (1) x scalar(@q);
+ }
+
+ sub set_vertical_alignment_markers_token_loop {
+ my ( $self, $nline, $ibeg, $iend ) = @_;
# Input parameters:
- # $line = index of this line in the current batch
+ # $nline = index of this line in the current batch
# $ibeg, $iend = index range of tokens to check in the _to_go arrays
+ # Task:
+ # Set vertical alignment markers for the tokens on one line
+ # of the current output batch. This is done by updating the
+ # three closure variables needed by sub 'make_alignment_patterns':
+ # $ralignment_type_to_go - alignment type of tokens, like '=', if any
+ # $ralignment_counts - number of alignment tokens in the line
+
my $level_beg = $levels_to_go[$ibeg];
my $token_beg = $tokens_to_go[$ibeg];
my $type_beg = $types_to_go[$ibeg];
- my $type_beg_special_char =
- ( $type_beg eq '.' || $type_beg eq ':' || $type_beg eq '?' );
my $last_vertical_alignment_BEFORE_index = -1;
my $vert_last_nonblank_type = $type_beg;
- my $vert_last_nonblank_token = $token_beg;
# ----------------------------------------------------------------
# Initialization code merged from 'sub delete_needless_alignments'
$i_elsif_close = $mate_index_to_go[$i_good_paren];
if ( !defined($i_elsif_close) ) { $i_elsif_close = -1 }
}
- } ## end if ( $type_beg eq 'k' )
+ }
# --------------------------------------------
# Loop over each token in this output line ...
# --------------------------------------------
+ my $type;
foreach my $i ( $ibeg + 1 .. $iend ) {
- next if ( $types_to_go[$i] eq 'b' );
+ next if ( ( $type = $types_to_go[$i] ) eq 'b' );
- my $type = $types_to_go[$i];
- my $token = $tokens_to_go[$i];
- my $alignment_type = EMPTY_STRING;
+ my $token = $tokens_to_go[$i];
+ my $alignment_type;
# ----------------------------------------------
# Check for 'paren patch' : Remove excess parens
{
if ( $ralignment_type_to_go->[$imate] ) {
$ralignment_type_to_go->[$imate] = EMPTY_STRING;
- $ralignment_counts->[$line]--;
- delete $ralignment_hash_by_line->[$line]->{$imate};
+ $ralignment_counts->[$nline]--;
}
pop @imatch_list;
}
# /^(if|unless|and|or|eq|ne)$/
if ( $is_vertical_alignment_keyword{$token} ) {
$alignment_type = $token;
+
+ # Align postfix 'unless' and 'if' if requested (git #116)
+ # These are the only equivalent keywords. For equivalent
+ # token types see '%operator_map'.
+ if ( $token eq 'unless' && $rOpts_valign_if_unless ) {
+ $alignment_type = 'if';
+ }
}
}
{
$alignment_type = $token;
+ if ( $rOpts_valign_wide_equals && $is_assignment{$type} ) {
+ $alignment_type = '=';
+ }
+
# Do not align a terminal token. Although it might
# occasionally look ok to do this, this has been found to be
# a good general rule. The main problems are:
# $PDL::IO::Pic::biggrays
# ? ( m/GIF/ ? 0 : 1 )
# : ( m/GIF|RAST|IFF/ ? 0 : 1 );
- if ( $type_beg_special_char
- && $i == $ibeg + 2
+ if ( $i == $ibeg + 2
+ && $is_dot_question_colon{$type_beg}
&& $types_to_go[ $i - 1 ] eq 'b' )
{
$alignment_type = EMPTY_STRING;
# elsif ( $b ) { &b }
# ^-------------------aligned parens
if ( $vert_last_nonblank_type eq 'k'
- && !$is_if_unless_elsif{$vert_last_nonblank_token} )
+ && !$is_if_unless_elsif{ $tokens_to_go[ $i - 2 ] } )
{
$alignment_type = EMPTY_STRING;
}
# if ($token ne $type) {$alignment_type .= $type}
}
+ # make qw() functions using -qwaf align 'use' statement
+ elsif ( $type eq 'U' ) {
+ if ( $types_to_go[0] eq 'k'
+ && $tokens_to_go[0] eq 'use'
+ && substr( $token, 0, 2 ) eq 'qw' )
+ {
+ $alignment_type = 'q';
+ }
+ }
+ else {
+ ## not a special type
+ }
+
# NOTE: This is deactivated because it causes the previous
# if/elsif alignment to fail
#elsif ( $type eq '}' && $token eq '}' && $block_type_to_go[$i])
# We want to line up ',' and interior ';' tokens, with the added
# space AFTER these tokens. (Note: interior ';' is included
# because it may occur in short blocks).
- elsif (
+ else {
+ if (
- # previous token IS one of these:
- (
- $vert_last_nonblank_type eq ','
- || $vert_last_nonblank_type eq ';'
- )
+ # previous token IS one of these:
+ (
+ $vert_last_nonblank_type eq ','
+ || $vert_last_nonblank_type eq ';'
+ )
- # and it follows a blank
- && $types_to_go[ $i - 1 ] eq 'b'
+ # and it follows a blank
+ && $types_to_go[ $i - 1 ] eq 'b'
- # and it's NOT one of these
- && !$is_closing_token{$type}
+ # and it's NOT one of these
+ && !$is_closing_token{$type}
- # then go ahead and align
- )
+ # then go ahead and align
+ )
- {
- $alignment_type = $vert_last_nonblank_type;
+ {
+ $alignment_type = $vert_last_nonblank_type;
+ }
}
#-----------------------
# but do not align the opening brace of an anonymous sub
if ( $token eq '{'
&& $block_type_to_go[$i]
- && $block_type_to_go[$i] =~ /$ASUB_PATTERN/ )
+ && $matches_ASUB{ $block_type_to_go[$i] } )
{
}
else {
$ralignment_type_to_go->[$i] = $alignment_type;
- $ralignment_hash_by_line->[$line]->{$i} = $alignment_type;
- $ralignment_counts->[$line]++;
+ $ralignment_counts->[$nline]++;
push @imatch_list, $i;
}
}
- $vert_last_nonblank_type = $type;
- $vert_last_nonblank_token = $token;
+ $vert_last_nonblank_type = $type;
}
return;
} ## end sub set_vertical_alignment_markers_token_loop
} ## end closure set_vertical_alignment_markers
+sub make_HSC_vertical_alignments {
+
+ # This is the alignment for a hanging side comment
+ my $rline_alignments;
+
+ #--------------------------------------
+ # Case 1: no alignments if -naws is set
+ #--------------------------------------
+ if ( !$rOpts_add_whitespace ) {
+
+ # Nothing to do if we are not allowed to add whitespace
+ $rline_alignments = [
+ [
+ [], [ SPACE . $tokens_to_go[0] ],
+ ['#'],
+ [ 1 + $summed_lengths_to_go[1] - $summed_lengths_to_go[0] ],
+ ]
+ ];
+ }
+
+ #-----------------------
+ # Case 2: -nvsc flag set
+ #-----------------------
+ # - For the specific combination -vc -nvsc, we put all side comments
+ # at fixed locations. Note that we will lose hanging side comment
+ # alignments. Otherwise, hsc's can move to strange locations.
+ # - For -nvc -nvsc we make all side comments vertical alignments
+ # because the vertical aligner will check for -nvsc and be able
+ # to reduce the final padding to the side comments for long lines.
+ # and keep hanging side comments aligned.
+ elsif ( !$rOpts_valign_side_comments && $rOpts_valign_code ) {
+ my $pad_spaces = $rOpts->{'minimum-space-to-comment'};
+ $rline_alignments = [
+ [
+
+ [],
+ [ SPACE x $pad_spaces . $tokens_to_go[0] ],
+ ['q'],
+ [
+ $pad_spaces +
+ $summed_lengths_to_go[1] -
+ $summed_lengths_to_go[0]
+ ],
+ ]
+ ];
+ }
+
+ #--------------------------------------
+ # Case 3: Normal case of no constraints
+ #--------------------------------------
+ # Originally, a hanging side comment line was constructed as three tokens:
+ # type 'q' with zero length,
+ # type 'b' with length 1
+ # type '#' with the text of the comment
+ # In this way, the comment became a true side comment through all of the
+ # tokenization operations. However, this caused a problem (c269) with subs
+ # K_next_* and K_previous_*, which would stop at the 'q' token. Rather
+ # than change those to skip an empty 'q', the hanging side comment was
+ # left as a block comment but the line was marked as 'HSC'. Only when
+ # we make the vertical alignments, right here, do we need to construct
+ # the artificial 'q', 'b', '#' sequence for the vertical aligner.
+ else {
+ $rline_alignments = [
+ [
+ ['#'],
+ [ SPACE, $tokens_to_go[0] ],
+ [ 'qb', '#' ],
+ [ 1, $summed_lengths_to_go[1] - $summed_lengths_to_go[0] ],
+ ]
+ ];
+ }
+
+ return $rline_alignments;
+} ## end sub make_HSC_vertical_alignments
+
sub make_vertical_alignments {
- my ( $self, $ri_first, $ri_last ) = @_;
+ my ($self) = @_;
+
+ my $this_batch = $self->[_this_batch_];
+ my $ri_first = $this_batch->[_ri_first_];
+ my $ri_last = $this_batch->[_ri_last_];
+
+ # Given:
+ # $ri_first = ref to list of starting line indexes in _to_go arrays
+ # $ri_last = ref to list of ending line indexes in _to_go arrays
#----------------------------
# Shortcut for a single token
}
# Strange line packing, not fatal but should not happen
- elsif (DEVEL_MODE) {
- my $max_line = @{$ri_first} - 1;
- my $ibeg = $ri_first->[0];
- my $iend = $ri_last->[0];
- my $tok_b = $tokens_to_go[$ibeg];
- my $tok_e = $tokens_to_go[$iend];
- my $type_b = $types_to_go[$ibeg];
- my $type_e = $types_to_go[$iend];
- Fault(
+ else {
+
+ if (DEVEL_MODE) {
+ my $max_line = @{$ri_first} - 1;
+ my $ibeg = $ri_first->[0];
+ my $iend = $ri_last->[0];
+ my $tok_b = $tokens_to_go[$ibeg];
+ my $tok_e = $tokens_to_go[$iend];
+ my $type_b = $types_to_go[$ibeg];
+ my $type_e = $types_to_go[$iend];
+ Fault(
"Strange..max_index=0 but nlines=$max_line ibeg=$ibeg tok=$tok_b type=$type_b iend=$iend tok=$tok_e type=$type_e; please check\n"
- );
+ );
+ }
}
}
#---------------------------------------------------------
# Step 1: Define the alignment tokens for the entire batch
#---------------------------------------------------------
- my ( $ralignment_type_to_go, $ralignment_counts, $ralignment_hash_by_line );
+ my ( $ralignment_type_to_go, $ralignment_counts );
# We only need to make this call if vertical alignment of code is
# requested or if a line might have a side comment.
if ( $rOpts_valign_code
|| $types_to_go[$max_index_to_go] eq '#' )
{
- ( $ralignment_type_to_go, $ralignment_counts, $ralignment_hash_by_line )
- = $self->set_vertical_alignment_markers( $ri_first, $ri_last );
+ ( $ralignment_type_to_go, $ralignment_counts ) =
+ $self->set_vertical_alignment_markers();
}
#----------------------------------------------
my $iend = $ri_last->[$line];
my $rtok_fld_pat_len = $self->make_alignment_patterns(
- $ibeg, $iend, $ralignment_type_to_go,
+
+ $ibeg,
+ $iend,
+ $ralignment_type_to_go,
$ralignment_counts->[$line],
- $ralignment_hash_by_line->[$line]
+
);
push @{$rline_alignments}, $rtok_fld_pat_len;
}
sub get_seqno {
- # get opening and closing sequence numbers of a token for the vertical
+ my ( $self, $ii ) = @_;
+
+ # Get opening and closing sequence numbers of a token for the vertical
# aligner. Assign qw quotes a value to allow qw opening and closing tokens
# to be treated somewhat like opening and closing tokens for stacking
# tokens by the vertical aligner.
- my ( $self, $ii, $ending_in_quote ) = @_;
+
+ # Given:
+ # $ii = index of token in the output batch
my $rLL = $self->[_rLL_];
$seqno = $SEQ_QW if ( $token =~ /^qw\s*[\(\{\[]/ );
}
else {
- if ( !$ending_in_quote ) {
+ # _ending_in_quote_ = true if line ends in quote
+ if ( !$self->[_this_batch_]->[_ending_in_quote_] ) {
$seqno = $SEQ_QW if ( $token =~ /[\)\}\]]$/ );
}
}
return ($seqno);
} ## end sub get_seqno
+sub undo_contained_ci {
+
+ my ( $self, $ri_first, $ri_last ) = @_;
+
+ # Given:
+ # $ri_first = ref to list of starting line indexes in _to_go arrays
+ # $ri_last = ref to list of ending line indexes in _to_go arrays
+
+ # Undo ci for a sequence of lines in a container which all have both ci
+ # and a jump in level. Written for issue git #137. This mainly occurs
+ # in code with very long quotes when -nolq is set. Examples:
+
+ # diag( 'Test run performed at: '
+ # . DateTime->now
+ # . ' with Moose '
+ # . ( Moose->VERSION || 'git repo' ) );
+ # $d = sqrt( ( $x->[$x_l] - $x->[$x_r] )**2 +
+ # ( $y->[$x_l] - $y->[$x_r] )**2 );
+
+ # These all involve lines with ci within a complete container, where the
+ # batch ends in ');' or '];' or '};' with possible side comment. The
+ # opening container token does not end a line, and this causes the double
+ # jump.
+
+ my $max_line = @{$ri_first} - 1;
+ return if ( $max_line < 1 );
+
+ my $ibeg_max = $ri_first->[$max_line];
+ my $iend_max = $ri_last->[$max_line];
+ my $i_opening;
+ my $line_last;
+
+ # Look for Case 1: last line begins with ');'
+ if ( $is_closing_token{ $tokens_to_go[$ibeg_max] } ) {
+ my $i_n = $inext_to_go[$ibeg_max];
+ return if ( $i_n < $ibeg_max || $i_n > $iend_max );
+ return if ( $types_to_go[$i_n] ne ';' );
+ $i_opening = $mate_index_to_go[$ibeg_max];
+ return if ( !defined($i_opening) || $i_opening <= 0 );
+ $line_last = $max_line - 1;
+ }
+
+ # Look for Case 2: last line has some text which ends with ');'
+ else {
+ my $i_t = $iend_max;
+ if ( $types_to_go[$i_t] eq '#' ) {
+ $i_t = iprev_to_go($i_t);
+ }
+ return if ( $i_t <= $ibeg_max );
+ return if ( $types_to_go[$i_t] ne ';' );
+ $i_t = iprev_to_go($i_t);
+ return if ( $i_t <= $ibeg_max );
+ return if ( !$is_closing_token{ $tokens_to_go[$i_t] } );
+ $i_opening = $mate_index_to_go[$i_t];
+ return if ( !defined($i_opening) || $i_opening < 0 );
+ $line_last = $max_line;
+ }
+
+ # Scan backwards to the line with the opening container,
+ # looking for a set of lines with ci to remove which have
+ # the same level and ci as the final line of the group
+ my $ibeg_last = $ri_first->[$line_last];
+ my $level_last = $levels_to_go[$ibeg_last];
+ return unless ( $ci_levels_to_go[$ibeg_last] );
+
+ # do not change ci under -lp control
+ return if ( ref( $reduced_spaces_to_go[$ibeg_last] ) );
+
+ my $line_start = $line_last;
+ foreach my $line ( reverse( 0 .. $line_last ) ) {
+ my $ibeg = $ri_first->[$line];
+ return if ( ref( $reduced_spaces_to_go[$ibeg] ) );
+ last if ( !$ci_levels_to_go[$ibeg] );
+ last if ( $levels_to_go[$ibeg] != $level_last );
+ $line_start = $line;
+ }
+
+ # There must be a jump in level and ci from the line before the start,
+ # and it must contain the opening container token.
+ my $line_o = $line_start - 1;
+ return if ( $line_o < 0 );
+ my $ibeg_o = $ri_first->[$line_o];
+ my $iend_o = $ri_last->[$line_o];
+ return if ( $ci_levels_to_go[$ibeg_o] );
+ return if ( $levels_to_go[$ibeg_o] >= $level_last );
+ return if ( $i_opening < $ibeg_o || $i_opening > $iend_o );
+
+ # ok to undo the ci of this group
+ foreach my $line_t ( $line_start .. $line_last ) {
+ my $ibeg_t = $ri_first->[$line_t];
+ $ci_levels_to_go[$ibeg_t] = 0;
+ $leading_spaces_to_go[$ibeg_t] = $reduced_spaces_to_go[$ibeg_t];
+ }
+ return;
+} ## end sub undo_contained_ci
+
{
my %undo_extended_ci;
sub undo_ci {
# Undo continuation indentation in certain sequences
- my ( $self, $ri_first, $ri_last, $rix_seqno_controlling_ci ) = @_;
- my ( $line_1, $line_2, $lev_last );
+ my ($self) = @_;
+
+ my $this_batch = $self->[_this_batch_];
+ my $ri_first = $this_batch->[_ri_first_];
+ my $ri_last = $this_batch->[_ri_last_];
+ my $rix_seqno_controlling_ci =
+ $this_batch->[_rix_seqno_controlling_ci_];
+
+ # Given:
+ # $ri_first = ref to list of starting line indexes in _to_go arrays
+ # $ri_last = ref to list of ending line indexes in _to_go arrays
+ # $rix_seqno_controlling_ci = a control array
+
+ my ( $line_1, $line_2 );
my $max_line = @{$ri_first} - 1;
my $rseqno_controlling_my_ci = $self->[_rseqno_controlling_my_ci_];
}
}
+ my $line_double_jump;
foreach my $line ( 0 .. $max_line ) {
my $ibeg = $ri_first->[$line];
my $iend = $ri_last->[$line];
- my $lev = $levels_to_go[$ibeg];
#-----------------------------------
# SECTION 1: Undo needless common CI
# sort { $a <=> $b }
# grep { $lookup->{$_} ne $default } keys %$lookup );
- if ( $line > 0 && !$skip_SECTION_1 ) {
+ if ( $line && !$skip_SECTION_1 ) {
+
+ my $ibeg_last = $ri_first->[ $line - 1 ];
+ my $lev = $levels_to_go[$ibeg];
+ my $lev_last = $levels_to_go[$ibeg_last];
+
+ # set flag for calling undo_contained_ci
+ if ( $lev == $lev_last + 1
+ && $ci_levels_to_go[$ibeg]
+ && !$ci_levels_to_go[$ibeg_last] )
+ {
+ $line_double_jump = $line;
+ }
# if we have started a chain..
if ($line_1) {
# chain ends with previous line
$line_2 = $line - 1;
}
- elsif ( $lev > $lev_last ) {
+ else {
+ # ( $lev > $lev_last )
# kill chain
$line_1 = undef;
}
}
}
}
+
}
#-------------------------------------
$undo_extended_ci{$seqno} = 1;
}
}
+ }
- $lev_last = $lev;
+ #-------------------------------------
+ # Undo ci in containers if -mci is set
+ #-------------------------------------
+ if ( $line_double_jump && $rOpts_minimize_continuation_indentation ) {
+ $self->undo_contained_ci( $ri_first, $ri_last );
}
return;
sub set_logical_padding {
+ my ($self) = @_;
+
# Look at a batch of lines and see if extra padding can improve the
# alignment when there are certain leading operators. Here is an
# example, in which some extra space is introduced before
# {
# &Error_OutOfRange;
# }
- #
- my ( $self, $ri_first, $ri_last, $starting_in_quote ) = @_;
+
+ my $this_batch = $self->[_this_batch_];
+
+ my $ri_first = $this_batch->[_ri_first_];
+ my $ri_last = $this_batch->[_ri_last_];
+ my $starting_in_quote = $this_batch->[_starting_in_quote_];
+
my $max_line = @{$ri_first} - 1;
my ( $ibeg, $ibeg_next, $ibegm, $iend, $iendm, $ipad, $pad_spaces,
$tok_next = $tokens_to_go[$ibeg_next];
$type_next = $types_to_go[$ibeg_next];
- $has_leading_op_next = ( $tok_next =~ /^\w/ )
- ? $is_chain_operator{$tok_next} # + - * / : ? && ||
- : $is_chain_operator{$type_next}; # and, or
+ $has_leading_op_next = ( $type_next eq 'k' )
+ ? $is_chain_operator{$tok_next} # and, or
+ : $is_chain_operator{$type_next}; # + - * / : ? && ||
+
+ # Fix for git134
+ if ( !$has_leading_op_next
+ && $iend > $ibeg + 2
+ && $types_to_go[ $ibeg + 1 ] eq 'b'
+ && $is_opening_type{ $types_to_go[$ibeg] }
+ && $nesting_depth_to_go[$iend] > $nesting_depth_to_go[$ibeg] )
+ {
+ my $iend_next = $ri_last->[ $line + 1 ];
+ $self->pad_broken_list( $ibeg, $iend, $ibeg_next, $iend_next );
+ }
next unless ($has_leading_op_next);
$ok_comma = $tok_next_next eq $tok_next;
}
- next
- unless (
- $is_assignment{ $types_to_go[$iendm] }
- || $ok_comma
- || ( $nesting_depth_to_go[$ibegm] <
+ my $ok_pad = (
+ $is_assignment{ $types_to_go[$iendm] }
+ || $ok_comma
+ || ( $nesting_depth_to_go[$ibegm] <
$nesting_depth_to_go[$ibeg] )
- || ( $types_to_go[$iendm] eq 'k'
+ || ( $types_to_go[$iendm] eq 'k'
&& $tokens_to_go[$iendm] eq 'return' )
- );
+ );
+ next if ( !$ok_pad );
# we will add padding before the first token
$ipad = $ibeg;
# find any unclosed container
next
- unless ( $type_sequence_to_go[$i]
- && defined( $mate_index_to_go[$i] )
- && $mate_index_to_go[$i] > $iend );
+ if ( !$type_sequence_to_go[$i]
+ || !defined( $mate_index_to_go[$i] )
+ || $mate_index_to_go[$i] <= $iend );
# find next nonblank token to pad
$ipad = $inext_to_go[$i];
# an editor. In that case either the user will see and
# fix the problem or it will be corrected next time the
# entire file is processed with perltidy.
- my $this_batch = $self->[_this_batch_];
my $peak_batch_size = $this_batch->[_peak_batch_size_];
next if ( $ipad == 0 && $peak_batch_size <= 1 );
}
# we might be able to handle a pad of -1 by removing a blank
- # token
+ # token.
if ( $pad_spaces < 0 ) {
# Deactivated for -kpit due to conflict. This block deletes
# but it may conflict with user spacing requests. For now
# it is just deactivated if the -kpit option is used.
if ( $pad_spaces == -1 ) {
- if ( $ipad > $ibeg
+ if (
+ $ipad > $ibeg
&& $types_to_go[ $ipad - 1 ] eq 'b'
- && !%keyword_paren_inner_tightness )
+ && !%keyword_paren_inner_tightness
+
+ # additional tests added for c385:
+ && (
+ $types_to_go[$inext_next] eq $types_to_go[$ipad]
+ || (
+ $types_to_go[$ipad] eq '!'
+ && ( $types_to_go[ $ipad + 1 ] eq
+ $types_to_go[$inext_next] )
+ )
+ )
+ )
{
$self->pad_token( $ipad - 1, $pad_spaces );
}
} ## end sub set_logical_padding
} ## end closure set_logical_padding
+sub pad_broken_list {
+ my ( $self, $ibeg, $iend, $ibeg_next, $iend_next ) = @_;
+
+ # Given:
+ # $ibeg, $iend = index range of line to get padding
+ # $ibeg_next, $iend_next = index range of next line
+
+ # This fixes a minor issue discussed in git134. In the example shown
+ # below, the is a broken list because of the q term, so line breaks
+ # are copied from the input. We want to insert padding at
+ # '[ $clientKey,' to align with the next line.
+
+ # $q->do(
+ # q{
+ # Something
+ # },
+ # [ $clientKey, ## <-- pad spaces needed here
+ # $systemKey,
+ # ],
+ # );
+
+ # Notation for the line being padded:
+ #
+ # [ $clientKey,
+ # | | |
+ # | | ------- $iend
+ # | ------ $ibeg+2
+ # ---- $ibeg
+
+ # NOTES:
+ # - This particular list is broken because of the 'q' term in the list
+ # - It is extremely rare for this routine to be called for typical code
+ # (I found just two examples in my large collection of test scripts)
+ # - This routine is not called for the last line of a batch. This
+ # is not necessary because perltidy will generally put a break
+ # after the opening token in that case.
+
+ # The basic logic is to pad the first blank space $ibeg+1 using the
+ # leading spaces that would have been given to token at $ibeg+2 if:
+ # - this line begins with an opening token which is
+ # - followed by additional tokens on the same line,
+ # - and is a list container, and
+ # - the line terminates in a comma whose parent is this container,
+ # - then pad using the indentation of the second token
+
+ # So in other words, we are simulating doing a line break after the
+ # first token and then recombining with a -vt operation. That cannot
+ # actually happen for a broken list.
+
+ # Next token must be blank for padding, and must be followed
+ # by at least one token and comma
+ return if ( $iend < $ibeg + 3 || $types_to_go[ $ibeg + 1 ] ne 'b' );
+
+ # This is only for lists
+ my $seqno = $type_sequence_to_go[$ibeg];
+ return if ( !$seqno );
+ my $is_list = $self->[_ris_list_by_seqno_]->{$seqno};
+ return if ( !$is_list );
+
+ # First token on next line must be in same container
+ my $seqno_beg_next = $parent_seqno_to_go[$ibeg_next];
+ return if ( !$seqno_beg_next || $seqno_beg_next != $seqno );
+
+ # This does not work well if the closing token is on the next line
+ return
+ if ( !defined( $mate_index_to_go[$ibeg] )
+ || $mate_index_to_go[$ibeg] <= $iend_next );
+
+ # Line must end in a comma, with possible side comment
+ my $i_terminal = $iend;
+ if ( $types_to_go[$i_terminal] eq '#' ) {
+ $i_terminal -= 1;
+ if ( $types_to_go[$i_terminal] eq 'b' ) {
+ $i_terminal -= 1;
+ }
+ }
+ return if ( $i_terminal < $ibeg + 2 );
+ return if ( $types_to_go[$i_terminal] ne ',' );
+
+ # add padding to make the second token have the same location
+ # as if it had been output separately and later joined with -vt
+ my $lsp = $leading_spaces_to_go[$ibeg];
+ my $lsp_next = $leading_spaces_to_go[$ibeg_next];
+
+ # this is not for -lp style
+ return if ( ref($lsp) || ref($lsp_next) );
+
+ my $pad_spaces =
+ $lsp_next -
+ ( $lsp + $token_lengths_to_go[$ibeg] +
+ $token_lengths_to_go[ $ibeg + 1 ] );
+
+ return if ( $pad_spaces <= 0 );
+
+ # Do not pad if it will cause excess line length
+ my $excess = $self->excess_line_length( $ibeg, $iend );
+ return if ( $excess + $pad_spaces > 0 );
+
+ $self->pad_token( $ibeg + 1, $pad_spaces );
+ return;
+} ## end sub pad_broken_list
+
sub pad_token {
# insert $pad_spaces before token number $ipad
sub xlp_tweak {
+ my ($self) = @_;
+
# Remove one indentation space from unbroken containers marked with
# 'K_extra_space'. These are mostly two-line lists with short names
# formatted with -xlp -pt=2.
# - This is currently only applied to -xlp. It would also work for -lp
# but that style is essentially frozen.
- my ( $self, $ri_first, $ri_last ) = @_;
+ my $this_batch = $self->[_this_batch_];
+ my $ri_first = $this_batch->[_ri_first_];
+ my $ri_last = $this_batch->[_ri_last_];
# Must be 2 or more lines
- return unless ( @{$ri_first} > 1 );
+ return if ( @{$ri_first} <= 1 );
# Pull indentation object from start of second line
my $ibeg_1 = $ri_first->[1];
my %keyword_map;
my %operator_map;
- my %is_w_n_C;
+ my %is_k_w_n_C_bang;
my %is_my_local_our;
- my %is_kwU;
my %is_use_like;
my %is_binary_type;
my %is_binary_keyword;
# Note: %block_type_map is now global to enable the -gal=s option
- # map certain keywords to the same 'if' class to align
- # long if/elsif sequences. [elsif.pl]
+ # Map certain keywords to the same 'if' class to align
+ # long if/elsif sequences. [elsif.pl]. But note that this is
+ # only for purposes of making the patterns, not alignment tokens.
+ # The only possible equivalent alignment tokens are 'if' and 'unless',
+ # and this is handled earlier under control of $rOpts_valign_if_unless
+ # to avoid making this a global hash.
%keyword_map = (
'unless' => 'if',
'else' => 'if',
'undef' => 'Q',
);
- # map certain operators to the same class for pattern matching
+ # Map certain operators to the same class for alignment.
+ # Note that this map is for the alignment tokens, not the patterns.
+ # We could have placed 'unless' => 'if' here, but since that is
+ # under control of $rOpts_valign_if_unless, it is handled elsewhere.
%operator_map = (
'!~' => '=~',
'+=' => '+=',
'/=' => '+=',
);
- %is_w_n_C = (
+ %is_k_w_n_C_bang = (
+ 'k' => 1,
'w' => 1,
'n' => 1,
'C' => 1,
+ '!' => 1,
);
# leading keywords which to skip for efficiency when making parenless
# container names
my @q = qw( my local our return );
- @{is_my_local_our}{@q} = (1) x scalar(@q);
+ @is_my_local_our{@q} = (1) x scalar(@q);
# leading keywords where we should just join one token to form
# parenless name
@q = qw( use );
- @{is_use_like}{@q} = (1) x scalar(@q);
-
- # leading token types which may be used to make a container name
- @q = qw( k w U );
- @{is_kwU}{@q} = (1) x scalar(@q);
+ @is_use_like{@q} = (1) x scalar(@q);
# token types which prevent using leading word as a container name
- @q = qw(
- x / : % . | ^ < = > || >= != *= => !~ == && |= .= -= =~ += <= %= ^= x= ~~ ** << /=
- &= // >> ~. &. |. ^.
+ @q = qw{
+ x / : % . | ^ < = > || >= != *= => !~ == && |= .= -= =~ += <=
+ %= ^= x= ~~ ** << /= &= // >> ~. &. |. ^.
**= <<= >>= &&= ||= //= <=> !~~ &.= |.= ^.= <<~
- );
+ };
push @q, ',';
- @{is_binary_type}{@q} = (1) x scalar(@q);
+ @is_binary_type{@q} = (1) x scalar(@q);
# token keywords which prevent using leading word as a container name
- @q = qw(and or err eq ne cmp);
+ @q = qw( and or err eq ne cmp );
@is_binary_keyword{@q} = (1) x scalar(@q);
# Some common function calls whose args can be aligned. These do not
sub make_alignment_patterns {
- my ( $self, $ibeg, $iend, $ralignment_type_to_go, $alignment_count,
- $ralignment_hash )
- = @_;
+ my (
+ $self,
+
+ $ibeg,
+ $iend,
+ $ralignment_type_to_go,
+ $alignment_count,
+
+ ) = @_;
#------------------------------------------------------------------
# This sub creates arrays of vertical alignment info for one output
# $ibeg, $iend - index range of this line in the _to_go arrays
# $ralignment_type_to_go - alignment type of tokens, like '=', if any
# $alignment_count - number of alignment tokens in the line
- # $ralignment_hash - this contains all of the alignments for this
- # line. It is not yet used but is available for future coding in
- # case there is a need to do a preliminary scan of alignment tokens.
# The arrays which are created contain strings that can be tested by
# the vertical aligner to see if consecutive lines can be aligned
# allowed, even when the alignment tokens match.
# @field_lengths - the display width of each field
- if (DEVEL_MODE) {
- my $new_count = 0;
- if ( defined($ralignment_hash) ) {
- $new_count = keys %{$ralignment_hash};
- }
- my $old_count = $alignment_count;
- $old_count = 0 unless ($old_count);
- if ( $new_count != $old_count ) {
- my $K = $K_to_go[$ibeg];
- my $rLL = $self->[_rLL_];
- my $lnl = $rLL->[$K]->[_LINE_INDEX_];
- Fault(
-"alignment hash token count gives count=$new_count but old count is $old_count near line=$lnl\n"
- );
- }
- }
-
# -------------------------------------
# Shortcut for lines without alignments
# -------------------------------------
return [ $rtokens, $rfields, $rpatterns, $rfield_lengths ];
}
- my $i_start = $ibeg;
- my $depth = 0;
- my $i_depth_prev = $i_start;
- my $depth_prev = $depth;
- my %container_name = ( 0 => EMPTY_STRING );
+ my $i_start = $ibeg;
+ my $depth = 0;
+ my $i_depth_prev = $i_start;
+ my $depth_prev = $depth;
+ my %container_name = ( 0 => EMPTY_STRING );
+ my $saw_exclamation_mark = 0;
my @tokens = ();
my @fields = ();
&& $levels_to_go[$ibeg] eq $levels_to_go[$iterm] )
{
$container_name{'0'} =
- make_uncontained_comma_name( $iterm, $ibeg, $iend );
+ make_uncontained_comma_name( $ibeg, $iterm );
}
}
$patterns[0] = EMPTY_STRING;
my %token_count;
+ my $type;
for my $i ( $ibeg .. $iend ) {
+ # handle a blank space
+ if ( ( $type = $types_to_go[$i] ) eq 'b' ) {
+ $patterns[$j] .= $type;
+ next;
+ }
+
#-------------------------------------------------------------
# Part 1: keep track of containers balanced on this line only.
#-------------------------------------------------------------
# These are used below to prevent unwanted cross-line alignments.
# Unbalanced containers already avoid aligning across
# container boundaries.
- my $type = $types_to_go[$i];
if ( $type_sequence_to_go[$i] ) {
my $token = $tokens_to_go[$i];
if ( $is_opening_token{$token} ) {
# if we are not aligning on this paren...
if ( !$ralignment_type_to_go->[$i] ) {
- my $len = length_tag( $i, $ibeg, $i_start );
+ # Add the length to the name ...
+ my $len = $summed_lengths_to_go[$i] -
+ $summed_lengths_to_go[$i_start];
+
+ # Do not include the length of any '!'. Otherwise,
+ # commas in the following line will not match:
+ # ok( 20, tapprox( ( pdl 2, 3 ), ( pdl 2, 3 ) ) );
+ # ok( 21, !tapprox( ( pdl 2, 3 ), ( pdl 2, 4 ) ) );
+ if ($saw_exclamation_mark) { $len -= 1 }
+
+ # For first token, use distance from start of line
+ # but subtract off the indentation due to level.
+ # Otherwise, results could vary with indentation.
+ if ( $i_start == $ibeg ) {
+ $len +=
+ leading_spaces_to_go($ibeg) -
+ $levels_to_go[$i_start] *
+ $rOpts_indent_columns;
+ }
+ if ( $len < 0 ) { $len = 0 }
# tack this length onto the container name to try
# to make a unique token name
} ## end if ( $i_mate > $i && $i_mate...)
} ## end if ( $is_opening_token...)
- elsif ( $is_closing_type{$token} ) {
+ elsif ( $is_closing_token{$token} ) {
$i_depth_prev = $i;
$depth_prev = $depth;
$depth-- if $depth > 0;
}
+ else {
+ ## must be ternary
+ }
} ## end if ( $type_sequence_to_go...)
#------------------------------------------------------------
if ( $block_type =~ /^[A-Z]+$/ ) { $block_type = 'BEGIN' }
$tok .= $block_type;
+
+ # Avoid aligning opening braces across leading ci level
+ # changes by marking block type with _ci (issue c224)
+ if ( $ci_levels_to_go[$ibeg] ) { $tok .= '_1' }
}
- # Mark multiple copies of certain tokens with the copy number
+ # Mark multiple copies of certain tokens with a copy number.
# This will allow the aligner to decide if they are matched.
- # For now, only do this for equals. For example, the two
- # equals on the next line will be labeled '=0' and '=0.2'.
- # Later, the '=0.2' will be ignored in alignment because it
- # has no match.
-
- # $| = $debug = 1 if $opt_d;
- # $full_index = 1 if $opt_i;
+ # For example, the two equals in the example below will be
+ # labeled '=0' and '=0.2'. Later, the '=0.2' will be ignored
+ # in alignment because it has no match.
+ # $| = $debug = 1 if $opt_d;
+ # $full_index = 1 if $opt_i;
if ( $raw_tok eq '=' || $raw_tok eq '=>' ) {
$token_count{$tok}++;
push( @tokens, $tok );
# get ready for the next batch
- $i_start = $i;
+ $i_start = $i;
+ $saw_exclamation_mark = 0;
$j++;
$patterns[$j] = EMPTY_STRING;
} ## end if ( new synchronization token
# Part 3: continue accumulating the next pattern
#-----------------------------------------------
- # for keywords we have to use the actual text
- if ( $type eq 'k' ) {
+ if ( $is_k_w_n_C_bang{$type} ) {
- my $tok_fix = $tokens_to_go[$i];
+ # for keywords we have to use the actual text
+ if ( $type eq 'k' ) {
- # but map certain keywords to a common string to allow
- # alignment.
- $tok_fix = $keyword_map{$tok_fix}
- if ( defined( $keyword_map{$tok_fix} ) );
- $patterns[$j] .= $tok_fix;
- }
+ my $tok_fix = $tokens_to_go[$i];
- elsif ( $type eq 'b' ) {
- $patterns[$j] .= $type;
- }
+ # but map certain keywords to a common string to allow
+ # alignment.
+ $tok_fix = $keyword_map{$tok_fix}
+ if ( defined( $keyword_map{$tok_fix} ) );
+
+ # VSN PATCH: all 'n' chars in a pattern must be for token
+ # type 'n' (number). i.e. convert 'print' to 'priNt'
+ $tok_fix =~ tr/n/N/;
+
+ $patterns[$j] .= $tok_fix;
+ }
- # Mark most things before arrows as a quote to
- # get them to line up. Testfile: mixed.pl.
+ # ignore any ! in patterns
+ elsif ( $type eq '!' ) {
+ $saw_exclamation_mark = 1;
+ }
- # handle $type =~ /^[wnC]$/
- elsif ( $is_w_n_C{$type} ) {
+ # Handle $type =~ /^[wnC]$/...
+ # Mark most things before arrows as a quote to
+ # get them to line up. Testfile: mixed.pl.
+ else {
- my $type_fix = $type;
+ my $type_fix = $type;
- if ( $i < $iend - 1 ) {
- my $next_type = $types_to_go[ $i + 1 ];
- my $i_next_nonblank =
- ( ( $next_type eq 'b' ) ? $i + 2 : $i + 1 );
+ if ( $i < $iend - 1 ) {
+ my $next_type = $types_to_go[ $i + 1 ];
+ my $i_next_nonblank =
+ ( ( $next_type eq 'b' ) ? $i + 2 : $i + 1 );
- if ( $types_to_go[$i_next_nonblank] eq '=>' ) {
- $type_fix = 'Q';
+ if ( $types_to_go[$i_next_nonblank] eq '=>' ) {
+ $type_fix = 'Q';
- # Patch to ignore leading minus before words,
- # by changing pattern 'mQ' into just 'Q',
- # so that we can align things like this:
- # Button => "Print letter \"~$_\"",
- # -command => [ sub { print "$_[0]\n" }, $_ ],
- if ( $patterns[$j] eq 'm' ) {
- $patterns[$j] = EMPTY_STRING;
+ # Patch to ignore leading minus before words,
+ # by changing pattern 'mQ' into just 'Q',
+ # so that we can align things like this:
+ # Button => "Print letter \"~$_\"",
+ # -command => [ sub { print "$_[0]\n" }, $_ ],
+ if ( $patterns[$j] eq 'm' ) {
+ $patterns[$j] = EMPTY_STRING;
+ }
}
}
- }
-
- # Convert a bareword within braces into a quote for
- # matching. This will allow alignment of expressions like
- # this:
- # local ( $SIG{'INT'} ) = IGNORE;
- # local ( $SIG{ALRM} ) = 'POSTMAN';
- if ( $type eq 'w'
- && $i > $ibeg
- && $i < $iend
- && $types_to_go[ $i - 1 ] eq 'L'
- && $types_to_go[ $i + 1 ] eq 'R' )
- {
- $type_fix = 'Q';
- }
- # patch to make numbers and quotes align
- if ( $type eq 'n' ) { $type_fix = 'Q' }
+ # Convert a bareword within braces into a quote for
+ # matching. This will allow alignment of expressions like
+ # this:
+ # local ( $SIG{'INT'} ) = IGNORE;
+ # local ( $SIG{ALRM} ) = 'POSTMAN';
+ if ( $type eq 'w'
+ && $i > $ibeg
+ && $i < $iend
+ && $types_to_go[ $i - 1 ] eq 'L'
+ && $types_to_go[ $i + 1 ] eq 'R' )
+ {
+ $type_fix = 'Q';
+ }
- $patterns[$j] .= $type_fix;
- } ## end elsif ( $is_w_n_C{$type} )
+ # VSN PATCH: no longer changing 'n' to 'Q' here; this
+ # will be handled in the vertical aligner and allow
+ # the aligner to find numbers more efficiently.
+ ##: if ( $type eq 'n' ) { $type_fix = 'Q' }
- # ignore any ! in patterns
- elsif ( $type eq '!' ) { }
+ $patterns[$j] .= $type_fix;
+ }
+ } ## end elsif ( $is_k_w_n_C{$type} )
# everything else
else {
} ## end sub make_alignment_patterns
sub make_uncontained_comma_name {
- my ( $iterm, $ibeg, $iend ) = @_;
+ my ( $ibeg, $iterm ) = @_;
+
+ # Given:
+ # $ibeg = first index
+ # $iterm = last index
# Make a container name by combining all leading barewords,
# keywords and functions.
my $count_max;
my $iname_end;
my $ilast_blank;
- for ( $ibeg .. $iterm ) {
- my $type = $types_to_go[$_];
-
+ for my $ii ( $ibeg .. $iterm ) {
+ my $type = $types_to_go[$ii];
if ( $type eq 'b' ) {
- $ilast_blank = $_;
+ $ilast_blank = $ii;
next;
}
- my $token = $tokens_to_go[$_];
+ my $token = $tokens_to_go[$ii];
# Give up if we find an opening paren, binary operator or
# comma within or after the proposed container name.
}
# The container name is only built of certain types:
+ # 'k'=builtin keyword, 'U'=user defined sub, 'w'=unknown bareword
last if ( !$is_kwU{$type} );
# Normally it is made of one word, but two words for 'use'
if ( $count == 0 ) {
if ( $type eq 'k'
- && $is_use_like{ $tokens_to_go[$_] } )
+ && $is_use_like{ $tokens_to_go[$ii] } )
{
$count_max = 2;
}
elsif ( defined($count_max) && $count >= $count_max ) {
last;
}
+ else {
+ ## continue
+ }
if ( defined( $name_map{$token} ) ) {
$token = $name_map{$token};
}
$name .= SPACE . $token;
- $iname_end = $_;
+ $iname_end = $ii;
$count++;
}
return $name;
} ## end sub make_uncontained_comma_name
- sub length_tag {
-
- my ( $i, $ibeg, $i_start ) = @_;
-
- # Generate a line length to be used as a tag for rejecting bad
- # alignments. The tag is the length of the line from the previous
- # matching token, or beginning of line, to the function name. This
- # will allow the vertical aligner to reject undesirable matches.
-
- # The basic method: sum length from previous alignment
- my $len = token_sequence_length( $i_start, $i - 1 );
-
- # Minor patch: do not include the length of any '!'.
- # Otherwise, commas in the following line will not
- # match
- # ok( 20, tapprox( ( pdl 2, 3 ), ( pdl 2, 3 ) ) );
- # ok( 21, !tapprox( ( pdl 2, 3 ), ( pdl 2, 4 ) ) );
- if ( grep { $_ eq '!' } @types_to_go[ $i_start .. $i - 1 ] ) {
- $len -= 1;
- }
-
- if ( $i_start == $ibeg ) {
-
- # For first token, use distance from start of
- # line but subtract off the indentation due to
- # level. Otherwise, results could vary with
- # indentation.
- $len +=
- leading_spaces_to_go($ibeg) -
- $levels_to_go[$i_start] * $rOpts_indent_columns;
- }
- if ( $len < 0 ) { $len = 0 }
- return $len;
- } ## end sub length_tag
-
} ## end closure make_alignment_patterns
sub make_paren_name {
sub get_final_indentation {
my (
- $self, #
+ $self,
$ibeg,
$iend,
- $rfields,
- $rpatterns,
- $ri_first,
- $ri_last,
$rindentation_list,
$level_jump,
- $starting_in_quote,
- $is_static_block_comment,
) = @_;
# indentation of a line in the Formatter.
#--------------------------------------------------------------
+ # Given:
+ # ($ibeg, $iend) = index range of tokens on this line
+ # $rindentation_list = ref to indentation of each line in this batch,
+ # to be updated by this sub
+ # $level_jump = level change to $token $ibeg from previous token
+
# It starts with the basic indentation which has been defined for the
# leading token, and then takes into account any options that the user
# has set regarding special indenting and outdenting.
# undo_ci, which was processed earlier, so care has to be taken to
# keep them coordinated.
+ my $this_batch = $self->[_this_batch_];
+
# Find the last code token of this line
my $i_terminal = $iend;
my $terminal_type = $types_to_go[$iend];
# }
#
- # MOJO patch: Set a flag if this lines begins with ')->'
- my $leading_paren_arrow = (
- $is_closing_type_beg
- && $token_beg eq ')'
- && (
- ( $ibeg < $i_terminal && $types_to_go[ $ibeg + 1 ] eq '->' )
- || ( $ibeg < $i_terminal - 1
- && $types_to_go[ $ibeg + 1 ] eq 'b'
- && $types_to_go[ $ibeg + 2 ] eq '->' )
- )
- );
-
#---------------------------------------------------------
# Section 1: set a flag and a default indentation
#
$ibeg,
$iend,
- $ri_first,
- $ri_last,
$rindentation_list,
$level_jump,
$i_terminal,
);
}
- #--------------------------------------------------------
+ #-----------------------------------------
# Section 1B:
- # if at ');', '};', '>;', and '];' of a terminal qw quote
- #--------------------------------------------------------
- elsif (
- substr( $rpatterns->[0], 0, 2 ) eq 'qb'
- && substr( $rfields->[0], -1, 1 ) eq ';'
- ## $rpatterns->[0] =~ /^qb*;$/
- && $rfields->[0] =~ /^([\)\}\]\>]);$/
- )
- {
- if ( $closing_token_indentation{$1} == 0 ) {
+ # if line starts with a non-sequenced item
+ #-----------------------------------------
+ else {
+ if ( $type_beg eq ';' && !$rOpts_indent_leading_semicolon ) {
$adjust_indentation = 1;
}
- else {
- $adjust_indentation = 3;
- }
}
#---------------------------------------------------------
}
#-------------------------------------------------------------------
- # Secton 2B: adjust_indentation == 1
+ # Section 2B: adjust_indentation == 1
# Change the indentation to be that of a different token on the line
#-------------------------------------------------------------------
elsif ( $adjust_indentation == 1 ) {
my $i_ind = $ibeg;
$indentation = $reduced_spaces_to_go[$i_ind];
$lev = $levels_to_go[$i_ind];
- while ( $i_ind < $i_terminal ) {
- $i_ind++;
+ while ( ++$i_ind <= $i_terminal ) {
if ( $levels_to_go[$i_ind] < $lev ) {
$indentation = $reduced_spaces_to_go[$i_ind];
$lev = $levels_to_go[$i_ind];
}
- }
+ } ## end while ( ++$i_ind <= $i_terminal)
}
#--------------------------------------------------------------
- # Secton 2C: adjust_indentation == 2
+ # Section 2C: adjust_indentation == 2
# Handle indented closing token which aligns with opening token
#--------------------------------------------------------------
elsif ( $adjust_indentation == 2 ) {
$indentation = $reduced_spaces_to_go[$i_terminal];
$lev = $levels_to_go[$i_terminal];
}
+ else {
+ # shouldn't happen - default_adjust_indentation is 0 or 1
+ DEVEL_MODE
+ && Fault(
+"default_indentation=$default_adjust_indentation expected to be 0 or 1\n"
+ );
+
+ # continue with 0 if not in DEVEL_MODE
+ $indentation = $leading_spaces_beg;
+ }
}
}
#-------------------------------------------------------------
- # Secton 2D: adjust_indentation == 3
+ # Section 2D: adjust_indentation == 3
# Full indentation of closing tokens (-icb and -icp or -cti=2)
#-------------------------------------------------------------
else {
# Remember indentation except for multi-line quotes, which get
# no indentation
#-------------------------------------------------------------
- if ( !( $ibeg == 0 && $starting_in_quote ) ) {
+ if ( !( $ibeg == 0 && $this_batch->[_starting_in_quote_] ) ) {
$last_indentation_written = $indentation;
$last_unadjusted_indentation = $leading_spaces_beg;
$last_leading_token = $token_beg;
# than the line which contained the corresponding opening token.
#---------------------------------------------------------------------
- # Updated per bug report in alex_bug.pl: we must not
- # mess with the indentation of closing logical braces, so
- # we must treat something like '} else {' as if it were
- # an isolated brace
- my $is_isolated_block_brace = $block_type_beg
- && ( $i_terminal == $ibeg
- || $is_if_elsif_else_unless_while_until_for_foreach{$block_type_beg}
- );
+ if ( defined($opening_indentation) ) {
+
+ # MOJO patch: Set a flag if this lines begins with ')->'
+ my $leading_paren_arrow = (
+ $is_closing_type_beg
+ && $token_beg eq ')'
+ && (
+ (
+ $ibeg < $i_terminal
+ && $types_to_go[ $ibeg + 1 ] eq '->'
+ )
+ || ( $ibeg < $i_terminal - 1
+ && $types_to_go[ $ibeg + 1 ] eq 'b'
+ && $types_to_go[ $ibeg + 2 ] eq '->' )
+ )
+ );
- # only do this for a ':; which is aligned with its leading '?'
- my $is_unaligned_colon = $type_beg eq ':' && !$is_leading;
+ # Updated per bug report in alex_bug.pl: we must not
+ # mess with the indentation of closing logical braces, so
+ # we must treat something like '} else {' as if it were
+ # an isolated brace
+ my $is_isolated_block_brace = $block_type_beg
+ && (
+ $i_terminal == $ibeg
+ || $is_if_elsif_else_unless_while_until_for_foreach{
+ $block_type_beg}
+ );
- if (
- defined($opening_indentation)
- && !$leading_paren_arrow # MOJO patch
- && !$is_isolated_block_brace
- && !$is_unaligned_colon
- )
- {
- if ( get_spaces($opening_indentation) > get_spaces($indentation) ) {
+ # only do this for a ':; which is aligned with its leading '?'
+ my $is_unaligned_colon = $type_beg eq ':' && !$is_leading;
+
+ if (
+ !$leading_paren_arrow
+ && !$is_isolated_block_brace
+ && !$is_unaligned_colon
+ && ( get_spaces($opening_indentation) >
+ get_spaces($indentation) )
+ )
+ {
$indentation = $opening_indentation;
}
}
|| $rOpts_outdent_labels && $type_beg eq 'J'
# or static block comments if requested
- || $is_static_block_comment
+ || $this_batch->[_is_static_block_comment_]
&& $rOpts_outdent_static_block_comments
)
)
# Determine indentation adjustment for a line with a leading closing
# token - i.e. one of these: ) ] } :
+ # The indentation adjustment is found by checking all user controls,
+ # which are sometimes in conflict. So the logic is rather complex.
+
+ # Returns:
+ # Flags giving the indentation to use for this line:
+
+ # $adjust_indentation,
+ # 0 - do not adjust
+ # 1 - outdent
+ # 2 - vertically align with opening token
+ # 3 - indent
+ # $default_adjust_indentation
+ # a default in case $adjust_indentation cannot be used
+ #
+ # Also returns info about the indentation of the opening token,
+ # obtained from sub 'get_opening_indentation':
+
+ # $opening_indentation,
+ # $opening_offset,
+ # $is_leading,
+ # $opening_exists,
+
my (
- $self, #
+ $self,
$ibeg,
$iend,
- $ri_first,
- $ri_last,
$rindentation_list,
$level_jump,
$i_terminal,
my $seqno_beg = $type_sequence_to_go[$ibeg];
my $is_closing_type_beg = $is_closing_type{$type_beg};
+ # Return variables:
my (
$opening_indentation, $opening_offset,
$is_leading, $opening_exists
}
if ( $Kterm == $K_beg ) { $adjust_indentation = 1 }
}
+ else {
+ # no change in ci needed
+ }
}
my $ris_bli_container = $self->[_ris_bli_container_];
$opening_indentation, $opening_offset,
$is_leading, $opening_exists
)
- = $self->get_opening_indentation( $ibeg_weld_fix, $ri_first,
- $ri_last, $rindentation_list, $seqno_qw_closing );
+ = $self->get_opening_indentation(
+
+ $ibeg_weld_fix,
+ $rindentation_list,
+ $seqno_qw_closing
+
+ );
# Patch for rt144979, part 1. Coordinated with part 2.
# Do not undo ci for a cuddled closing brace control; it
$opening_indentation, $opening_offset,
$is_leading, $opening_exists
)
- = $self->get_opening_indentation( $ibeg, $ri_first,
- $ri_last, $rindentation_list );
+ = $self->get_opening_indentation( $ibeg,
+ $rindentation_list, undef );
my $indentation = $leading_spaces_beg;
if ( defined($opening_indentation)
&& get_spaces($indentation) >
$opening_indentation, $opening_offset,
$is_leading, $opening_exists
)
- = $self->get_opening_indentation( $ibeg, $ri_first, $ri_last,
- $rindentation_list );
+ = $self->get_opening_indentation( $ibeg, $rindentation_list,
+ undef );
my $indentation = $leading_spaces_beg;
if ( defined($opening_indentation)
&& get_spaces($indentation) >
elsif ( $cti == 3 ) {
$adjust_indentation = 3;
}
+ else {
+ ## cti == 0
+ }
}
# handle option to indent blocks
$opening_indentation, $opening_offset,
$is_leading, $opening_exists
)
- = $self->get_opening_indentation( $ibeg, $ri_first, $ri_last,
- $rindentation_list );
+ = $self->get_opening_indentation( $ibeg,
+ $rindentation_list, undef );
if ($is_leading) { $adjust_indentation = 2; }
}
+ else {
+ # not a closing type
+ }
+
+ # A final check: reset the flag value from 1 to 0 if moving left would
+ # give this closing token less indentation than the line with its
+ # opening token. We could do this check at the top for more efficiency
+ # except for -lp. For -lp, if the $adjust_indentation flag flips from
+ # 1 to 2, then the -lp logic can do a better recovery if it knows that
+ # the $default_adjust_indentation=1 instead of 0 (c435)
+ if ( $adjust_indentation == 1 ) {
+ my $no_left_adjustment_space = defined($opening_indentation)
+ && get_spaces($leading_spaces_beg) <=
+ get_spaces($opening_indentation);
+ if ($no_left_adjustment_space) {
+ $adjust_indentation = 0;
+ }
+ }
return (
# which matches the token at index $i_opening
# -and its offset (number of columns) from the start of the line
#
- my ( $self, $i_closing, $ri_first, $ri_last, $rindentation_list, $qw_seqno )
- = @_;
+ my (
+ $self,
+
+ $i_closing,
+ $rindentation_list,
+ $qw_seqno
+
+ ) = @_;
# first, see if the opening token is in the current batch
my $i_opening = $mate_index_to_go[$i_closing];
# it is..look up the indentation
( $indent, $offset, $is_leading ) =
- lookup_opening_indentation( $i_opening, $ri_first, $ri_last,
- $rindentation_list );
+ $self->lookup_opening_indentation( $i_opening, $rindentation_list );
}
# if not, it should have been stored in the hash by a previous batch
return;
} ## end sub examine_vertical_tightness_flags
+my %is_uncovered_operator;
+
+# b1060, b1499
+BEGIN {
+ my @q = qw( ? : && || );
+ @is_uncovered_operator{@q} = (1) x scalar(@q);
+}
+
sub set_vertical_tightness_flags {
- my ( $self, $n, $n_last_line, $ibeg, $iend, $ri_first, $ri_last,
- $ending_in_quote, $closing_side_comment )
- = @_;
+ my ( $self, $nline, $closing_side_comment ) = @_;
+
+ # Given:
+ # $nline = index of this line in the current output batch
+ # $closing_side_comment = true if line has side comment
# Define vertical tightness controls for the nth line of a batch.
# Note: do not call this sub for a block comment or if
# continually increase if we allowed it when the -fws flag is set.
# See case b499 for an example.
- # Define these values...
- my $vt_type = 0;
- my $vt_opening_flag = 0;
- my $vt_closing_flag = 0;
- my $vt_seqno = 0;
- my $vt_valid_flag = 0;
- my $vt_seqno_beg = 0;
- my $vt_seqno_end = 0;
- my $vt_min_lines = 0;
- my $vt_max_lines = 0;
-
# Uses these global parameters:
# $rOpts_block_brace_tightness
# $rOpts_block_brace_vertical_tightness
# %stack_closing_token
# %stack_opening_token
+ # Pull out needed batch variables
+ my $this_batch = $self->[_this_batch_];
+ my $ri_first = $this_batch->[_ri_first_];
+ my $ri_last = $this_batch->[_ri_last_];
+
+ my $n_last_line = @{$ri_first} - 1;
+ if ( $nline < 0 || $nline > $n_last_line ) {
+ DEVEL_MODE && Fault("bad line index '$nline' ; max is $n_last_line\n");
+ return;
+ }
+ my $ibeg = $ri_first->[$nline];
+ my $iend = $ri_last->[$nline];
+
+ # Define these values for each vertical tightness type:
+ my (
+
+ $vt_type,
+ $vt_opening_flag,
+ $vt_closing_flag,
+ $vt_seqno,
+ $vt_valid_flag,
+ $vt_min_lines,
+ $vt_max_lines
+ );
+
+ # get the sequence numbers of the ends of this line
+ my $vt_seqno_beg =
+ $type_sequence_to_go[$ibeg] ? $type_sequence_to_go[$ibeg]
+ : $types_to_go[$ibeg] eq 'q' ? $self->get_seqno($ibeg)
+ : EMPTY_STRING;
+ my $vt_seqno_end =
+ $type_sequence_to_go[$iend] ? $type_sequence_to_go[$iend]
+ : $types_to_go[$iend] eq 'q' ? $self->get_seqno($iend)
+ : EMPTY_STRING;
+
#--------------------------------------------------------------
# Vertical Tightness Flags Section 1:
# Handle Lines 1 .. n-1 but not the last line
# For non-BLOCK tokens, we will need to examine the next line
# too, so we won't consider the last line.
#--------------------------------------------------------------
- if ( $n < $n_last_line ) {
+ if ( $nline < $n_last_line ) {
+
+ # NOTE: Section 1 has 4 sub-sections: 1a, 1b, 1c, and 1d. The logic to
+ # reach any of these end states is complex, and it is possible but very
+ # unlikely that more than one of these end states could be reached.
+ # The current logic is to keep going and use the last such state.
+ # There are currently no known instances where multiple end states can
+ # be reached, but it is something to be aware of when making changes.
#--------------------------------------------------------------
# Vertical Tightness Flags Section 1a:
# Look for Type 1, last token of this line is a non-block opening token
#--------------------------------------------------------------
- my $ibeg_next = $ri_first->[ $n + 1 ];
+ my $ibeg_next = $ri_first->[ $nline + 1 ];
my $token_end = $tokens_to_go[$iend];
- my $iend_next = $ri_last->[ $n + 1 ];
+ my $iend_next = $ri_last->[ $nline + 1 ];
if (
$type_sequence_to_go[$iend]
# requested
my $ovt = $opening_vertical_tightness{$token_end};
- # Turn off the -vt flag if the next line ends in a weld.
- # This avoids an instability with one-line welds (fixes b1183).
- my $type_end_next = $types_to_go[$iend_next];
- $ovt = 0
- if ( $self->[_rK_weld_left_]->{ $K_to_go[$iend_next] }
- && $is_closing_type{$type_end_next} );
+ # if we are in -lp and the next line ends in a weld..
+ if ( $rOpts_line_up_parentheses
+ && $self->[_rK_weld_left_]->{ $K_to_go[$iend_next] } )
+ {
+ my $type_end_next = $types_to_go[$iend_next];
+
+ # Turn off -vt if the next line ends in a closing token. This
+ # avoids an instability with one-line welds (b1183).
+ if ( $is_closing_type{$type_end_next} ) {
+ $ovt = 0;
+ }
+
+ # Turn off -vt if the next line ends in an opening token. This
+ # avoids an instability (b1460).
+ elsif ( $is_opening_type{$type_end_next} ) {
+ $ovt = 0;
+ }
+
+ # neither opening nor closing
+ else {
+ }
+ }
# The flag '_rbreak_container_' avoids conflict of -bom and -pt=1
# or -pt=2; fixes b1270. See similar patch above for $cvt.
min( $ovt, $self->[_rmax_vertical_tightness_]->{$seqno} );
}
- unless (
- $ovt < 2
- && ( $nesting_depth_to_go[ $iend_next + 1 ] !=
+ if (
+ $ovt >= 2
+ || ( $nesting_depth_to_go[ $iend_next + 1 ] ==
$nesting_depth_to_go[$ibeg_next] )
)
{
$vt_type = 1;
$vt_opening_flag = $ovt;
+ $vt_closing_flag = 0;
$vt_seqno = $type_sequence_to_go[$iend];
$vt_valid_flag = $valid_flag;
+ $vt_min_lines = 0;
+ $vt_max_lines = 0;
+
}
}
# token .. and be sure this line does not have a side comment
#--------------------------------------------------------------
my $token_next = $tokens_to_go[$ibeg_next];
- if ( $type_sequence_to_go[$ibeg_next]
+ if (
+ $type_sequence_to_go[$ibeg_next]
&& !$block_type_to_go[$ibeg_next]
&& $is_closing_token{$token_next}
- && $types_to_go[$iend] ne '#' ) # for safety, shouldn't happen!
+ && !$self->[_rbreak_container_]
+ ->{ $type_sequence_to_go[$ibeg_next] } # b1498
+ && $types_to_go[$iend] ne '#'
+ ) # for safety, shouldn't happen!
{
- my $cvt = $closing_vertical_tightness{$token_next};
+ my $cvt = $closing_vertical_tightness{$token_next};
+ my $seqno = $type_sequence_to_go[$ibeg_next];
# Avoid conflict of -bom and -pvt=1 or -pvt=2, fixes b977, b1303
# See similar patch above for $ovt.
- my $seqno = $type_sequence_to_go[$ibeg_next];
- if ( $cvt && $self->[_rbreak_container_]->{$seqno} ) {
+ # NOTE: this is overriden by fix for b1498 above and can
+ # eventually be removed.
+ if ( 0 && $cvt && $self->[_rbreak_container_]->{$seqno} ) {
$cvt = 0;
}
}
# Fix for b1379, b1380, b1381, b1382, b1384 part 2,
- # instablility with adding and deleting trailing commas:
+ # instability with adding and deleting trailing commas:
# Reducing -cvt=2 to =1 fixes stability for -wtc=b in b1379,1380.
# Reducing -cvt>0 to =0 fixes stability for -wtc=b in b1381,1382.
# Reducing -cvt>0 to =0 fixes stability for -wtc=m in b1384
}
$vt_type = 2;
+ $vt_opening_flag = 0;
$vt_closing_flag = $tightness{$token_next} == 2 ? 0 : 1;
$vt_seqno = $type_sequence_to_go[$ibeg_next];
$vt_valid_flag = $valid_flag;
# Fix for case b1060 when both -baoo and -otr are set:
# to avoid blinking, honor the -baoo flag over the -otr flag.
- && $token_end ne '||' && $token_end ne '&&'
+ # b1499 added ? and : for same reason
+ ##&& $token_end ne '||' && $token_end ne '&&'
+ && !$is_uncovered_operator{$token_end}
# Keep break after '=' if -lp. Fixes b964 b1040 b1062 b1083 b1089.
# Generalized from '=' to $is_assignment to fix b1375.
# give -kba priority over -otr (b1445)
&& !$self->[_rbreak_after_Klast_]->{ $K_to_go[$iend] }
+
+ # Fix b1462, b1463: avoid possible edge instability with
+ # the combination -xlp and -dws
+ && !(
+ $rOpts_extended_line_up_parentheses
+ && $rOpts_delete_old_whitespace
+ )
)
{
my $spaces = ( $types_to_go[ $ibeg_next - 1 ] eq 'b' ) ? 1 : 0;
$vt_type = 2;
+ $vt_opening_flag = 0;
$vt_closing_flag = $spaces;
$vt_seqno = $type_sequence_to_go[$ibeg_next];
$vt_valid_flag = 1;
+ $vt_min_lines = 0;
+ $vt_max_lines = 0;
}
#--------------------------------------------------------------
# avoid instability of combo -bom and -sct; b1179
my $seq_next = $type_sequence_to_go[$ibeg_next];
+ my $bom = $seq_next && $self->[_rbreak_container_]->{$seq_next};
$stackable = $stack_closing_token{$token_beg_next}
- unless ( $block_type_to_go[$ibeg_next]
- || $seq_next && $self->[_rbreak_container_]->{$seq_next} );
+ unless ( $block_type_to_go[$ibeg_next] || $bom );
}
elsif ($is_opening_token{$token_end}
&& $is_opening_token{$token_beg_next} )
{
$stackable = $stack_opening_token{$token_beg_next}
- unless ( $block_type_to_go[$ibeg_next] )
- ; # shouldn't happen; just checking
+ unless ( $block_type_to_go[$ibeg_next] ); # shouldn't happen
+ }
+ else {
+ ## not stackable
}
if ($stackable) {
my $is_semicolon_terminated;
- if ( $n + 1 == $n_last_line ) {
- my ( $terminal_type, $i_terminal ) =
- terminal_type_i( $ibeg_next, $iend_next );
+ if ( $nline + 1 == $n_last_line ) {
+ my $terminal_type = terminal_type_i( $ibeg_next, $iend_next );
$is_semicolon_terminated = $terminal_type eq ';'
&& $nesting_depth_to_go[$iend_next] <
$nesting_depth_to_go[$ibeg_next];
my $spaces = ( $types_to_go[ $ibeg_next - 1 ] eq 'b' ) ? 1 : 0;
$vt_type = 2;
+ $vt_opening_flag = 0;
$vt_closing_flag = $spaces;
$vt_seqno = $type_sequence_to_go[$ibeg_next];
$vt_valid_flag = 1;
-
+ $vt_min_lines = 0;
+ $vt_max_lines = 0;
}
}
}
{
$vt_type = 3;
$vt_opening_flag = $rOpts_block_brace_vertical_tightness;
+ $vt_closing_flag = 0;
$vt_seqno = 0;
$vt_valid_flag = 1;
+ $vt_min_lines = 0;
+ $vt_max_lines = 0;
}
#--------------------------------------------------------------
&& $ibeg eq $iend
&& $block_type_to_go[$iend]
&& $types_to_go[$iend] eq '}'
- && ( !$closing_side_comment || $n < $n_last_line ) )
+ && ( !$closing_side_comment || $nline < $n_last_line ) )
{
my $spaces = $rOpts_block_brace_tightness == 2 ? 0 : 1;
$vt_type = 4;
+ $vt_opening_flag = 0;
$vt_closing_flag = $spaces;
$vt_seqno = $type_sequence_to_go[$iend];
$vt_valid_flag = 1;
-
+ $vt_min_lines = 0;
+ $vt_max_lines = 0;
}
-
- # get the sequence numbers of the ends of this line
- $vt_seqno_beg = $type_sequence_to_go[$ibeg];
- if ( !$vt_seqno_beg ) {
- if ( $types_to_go[$ibeg] eq 'q' ) {
- $vt_seqno_beg = $self->get_seqno( $ibeg, $ending_in_quote );
- }
- else { $vt_seqno_beg = EMPTY_STRING }
+ else {
+ # no -vt flags apply
}
- $vt_seqno_end = $type_sequence_to_go[$iend];
- if ( !$vt_seqno_end ) {
- if ( $types_to_go[$iend] eq 'q' ) {
- $vt_seqno_end = $self->get_seqno( $iend, $ending_in_quote );
+ my $last_vt_type = $self->[_last_vt_type_];
+ $self->[_last_vt_type_] = $vt_type;
+
+ if ( !$vt_type ) {
+
+ # Make a simple return if this line is not involved in vertical
+ # tightness at all.
+ if ( !$vt_seqno_beg
+ && !$vt_seqno_end
+ && !$last_vt_type )
+ {
+ return;
}
- else { $vt_seqno_end = EMPTY_STRING }
+
+ $vt_type = 0;
+ $vt_opening_flag = 0;
+ $vt_closing_flag = 0;
+ $vt_seqno = 0;
+ $vt_valid_flag = 0;
+ $vt_min_lines = 0;
+ $vt_max_lines = 0;
}
+ else {
- if ( !defined($vt_seqno) ) { $vt_seqno = EMPTY_STRING }
+ # guard against undefined sequence numbers pulled from _to_go
+ if ( !defined($vt_seqno) ) { $vt_seqno = 0 }
+ }
- my $rvertical_tightness_flags = {
+ # return the full data structure
+ return {
_vt_type => $vt_type,
_vt_opening_flag => $vt_opening_flag,
_vt_closing_flag => $vt_closing_flag,
_vt_seqno => $vt_seqno,
_vt_valid_flag => $vt_valid_flag,
- _vt_seqno_beg => $vt_seqno_beg,
- _vt_seqno_end => $vt_seqno_end,
_vt_min_lines => $vt_min_lines,
_vt_max_lines => $vt_max_lines,
+ _vt_seqno_beg => $vt_seqno_beg,
+ _vt_seqno_end => $vt_seqno_end,
};
- return ($rvertical_tightness_flags);
} ## end sub set_vertical_tightness_flags
##########################################################
# These routines are called once per batch when the --closing-side-comments flag
# has been set.
- my %block_leading_text;
+ my $rblock_leading_text;
my %block_opening_line_number;
my $csc_new_statement_ok;
my $csc_last_label;
my $leading_block_text_line_number;
sub initialize_csc_vars {
- %block_leading_text = ();
+ $rblock_leading_text = {};
%block_opening_line_number = ();
$csc_new_statement_ok = 1;
$csc_last_label = EMPTY_STRING;
$rOpts_closing_side_comment_maximum_text
)
- # UNLESS: we are adding a closing paren before the brace we seek.
- # This is an attempt to avoid situations where the ... to be
- # added are longer than the omitted right paren, as in:
+ # UNLESS: we are adding a closing paren before the brace we
+ # seek. This is an attempt to avoid situations where the ...
+ # to be added are longer than the omitted right paren, as in:
- # foreach my $item (@a_rather_long_variable_name_here) {
- # &whatever;
- # } ## end foreach my $item (@a_rather_long_variable_name_here...
+ #foreach my $item (@a_rather_long_variable_name_here) {
+ # &whatever;
+ #} ## end foreach my $item (@a_rather_long_variable_name_here...
|| (
$tokens_to_go[$i] eq ')'
$leading_block_text_length_exceeded = 1;
$leading_block_text .= '...';
}
+ else {
+ # not enough space to add text
+ }
}
return;
} ## end sub accumulate_block_text
if ( $token eq '}' ) {
# restore any leading text saved when we entered this block
- if ( defined( $block_leading_text{$type_sequence} ) ) {
+ if ( defined( $rblock_leading_text->{$type_sequence} ) ) {
( $block_leading_text, $rblock_leading_if_elsif_text )
- = @{ $block_leading_text{$type_sequence} };
+ = @{ $rblock_leading_text->{$type_sequence} };
$i_block_leading_text = $i;
- delete $block_leading_text{$type_sequence};
+ delete $rblock_leading_text->{$type_sequence};
$rleading_block_if_elsif_text =
$rblock_leading_if_elsif_text;
}
if ( $accumulating_text_for_block
&& $levels_to_go[$i] <= $leading_block_text_level )
{
- my $lev = $levels_to_go[$i];
reset_block_text_accumulator();
}
if ( $accumulating_text_for_block eq $block_type ) {
# save any leading text before we enter this block
- $block_leading_text{$type_sequence} = [
+ $rblock_leading_text->{$type_sequence} = [
$leading_block_text,
$rleading_block_if_elsif_text
];
}
}
}
+ else {
+ ## should not get here
+ DEVEL_MODE
+ && Fault("token=$token should be '{' or '}' for block\n");
+ }
}
if ( $type eq 'k'
&& $csc_new_statement_ok
&& $is_if_elsif_else_unless_while_until_for_foreach{$token}
- && $token =~ /$closing_side_comment_list_pattern/ )
+ && $token =~ /$closing_side_comment_list_pattern/
+ && $token !~ /$closing_side_comment_exclusion_pattern/ )
{
$self->set_block_text_accumulator($i);
}
next unless ( $matching_char{$char} );
# pop most recently appended character
- my $top = chop($csc);
+ my $top = chop $csc;
# push it back plus the mate to the newest character
# unless they balance each other.
- $csc = $csc . $top . $matching_char{$char} unless $top eq $char;
+ $csc = $csc . $top . $matching_char{$char} unless ( $top eq $char );
}
# return the balanced string
} ## end sub balance_csc_text
} ## end closure balance_csc_text
+sub get_asub_block_label {
+ my ( $self, $seqno ) = @_;
+
+ # Given:
+ # $seqno = the sequence number of an asub block
+ # Return:
+ # $block_label = the text # that will be displayed before 'sub' in its
+ # closing side comment.
+ # Note: see similar inline code in sub find_selected_blocks
+
+ # Example:
+ # my $doit = sub { ...
+ # |
+ # ^----------walk back from here to get
+ # $block_label = '$doit ='
+
+ my $block_label = EMPTY_STRING;
+ return $block_label unless ($seqno);
+ my $K_opening = $self->[_K_opening_container_]->{$seqno};
+ my $rLL = $self->[_rLL_];
+ return $block_label unless ($K_opening);
+ my $K_search_min = max( 0, $K_opening - 6 );
+ my ( $saw_equals, $saw_fat_comma, $blank_count, $nonblank_count );
+ my $text = EMPTY_STRING;
+
+ foreach my $KK ( reverse( $K_search_min .. $K_opening - 1 ) ) {
+ my $token_type = $rLL->[$KK]->[_TYPE_];
+ my $token = $rLL->[$KK]->[_TOKEN_];
+
+ # first nonblank, keyword 'sub', is not part of the label
+ if ($nonblank_count) { $text = $token . $text }
+
+ if ( $token_type eq 'b' ) { $blank_count++; next }
+ else { $nonblank_count++ }
+ if ( $token_type eq '=>' ) { $saw_fat_comma++; next }
+ if ( $token_type eq '=' ) { $saw_equals++; next }
+ if ( $token_type eq 'i' && $saw_equals
+ || ( $token_type eq 'w' || $token_type eq 'Q' ) && $saw_fat_comma )
+ {
+ $block_label = $text;
+ $block_label =~ s/\s*$//;
+ last;
+ }
+ }
+ return $block_label;
+} ## end sub get_asub_block_label
+
sub add_closing_side_comment {
- my ( $self, $ri_first, $ri_last ) = @_;
- my $rLL = $self->[_rLL_];
+ my ($self) = @_;
+
+ my $rLL = $self->[_rLL_];
+ my $this_batch = $self->[_this_batch_];
+
+ my $ri_first = $this_batch->[_ri_first_];
+ my $ri_last = $this_batch->[_ri_last_];
# add closing side comments after closing block braces if -csc used
my ( $closing_side_comment, $cscw_block_comment );
#---------------------------------------------------------------
my $have_side_comment = $types_to_go[$max_index_to_go] eq '#';
+ # Special check for asub closing side comments (c380)
+ # These are the only types which follow '};' instead of a bare '}'
+ if ( $terminal_type eq ';' && $closing_side_comment_want_asub ) {
+ if ( $types_to_go[0] eq '}'
+ && $tokens_to_go[0] eq '}'
+ && $i_terminal == $inext_to_go[0] )
+ {
+ my $seqno = $type_sequence_to_go[0];
+ if ( $self->[_ris_asub_block_]->{$seqno} ) {
+
+ # reset the terminal token to be the closing brace so
+ # that the code below ignores the trailing semicolon
+ $terminal_type = '}';
+ $i_terminal = 0;
+
+ # create a name for this asub block
+ $block_label = $self->get_asub_block_label($seqno);
+ }
+ }
+ }
+
# if this line might end in a block closure..
if (
$terminal_type eq '}'
# .. and if this is one of the types of interest
&& $block_type_to_go[$i_terminal] =~
/$closing_side_comment_list_pattern/
-
- # .. but not an anonymous sub
- # These are not normally of interest, and their closing braces are
- # often followed by commas or semicolons anyway. This also avoids
- # possible erratic output due to line numbering inconsistencies
- # in the cases where their closing braces terminate a line.
- && $block_type_to_go[$i_terminal] ne 'sub'
+ && $block_type_to_go[$i_terminal] !~
+ /$closing_side_comment_exclusion_pattern/
# ..and the corresponding opening brace must is not in this batch
# (because we do not need to tag one-line blocks, although this
$token = balance_csc_text($token)
if $rOpts->{'closing-side-comments-balanced'};
- $token =~ s/\s*$//; # trim any trailing whitespace
+ $token =~ s/\s+$//; # trim any trailing whitespace
# handle case of existing closing side comment
if ($have_side_comment) {
$old_csc =~ s/\s+//g; # trim all whitespace
$new_csc =~ s/[\]\)\}\s]*$//; # trim trailing structures
$old_csc =~ s/[\]\)\}\s]*$//; # trim trailing structures
- $new_csc =~ s/(\.\.\.)$//; # trim trailing '...'
- my $new_trailing_dots = $1;
- $old_csc =~ s/(\.\.\.)\s*$//; # trim trailing '...'
+
+ # trim trailing '...'
+ my $new_trailing_dots = $new_csc =~ s/\.\.\.$//;
+ $old_csc =~ s/\.\.\.\s*$//;
# Patch to handle multiple closing side comments at
# else and elsif's. These have become too complicated
elsif ( $block_type_to_go[$i_terminal] eq 'elsif' ) {
if ( $old_csc =~ /\[\s*if/ ) { $old_csc = $new_csc }
}
+ else {
+ # neither else or elsif
+ }
# if old comment is contained in new comment,
# only compare the common part.
# otherwise we'll make a note of it
else {
+ my $msg_line_number;
+ my $K = $K_to_go[$i_terminal];
+ if ( defined($K) ) {
+ $msg_line_number = $rLL->[$K]->[_LINE_INDEX_] + 1;
+ }
warning(
-"perltidy -cscw replaced: $tokens_to_go[$max_index_to_go]\n"
+"perltidy -cscw replaced: $tokens_to_go[$max_index_to_go]\n",
+ $msg_line_number
);
# save the old side comment in a new trailing block
$ri_last->[-1] = $iend;
}
}
+ else {
+ ## above threshold, cannot delete
+ }
}
# switch to the new csc (unless we deleted it!)
# This is the last routine called when a file is formatted.
# Flush buffer and write any informative messages
- my ( $self, $severe_error ) = @_;
+ my ( $self, ($severe_error) ) = @_;
+
+ # Optional parameter:
+ # $severe_error = true if program is ending on an error
+ # false for normal end
$self->flush();
my $file_writer_object = $self->[_file_writer_object_];
my $first_tabbing_disagreement = $self->[_first_tabbing_disagreement_];
my $last_tabbing_disagreement = $self->[_last_tabbing_disagreement_];
- my $tabbing_disagreement_count = $self->[_tabbing_disagreement_count_];
my $in_tabbing_disagreement = $self->[_in_tabbing_disagreement_];
if ($first_tabbing_disagreement) {
$file_writer_object->report_line_length_errors();
# Define the formatter self-check for convergence.
- $self->[_converged_] =
- $severe_error
- || $file_writer_object->get_convergence_check()
+ $self->[_converged_] = $severe_error
+ || (!$self->[_want_second_iteration_]
+ && $file_writer_object->get_convergence_check() )
|| $rOpts->{'indent-only'};
return;
package Perl::Tidy::HtmlWriter;
use strict;
use warnings;
-our $VERSION = '20230309';
+our $VERSION = '20250105';
+use Carp;
use English qw( -no_match_vars );
use File::Basename;
+use File::Temp qw();
use constant EMPTY_STRING => q{};
use constant SPACE => q{ };
+{ #<<< A non-indenting brace to contain all lexical variables
+
# class variables
-use vars qw{
- %html_color
- %html_bold
- %html_italic
- %token_short_names
- %short_to_long_names
- $rOpts
- $css_filename
- $css_linkname
- $missing_html_entities
- $missing_pod_html
-};
+my (
+
+ # INITIALIZER: BEGIN block
+ $missing_html_entities,
+ $missing_pod_html,
+
+ # INITIALIZER: BEGIN block
+ %short_to_long_names,
+ %token_short_names,
+
+ # INITIALIZER: sub check_options
+ $rOpts,
+ $rOpts_html_entities,
+ $css_linkname,
+ %html_bold,
+ %html_color,
+ %html_italic,
+
+);
# replace unsafe characters with HTML entity representation if HTML::Entities
# is available
#{ eval "use HTML::Entities"; $missing_html_entities = $@; }
BEGIN {
+
+ $missing_html_entities = EMPTY_STRING;
if ( !eval { require HTML::Entities; 1 } ) {
$missing_html_entities = $EVAL_ERROR ? $EVAL_ERROR : 1;
}
+
+ $missing_pod_html = EMPTY_STRING;
if ( !eval { require Pod::Html; 1 } ) {
$missing_pod_html = $EVAL_ERROR ? $EVAL_ERROR : 1;
}
return if ( $AUTOLOAD =~ /\bDESTROY$/ );
my ( $pkg, $fname, $lno ) = caller();
my $my_package = __PACKAGE__;
- print STDERR <<EOM;
+ print {*STDERR} <<EOM;
======================================================================
Error detected in package '$my_package', version $VERSION
Received unexpected AUTOLOAD call for sub '$AUTOLOAD'
-Called from package: '$pkg'
+Called from package: '$pkg'
Called from File '$fname' at line '$lno'
This error is probably due to a recent programming change
======================================================================
sub new {
- my ( $class, @args ) = @_;
+ my ( $class, @arglist ) = @_;
+ if ( @arglist % 2 ) { croak "Odd number of items in arg hash list\n" }
my %defaults = (
input_file => undef,
html_toc_extension => undef,
html_src_extension => undef,
);
- my %args = ( %defaults, @args );
+ my %args = ( %defaults, @arglist );
my $input_file = $args{input_file};
my $html_file = $args{html_file};
my $html_toc_extension = $args{html_toc_extension};
my $html_src_extension = $args{html_src_extension};
- my $html_file_opened = 0;
- my $html_fh;
- ( $html_fh, my $html_filename ) =
- Perl::Tidy::streamhandle( $html_file, 'w' );
- unless ($html_fh) {
- Perl::Tidy::Warn("can't open $html_file: $ERRNO\n");
+ my $html_fh = Perl::Tidy::streamhandle( $html_file, 'w' );
+ if ( !$html_fh ) {
+ Perl::Tidy::Warn("can't open html file '$html_file'\n");
return;
}
- $html_file_opened = 1;
+ my $html_file_opened = 1;
if ( !$input_file || $input_file eq '-' || ref($input_file) ) {
$input_file = "NONAME";
# pre section goes directly to the output stream
$html_pre_fh = $html_fh;
- $html_pre_fh->print( <<"PRE_END");
+ $html_pre_fh->print(<<"PRE_END");
<pre>
PRE_END
}
undef $rOpts->{'pod2html'};
}
else {
- ##eval "use Pod::Html";
- #if ($@) {
if ($missing_pod_html) {
Perl::Tidy::Warn(
"unable to find Pod::Html; cannot use pod2html\n-npod disables this message\n"
my $toc_filename;
my $src_filename;
if ( $rOpts->{'frames'} ) {
- unless ($extension) {
+ if ( !$extension ) {
Perl::Tidy::Warn(
"cannot use frames without a specified output extension; ignoring -frm\n"
);
# ----------------------------------------------------------
my $title = $rOpts->{'title'};
- unless ($title) {
- ( $title, my $path ) = fileparse($input_file);
+ if ( !$title ) {
+ ( $title, my $path_uu ) = fileparse($input_file);
}
my $toc_item_count = 0;
my $in_toc_package = EMPTY_STRING;
}, $class;
} ## end sub new
-sub close_object {
- my ($object) = @_;
-
- # returns true if close works, false if not
- # failure probably means there is no close method
- return eval { $object->close(); 1 };
-} ## end sub close_object
-
sub add_toc_item {
# Add an item to the html table of contents.
${$rin_toc_package} = EMPTY_STRING;
}
return;
- };
+ }; ## end $end_package_list = sub
my $start_package_list = sub {
my ( $unique_name, $package ) = @_;
EOM
${$rin_toc_package} = $package;
return;
- };
+ }; ## end $start_package_list = sub
# start the table of contents on the first item
- unless ( ${$rtoc_item_count} ) {
+ if ( !${$rtoc_item_count} ) {
# but just quit if we hit EOF without any other entries
# in this case, there will be no toc
return if ( $type eq 'EOF' );
- $html_toc_fh->print( <<"TOC_END");
+ $html_toc_fh->print(<<"TOC_END");
<!-- BEGIN CODE INDEX --><a name="code-index"></a>
<ul>
TOC_END
# start/stop lists of subs
if ( $type eq 'sub' ) {
my $package = $rpackage_stack->[ ${$rlast_level} ];
- unless ($package) { $package = 'main' }
+ if ( !$package ) { $package = 'main' }
# if we're already in a package/sub list, be sure its the right
# package or else close it
}
# start a package/sub list if necessary
- unless ( ${$rin_toc_package} ) {
+ if ( !${$rin_toc_package} ) {
$start_package_list->( $unique_name, $package );
}
}
# end the table of contents, if any, on the end of file
if ( $type eq 'EOF' ) {
- $html_toc_fh->print( <<"TOC_END");
+ $html_toc_fh->print(<<"TOC_END");
</ul>
<!-- END CODE INDEX -->
TOC_END
# When adding NEW_TOKENS: update this hash table
# $type => $short_name
+ # c250: changed 'M' to 'S'
%token_short_names = (
'#' => 'c',
'n' => 'n',
'f' => 'sc',
'(' => 'p',
')' => 'p',
- 'M' => 'm',
- 'P' => 'pd',
+ 'S' => 'm',
+ 'pd' => 'pd',
'A' => 'co',
);
# These token types will all be called identifiers for now
- my @identifier = qw< i t U C Y Z G :: CORE::>;
+ # Fix for c250: added new type 'P', formerly 'i'
+ # ( but package statements will eventually be split into 'k' and 'i')
+ my @identifier = qw< i t U C Y Z G P :: CORE::>;
@token_short_names{@identifier} = ('i') x scalar(@identifier);
# These token types will be called 'structure'
sub make_getopt_long_names {
my ( $class, $rgetopt_names ) = @_;
- while ( my ( $short_name, $name ) = each %short_to_long_names ) {
- push @{$rgetopt_names}, "html-color-$name=s";
- push @{$rgetopt_names}, "html-italic-$name!";
- push @{$rgetopt_names}, "html-bold-$name!";
+ foreach my $short_name ( keys %short_to_long_names ) {
+ my $long_name = $short_to_long_names{$short_name};
+ push @{$rgetopt_names}, "html-color-$long_name=s";
+ push @{$rgetopt_names}, "html-italic-$long_name!";
+ push @{$rgetopt_names}, "html-bold-$long_name!";
}
push @{$rgetopt_names}, "html-color-background=s";
push @{$rgetopt_names}, "html-linked-style-sheet=s";
my ( $class, $rexpansion ) = @_;
# abbreviations for color/bold/italic properties
- while ( my ( $short_name, $long_name ) = each %short_to_long_names ) {
+ foreach my $short_name ( keys %short_to_long_names ) {
+ my $long_name = $short_to_long_names{$short_name};
${$rexpansion}{"hc$short_name"} = ["html-color-$long_name"];
${$rexpansion}{"hb$short_name"} = ["html-bold-$long_name"];
${$rexpansion}{"hi$short_name"} = ["html-italic-$long_name"];
# X11 color names for default settings that seemed to look ok
# (these color names are only used for programming clarity; the hex
# numbers are actually written)
+## use constant SaddleBrown => "#8B4513";
use constant ForestGreen => "#228B22";
- use constant SaddleBrown => "#8B4513";
use constant magenta4 => "#8B008B";
use constant IndianRed3 => "#CD5555";
use constant DeepSkyBlue4 => "#00688B";
# setup property lookup tables for tokens based on their short names
# every token type has a short name, and will use these tables
# to do the html markup
- while ( my ( $short_name, $long_name ) = each %short_to_long_names ) {
+ foreach my $short_name ( keys %short_to_long_names ) {
+ my $long_name = $short_to_long_names{$short_name};
$html_color{$short_name} = $rOpts->{"html-color-$long_name"};
$html_bold{$short_name} = $rOpts->{"html-bold-$long_name"};
$html_italic{$short_name} = $rOpts->{"html-italic-$long_name"};
}
# make sure user gives a file name after -css
+ $css_linkname = EMPTY_STRING;
if ( defined( $rOpts->{'html-linked-style-sheet'} ) ) {
$css_linkname = $rOpts->{'html-linked-style-sheet'};
if ( $css_linkname =~ /^-/ ) {
# forgets to specify the style sheet, like this:
# perltidy -html -css myfile1.pl myfile2.pl
# This would cause myfile1.pl to parsed as the style sheet by GetOpts
- my $css_filename = $css_linkname;
- unless ( -e $css_filename ) {
- write_style_sheet_file($css_filename);
+ if ( !-e $css_linkname ) {
+ write_style_sheet_file($css_linkname);
}
}
- $missing_html_entities = 1 unless $rOpts->{'html-entities'};
+ $rOpts_html_entities = $rOpts->{'html-entities'};
return;
} ## end sub check_options
sub write_style_sheet_file {
- my $css_filename = shift;
- my $fh;
- unless ( $fh = IO::File->new("> $css_filename") ) {
- Perl::Tidy::Die("can't open $css_filename: $ERRNO\n");
+ my $filename = shift;
+ my $fh = IO::File->new("> $filename");
+ if ( !$fh ) {
+ Perl::Tidy::Die("can't open $filename: $OS_ERROR\n");
}
write_style_sheet_data($fh);
- close_object($fh);
+ if ( $fh->can('close') && $filename ne '-' && !ref($filename) ) {
+ $fh->close()
+ or
+ Perl::Tidy::Warn("can't close style sheet '$filename' : $OS_ERROR\n");
+ }
return;
} ## end sub write_style_sheet_file
$fh->print(<<"EOM");
/* default style sheet generated by perltidy */
body {background: $bg_color; color: $text_color}
-pre { color: $text_color;
+pre { color: $text_color;
background: $pre_bg_color;
font-family: courier;
- }
+ }
EOM
set_default_color( "html-color-$short_to_long_names{$short_name}", $color );
my $key;
$key = "html-bold-$short_to_long_names{$short_name}";
- $rOpts->{$key} = ( defined $rOpts->{$key} ) ? $rOpts->{$key} : $bold;
+ $rOpts->{$key} = defined( $rOpts->{$key} ) ? $rOpts->{$key} : $bold;
$key = "html-italic-$short_to_long_names{$short_name}";
- $rOpts->{$key} = ( defined $rOpts->{$key} ) ? $rOpts->{$key} : $italic;
+ $rOpts->{$key} = defined( $rOpts->{$key} ) ? $rOpts->{$key} : $italic;
return;
} ## end sub set_default_properties
# return 1 if success, 0 otherwise
my ( $self, $pod_string, $css_string, $toc_string, $rpre_string_stack ) =
@_;
- my $input_file = $self->{_input_file};
my $title = $self->{_title};
my $success_flag = 0;
# don't try to use pod2html if no pod
- unless ($pod_string) {
+ if ( !$pod_string ) {
return $success_flag;
}
# Pod::Html requires a real temporary filename
my ( $fh_tmp, $tmpfile ) = File::Temp::tempfile();
- unless ($fh_tmp) {
+ if ( !$fh_tmp ) {
Perl::Tidy::Warn(
"unable to open temporary file $tmpfile; cannot use pod2html\n");
return $success_flag;
# write the pod text to the temporary file
$fh_tmp->print($pod_string);
- $fh_tmp->close();
+
+ if ( !$fh_tmp->close() ) {
+ Perl::Tidy::Warn(
+ "unable to close temporary file $tmpfile; cannot use pod2html\n");
+ return $success_flag;
+ }
# Hand off the pod to pod2html.
# Note that we can use the same temporary filename for input and output
# "backlink=s", "cachedir=s", "htmlroot=s", "libpods=s",
# "podpath=s", "podroot=s"
# Note: -css=s is handled by perltidy itself
- foreach my $kw (qw(backlink cachedir htmlroot libpods podpath podroot))
+ foreach
+ my $kw (qw( backlink cachedir htmlroot libpods podpath podroot ))
{
if ( $rOpts->{$kw} ) { push @args, "--$kw=$rOpts->{$kw}" }
}
# Toggle switches; these have extra leading 'pod'
# "header!", "index!", "recurse!", "quiet!", "verbose!"
- foreach my $kw (qw(podheader podindex podrecurse podquiet podverbose)) {
+ foreach
+ my $kw (qw( podheader podindex podrecurse podquiet podverbose ))
+ {
my $kwd = $kw; # allows us to strip 'pod'
if ( $rOpts->{$kw} ) { $kwd =~ s/^pod//; push @args, "--$kwd" }
elsif ( defined( $rOpts->{$kw} ) ) {
$kwd =~ s/^pod//;
push @args, "--no$kwd";
}
+ else {
+ # user did not set this keyword
+ }
}
# "flush",
# Must clean up if pod2html dies (it can);
# Be careful not to overwrite callers __DIE__ routine
local $SIG{__DIE__} = sub {
- unlink $tmpfile if -e $tmpfile;
+ unlink($tmpfile) if -e $tmpfile;
Perl::Tidy::Die( $_[0] );
};
Pod::Html::pod2html(@args);
}
$fh_tmp = IO::File->new( $tmpfile, 'r' );
- unless ($fh_tmp) {
+ if ( !$fh_tmp ) {
# this error shouldn't happen ... we just used this filename
Perl::Tidy::Warn(
if ($in_toc) { push @toc, $line }
}
return;
- };
+ }; ## end $html_print = sub
# loop over lines of html output from pod2html and merge in
# the necessary perltidy html sections
my $date = localtime;
$timestamp = "on $date";
}
- while ( my $line = $fh_tmp->getline() ) {
+ while ( defined( my $line = $fh_tmp->getline() ) ) {
if ( $line =~ /^\s*<html>\s*$/i ) {
- ##my $date = localtime;
- ##$html_print->("<!-- Generated by perltidy on $date -->\n");
$html_print->("<!-- Generated by perltidy $timestamp -->\n");
$html_print->($line);
}
if ($toc_string) {
$html_print->("<hr />\n") if $rOpts->{'frames'};
$html_print->("<h2>Code Index:</h2>\n");
- ##my @toc = map { $_ .= "\n" } split /\n/, $toc_string;
- my @toc_st = map { $_ . "\n" } split /\n/, $toc_string;
+ my @toc_st = split /^/, $toc_string;
$html_print->(@toc_st);
}
$in_toc = EMPTY_STRING;
if ($toc_string) {
$html_print->("<hr />\n") if $rOpts->{'frames'};
$html_print->("<h2>Code Index:</h2>\n");
- ##my @toc = map { $_ .= "\n" } split /\n/, $toc_string;
- my @toc_st = map { $_ . "\n" } split /\n/, $toc_string;
+ my @toc_st = split /^/, $toc_string;
$html_print->(@toc_st);
}
$in_toc = EMPTY_STRING;
# Intermingle code and pod sections if we saw multiple =cut's.
if ( $self->{_pod_cut_count} > 1 ) {
- my $rpre_string = shift( @{$rpre_string_stack} );
+ my $rpre_string = shift @{$rpre_string_stack};
if ( ${$rpre_string} ) {
$html_print->('<pre>');
$html_print->( ${$rpre_string} );
elsif ( $line =~ /^\s*<\/body>\s*$/i ) {
$saw_body_end = 1;
if ( @{$rpre_string_stack} ) {
- unless ( $self->{_pod_cut_count} > 1 ) {
+ if ( $self->{_pod_cut_count} <= 1 ) {
$html_print->('<hr />');
}
- while ( my $rpre_string = shift( @{$rpre_string_stack} ) ) {
+ while ( @{$rpre_string_stack} ) {
+ my $rpre_string = shift @{$rpre_string_stack};
$html_print->('<pre>');
$html_print->( ${$rpre_string} );
$html_print->('</pre>');
- }
+ } ## end while ( @{$rpre_string_stack...})
}
$html_print->($line);
}
else {
$html_print->($line);
}
- }
+ } ## end while ( defined( my $line...))
$success_flag = 1;
- unless ($saw_body) {
+ if ( !$saw_body ) {
Perl::Tidy::Warn("Did not see <body> in pod2html output\n");
$success_flag = 0;
}
- unless ($saw_body_end) {
+ if ( !$saw_body_end ) {
Perl::Tidy::Warn("Did not see </body> in pod2html output\n");
$success_flag = 0;
}
- unless ($saw_index) {
+ if ( !$saw_index ) {
Perl::Tidy::Warn("Did not find INDEX END in pod2html output\n");
$success_flag = 0;
}
- close_object($html_fh);
+ if ( $html_fh->can('close') ) {
+ $html_fh->close();
+ }
# note that we have to unlink tmpfile before making frames
# because the tmpfile may be one of the names used for frames
if ( -e $tmpfile ) {
- unless ( unlink($tmpfile) ) {
+ if ( !unlink($tmpfile) ) {
Perl::Tidy::Warn(
- "couldn't unlink temporary file $tmpfile: $ERRNO\n");
+ "couldn't unlink temporary file $tmpfile: $OS_ERROR\n");
$success_flag = 0;
}
}
# $html_filename contains the no-frames html output
# $rtoc is a reference to an array with the table of contents
my ( $self, $rtoc ) = @_;
- my $input_file = $self->{_input_file};
my $html_filename = $self->{_html_file};
my $toc_filename = $self->{_toc_filename};
my $src_filename = $self->{_src_filename};
# 3. - the frame which contains them
# get basenames for relative links
- my ( $toc_basename, $toc_path ) = fileparse($toc_filename);
- my ( $src_basename, $src_path ) = fileparse($src_filename);
+ my ( $toc_basename, $toc_path_uu ) = fileparse($toc_filename);
+ my ( $src_basename, $src_path_uu ) = fileparse($src_filename);
# 1. Make the table of contents panel, with appropriate changes
# to the anchor names
- my $src_frame_name = 'SRC';
- my $first_anchor =
- write_toc_html( $title, $toc_filename, $src_basename, $rtoc,
- $src_frame_name );
+ my $src_frame_name = 'SRC';
+ my $first_anchor_uu = write_toc_html(
+ {
+ title => $title,
+ toc_filename => $toc_filename,
+ src_basename => $src_basename,
+ rtoc => $rtoc,
+ src_frame_name => $src_frame_name,
+ }
+ );
# 2. The current .html filename is renamed to be the contents panel
rename( $html_filename, $src_filename )
or Perl::Tidy::Die(
- "Cannot rename $html_filename to $src_filename: $ERRNO\n");
+ "Cannot rename $html_filename to $src_filename: $OS_ERROR\n");
# 3. Then use the original html filename for the frame
write_frame_html(
- $title, $html_filename, $top_basename,
- $toc_basename, $src_basename, $src_frame_name
+ {
+ title => $title,
+ frame_filename => $html_filename,
+ top_basename => $top_basename,
+ toc_basename => $toc_basename,
+ src_basename => $src_basename,
+ src_frame_name => $src_frame_name,
+ }
);
return;
} ## end sub make_frame
sub write_toc_html {
# write a separate html table of contents file for frames
- my ( $title, $toc_filename, $src_basename, $rtoc, $src_frame_name ) = @_;
+ my ($rarg_hash) = @_;
+
+ my $title = $rarg_hash->{title};
+ my $toc_filename = $rarg_hash->{toc_filename};
+ my $src_basename = $rarg_hash->{src_basename};
+ my $rtoc = $rarg_hash->{rtoc};
+ my $src_frame_name = $rarg_hash->{src_frame_name};
+
my $fh = IO::File->new( $toc_filename, 'w' )
- or Perl::Tidy::Die("Cannot open $toc_filename: $ERRNO\n");
+ or Perl::Tidy::Die("Cannot open $toc_filename: $OS_ERROR\n");
$fh->print(<<EOM);
<html>
<head>
<h1><a href=\"$src_basename#-top-" target="$src_frame_name">$title</a></h1>
EOM
- my $first_anchor =
+ my $first_anchor_uu =
change_anchor_names( $rtoc, $src_basename, "$src_frame_name" );
$fh->print( join EMPTY_STRING, @{$rtoc} );
sub write_frame_html {
# write an html file to be the table of contents frame
- my (
- $title, $frame_filename, $top_basename,
- $toc_basename, $src_basename, $src_frame_name
- ) = @_;
+
+ my ($rarg_hash) = @_;
+
+ my $title = $rarg_hash->{title};
+ my $frame_filename = $rarg_hash->{frame_filename};
+ my $top_basename = $rarg_hash->{top_basename};
+ my $toc_basename = $rarg_hash->{toc_basename};
+ my $src_basename = $rarg_hash->{src_basename};
+ my $src_frame_name = $rarg_hash->{src_frame_name};
my $fh = IO::File->new( $frame_filename, 'w' )
- or Perl::Tidy::Die("Cannot open $toc_basename: $ERRNO\n");
+ or Perl::Tidy::Die("Cannot open $toc_basename: $OS_ERROR\n");
$fh->print(<<EOM);
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
my $post = $5;
my $href = "$filename#$name";
$line = "$pre<a href=\"$href\" target=\"$target\">$post\n";
- unless ($first_anchor) { $first_anchor = $href }
+ if ( !$first_anchor ) { $first_anchor = $href }
}
}
return $first_anchor;
# Path 1: finish up if in -pre mode
# ---------------------------------
if ( $rOpts->{'html-pre-only'} ) {
- $html_fh->print( <<"PRE_END");
+ $html_fh->print(<<"PRE_END");
</pre>
PRE_END
- close_object($html_fh);
+ $html_fh->close()
+ if ( $html_fh->can('close') );
return;
}
my $css_string;
my $fh_css = Perl::Tidy::IOScalar->new( \$css_string, 'w' );
- # use css linked to another file
+ # use css linked to another file,
if ( $rOpts->{'html-linked-style-sheet'} ) {
$fh_css->print(
qq(<link rel="stylesheet" href="$css_linkname" type="text/css" />));
}
- # use css embedded in this file
- elsif ( !$rOpts->{'nohtml-style-sheets'} ) {
- $fh_css->print( <<'ENDCSS');
+ # or no css,
+ elsif ( $rOpts->{'nohtml-style-sheets'} ) {
+
+ }
+
+ # or use css embedded in this file
+ else {
+ $fh_css->print(<<'ENDCSS');
<style type="text/css">
<!--
ENDCSS
write_style_sheet_data($fh_css);
- $fh_css->print( <<"ENDCSS");
+ $fh_css->print(<<"ENDCSS");
-->
</style>
ENDCSS
my $date = localtime;
$timestamp = "on $date";
}
- $html_fh->print( <<"HTML_START");
-<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+ $html_fh->print(<<"HTML_START");
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!-- Generated by perltidy $timestamp -->
<html xmlns="http://www.w3.org/1999/xhtml">
# output the css, if used
if ($css_string) {
$html_fh->print($css_string);
- $html_fh->print( <<"ENDCSS");
+ $html_fh->print(<<"ENDCSS");
</head>
<body>
ENDCSS
}
else {
- $html_fh->print( <<"HTML_START");
+ $html_fh->print(<<"HTML_START");
</head>
<body bgcolor=\"$rOpts->{'html-color-background'}\" text=\"$rOpts->{'html-color-punctuation'}\">
HTML_START
}
$html_fh->print("<a name=\"-top-\"></a>\n");
- $html_fh->print( <<"EOM");
+ $html_fh->print(<<"EOM");
<h1>$title</h1>
EOM
# copy the pre section(s)
my $fname_comment = $input_file;
$fname_comment =~ s/--+/-/g; # protect HTML comment tags
- $html_fh->print( <<"END_PRE");
+ $html_fh->print(<<"END_PRE");
<hr />
<!-- contents of filename: $fname_comment -->
<pre>
}
# and finish the html page
- $html_fh->print( <<"HTML_END");
+ $html_fh->print(<<"HTML_END");
</pre>
</body>
</html>
HTML_END
- close_object($html_fh);
+ $html_fh->close()
+ if ( $html_fh->can('close') );
if ( $rOpts->{'frames'} ) {
- ##my @toc = map { $_ .= "\n" } split /\n/, ${$rtoc_string};
- my @toc = map { $_ . "\n" } split /\n/, ${$rtoc_string};
+ my @toc = split /^/, ${$rtoc_string};
$self->make_frame( \@toc );
}
return;
# blocks and go out of scope when we leave the block.
#-------------------------------------------------------
if ( $level > ${$rlast_level} ) {
- unless ( $rpackage_stack->[ $level - 1 ] ) {
+ if ( !$rpackage_stack->[ $level - 1 ] ) {
$rpackage_stack->[ $level - 1 ] = 'main';
}
$rpackage_stack->[$level] = $rpackage_stack->[ $level - 1 ];
}
elsif ( $level < ${$rlast_level} ) {
my $package = $rpackage_stack->[$level];
- unless ($package) { $package = 'main' }
+ if ( !$package ) { $package = 'main' }
# if we change packages due to a nesting change, we
# have to make an entry in the toc
$self->add_toc_item( $package, 'package' );
}
}
+ else {
+ ## level unchanged
+ }
${$rlast_level} = $level;
#-------------------------------------------------------
# Intercept a sub name here; split it
# into keyword 'sub' and sub name; and add an
# entry in the toc
+ # Fix for c250: switch from 'i' to 'S'
#-------------------------------------------------------
- if ( $type eq 'i' && $token =~ /^(sub\s+)(\w.*)$/ ) {
+ if ( $type eq 'S' && $token =~ /^(\w+\s+)(\w.*)$/ ) {
$token = $self->markup_html_element( $1, 'k' );
push @colored_tokens, $token;
$token = $2;
- $type = 'M';
+ $type = 'S';
# but don't include sub declarations in the toc;
# these will have leading token types 'i;'
my $signature = join EMPTY_STRING, @{$rtoken_type};
- unless ( $signature =~ /^i;/ ) {
+ if ( $signature !~ /^i;/ ) {
my $subname = $token;
$subname =~ s/[\s\(].*$//; # remove any attributes and prototype
$self->add_toc_item( $subname, 'sub' );
# into keyword 'package' and name; add to the toc,
# and update the package stack
#-------------------------------------------------------
- if ( $type eq 'i' && $token =~ /^(package\s+)(\w.*)$/ ) {
+ # Fix for c250: switch from 'i' to 'P' and allow 'class' or 'package'
+ if ( $type eq 'P' && $token =~ /^(\w+\s+)(\w.*)$/ ) {
$token = $self->markup_html_element( $1, 'k' );
push @colored_tokens, $token;
$token = $2;
sub escape_html {
my $token = shift;
- if ($missing_html_entities) {
+ if ( $missing_html_entities || !$rOpts_html_entities ) {
$token =~ s/\&/&/g;
$token =~ s/\</</g;
$token =~ s/\>/>/g;
$self->add_toc_item( '__DATA__', '__DATA__' );
}
elsif ( $line_type =~ /^POD/ ) {
- $line_character = 'P';
+
+ # fix for c250: changed 'P' to 'pd' here and in %token_short_names
+ # to allow use of 'P' as new package token type
+ $line_character = 'pd';
if ( $rOpts->{'pod2html'} ) {
my $html_pod_fh = $self->{_html_pod_fh};
if ( $line_type eq 'POD_START' ) {
$html_pre_fh->print("$html_line\n");
return;
} ## end sub write_line
+
+} ## end package Perl::Tidy::HtmlWriter
1;
use strict;
use warnings;
use Carp;
-our $VERSION = '20230309';
+our $VERSION = '20250105';
+use constant DEVEL_MODE => 0;
use constant EMPTY_STRING => q{};
sub AUTOLOAD {
# except for a programming error.
our $AUTOLOAD;
return if ( $AUTOLOAD =~ /\bDESTROY$/ );
+
+ # Originally there was a dummy sub close. All calls to it should have been
+ # eliminated, but for safety we will check for them here.
+ return 1 if ( $AUTOLOAD =~ /\bclose$/ && !DEVEL_MODE );
my ( $pkg, $fname, $lno ) = caller();
my $my_package = __PACKAGE__;
- print STDERR <<EOM;
+ print {*STDERR} <<EOM;
======================================================================
Error detected in package '$my_package', version $VERSION
Received unexpected AUTOLOAD call for sub '$AUTOLOAD'
-Called from package: '$pkg'
+Called from package: '$pkg'
Called from File '$fname' at line '$lno'
This error is probably due to a recent programming change
======================================================================
EOM
exit 1;
-}
+} ## end sub AUTOLOAD
sub DESTROY {
sub new {
my ( $package, $rscalar, $mode ) = @_;
- my $ref = ref $rscalar;
+ my $ref = ref($rscalar);
if ( $ref ne 'SCALAR' ) {
confess <<EOM;
------------------------------------------------------------------------
# Convert a scalar to an array.
# This avoids looking for "\n" on each call to getline
- #
- # NOTES: The -1 count is needed to avoid loss of trailing blank lines
- # (which might be important in a DATA section).
my @array;
if ( $rscalar && ${$rscalar} ) {
-
- #@array = map { $_ .= "\n" } split /\n/, ${$rscalar}, -1;
- @array = map { $_ . "\n" } split /\n/, ${$rscalar}, -1;
-
- # remove possible extra blank line introduced with split
- if ( @array && $array[-1] eq "\n" ) { pop @array }
+ @array = split /^/, ${$rscalar};
}
my $i_next = 0;
return bless [ \@array, $mode, $i_next ], $package;
------------------------------------------------------------------------
EOM
}
-}
+} ## end sub new
sub getline {
my $self = shift;
}
my $i = $self->[2]++;
return $self->[0]->[$i];
-}
+} ## end sub getline
sub print {
my ( $self, $msg ) = @_;
}
${ $self->[0] } .= $msg;
return;
-}
-sub close { return }
+} ## end sub print
1;
-
use strict;
use warnings;
use Carp;
-our $VERSION = '20230309';
+our $VERSION = '20250105';
+
+use constant DEVEL_MODE => 0;
sub AUTOLOAD {
# except for a programming error.
our $AUTOLOAD;
return if ( $AUTOLOAD =~ /\bDESTROY$/ );
+
+ # Originally there was a dummy sub close. All calls to it should have been
+ # eliminated, but for safety we will check for them here.
+ return 1 if ( $AUTOLOAD =~ /\bclose$/ && !DEVEL_MODE );
my ( $pkg, $fname, $lno ) = caller();
my $my_package = __PACKAGE__;
- print STDERR <<EOM;
+ print {*STDERR} <<EOM;
======================================================================
Error detected in package '$my_package', version $VERSION
Received unexpected AUTOLOAD call for sub '$AUTOLOAD'
-Called from package: '$pkg'
+Called from package: '$pkg'
Called from File '$fname' at line '$lno'
This error is probably due to a recent programming change
======================================================================
EOM
exit 1;
-}
+} ## end sub AUTOLOAD
sub DESTROY {
sub new {
my ( $package, $rarray, $mode ) = @_;
- my $ref = ref $rarray;
+ my $ref = ref($rarray);
if ( $ref ne 'ARRAY' ) {
confess <<EOM;
------------------------------------------------------------------------
------------------------------------------------------------------------
EOM
}
-}
+} ## end sub new
sub getline {
my $self = shift;
}
my $i = $self->[2]++;
return $self->[0]->[$i];
-}
+} ## end sub getline
sub print {
my ( $self, $msg ) = @_;
}
push @{ $self->[0] }, $msg;
return;
-}
-sub close { return }
+} ## end sub print
1;
-
package Perl::Tidy::IndentationItem;
use strict;
use warnings;
-our $VERSION = '20230309';
+
+our $VERSION = '20250105';
BEGIN {
_recoverable_spaces_ => $i++,
_align_seqno_ => $i++,
_marked_ => $i++,
- _stack_depth_ => $i++,
_K_begin_line_ => $i++,
_arrow_count_ => $i++,
_standard_spaces_ => $i++,
return if ( $AUTOLOAD =~ /\bDESTROY$/ );
my ( $pkg, $fname, $lno ) = caller();
my $my_package = __PACKAGE__;
- print STDERR <<EOM;
+ print {*STDERR} <<EOM;
======================================================================
Error detected in package '$my_package', version $VERSION
Received unexpected AUTOLOAD call for sub '$AUTOLOAD'
-Called from package: '$pkg'
+Called from package: '$pkg'
Called from File '$fname' at line '$lno'
This error is probably due to a recent programming change
======================================================================
# align_seqno => # if we are aligning with an opening structure,
# # this is its seqno
# marked => # if visited by corrector logic
- # stack_depth => # indentation nesting depth
# K_begin_line => # first token index K of this level
# arrow_count => # how many =>'s
my $self = [];
+ bless $self, $class;
+
$self->[_spaces_] = $input_hash{spaces};
$self->[_level_] = $input_hash{level};
$self->[_ci_level_] = $input_hash{ci_level};
$self->[_recoverable_spaces_] = 0;
$self->[_align_seqno_] = $input_hash{align_seqno};
$self->[_marked_] = 0;
- $self->[_stack_depth_] = $input_hash{stack_depth};
$self->[_K_begin_line_] = $input_hash{K_begin_line};
$self->[_arrow_count_] = 0;
$self->[_standard_spaces_] = $input_hash{standard_spaces};
$self->[_K_extra_space_] = $input_hash{K_extra_space};
- bless $self, $class;
return $self;
} ## end sub new
# at one indentation item. NOTE: if there are child nodes, their
# total SPACES must be reduced by the caller.
- my ( $item, $spaces_needed ) = @_;
- my $available_spaces = $item->get_available_spaces();
+ my ( $self, $spaces_needed ) = @_;
+ my $available_spaces = $self->get_available_spaces();
my $deleted_spaces =
( $available_spaces > $spaces_needed )
? $spaces_needed
# Fixed for c085; a zero value must remain unchanged unless the closed
# flag has been set.
- my $closed = $item->get_closed();
- $item->decrease_available_spaces($deleted_spaces)
- unless ( $available_spaces == 0 && $closed < 0 );
- $item->decrease_SPACES($deleted_spaces);
- $item->set_recoverable_spaces(0);
+ my $closed = $self->get_closed();
+ $self->decrease_available_spaces($deleted_spaces)
+ if ( $available_spaces != 0 || $closed >= 0 );
+ $self->decrease_SPACES($deleted_spaces);
+ $self->set_recoverable_spaces(0);
return $deleted_spaces;
} ## end sub permanently_decrease_available_spaces
# for an indentation item. We may want to undo this later. NOTE: if
# there are child nodes, their total SPACES must be reduced by the
# caller.
- my ( $item, $spaces_needed ) = @_;
- my $available_spaces = $item->get_available_spaces();
+ my ( $self, $spaces_needed ) = @_;
+ my $available_spaces = $self->get_available_spaces();
my $deleted_spaces =
( $available_spaces > $spaces_needed )
? $spaces_needed
: $available_spaces;
- $item->decrease_available_spaces($deleted_spaces);
- $item->decrease_SPACES($deleted_spaces);
- $item->increase_recoverable_spaces($deleted_spaces);
+ $self->decrease_available_spaces($deleted_spaces);
+ $self->decrease_SPACES($deleted_spaces);
+ $self->increase_recoverable_spaces($deleted_spaces);
return $deleted_spaces;
} ## end sub tentatively_decrease_available_spaces
-sub get_stack_depth {
- return $_[0]->[_stack_depth_];
-}
-
+# time-critical sub
sub get_spaces {
return $_[0]->[_spaces_];
}
sub get_standard_spaces {
- return $_[0]->[_standard_spaces_];
+ my $self = shift;
+ return $self->[_standard_spaces_];
}
+# time-critical sub
sub get_marked {
return $_[0]->[_marked_];
}
} ## end sub set_marked
sub get_available_spaces {
- return $_[0]->[_available_spaces_];
+ my $self = shift;
+ return $self->[_available_spaces_];
}
sub decrease_SPACES {
} ## end sub decrease_available_spaces
sub get_align_seqno {
- return $_[0]->[_align_seqno_];
+ my $self = shift;
+ return $self->[_align_seqno_];
}
sub get_recoverable_spaces {
- return $_[0]->[_recoverable_spaces_];
+ my $self = shift;
+ return $self->[_recoverable_spaces_];
}
sub set_recoverable_spaces {
} ## end sub increase_recoverable_spaces
sub get_ci_level {
- return $_[0]->[_ci_level_];
+ my $self = shift;
+ return $self->[_ci_level_];
}
sub get_level {
- return $_[0]->[_level_];
+ my $self = shift;
+ return $self->[_level_];
}
sub get_spaces_level_ci {
}
sub get_lp_item_index {
- return $_[0]->[_lp_item_index_];
+ my $self = shift;
+ return $self->[_lp_item_index_];
}
sub get_K_begin_line {
- return $_[0]->[_K_begin_line_];
+ my $self = shift;
+ return $self->[_K_begin_line_];
}
sub get_K_extra_space {
- return $_[0]->[_K_extra_space_];
+ my $self = shift;
+ return $self->[_K_extra_space_];
}
sub set_have_child {
} ## end sub set_have_child
sub get_have_child {
- return $_[0]->[_have_child_];
+ my $self = shift;
+ return $self->[_have_child_];
}
sub set_arrow_count {
} ## end sub set_arrow_count
sub get_arrow_count {
- return $_[0]->[_arrow_count_];
+ my $self = shift;
+ return $self->[_arrow_count_];
}
sub set_comma_count {
} ## end sub set_comma_count
sub get_comma_count {
- return $_[0]->[_comma_count_];
+ my $self = shift;
+ return $self->[_comma_count_];
}
sub set_closed {
} ## end sub set_closed
sub get_closed {
- return $_[0]->[_closed_];
+ my $self = shift;
+ return $self->[_closed_];
}
1;
+++ /dev/null
-#####################################################################
-#
-# The Perl::Tidy::LineBuffer class supplies a 'get_line()'
-# method for returning the next line to be parsed, as well as a
-# 'peek_ahead()' method
-#
-# The input parameter is an object with a 'get_line()' method
-# which returns the next line to be parsed
-#
-#####################################################################
-
-package Perl::Tidy::LineBuffer;
-use strict;
-use warnings;
-our $VERSION = '20230309';
-
-sub AUTOLOAD {
-
- # Catch any undefined sub calls so that we are sure to get
- # some diagnostic information. This sub should never be called
- # except for a programming error.
- our $AUTOLOAD;
- return if ( $AUTOLOAD =~ /\bDESTROY$/ );
- my ( $pkg, $fname, $lno ) = caller();
- my $my_package = __PACKAGE__;
- print STDERR <<EOM;
-======================================================================
-Error detected in package '$my_package', version $VERSION
-Received unexpected AUTOLOAD call for sub '$AUTOLOAD'
-Called from package: '$pkg'
-Called from File '$fname' at line '$lno'
-This error is probably due to a recent programming change
-======================================================================
-EOM
- exit 1;
-}
-
-sub DESTROY {
-
- # required to avoid call to AUTOLOAD in some versions of perl
-}
-
-sub new {
-
- my ( $class, $line_source_object ) = @_;
-
- return bless {
- _line_source_object => $line_source_object,
- _rlookahead_buffer => [],
- }, $class;
-}
-
-sub peek_ahead {
- my ( $self, $buffer_index ) = @_;
- my $line = undef;
- my $line_source_object = $self->{_line_source_object};
- my $rlookahead_buffer = $self->{_rlookahead_buffer};
- if ( $buffer_index < scalar( @{$rlookahead_buffer} ) ) {
- $line = $rlookahead_buffer->[$buffer_index];
- }
- else {
- $line = $line_source_object->get_line();
- push( @{$rlookahead_buffer}, $line );
- }
- return $line;
-}
-
-sub get_line {
- my $self = shift;
- my $line = undef;
- my $line_source_object = $self->{_line_source_object};
- my $rlookahead_buffer = $self->{_rlookahead_buffer};
-
- if ( scalar( @{$rlookahead_buffer} ) ) {
- $line = shift @{$rlookahead_buffer};
- }
- else {
- $line = $line_source_object->get_line();
- }
- return $line;
-}
-1;
-
+++ /dev/null
-#####################################################################
-#
-# the Perl::Tidy::LineSink class supplies a write_line method for
-# actual file writing
-#
-#####################################################################
-
-package Perl::Tidy::LineSink;
-use strict;
-use warnings;
-our $VERSION = '20230309';
-
-sub AUTOLOAD {
-
- # Catch any undefined sub calls so that we are sure to get
- # some diagnostic information. This sub should never be called
- # except for a programming error.
- our $AUTOLOAD;
- return if ( $AUTOLOAD =~ /\bDESTROY$/ );
- my ( $pkg, $fname, $lno ) = caller();
- my $my_package = __PACKAGE__;
- print STDERR <<EOM;
-======================================================================
-Error detected in package '$my_package', version $VERSION
-Received unexpected AUTOLOAD call for sub '$AUTOLOAD'
-Called from package: '$pkg'
-Called from File '$fname' at line '$lno'
-This error is probably due to a recent programming change
-======================================================================
-EOM
- exit 1;
-}
-
-sub DESTROY {
-
- # required to avoid call to AUTOLOAD in some versions of perl
-}
-
-sub new {
-
- my ( $class, @args ) = @_;
-
- my %defaults = (
- output_file => undef,
- line_separator => undef,
- is_encoded_data => undef,
- );
- my %args = ( %defaults, @args );
-
- my $output_file = $args{output_file};
- my $line_separator = $args{line_separator};
- my $is_encoded_data = $args{is_encoded_data};
-
- my $fh = undef;
-
- my $output_file_open = 0;
-
- ( $fh, $output_file ) =
- Perl::Tidy::streamhandle( $output_file, 'w', $is_encoded_data );
- unless ($fh) { Perl::Tidy::Die("Cannot write to output stream\n"); }
- $output_file_open = 1;
-
- return bless {
- _fh => $fh,
- _output_file => $output_file,
- _output_file_open => $output_file_open,
- _line_separator => $line_separator,
- _is_encoded_data => $is_encoded_data,
- }, $class;
-}
-
-sub set_line_separator {
- my ( $self, $val ) = @_;
- $self->{_line_separator} = $val;
- return;
-}
-
-sub write_line {
-
- my ( $self, $line ) = @_;
- my $fh = $self->{_fh};
-
- my $line_separator = $self->{_line_separator};
- if ( defined($line_separator) ) {
- chomp $line;
- $line .= $line_separator;
- }
-
- $fh->print($line) if ( $self->{_output_file_open} );
-
- return;
-}
-
-sub close_output_file {
- my $self = shift;
-
- # Only close physical files, not STDOUT and other objects
- my $output_file = $self->{_output_file};
- if ( $output_file ne '-' && !ref $output_file ) {
- $self->{_fh}->close() if $self->{_output_file_open};
- }
- return;
-}
-
-1;
+++ /dev/null
-#####################################################################
-#
-# the Perl::Tidy::LineSource class supplies an object with a 'get_line()' method
-# which returns the next line to be parsed
-#
-#####################################################################
-
-package Perl::Tidy::LineSource;
-use strict;
-use warnings;
-use English qw( -no_match_vars );
-our $VERSION = '20230309';
-
-use constant DEVEL_MODE => 0;
-
-sub AUTOLOAD {
-
- # Catch any undefined sub calls so that we are sure to get
- # some diagnostic information. This sub should never be called
- # except for a programming error.
- our $AUTOLOAD;
- return if ( $AUTOLOAD =~ /\bDESTROY$/ );
- my ( $pkg, $fname, $lno ) = caller();
- my $my_package = __PACKAGE__;
- print STDERR <<EOM;
-======================================================================
-Error detected in package '$my_package', version $VERSION
-Received unexpected AUTOLOAD call for sub '$AUTOLOAD'
-Called from package: '$pkg'
-Called from File '$fname' at line '$lno'
-This error is probably due to a recent programming change
-======================================================================
-EOM
- exit 1;
-}
-
-sub DESTROY {
-
- # required to avoid call to AUTOLOAD in some versions of perl
-}
-
-sub new {
-
- my ( $class, @args ) = @_;
-
- my %defaults = (
- input_file => undef,
- rOpts => undef,
- );
-
- my %args = ( %defaults, @args );
-
- my $input_file = $args{input_file};
- my $rOpts = $args{rOpts};
-
- ( my $fh, $input_file ) = Perl::Tidy::streamhandle( $input_file, 'r' );
- return unless $fh;
-
- return bless {
- _fh => $fh,
- _filename => $input_file,
- _rinput_buffer => [],
- _started => 0,
- }, $class;
-}
-
-sub close_input_file {
- my $self = shift;
-
- # Only close physical files, not STDIN and other objects
- my $filename = $self->{_filename};
- if ( $filename ne '-' && !ref $filename ) {
- my $ok = eval { $self->{_fh}->close(); 1 };
- if ( !$ok && DEVEL_MODE ) {
- Fault("Could not close file handle(): $EVAL_ERROR\n");
- }
- }
- return;
-}
-
-sub get_line {
- my $self = shift;
- my $line = undef;
- my $fh = $self->{_fh};
- my $rinput_buffer = $self->{_rinput_buffer};
-
- if ( scalar( @{$rinput_buffer} ) ) {
- $line = shift @{$rinput_buffer};
- }
- else {
- $line = $fh->getline();
-
- # patch to read raw mac files under unix, dos
- # see if the first line has embedded \r's
- if ( $line && !$self->{_started} ) {
- if ( $line =~ /[\015][^\015\012]/ ) {
-
- # found one -- break the line up and store in a buffer
- @{$rinput_buffer} = map { $_ . "\n" } split /\015/, $line;
- my $count = @{$rinput_buffer};
- $line = shift @{$rinput_buffer};
- }
- $self->{_started}++;
- }
- }
- return $line;
-}
-1;
package Perl::Tidy::Logger;
use strict;
use warnings;
-our $VERSION = '20230309';
+our $VERSION = '20250105';
+use Carp;
use English qw( -no_match_vars );
-use constant DEVEL_MODE => 0;
use constant EMPTY_STRING => q{};
use constant SPACE => q{ };
return if ( $AUTOLOAD =~ /\bDESTROY$/ );
my ( $pkg, $fname, $lno ) = caller();
my $my_package = __PACKAGE__;
- print STDERR <<EOM;
+ print {*STDERR} <<EOM;
======================================================================
Error detected in package '$my_package', version $VERSION
Received unexpected AUTOLOAD call for sub '$AUTOLOAD'
-Called from package: '$pkg'
+Called from package: '$pkg'
Called from File '$fname' at line '$lno'
This error is probably due to a recent programming change
======================================================================
sub new {
- my ( $class, @args ) = @_;
+ my ( $class, @arglist ) = @_;
+ if ( @arglist % 2 ) { croak "Odd number of items in arg hash list\n" }
my %defaults = (
rOpts => undef,
is_encoded_data => undef,
);
- my %args = ( %defaults, @args );
+ my %args = ( %defaults, @arglist );
my $rOpts = $args{rOpts};
my $log_file = $args{log_file};
my $fh_warnings = $rOpts->{'standard-error-output'} ? $fh_stderr : undef;
# remove any old error output file if we might write a new one
- unless ( $fh_warnings || ref($warning_file) ) {
+ if ( !$fh_warnings && !ref($warning_file) ) {
if ( -e $warning_file ) {
unlink($warning_file)
or Perl::Tidy::Die(
- "couldn't unlink warning file $warning_file: $ERRNO\n");
+ "couldn't unlink warning file $warning_file: $OS_ERROR\n");
}
}
_rOpts => $rOpts,
_fh_warnings => $fh_warnings,
_last_input_line_written => 0,
+ _last_input_line_number => undef,
_at_end_of_file => 0,
_use_prefix => 1,
_block_log_output => 0,
_complaint_count => 0,
_is_encoded_data => $is_encoded_data,
_saw_code_bug => -1, # -1=no 0=maybe 1=for sure
- _saw_brace_error => 0,
+ _saw_brace_error => 0,
_output_array => [],
_input_stream_name => $input_stream_name,
_filename_stamp => $filename_stamp,
return $self->{_input_stream_name};
}
+sub set_last_input_line_number {
+ my ( $self, $lno ) = @_;
+ $self->{_last_input_line_number} = $lno;
+ return;
+}
+
sub get_warning_count {
my $self = shift;
return $self->{_warning_count};
sub we_are_at_the_last_line {
my $self = shift;
- unless ( $self->{_wrote_line_information_string} ) {
+ if ( !$self->{_wrote_line_information_string} ) {
$self->write_logfile_entry("Last line\n\n");
}
$self->{_at_end_of_file} = 1;
sub black_box {
my ( $self, $line_of_tokens, $output_line_number ) = @_;
+
+ # This routine saves information comparing the indentation of input
+ # and output lines when a detailed logfile is requested.
+ # This was very useful during the initial development of perltidy.
+
my $input_line = $line_of_tokens->{_line_text};
my $input_line_number = $line_of_tokens->{_line_number};
- # save line information in case we have to write a logfile message
$self->{_line_of_tokens} = $line_of_tokens;
$self->{_output_line_number} = $output_line_number;
$self->{_wrote_line_information_string} = 0;
$structural_indentation_level = 0
if ( $structural_indentation_level < 0 );
$self->{_last_input_line_written} = $input_line_number;
- ( my $out_str = $input_line ) =~ s/^\s*//;
+ ( my $out_str = $input_line ) =~ s/^\s+//;
chomp $out_str;
$out_str = ( '.' x $structural_indentation_level ) . $out_str;
}
sub brace_warning {
- my ( $self, $msg ) = @_;
+ my ( $self, $msg, $msg_line_number ) = @_;
use constant BRACE_WARNING_LIMIT => 10;
my $saw_brace_error = $self->{_saw_brace_error};
if ( $saw_brace_error < BRACE_WARNING_LIMIT ) {
- $self->warning($msg);
+ $self->warning( $msg, $msg_line_number );
}
$saw_brace_error++;
$self->{_saw_brace_error} = $saw_brace_error;
sub complain {
# handle non-critical warning messages based on input flag
- my ( $self, $msg ) = @_;
+ my ( $self, $msg, $msg_line_number ) = @_;
my $rOpts = $self->{_rOpts};
# these appear in .ERR output only if -w flag is used
if ( $rOpts->{'warning-output'} ) {
- $self->warning($msg);
+ $self->warning( $msg, $msg_line_number );
}
# otherwise, they go to the .LOG file
else {
$self->{_complaint_count}++;
+ if ($msg_line_number) {
+
+ # TODO: consider using same prefix as warning()
+ $msg = $msg_line_number . ':' . $msg;
+ }
$self->write_logfile_entry($msg);
}
return;
sub warning {
- # report errors to .ERR file (or stdout)
- my ( $self, $msg ) = @_;
+ my ( $self, $msg, ($msg_line_number) ) = @_;
+
+ # Report errors to .ERR file (or stdout)
+ # Given:
+ # $msg = a string with the warning message
+ # $msg_line_number = optional line number prefix
use constant WARNING_LIMIT => 50;
Perl::Tidy::Warn_count_bump();
my $rOpts = $self->{_rOpts};
- unless ( $rOpts->{'quiet'} ) {
+ if ( !$rOpts->{'quiet'} ) {
my $warning_count = $self->{_warning_count};
my $fh_warnings = $self->{_fh_warnings};
my $is_encoded_data = $self->{_is_encoded_data};
if ( !$fh_warnings ) {
my $warning_file = $self->{_warning_file};
- ( $fh_warnings, my $filename ) =
+ $fh_warnings =
Perl::Tidy::streamhandle( $warning_file, 'w', $is_encoded_data );
- $fh_warnings
- or Perl::Tidy::Die("couldn't open $filename: $ERRNO\n");
- Perl::Tidy::Warn_msg("## Please see file $filename\n")
+ if ( !$fh_warnings ) {
+ Perl::Tidy::Die("couldn't open warning file '$warning_file'\n");
+ }
+ Perl::Tidy::Warn_msg("## Please see file $warning_file\n")
unless ref($warning_file);
$self->{_fh_warnings} = $fh_warnings;
$fh_warnings->print("Perltidy version is $Perl::Tidy::VERSION\n");
}
}
- if ( $self->get_use_prefix() > 0 ) {
+ if ( $self->get_use_prefix() > 0 && defined($msg_line_number) ) {
$self->write_logfile_entry("WARNING: $msg");
# add prefix 'filename:line_no: ' to message lines
- my $input_line_number =
- Perl::Tidy::Tokenizer::get_input_line_number();
- if ( !defined($input_line_number) ) { $input_line_number = -1 }
- my $pre_string = $filename_stamp . $input_line_number . ': ';
+ my $pre_string = $filename_stamp . $msg_line_number . ': ';
chomp $msg;
$msg =~ s/\n/\n$pre_string/g;
$msg = $pre_string . $msg . "\n";
}
sub get_save_logfile {
-
- # Returns a true/false flag indicating whether or not
- # the logfile will be saved.
my $self = shift;
return $self->{_save_logfile};
-} ## end sub get_save_logfile
+}
sub finish {
# called after all formatting to summarize errors
my ($self) = @_;
- my $warning_count = $self->{_warning_count};
- my $save_logfile = $self->{_save_logfile};
- my $log_file = $self->{_log_file};
+ my $warning_count = $self->{_warning_count};
+ my $save_logfile = $self->{_save_logfile};
+ my $log_file = $self->{_log_file};
+ my $msg_line_number = $self->{_last_input_line_number};
if ($warning_count) {
if ($save_logfile) {
$self->block_log_output(); # avoid echoing this to the logfile
$self->warning(
- "The logfile $log_file may contain useful information\n");
+ "The logfile $log_file may contain useful information\n",
+ $msg_line_number );
$self->unblock_log_output();
}
if ( $self->{_complaint_count} > 0 ) {
$self->warning(
-"To see $self->{_complaint_count} non-critical warnings rerun with -w\n"
+"To see $self->{_complaint_count} non-critical warnings rerun with -w\n",
+ $msg_line_number
);
}
if ( $self->{_saw_brace_error}
&& ( $self->{_logfile_gap} > 1 || !$save_logfile ) )
{
- $self->warning("To save a full .LOG file rerun with -g\n");
+ $self->warning( "To save a full .LOG file rerun with -g\n",
+ $msg_line_number );
}
}
if ($save_logfile) {
my $is_encoded_data = $self->{_is_encoded_data};
- my ( $fh, $filename ) =
- Perl::Tidy::streamhandle( $log_file, 'w', $is_encoded_data );
- if ($fh) {
+ my $fh = Perl::Tidy::streamhandle( $log_file, 'w', $is_encoded_data );
+ if ( !$fh ) {
+ Perl::Tidy::Warn("unable to open log file '$log_file'\n");
+ }
+ else {
my $routput_array = $self->{_output_array};
foreach my $line ( @{$routput_array} ) { $fh->print($line) }
- if ( $log_file ne '-' && !ref $log_file ) {
- my $ok = eval { $fh->close(); 1 };
- if ( !$ok && DEVEL_MODE ) {
- Fault("Could not close file handle(): $EVAL_ERROR\n");
- }
+ if ( $fh->can('close')
+ && !ref($log_file)
+ && $log_file ne '-' )
+ {
+ $fh->close()
+ or Perl::Tidy::Warn(
+ "Error closing LOG file '$log_file': $OS_ERROR\n");
}
}
}
#####################################################################
#
-# The Perl::Tidy::Tokenizer package is essentially a filter which
-# reads lines of perl source code from a source object and provides
-# corresponding tokenized lines through its get_line() method. Lines
-# flow from the source_object to the caller like this:
+# Perl::Tidy::Tokenizer reads a source and breaks it into a stream of tokens
#
-# source_object --> LineBuffer_object --> Tokenizer --> calling routine
-# get_line() get_line() get_line() line_of_tokens
+# Usage Outline:
#
-# The source object can be any object with a get_line() method which
-# supplies one line (a character string) perl call.
-# The LineBuffer object is created by the Tokenizer.
-# The Tokenizer returns a reference to a data structure 'line_of_tokens'
-# containing one tokenized line for each call to its get_line() method.
+# STEP 1: initialize or re-initialize Tokenizer with user options
+# Perl::Tidy::Tokenizer::check_options($rOpts);
#
-# WARNING: This is not a real class. Only one tokenizer my be used.
+# STEP 2: create a tokenizer for a specific input source object
+# my $tokenizer = Perl::Tidy::Tokenizer->new(
+# source_object => $source,
+# ...
+# );
+#
+# STEP 3: get and process each tokenized 'line' (a hash ref of token info)
+# while ( my $line = $tokenizer->get_line() ) {
+# $formatter->write_line($line);
+# }
+#
+# STEP 4: report errors
+# my $severe_error = $tokenizer->report_tokenization_errors();
+#
+# The source object can be a STRING ref, an ARRAY ref, or an object with a
+# get_line() method which supplies one line (a character string) perl call.
+#
+# NOTE: This is not a real class. Only one tokenizer my be used.
#
########################################################################
use warnings;
use English qw( -no_match_vars );
-our $VERSION = '20230309';
+our $VERSION = '20250105';
-use Perl::Tidy::LineBuffer;
use Carp;
use constant DEVEL_MODE => 0;
use constant EMPTY_STRING => q{};
use constant SPACE => q{ };
+use constant BACKSLASH => q{\\};
+
+{ #<<< A non-indenting brace to contain all lexical variables
+
+# Parent sequence number of tree of containers; must be 1
+use constant SEQ_ROOT => 1;
+
+# Defaults for guessing old indentation
+use constant INDENT_COLUMNS_DEFAULT => 4;
+use constant TAB_SIZE_DEFAULT => 8;
# Decimal values of some ascii characters for quick checks
use constant ORD_TAB => 9;
use constant ORD_PRINTABLE_MIN => 33;
use constant ORD_PRINTABLE_MAX => 126;
-# PACKAGE VARIABLES for processing an entire FILE.
-# These must be package variables because most may get localized during
-# processing. Most are initialized in sub prepare_for_a_new_file.
-use vars qw{
- $tokenizer_self
-
- $last_nonblank_token
- $last_nonblank_type
- $last_nonblank_block_type
- $statement_type
- $in_attribute_list
- $current_package
- $context
-
- %is_constant
- %is_user_function
- %user_function_prototype
- %is_block_function
- %is_block_list_function
- %saw_function_definition
- %saw_use_module
-
- $brace_depth
- $paren_depth
- $square_bracket_depth
-
- @current_depth
- @total_depth
- $total_depth
- $next_sequence_number
- @nesting_sequence_number
- @current_sequence_number
- @paren_type
- @paren_semicolon_count
- @paren_structural_type
- @brace_type
- @brace_structural_type
- @brace_context
- @brace_package
- @square_bracket_type
- @square_bracket_structural_type
- @depth_array
- @nested_ternary_flag
- @nested_statement_type
- @starting_line_of_current_depth
-};
-
-# GLOBAL CONSTANTS for routines in this package,
-# Initialized in a BEGIN block.
-use vars qw{
- %is_indirect_object_taker
- %is_block_operator
- %expecting_operator_token
- %expecting_operator_types
- %expecting_term_types
- %expecting_term_token
- %is_digraph
- %can_start_digraph
- %is_file_test_operator
- %is_trigraph
- %is_tetragraph
- %is_valid_token_type
- %is_keyword
- %is_my_our_state
- %is_code_block_token
- %is_sort_map_grep_eval_do
- %is_sort_map_grep
- %is_grep_alias
- %really_want_term
- @opening_brace_names
- @closing_brace_names
- %is_keyword_taking_list
- %is_keyword_taking_optional_arg
- %is_keyword_rejecting_slash_as_pattern_delimiter
- %is_keyword_rejecting_question_as_pattern_delimiter
- %is_q_qq_qx_qr_s_y_tr_m
- %is_q_qq_qw_qx_qr_s_y_tr_m
- %is_sub
- %is_package
- %is_comma_question_colon
- %is_if_elsif_unless
- %is_if_elsif_unless_case_when
- %other_line_endings
- %is_END_DATA_format_sub
- %is_semicolon_or_t
- $code_skipping_pattern_begin
- $code_skipping_pattern_end
-};
-
-# GLOBAL VARIABLES which are constant after being configured by user-supplied
-# parameters. They remain constant as a file is being processed.
+# A limit on message length when problems are detected
+use constant LONG_MESSAGE => 256;
+
+# GLOBAL VARIABLES which change during tokenization:
+# These could also be stored in $self but it is more convenient and
+# efficient to make them global lexical variables.
+# INITIALIZER: sub prepare_for_a_new_file
my (
- $rOpts_code_skipping,
+ $brace_depth,
+ $context,
+ $current_package,
+ $last_nonblank_block_type,
+ $last_nonblank_token,
+ $last_nonblank_type,
+ $next_sequence_number,
+ $paren_depth,
+ $rbrace_context,
+ $rbrace_package,
+ $rbrace_structural_type,
+ $rbrace_type,
+ $rcurrent_depth,
+ $rcurrent_sequence_number,
+ $rdepth_array,
+ $ris_block_function,
+ $ris_block_list_function,
+ $ris_constant,
+ $ris_user_function,
+ $rnested_statement_type,
+ $rnested_ternary_flag,
+ $rparen_semicolon_count,
+ $rparen_vars,
+ $rparen_type,
+ $rsaw_function_definition,
+ $rsaw_use_module,
+ $rsquare_bracket_structural_type,
+ $rsquare_bracket_type,
+ $rstarting_line_of_current_depth,
+ $rtotal_depth,
+ $ruser_function_prototype,
+ $square_bracket_depth,
+ $statement_type,
+ $total_depth,
+);
+
+my (
+
+ # GLOBAL CONSTANTS for routines in this package,
+ # INITIALIZER: BEGIN block.
+ %can_start_digraph,
+ %expecting_operator_token,
+ %expecting_operator_types,
+ %expecting_term_token,
+ %expecting_term_types,
+ %is_block_operator,
+ %is_digraph,
+ %is_file_test_operator,
+ %is_if_elsif_unless,
+ %is_if_elsif_unless_case_when,
+ %is_indirect_object_taker,
+ %is_keyword_rejecting_question_as_pattern_delimiter,
+ %is_keyword_rejecting_slash_as_pattern_delimiter,
+ %is_keyword_taking_list,
+ %is_keyword_taking_optional_arg,
+ %is_q_qq_qw_qx_qr_s_y_tr_m,
+ %is_q_qq_qx_qr_s_y_tr_m,
+ %quote_modifiers,
+ %is_semicolon_or_t,
+ %is_sort_map_grep,
+ %is_sort_map_grep_eval_do,
+ %is_tetragraph,
+ %is_trigraph,
+ %is_valid_token_type,
+ %other_line_endings,
+ %really_want_term,
+ @closing_brace_names,
+ @opening_brace_names,
+
+ # GLOBAL CONSTANT hash lookup table of operator expected values
+ # INITIALIZER: BEGIN block
+ %op_expected_table,
+
+ # GLOBAL VARIABLES which are constant after being configured.
+ # INITIALIZER: BEGIN block and modified by sub check_options
+ %is_code_block_token,
+ %is_keyword,
+ %is_my_our_state,
+ %is_package,
+ %matching_end_token,
+
+ # INITIALIZER: sub check_options
$code_skipping_pattern_begin,
$code_skipping_pattern_end,
+ $format_skipping_pattern_begin,
+ $format_skipping_pattern_end,
+
+ $rOpts_code_skipping,
+ $rOpts_code_skipping_begin,
+ $rOpts_format_skipping,
+ $rOpts_format_skipping_begin,
+ $rOpts_format_skipping_end,
+ $rOpts_starting_indentation_level,
+ $rOpts_indent_columns,
+ $rOpts_look_for_hash_bang,
+ $rOpts_look_for_autoloader,
+ $rOpts_look_for_selfloader,
+ $rOpts_trim_qw,
+ $rOpts_extended_syntax,
+ $rOpts_continuation_indentation,
+ $rOpts_outdent_labels,
+ $rOpts_maximum_level_errors,
+ $rOpts_maximum_unexpected_errors,
+ $rOpts_indent_closing_brace,
+ $rOpts_non_indenting_braces,
+ $rOpts_non_indenting_brace_prefix,
+ $rOpts_whitespace_cycle,
+
+ $tabsize,
+ %is_END_DATA_format_sub,
+ %is_grep_alias,
+ %is_sub,
+ $guess_if_method,
);
# possible values of operator_expected()
_in_end_ => $i++,
_in_format_ => $i++,
_in_error_ => $i++,
+ _in_trouble_ => $i++,
+ _warning_count_ => $i++,
+ _html_tag_count_ => $i++,
_in_pod_ => $i++,
- _in_skipped_ => $i++,
+ _in_code_skipping_ => $i++,
+ _in_format_skipping_ => $i++,
_in_attribute_list_ => $i++,
_in_quote_ => $i++,
_quote_target_ => $i++,
_line_start_quote_ => $i++,
_starting_level_ => $i++,
_know_starting_level_ => $i++,
- _tabsize_ => $i++,
- _indent_columns_ => $i++,
- _look_for_hash_bang_ => $i++,
- _trim_qw_ => $i++,
- _continuation_indentation_ => $i++,
- _outdent_labels_ => $i++,
_last_line_number_ => $i++,
_saw_perl_dash_P_ => $i++,
_saw_perl_dash_w_ => $i++,
_saw_use_strict_ => $i++,
- _saw_v_string_ => $i++,
+ _saw_brace_error_ => $i++,
_hit_bug_ => $i++,
_look_for_autoloader_ => $i++,
_look_for_selfloader_ => $i++,
_saw_data_ => $i++,
_saw_negative_indentation_ => $i++,
_started_tokenizing_ => $i++,
- _line_buffer_object_ => $i++,
_debugger_object_ => $i++,
_diagnostics_object_ => $i++,
_logger_object_ => $i++,
+ _save_logfile_ => $i++,
_unexpected_error_count_ => $i++,
_started_looking_for_here_target_at_ => $i++,
_nearly_matched_here_target_at_ => $i++,
_line_of_text_ => $i++,
_rlower_case_labels_at_ => $i++,
- _extended_syntax_ => $i++,
_maximum_level_ => $i++,
_true_brace_error_count_ => $i++,
- _rOpts_maximum_level_errors_ => $i++,
- _rOpts_maximum_unexpected_errors_ => $i++,
- _rOpts_logfile_ => $i++,
_rOpts_ => $i++,
+ _rinput_lines_ => $i++,
+ _input_line_index_next_ => $i++,
+ _rtrimmed_input_lines_ => $i++,
+ _rclosing_brace_indentation_hash_ => $i++,
+ _show_indentation_table_ => $i++,
+ _rnon_indenting_brace_stack_ => $i++,
+ _rbareword_info_ => $i++,
};
} ## end BEGIN
sub DESTROY {
my $self = shift;
- $self->_decrement_count();
+ _decrement_count();
return;
}
return if ( $AUTOLOAD =~ /\bDESTROY$/ );
my ( $pkg, $fname, $lno ) = caller();
my $my_package = __PACKAGE__;
- print STDERR <<EOM;
+ print {*STDERR} <<EOM;
======================================================================
Error detected in package '$my_package', version $VERSION
Received unexpected AUTOLOAD call for sub '$AUTOLOAD'
-Called from package: '$pkg'
+Called from package: '$pkg'
Called from File '$fname' at line '$lno'
This error is probably due to a recent programming change
======================================================================
}
sub Fault {
- my ($msg) = @_;
+ my ( $self, $msg ) = @_;
# This routine is called for errors that really should not occur
# except if there has been a bug introduced by a recent program change.
# Please add comments at calls to Fault to explain why the call
# should not occur, and where to look to fix it.
- my ( $package0, $filename0, $line0, $subroutine0 ) = caller(0);
- my ( $package1, $filename1, $line1, $subroutine1 ) = caller(1);
- my ( $package2, $filename2, $line2, $subroutine2 ) = caller(2);
+ my ( $package0_uu, $filename0_uu, $line0, $subroutine0_uu ) = caller(0);
+ my ( $package1_uu, $filename1, $line1, $subroutine1 ) = caller(1);
+ my ( $package2_uu, $filename2_uu, $line2_uu, $subroutine2 ) = caller(2);
my $pkg = __PACKAGE__;
- my $input_stream_name = get_input_stream_name();
+ # Catch potential error of Fault not called as a method
+ my $input_stream_name;
+ if ( !ref($self) ) {
+ $msg = "Fault not called as a method - please fix\n";
+ if ( $self && length($self) < LONG_MESSAGE ) { $msg .= $self }
+ $self = undef;
+ $input_stream_name = "(UNKNOWN)";
+ }
+ else {
+ $input_stream_name = $self->get_input_stream_name();
+ }
Die(<<EOM);
==============================================================================
==============================================================================
EOM
- # We shouldn't get here, but this return is to keep Perl-Critic from
- # complaining.
- return;
+ croak "unexpected return from sub Die";
} ## end sub Fault
sub bad_pattern {
-
- # See if a pattern will compile. We have to use a string eval here,
- # but it should be safe because the pattern has been constructed
- # by this program.
my ($pattern) = @_;
- my $ok = eval "'##'=~/$pattern/";
- return !defined($ok) || $EVAL_ERROR;
+
+ # Return true if a regex pattern has an error
+ # Note: Formatter.pm also has a copy of this
+ my $regex_uu = eval { qr/$pattern/ };
+ return $EVAL_ERROR;
} ## end sub bad_pattern
-sub make_code_skipping_pattern {
+sub make_skipping_pattern {
my ( $rOpts, $opt_name, $default ) = @_;
+
+ # Make regex patterns for the format-skipping and code-skipping options
my $param = $rOpts->{$opt_name};
- unless ($param) { $param = $default }
- $param =~ s/^\s*//; # allow leading spaces to be like format-skipping
+ if ( !$param ) { $param = $default }
+ $param =~ s/^\s+//;
if ( $param !~ /^#/ ) {
Die("ERROR: the $opt_name parameter '$param' must begin with '#'\n");
}
- my $pattern = '^\s*' . $param . '\b';
+
+ # Note that the ending \s will match a newline
+ my $pattern = '^\s*' . $param . '\s';
if ( bad_pattern($pattern) ) {
Die(
"ERROR: the $opt_name parameter '$param' causes the invalid regex '$pattern'\n"
);
}
return $pattern;
-} ## end sub make_code_skipping_pattern
+} ## end sub make_skipping_pattern
sub check_options {
- # Check Tokenizer parameters
+ # Check and pre-process tokenizer parameters
my $rOpts = shift;
%is_sub = ();
}
}
+ # Set global flag to say if we have to guess if bareword 'method' is
+ # a sub when 'method' is in %is_sub. This will be true unless:
+ # (1) the user entered 'method' as sub alias, or
+ # (2) the user set --use-feature=class
+ # In these two cases we can assume that 'method' is a sub alias.
+ $guess_if_method = 1;
+ if ( $is_sub{'method'} ) { $guess_if_method = 0 }
+
#------------------------------------------------
# Update hash values for any -use-feature options
#------------------------------------------------
- my $use_feature_class = $rOpts->{'use-feature'} =~ /\bclass\b/;
+
+ my $use_feature_class = 1;
+ if ( $rOpts->{'use-feature'} ) {
+ if ( $rOpts->{'use-feature'} =~ /\bnoclass\b/ ) {
+ $use_feature_class = 0;
+ }
+ elsif ( $rOpts->{'use-feature'} =~ /\bclass\b/ ) {
+ $guess_if_method = 0;
+ }
+ else {
+ ## neither 'class' nor 'noclass' seen so use default
+ }
+ }
# These are the main updates for this option. There are additional
# changes elsewhere, usually indicated with a comment 'rt145706'
# 'method' - treated like sub using the sub-alias-list option
# Note: we must not set 'method' to be a keyword to avoid problems
# with older uses.
+ if ($use_feature_class) {
+ $is_sub{'method'} = 1;
+ $is_END_DATA_format_sub{'method'} = 1;
+ }
# 'field' - added as a keyword, and works like 'my'
$is_keyword{'field'} = $use_feature_class;
# Note that 'grep-alias-list' has been preprocessed to be a trimmed,
# space-separated list
my @q = split /\s+/, $rOpts->{'grep-alias-list'};
- @{is_grep_alias}{@q} = (1) x scalar(@q);
- }
+ @is_grep_alias{@q} = (1) x scalar(@q);
+ }
+
+ $rOpts_starting_indentation_level = $rOpts->{'starting-indentation-level'};
+ $rOpts_indent_columns = $rOpts->{'indent-columns'};
+ $rOpts_look_for_hash_bang = $rOpts->{'look-for-hash-bang'};
+ $rOpts_look_for_autoloader = $rOpts->{'look-for-autoloader'};
+ $rOpts_look_for_selfloader = $rOpts->{'look-for-selfloader'};
+ $rOpts_trim_qw = $rOpts->{'trim-qw'};
+ $rOpts_extended_syntax = $rOpts->{'extended-syntax'};
+ $rOpts_continuation_indentation = $rOpts->{'continuation-indentation'};
+ $rOpts_outdent_labels = $rOpts->{'outdent-labels'};
+ $rOpts_maximum_level_errors = $rOpts->{'maximum-level-errors'};
+ $rOpts_maximum_unexpected_errors = $rOpts->{'maximum-unexpected-errors'};
+ $rOpts_code_skipping = $rOpts->{'code-skipping'};
+ $rOpts_code_skipping_begin = $rOpts->{'code-skipping-begin'};
+ $rOpts_format_skipping = $rOpts->{'format-skipping'};
+ $rOpts_format_skipping_begin = $rOpts->{'format-skipping-begin'};
+ $rOpts_format_skipping_end = $rOpts->{'format-skipping-end'};
+ $rOpts_indent_closing_brace = $rOpts->{'indent-closing-brace'};
+ $rOpts_non_indenting_braces = $rOpts->{'non-indenting-braces'};
+ $rOpts_non_indenting_brace_prefix = $rOpts->{'non-indenting-brace-prefix'};
+ $rOpts_whitespace_cycle = $rOpts->{'whitespace-cycle'};
+
+ # In the Tokenizer, --indent-columns is just used for guessing old
+ # indentation, and must be positive. If -i=0 is used for this run (which
+ # is possible) we'll just guess that the old run used 4 spaces per level.
+ if ( !$rOpts_indent_columns ) {
+ $rOpts_indent_columns = INDENT_COLUMNS_DEFAULT;
+ }
+
+ # Define $tabsize, the number of spaces per tab for use in
+ # guessing the indentation of source lines with leading tabs.
+ # Assume same as for this run if tabs are used, otherwise assume
+ # a default value, typically 8
+ $tabsize =
+ $rOpts->{'entab-leading-whitespace'}
+ ? $rOpts->{'entab-leading-whitespace'}
+ : $rOpts->{'tabs'} ? $rOpts->{'indent-columns'}
+ : $rOpts->{'default-tabsize'};
+ if ( !$tabsize ) { $tabsize = TAB_SIZE_DEFAULT }
- $rOpts_code_skipping = $rOpts->{'code-skipping'};
$code_skipping_pattern_begin =
- make_code_skipping_pattern( $rOpts, 'code-skipping-begin', '#<<V' );
+ make_skipping_pattern( $rOpts, 'code-skipping-begin', '#<<V' );
$code_skipping_pattern_end =
- make_code_skipping_pattern( $rOpts, 'code-skipping-end', '#>>V' );
+ make_skipping_pattern( $rOpts, 'code-skipping-end', '#>>V' );
+
+ $format_skipping_pattern_begin =
+ make_skipping_pattern( $rOpts, 'format-skipping-begin', '#<<<' );
+ $format_skipping_pattern_end =
+ make_skipping_pattern( $rOpts, 'format-skipping-end', '#>>>' );
return;
} ## end sub check_options
sub new {
- my ( $class, @args ) = @_;
+ my ( $class, @arglist ) = @_;
+ if ( @arglist % 2 ) { croak "Odd number of items in arg hash list\n" }
- # Note: 'tabs' and 'indent_columns' are temporary and should be
- # removed asap
my %defaults = (
source_object => undef,
debugger_object => undef,
diagnostics_object => undef,
logger_object => undef,
starting_level => undef,
- indent_columns => 4,
- tabsize => 8,
- look_for_hash_bang => 0,
- trim_qw => 1,
- look_for_autoloader => 1,
- look_for_selfloader => 1,
starting_line_number => 1,
- extended_syntax => 0,
rOpts => {},
);
- my %args = ( %defaults, @args );
+ my %args = ( %defaults, @arglist );
# we are given an object with a get_line() method to supply source lines
my $source_object = $args{source_object};
my $rOpts = $args{rOpts};
- # we create another object with a get_line() and peek_ahead() method
- my $line_buffer_object = Perl::Tidy::LineBuffer->new($source_object);
+ # Check call args
+ if ( !defined($source_object) ) {
+ Die(
+"Perl::Tidy::Tokenizer::new called without a 'source_object' parameter\n"
+ );
+ }
+ if ( !ref($source_object) ) {
+ Die(<<EOM);
+sub Perl::Tidy::Tokenizer::new received a 'source_object' parameter which is not a reference;
+'source_object' must be a reference to a STRING, ARRAY, or object with a 'getline' method
+EOM
+ }
+
+ my $logger_object = $args{logger_object};
# Tokenizer state data is as follows:
# _rhere_target_list_ reference to list of here-doc targets
# _line_start_quote_ line where we started looking for a long quote
# _in_here_doc_ flag indicating if we are in a here-doc
# _in_pod_ flag set if we are in pod documentation
- # _in_skipped_ flag set if we are in a skipped section
+ # _in_code_skipping_ flag set if we are in a code skipping section
+ # _in_format_skipping_ flag set if we are in a format skipping section
# _in_error_ flag set if we saw severe error (binary in script)
+ # _in_trouble_ set if we saw a troublesome lexical like 'my sub s'
+ # _warning_count_ number of calls to logger sub warning
+ # _html_tag_count_ number of apparent html tags seen (indicates html)
# _in_data_ flag set if we are in __DATA__ section
# _in_end_ flag set if we are in __END__ section
# _in_format_ flag set if we are in a format description
# _in_attribute_list_ flag telling if we are looking for attributes
# _in_quote_ flag telling if we are chasing a quote
# _starting_level_ indentation level of first line
- # _line_buffer_object_ object with get_line() method to supply source code
# _diagnostics_object_ place to write debugging information
# _unexpected_error_count_ error count used to limit output
# _lower_case_labels_at_ line numbers where lower case labels seen
$self->[_in_end_] = 0;
$self->[_in_format_] = 0;
$self->[_in_error_] = 0;
+ $self->[_in_trouble_] = 0;
+ $self->[_warning_count_] = 0;
+ $self->[_html_tag_count_] = 0;
$self->[_in_pod_] = 0;
- $self->[_in_skipped_] = 0;
+ $self->[_in_code_skipping_] = 0;
+ $self->[_in_format_skipping_] = 0;
$self->[_in_attribute_list_] = 0;
$self->[_in_quote_] = 0;
$self->[_quote_target_] = EMPTY_STRING;
$self->[_line_start_quote_] = -1;
$self->[_starting_level_] = $args{starting_level};
$self->[_know_starting_level_] = defined( $args{starting_level} );
- $self->[_tabsize_] = $args{tabsize};
- $self->[_indent_columns_] = $args{indent_columns};
- $self->[_look_for_hash_bang_] = $args{look_for_hash_bang};
- $self->[_trim_qw_] = $args{trim_qw};
- $self->[_continuation_indentation_] = $args{continuation_indentation};
- $self->[_outdent_labels_] = $args{outdent_labels};
$self->[_last_line_number_] = $args{starting_line_number} - 1;
$self->[_saw_perl_dash_P_] = 0;
$self->[_saw_perl_dash_w_] = 0;
$self->[_saw_use_strict_] = 0;
- $self->[_saw_v_string_] = 0;
+ $self->[_saw_brace_error_] = 0;
$self->[_hit_bug_] = 0;
- $self->[_look_for_autoloader_] = $args{look_for_autoloader};
- $self->[_look_for_selfloader_] = $args{look_for_selfloader};
+ $self->[_look_for_autoloader_] = $rOpts_look_for_autoloader;
+ $self->[_look_for_selfloader_] = $rOpts_look_for_selfloader;
$self->[_saw_autoloader_] = 0;
$self->[_saw_selfloader_] = 0;
$self->[_saw_hash_bang_] = 0;
$self->[_saw_data_] = 0;
$self->[_saw_negative_indentation_] = 0;
$self->[_started_tokenizing_] = 0;
- $self->[_line_buffer_object_] = $line_buffer_object;
$self->[_debugger_object_] = $args{debugger_object};
$self->[_diagnostics_object_] = $args{diagnostics_object};
- $self->[_logger_object_] = $args{logger_object};
+ $self->[_logger_object_] = $logger_object;
$self->[_unexpected_error_count_] = 0;
$self->[_started_looking_for_here_target_at_] = 0;
$self->[_nearly_matched_here_target_at_] = undef;
$self->[_line_of_text_] = EMPTY_STRING;
$self->[_rlower_case_labels_at_] = undef;
- $self->[_extended_syntax_] = $args{extended_syntax};
$self->[_maximum_level_] = 0;
$self->[_true_brace_error_count_] = 0;
- $self->[_rOpts_maximum_level_errors_] = $rOpts->{'maximum-level-errors'};
- $self->[_rOpts_maximum_unexpected_errors_] =
- $rOpts->{'maximum-unexpected-errors'};
- $self->[_rOpts_logfile_] = $rOpts->{'logfile'};
- $self->[_rOpts_] = $rOpts;
+ $self->[_rnon_indenting_brace_stack_] = [];
+ $self->[_show_indentation_table_] = 0;
+ $self->[_rbareword_info_] = {};
+
+ $self->[_rclosing_brace_indentation_hash_] = {
+ valid => undef,
+ rhistory_line_number => [0],
+ rhistory_level_diff => [0],
+ rhistory_anchor_point => [1],
+ };
- # These vars are used for guessing indentation and must be positive
- $self->[_tabsize_] = 8 if ( !$self->[_tabsize_] );
- $self->[_indent_columns_] = 4 if ( !$self->[_indent_columns_] );
+ $self->[_rOpts_] = $rOpts;
+ $self->[_save_logfile_] =
+ defined($logger_object) && $logger_object->get_save_logfile();
bless $self, $class;
- $tokenizer_self = $self;
-
- prepare_for_a_new_file();
+ $self->prepare_for_a_new_file($source_object);
$self->find_starting_indentation_level();
# This is not a full class yet, so die if an attempt is made to
} ## end sub new
+# Called externally
+sub get_unexpected_error_count {
+ my ($self) = @_;
+ return $self->[_unexpected_error_count_];
+}
+
+# Called externally
+sub is_keyword {
+ my ($str) = @_;
+ return $is_keyword{$str};
+}
+
+#----------------------------------------------------------------
+# Line input routines, previously handled by the LineBuffer class
+#----------------------------------------------------------------
+sub make_source_array {
+
+ my ( $self, $line_source_object ) = @_;
+
+ # Convert the source into an array of lines
+ # Given:
+ # $line_source_object = the input source stream
+ # Task:
+ # Convert the source to an array ref and store in $self
+
+ my $rinput_lines = [];
+
+ my $rsource = ref($line_source_object);
+ my $source_string;
+
+ if ( !$rsource ) {
+
+ # shouldn't happen: this should have been checked in sub new
+ $self->Fault(<<EOM);
+sub Perl::Tidy::Tokenizer::new received a 'source_object' parameter which is not a reference;
+'source_object' must be a reference to a STRING, ARRAY, or object with a 'getline' method
+EOM
+ }
+
+ # handle an ARRAY ref
+ elsif ( $rsource eq 'ARRAY' ) {
+ $rinput_lines = $line_source_object;
+ $source_string = join( EMPTY_STRING, @{$line_source_object} );
+ }
+
+ # handle a SCALAR ref
+ elsif ( $rsource eq 'SCALAR' ) {
+ $source_string = ${$line_source_object};
+ my @lines = split /^/, $source_string;
+ $rinput_lines = \@lines;
+ }
+
+ # handle an object - must have a get_line method
+ else {
+
+ # This will die if user's object does have a 'get_line' method
+ my $line;
+ while ( defined( $line = $line_source_object->get_line() ) ) {
+ push( @{$rinput_lines}, $line );
+ }
+ $source_string = join( EMPTY_STRING, @{$rinput_lines} );
+ }
+
+ # Get trimmed lines. It is much faster to strip leading whitespace from
+ # the whole input file at once than line-by-line.
+
+ # Add a terminal newline if needed to keep line count unchanged:
+ # - avoids problem of losing a last line which is just \r and no \n (c283)
+ # - but check input line count to avoid adding line to an empty file (c286)
+ if ( @{$rinput_lines} && $source_string !~ /\n$/ ) {
+ $source_string .= "\n";
+ }
+
+ # Remove leading whitespace except newlines
+ $source_string =~ s/^ [^\S\n]+ //gxm;
+
+ # Then break the string into lines
+ my @trimmed_lines = split /^/, $source_string;
+
+ # Safety check - a change in number of lines would be a disaster
+ if ( @trimmed_lines != @{$rinput_lines} ) {
+
+ # Shouldn't happen - die in DEVEL_MODE and fix
+ my $ntr = @trimmed_lines;
+ my $utr = @{$rinput_lines};
+ DEVEL_MODE
+ && $self->Fault(
+ "trimmed / untrimmed line counts differ: $ntr / $utr\n");
+
+ # Otherwise we can safely continue with undefined trimmed lines. They
+ # will be detected and fixed later.
+ @trimmed_lines = ();
+ }
+
+ $self->[_rinput_lines_] = $rinput_lines;
+ $self->[_rtrimmed_input_lines_] = \@trimmed_lines;
+ $self->[_input_line_index_next_] = 0;
+ return;
+} ## end sub make_source_array
+
+sub peek_ahead {
+ my ( $self, $buffer_index ) = @_;
+
+ # look $buffer_index lines ahead of the current location in the input
+ # stream without disturbing the input
+ my $line;
+ my $rinput_lines = $self->[_rinput_lines_];
+ my $line_index = $buffer_index + $self->[_input_line_index_next_];
+ if ( $line_index < @{$rinput_lines} ) {
+ $line = $rinput_lines->[$line_index];
+ }
+ return $line;
+} ## end sub peek_ahead
+
+#-----------------------------------------
# interface to Perl::Tidy::Logger routines
+#-----------------------------------------
sub warning {
- my $msg = shift;
- my $logger_object = $tokenizer_self->[_logger_object_];
+
+ my ( $self, $msg ) = @_;
+
+ my $logger_object = $self->[_logger_object_];
+ $self->[_warning_count_]++;
if ($logger_object) {
- $logger_object->warning($msg);
+ my $msg_line_number = $self->[_last_line_number_];
+ $logger_object->warning( $msg, $msg_line_number );
}
return;
} ## end sub warning
sub get_input_stream_name {
+
+ my $self = shift;
+
my $input_stream_name = EMPTY_STRING;
- my $logger_object = $tokenizer_self->[_logger_object_];
+ my $logger_object = $self->[_logger_object_];
if ($logger_object) {
$input_stream_name = $logger_object->get_input_stream_name();
}
} ## end sub get_input_stream_name
sub complain {
- my $msg = shift;
- my $logger_object = $tokenizer_self->[_logger_object_];
+
+ my ( $self, $msg ) = @_;
+
+ my $logger_object = $self->[_logger_object_];
if ($logger_object) {
- my $input_line_number = $tokenizer_self->[_last_line_number_] + 1;
- $msg = "Line $input_line_number: $msg";
- $logger_object->complain($msg);
+ my $input_line_number = $self->[_last_line_number_];
+ $logger_object->complain( $msg, $input_line_number );
}
return;
} ## end sub complain
sub write_logfile_entry {
- my $msg = shift;
- my $logger_object = $tokenizer_self->[_logger_object_];
+
+ my ( $self, $msg ) = @_;
+
+ my $logger_object = $self->[_logger_object_];
if ($logger_object) {
$logger_object->write_logfile_entry($msg);
}
} ## end sub write_logfile_entry
sub interrupt_logfile {
- my $logger_object = $tokenizer_self->[_logger_object_];
+
+ my $self = shift;
+
+ my $logger_object = $self->[_logger_object_];
if ($logger_object) {
$logger_object->interrupt_logfile();
}
} ## end sub interrupt_logfile
sub resume_logfile {
- my $logger_object = $tokenizer_self->[_logger_object_];
+
+ my $self = shift;
+
+ my $logger_object = $self->[_logger_object_];
if ($logger_object) {
$logger_object->resume_logfile();
}
return;
} ## end sub resume_logfile
+sub brace_warning {
+ my ( $self, $msg ) = @_;
+ $self->[_saw_brace_error_]++;
+
+ my $logger_object = $self->[_logger_object_];
+ if ($logger_object) {
+ my $msg_line_number = $self->[_last_line_number_];
+ $logger_object->brace_warning( $msg, $msg_line_number );
+ }
+ return;
+} ## end sub brace_warning
+
sub increment_brace_error {
- my $logger_object = $tokenizer_self->[_logger_object_];
+
+ # This is same as sub brace_warning but without a message
+ my $self = shift;
+ $self->[_saw_brace_error_]++;
+
+ my $logger_object = $self->[_logger_object_];
if ($logger_object) {
$logger_object->increment_brace_error();
}
return;
} ## end sub increment_brace_error
+sub get_saw_brace_error {
+ my $self = shift;
+ return $self->[_saw_brace_error_];
+}
+
sub report_definite_bug {
- $tokenizer_self->[_hit_bug_] = 1;
- my $logger_object = $tokenizer_self->[_logger_object_];
+ my $self = shift;
+ $self->[_hit_bug_] = 1;
+ my $logger_object = $self->[_logger_object_];
if ($logger_object) {
$logger_object->report_definite_bug();
}
return;
} ## end sub report_definite_bug
-sub brace_warning {
- my $msg = shift;
- my $logger_object = $tokenizer_self->[_logger_object_];
- if ($logger_object) {
- $logger_object->brace_warning($msg);
- }
- return;
-} ## end sub brace_warning
-
-sub get_saw_brace_error {
- my $logger_object = $tokenizer_self->[_logger_object_];
- if ($logger_object) {
- return $logger_object->get_saw_brace_error();
- }
- else {
- return 0;
- }
-} ## end sub get_saw_brace_error
-
-sub get_unexpected_error_count {
- my ($self) = @_;
- return $self->[_unexpected_error_count_];
-}
-
-# interface to Perl::Tidy::Diagnostics routines
+#-------------------------------------
+# Interface to Perl::Tidy::Diagnostics
+#-------------------------------------
sub write_diagnostics {
- my $msg = shift;
- if ( $tokenizer_self->[_diagnostics_object_] ) {
- $tokenizer_self->[_diagnostics_object_]->write_diagnostics($msg);
+ my ( $self, $msg ) = @_;
+ my $input_line_number = $self->[_last_line_number_];
+ my $diagnostics_object = $self->[_diagnostics_object_];
+ if ($diagnostics_object) {
+ $diagnostics_object->write_diagnostics( $msg, $input_line_number );
}
return;
} ## end sub write_diagnostics
-sub get_maximum_level {
- return $tokenizer_self->[_maximum_level_];
-}
-
sub report_tokenization_errors {
my ($self) = @_;
# set severe error flag if tokenizer has encountered file reading problems
# (i.e. unexpected binary characters)
- my $severe_error = $self->[_in_error_];
+ # or code which may not be formatted correctly (such as 'my sub q')
+ # The difference between _in_error_ and _in_trouble_ is that
+ # _in_error_ stops the tokenizer immediately whereas
+ # _in_trouble_ lets the tokenizer finish so that all errors are seen
+ # Both block formatting and cause the input stream to be output verbatim.
+ my $severe_error = $self->[_in_error_] || $self->[_in_trouble_];
+
+ # And do not format if it looks like an html file (c209)
+ $severe_error ||= $self->[_html_tag_count_] && $self->[_warning_count_];
+
+ # Inform the logger object on length of input stream
+ my $logger_object = $self->[_logger_object_];
+ if ($logger_object) {
+ my $last_line_number = $self->[_last_line_number_];
+ $logger_object->set_last_input_line_number($last_line_number);
+ }
- my $maxle = $self->[_rOpts_maximum_level_errors_];
- my $maxue = $self->[_rOpts_maximum_unexpected_errors_];
+ my $maxle = $rOpts_maximum_level_errors;
+ my $maxue = $rOpts_maximum_unexpected_errors;
$maxle = 1 unless defined($maxle);
$maxue = 0 unless defined($maxue);
my $level = get_indentation_level();
- if ( $level != $tokenizer_self->[_starting_level_] ) {
- warning("final indentation level: $level\n");
- my $level_diff = $tokenizer_self->[_starting_level_] - $level;
+ if ( $level != $self->[_starting_level_] ) {
+ $self->warning("final indentation level: $level\n");
+
+ $self->[_show_indentation_table_] = 1;
+
+ my $level_diff = $self->[_starting_level_] - $level;
if ( $level_diff < 0 ) { $level_diff = -$level_diff }
# Set severe error flag if the level error is greater than 1.
# best not to attempt formatting for a high level error.
if ( $maxle >= 0 && $level_diff > $maxle ) {
$severe_error = 1;
- warning(<<EOM);
+ $self->warning(<<EOM);
Formatting will be skipped because level error '$level_diff' exceeds -maxle=$maxle; use -maxle=-1 to force formatting
EOM
}
}
- check_final_nesting_depths();
+ $self->check_final_nesting_depths();
+
+ if ( $self->[_show_indentation_table_] ) {
+ $self->show_indentation_table();
+ }
# Likewise, large numbers of brace errors usually indicate non-perl
# scripts, so set the severe error flag at a low number. This is similar
# to the level check, but different because braces may balance but be
# incorrectly interlaced.
- if ( $tokenizer_self->[_true_brace_error_count_] > 2 ) {
+ if ( $self->[_true_brace_error_count_] > 2 ) {
$severe_error = 1;
}
- if ( $tokenizer_self->[_look_for_hash_bang_]
- && !$tokenizer_self->[_saw_hash_bang_] )
+ if ( $rOpts_look_for_hash_bang
+ && !$self->[_saw_hash_bang_] )
{
- warning(
+ $self->warning(
"hit EOF without seeing hash-bang line; maybe don't need -x?\n");
}
- if ( $tokenizer_self->[_in_format_] ) {
- warning("hit EOF while in format description\n");
+ if ( $self->[_in_format_] ) {
+ $self->warning("hit EOF while in format description\n");
}
- if ( $tokenizer_self->[_in_skipped_] ) {
- write_logfile_entry(
+ if ( $self->[_in_code_skipping_] ) {
+ $self->write_logfile_entry(
"hit EOF while in lines skipped with --code-skipping\n");
}
- if ( $tokenizer_self->[_in_pod_] ) {
+ if ( $self->[_in_pod_] ) {
# Just write log entry if this is after __END__ or __DATA__
# because this happens to often, and it is not likely to be
# a parsing error.
- if ( $tokenizer_self->[_saw_data_] || $tokenizer_self->[_saw_end_] ) {
- write_logfile_entry(
+ if ( $self->[_saw_data_] || $self->[_saw_end_] ) {
+ $self->write_logfile_entry(
"hit eof while in pod documentation (no =cut seen)\n\tthis can cause trouble with some pod utilities\n"
);
}
else {
- complain(
+ $self->complain(
"hit eof while in pod documentation (no =cut seen)\n\tthis can cause trouble with some pod utilities\n"
);
}
}
- if ( $tokenizer_self->[_in_here_doc_] ) {
+ if ( $self->[_in_here_doc_] ) {
$severe_error = 1;
- my $here_doc_target = $tokenizer_self->[_here_doc_target_];
+ my $here_doc_target = $self->[_here_doc_target_];
my $started_looking_for_here_target_at =
- $tokenizer_self->[_started_looking_for_here_target_at_];
+ $self->[_started_looking_for_here_target_at_];
if ($here_doc_target) {
- warning(
+ $self->warning(
"hit EOF in here document starting at line $started_looking_for_here_target_at with target: $here_doc_target\n"
);
}
else {
- warning(<<EOM);
+ $self->warning(<<EOM);
Hit EOF in here document starting at line $started_looking_for_here_target_at with empty target string.
(Perl will match to the end of file but this may not be intended).
EOM
}
my $nearly_matched_here_target_at =
- $tokenizer_self->[_nearly_matched_here_target_at_];
+ $self->[_nearly_matched_here_target_at_];
if ($nearly_matched_here_target_at) {
- warning(
+ $self->warning(
"NOTE: almost matched at input line $nearly_matched_here_target_at except for whitespace\n"
);
}
}
# Something is seriously wrong if we ended inside a quote
- if ( $tokenizer_self->[_in_quote_] ) {
+ if ( $self->[_in_quote_] ) {
$severe_error = 1;
- my $line_start_quote = $tokenizer_self->[_line_start_quote_];
- my $quote_target = $tokenizer_self->[_quote_target_];
+ my $line_start_quote = $self->[_line_start_quote_];
+ my $quote_target = $self->[_quote_target_];
my $what =
- ( $tokenizer_self->[_in_attribute_list_] )
+ ( $self->[_in_attribute_list_] )
? "attribute list"
: "quote/pattern";
- warning(
+ $self->warning(
"hit EOF seeking end of $what starting at line $line_start_quote ending in $quote_target\n"
);
}
- if ( $tokenizer_self->[_hit_bug_] ) {
+ if ( $self->[_hit_bug_] ) {
$severe_error = 1;
}
# a shell script or an html file. But unfortunately this check can
# interfere with some extended syntaxes, such as RPerl, so it has to be off
# by default.
- my $ue_count = $tokenizer_self->[_unexpected_error_count_];
+ my $ue_count = $self->[_unexpected_error_count_];
if ( $maxue > 0 && $ue_count > $maxue ) {
- warning(<<EOM);
+ $self->warning(<<EOM);
Formatting will be skipped since unexpected token count = $ue_count > -maxue=$maxue; use -maxue=0 to force formatting
EOM
$severe_error = 1;
}
- unless ( $tokenizer_self->[_saw_perl_dash_w_] ) {
- if ( $] < 5.006 ) {
- write_logfile_entry("Suggest including '-w parameter'\n");
- }
- else {
- write_logfile_entry("Suggest including 'use warnings;'\n");
- }
+ if ( !$self->[_saw_perl_dash_w_] ) {
+ $self->write_logfile_entry("Suggest including 'use warnings;'\n");
}
- if ( $tokenizer_self->[_saw_perl_dash_P_] ) {
- write_logfile_entry("Use of -P parameter for defines is discouraged\n");
+ if ( $self->[_saw_perl_dash_P_] ) {
+ $self->write_logfile_entry(
+ "Use of -P parameter for defines is discouraged\n");
}
- unless ( $tokenizer_self->[_saw_use_strict_] ) {
- write_logfile_entry("Suggest including 'use strict;'\n");
+ if ( !$self->[_saw_use_strict_] ) {
+ $self->write_logfile_entry("Suggest including 'use strict;'\n");
}
# it is suggested that labels have at least one upper case character
# for legibility and to avoid code breakage as new keywords are introduced
- if ( $tokenizer_self->[_rlower_case_labels_at_] ) {
- my @lower_case_labels_at =
- @{ $tokenizer_self->[_rlower_case_labels_at_] };
- write_logfile_entry(
+ if ( $self->[_rlower_case_labels_at_] ) {
+ my @lower_case_labels_at = @{ $self->[_rlower_case_labels_at_] };
+ $self->write_logfile_entry(
"Suggest using upper case characters in label(s)\n");
local $LIST_SEPARATOR = ')(';
- write_logfile_entry(" defined at line(s): (@lower_case_labels_at)\n");
+ $self->write_logfile_entry(
+ " defined at line(s): (@lower_case_labels_at)\n");
}
return $severe_error;
} ## end sub report_tokenization_errors
+sub show_indentation_table {
+ my ($self) = @_;
+
+ # Output indentation table made at closing braces. This can be helpful for
+ # the case of a missing brace in a previously formatted file.
+
+ # skip if -wc is used (rare); it is too complex to use
+ return if ($rOpts_whitespace_cycle);
+
+ # skip if non-indenting-brace-prefix (very rare, but could be fixed)
+ return if ($rOpts_non_indenting_brace_prefix);
+
+ # skip if starting level is not zero (probably in editor)
+ return if ($rOpts_starting_indentation_level);
+
+ # skip if indentation analysis is not valid
+ my $rhash = $self->[_rclosing_brace_indentation_hash_];
+ return if ( !$rhash->{valid} );
+
+ my $rhistory_line_number = $rhash->{rhistory_line_number};
+ my $rhistory_level_diff = $rhash->{rhistory_level_diff};
+ my $rhistory_anchor_point = $rhash->{rhistory_anchor_point};
+
+ # Remove the first artificial point from the table
+ shift @{$rhistory_line_number};
+ shift @{$rhistory_level_diff};
+ shift @{$rhistory_anchor_point};
+
+ # Remove dubious points at an anchor point = 2 and beyond
+ # These can occur when non-indenting braces are used
+ my $num_his = @{$rhistory_level_diff};
+ foreach my $i ( 0 .. $num_his - 1 ) {
+ if ( $rhistory_anchor_point->[$i] == 2 ) {
+ $num_his = $i;
+ last;
+ }
+ }
+ return if ( $num_his <= 1 );
+
+ # Ignore an ending non-anchor point
+ if ( !$rhistory_anchor_point->[-1] ) {
+ $num_his -= 1;
+ }
+
+ # Ignore an ending point which is the same as the previous point
+ if ( $num_his > 1 ) {
+ if ( $rhistory_level_diff->[ $num_his - 1 ] ==
+ $rhistory_level_diff->[ $num_his - 2 ] )
+ {
+ $num_his -= 1;
+ }
+ }
+
+ # skip if the table does not have at least 2 points to pinpoint an error
+ return if ( $num_his <= 1 );
+
+ # skip if first point shows a level error - the analysis may not be valid
+ return if ( $rhistory_level_diff->[0] );
+
+ # Since the table could be arbitrarily large, we will limit the table to N
+ # lines. If there are more lines than that, we will show N-3 lines, then
+ # ..., then the last 2 lines. Allow about 3 lines per error, so a table
+ # limit of 10 can localize up to about 3 errors in a file.
+ my $nlines_max = 10;
+ my @pre_indexes = ( 0 .. $num_his - 1 );
+ my @post_indexes = ();
+ if ( @pre_indexes > $nlines_max ) {
+ if ( $nlines_max >= 5 ) {
+ @pre_indexes = ( 0 .. $nlines_max - 4 );
+ @post_indexes = ( $num_his - 2, $num_his - 1 );
+ }
+ else {
+ @pre_indexes = ( 0 .. $nlines_max - 1 );
+ }
+ }
+
+ my @output_lines;
+ push @output_lines, <<EOM;
+Table of nesting level differences at closing braces.
+This might help localize brace errors if the file was previously formatted.
+line: error=[new brace level]-[old indentation level]
+EOM
+ foreach my $i (@pre_indexes) {
+ my $lno = $rhistory_line_number->[$i];
+ my $diff = $rhistory_level_diff->[$i];
+ push @output_lines, <<EOM;
+$lno: $diff
+EOM
+ }
+ if (@post_indexes) {
+ push @output_lines, "...\n";
+ foreach my $i (@post_indexes) {
+ my $lno = $rhistory_line_number->[$i];
+ my $diff = $rhistory_level_diff->[$i];
+ push @output_lines, <<EOM;
+$lno: $diff
+EOM
+ }
+ }
+ push @output_lines, "\n";
+ my $output_str = join EMPTY_STRING, @output_lines;
+
+ $self->interrupt_logfile();
+ $self->warning($output_str);
+ $self->resume_logfile();
+
+ return;
+} ## end sub show_indentation_table
+
sub report_v_string {
# warn if this version can't handle v-strings
- my $tok = shift;
- unless ( $tokenizer_self->[_saw_v_string_] ) {
- $tokenizer_self->[_saw_v_string_] =
- $tokenizer_self->[_last_line_number_];
- }
+ my ( $self, $tok ) = @_;
if ( $] < 5.006 ) {
- warning(
+ $self->warning(
"Found v-string '$tok' but v-strings are not implemented in your version of perl; see Camel 3 book ch 2\n"
);
}
return $is_valid_token_type{$type};
}
-sub get_input_line_number {
- return $tokenizer_self->[_last_line_number_];
-}
-
sub log_numbered_msg {
my ( $self, $msg ) = @_;
# write input line number + message to logfile
my $input_line_number = $self->[_last_line_number_];
- write_logfile_entry("Line $input_line_number: $msg");
+ $self->write_logfile_entry("Line $input_line_number: $msg");
return;
} ## end sub log_numbered_msg
-# returns the next tokenized line
sub get_line {
my $self = shift;
+ # Read the next input line and tokenize it
+ # Returns:
+ # $line_of_tokens = ref to hash of info for the tokenized line
+
# USES GLOBAL VARIABLES:
# $brace_depth, $square_bracket_depth, $paren_depth
- my $input_line = $self->[_line_buffer_object_]->get_line();
+ # get the next line from the input array
+ my $input_line;
+ my $trimmed_input_line;
+ my $line_index = $self->[_input_line_index_next_];
+ my $rinput_lines = $self->[_rinput_lines_];
+ if ( $line_index < @{$rinput_lines} ) {
+ $trimmed_input_line = $self->[_rtrimmed_input_lines_]->[$line_index];
+ $input_line = $rinput_lines->[ $line_index++ ];
+ $self->[_input_line_index_next_] = $line_index;
+ }
+
$self->[_line_of_text_] = $input_line;
- return unless ($input_line);
+ return if ( !defined($input_line) );
my $input_line_number = ++$self->[_last_line_number_];
# Find and remove what characters terminate this line, including any
# control r
my $input_line_separator = EMPTY_STRING;
- if ( chomp($input_line) ) {
+ if ( chomp $input_line ) {
$input_line_separator = $INPUT_RECORD_SEPARATOR;
}
# The first test here very significantly speeds things up, but be sure to
# keep the regex and hash %other_line_endings the same.
if ( $other_line_endings{ substr( $input_line, -1 ) } ) {
- if ( $input_line =~ s/((\r|\035|\032)+)$// ) {
- $input_line_separator = $2 . $input_line_separator;
+ if ( $input_line =~ s/([\r\035\032])+$// ) {
+ $input_line_separator = $1 . $input_line_separator;
+
+ # This could make the trimmed input line incorrect, so the
+ # safe thing to do is to make it undef to force it to be
+ # recomputed later.
+ $trimmed_input_line = undef;
}
}
- # for backwards compatibility we keep the line text terminated with
+ # For backwards compatibility we keep the line text terminated with
# a newline character
$input_line .= "\n";
$self->[_line_of_text_] = $input_line;
_curly_brace_depth => $brace_depth,
_square_bracket_depth => $square_bracket_depth,
_paren_depth => $paren_depth,
- _quote_character => EMPTY_STRING,
## Skip these needless initializations for efficiency:
## _rtoken_type => undef,
## _rtokens => undef,
## _rlevels => undef,
## _rblock_type => undef,
## _rtype_sequence => undef,
-## _rci_levels => undef,
## _starting_in_quote => 0,
## _ending_in_quote => 0,
};
# Handle <<~ targets, which are indicated here by a leading space on
# the here quote character
if ( $here_quote_character =~ /^\s/ ) {
- $candidate_target =~ s/^\s*//;
+ $candidate_target =~ s/^\s+//;
}
if ( $candidate_target eq $here_doc_target ) {
$self->[_nearly_matched_here_target_at_] = undef;
# check for error of extra whitespace
# note for PERL6: leading whitespace is allowed
else {
- $candidate_target =~ s/\s*$//;
- $candidate_target =~ s/^\s*//;
+ $candidate_target =~ s/^ \s+ | \s+ $//gx; # trim both ends
if ( $candidate_target eq $here_doc_target ) {
$self->[_nearly_matched_here_target_at_] = $input_line_number;
}
if ( !$self->[_in_format_] ) {
$self->log_numbered_msg("Exiting format section\n");
$line_of_tokens->{_line_type} = 'FORMAT_END';
+
+ # Make the tokenizer mark an opening brace which follows
+ # as a code block. Fixes issue c202/t032.
+ $last_nonblank_token = ';';
+ $last_nonblank_type = ';';
}
}
else {
$self->[_in_pod_] = 0;
}
if ( $input_line =~ /^\#\!.*perl\b/ && !$self->[_in_end_] ) {
- warning(
+ $self->warning(
"Hash-bang in pod can cause older versions of perl to fail! \n"
);
}
}
# print line unchanged if in skipped section
- elsif ( $self->[_in_skipped_] ) {
+ elsif ( $self->[_in_code_skipping_] ) {
$line_of_tokens->{_line_type} = 'SKIP';
if ( $input_line =~ /$code_skipping_pattern_end/ ) {
$line_of_tokens->{_line_type} = 'SKIP_END';
$self->log_numbered_msg("Exiting code-skipping section\n");
- $self->[_in_skipped_] = 0;
+ $self->[_in_code_skipping_] = 0;
+ }
+ elsif ( $input_line =~ /$code_skipping_pattern_begin/ ) {
+
+ # warn of duplicate starting comment lines, git #118
+ my $lno = $self->[_in_code_skipping_];
+ $self->warning(
+ "Already in code-skipping section which started at line $lno\n"
+ );
+ }
+ else {
+ # not a code-skipping control line
}
return $line_of_tokens;
}
return $line_of_tokens;
}
}
+ else {
+ # not a special control line
+ }
# check for a hash-bang line if we haven't seen one
- if ( !$self->[_saw_hash_bang_] ) {
- if ( $input_line =~ /^\#\!.*perl\b/ ) {
- $self->[_saw_hash_bang_] = $input_line_number;
+ if ( !$self->[_saw_hash_bang_]
+ && substr( $input_line, 0, 2 ) eq '#!'
+ && $input_line =~ /^\#\!.*perl\b/ )
+ {
+ $self->[_saw_hash_bang_] = $input_line_number;
- # check for -w and -P flags
- if ( $input_line =~ /^\#\!.*perl\s.*-.*P/ ) {
- $self->[_saw_perl_dash_P_] = 1;
- }
+ # check for -w and -P flags
+ if ( $input_line =~ /^\#\!.*perl\s.*-.*P/ ) {
+ $self->[_saw_perl_dash_P_] = 1;
+ }
- if ( $input_line =~ /^\#\!.*perl\s.*-.*w/ ) {
- $self->[_saw_perl_dash_w_] = 1;
- }
+ if ( $input_line =~ /^\#\!.*perl\s.*-.*w/ ) {
+ $self->[_saw_perl_dash_w_] = 1;
+ }
- if (
- $input_line_number > 1
+ if (
+ $input_line_number > 1
- # leave any hash bang in a BEGIN block alone
- # i.e. see 'debugger-duck_type.t'
- && !(
- $last_nonblank_block_type
- && $last_nonblank_block_type eq 'BEGIN'
- )
- && !$self->[_look_for_hash_bang_]
+ # leave any hash bang in a BEGIN block alone
+ # i.e. see 'debugger-duck_type.t'
+ && !(
+ $last_nonblank_block_type
+ && $last_nonblank_block_type eq 'BEGIN'
+ )
+ && !$rOpts_look_for_hash_bang
- # Try to avoid giving a false alarm at a simple comment.
- # These look like valid hash-bang lines:
+ # Try to avoid giving a false alarm at a simple comment.
+ # These look like valid hash-bang lines:
- #!/usr/bin/perl -w
- #! /usr/bin/perl -w
- #!c:\perl\bin\perl.exe
+ #!/usr/bin/perl -w
+ #! /usr/bin/perl -w
+ #!c:\perl\bin\perl.exe
- # These are comments:
- #! I love perl
- #! sunos does not yet provide a /usr/bin/perl
+ # These are comments:
+ #! I love perl
+ #! sunos does not yet provide a /usr/bin/perl
- # Comments typically have multiple spaces, which suggests
- # the filter
- && $input_line =~ /^\#\!(\s+)?(\S+)?perl/
- )
- {
+ # Comments typically have multiple spaces, which suggests
+ # the filter
+ && $input_line =~ /^\#\!(\s+)?(\S+)?perl/
+ )
+ {
- # this is helpful for VMS systems; we may have accidentally
- # tokenized some DCL commands
- if ( $self->[_started_tokenizing_] ) {
- warning(
+ # this is helpful for VMS systems; we may have accidentally
+ # tokenized some DCL commands
+ if ( $self->[_started_tokenizing_] ) {
+ $self->warning(
"There seems to be a hash-bang after line 1; do you need to run with -x ?\n"
- );
- }
- else {
- complain("Useless hash-bang after line 1\n");
- }
+ );
}
-
- # Report the leading hash-bang as a system line
- # This will prevent -dac from deleting it
else {
- $line_of_tokens->{_line_type} = 'SYSTEM';
- return $line_of_tokens;
+ $self->complain("Useless hash-bang after line 1\n");
}
}
+
+ # Report the leading hash-bang as a system line
+ # This will prevent -dac from deleting it
+ else {
+ $line_of_tokens->{_line_type} = 'SYSTEM';
+ return $line_of_tokens;
+ }
}
# wait for a hash-bang before parsing if the user invoked us with -x
- if ( $self->[_look_for_hash_bang_]
+ if ( $rOpts_look_for_hash_bang
&& !$self->[_saw_hash_bang_] )
{
$line_of_tokens->{_line_type} = 'SYSTEM';
# _in_end_
# _in_format_
# _in_error_
- # _in_skipped_
+ # _in_code_skipping_
+ # _in_format_skipping_
# _in_pod_
# _in_quote_
- $self->tokenize_this_line($line_of_tokens);
+
+ $self->tokenize_this_line( $line_of_tokens, $trimmed_input_line );
# Now finish defining the return structure and return it
$line_of_tokens->{_ending_in_quote} = $self->[_in_quote_];
# handle severe error (binary data in script)
if ( $self->[_in_error_] ) {
$self->[_in_quote_] = 0; # to avoid any more messages
- warning("Giving up after error\n");
+ $self->warning("Giving up after error\n");
$line_of_tokens->{_line_type} = 'ERROR';
reset_indentation_level(0); # avoid error messages
return $line_of_tokens;
# leading =head. In any case, this isn't good.
if ( $input_line =~ /^=cut\b/ ) {
if ( $self->[_saw_data_] || $self->[_saw_end_] ) {
- complain("=cut while not in pod ignored\n");
+ $self->complain("=cut while not in pod ignored\n");
$self->[_in_pod_] = 0;
$line_of_tokens->{_line_type} = 'POD_END';
}
else {
$line_of_tokens->{_line_type} = 'POD_START';
- warning(
+ if ( !DEVEL_MODE ) {
+ $self->warning(
"=cut starts a pod section .. this can fool pod utilities.\n"
- ) unless (DEVEL_MODE);
+ );
+ }
$self->log_numbered_msg("Entering POD section\n");
}
}
}
# handle start of skipped section
- if ( $self->[_in_skipped_] ) {
+ if ( $self->[_in_code_skipping_] ) {
$line_of_tokens->{_line_type} = 'SKIP';
$self->log_numbered_msg("Entering code-skipping section\n");
}
return $line_of_tokens;
}
+ else {
+ # not in __END__ or __DATA__
+ }
# now, finally, we know that this line is type 'CODE'
$line_of_tokens->{_line_type} = 'CODE';
if ( $self->[_in_quote_]
and ( $self->[_line_start_quote_] < 0 ) )
{
-
- #if ( ( my $quote_target = get_quote_target() ) !~ /^\s*$/ ) {
if ( ( my $quote_target = $self->[_quote_target_] ) !~ /^\s*$/ ) {
$self->[_line_start_quote_] = $input_line_number;
$self->log_numbered_msg(
$self->[_line_start_quote_] = -1;
$self->log_numbered_msg("End of multi-line quote or pattern\n");
}
+ else {
+ # not at the edge of a quote
+ }
# we are returning a line of CODE
return $line_of_tokens;
}
# if we know there is a hash_bang line, the level must be zero
- elsif ( $self->[_look_for_hash_bang_] ) {
+ elsif ($rOpts_look_for_hash_bang) {
$self->[_know_starting_level_] = 1;
}
my $i = 0;
# keep looking at lines until we find a hash bang or piece of code
+ # ( or, for now, an =pod line)
my $msg = EMPTY_STRING;
- while ( $line = $self->[_line_buffer_object_]->peek_ahead( $i++ ) ) {
+ my $in_code_skipping;
+ my $line_for_guess;
+ while ( defined( $line = $self->peek_ahead( $i++ ) ) ) {
# if first line is #! then assume starting level is zero
if ( $i == 1 && $line =~ /^\#\!/ ) {
$starting_level = 0;
last;
}
- next if ( $line =~ /^\s*#/ ); # skip past comments
+
+ # ignore lines fenced off with code-skipping comments
+ if ( $line =~ /^\s*#/ ) {
+
+ # use first comment for indentation guess in case of no code
+ if ( !defined($line_for_guess) ) { $line_for_guess = $line }
+
+ if ( !$in_code_skipping ) {
+ if ( $rOpts_code_skipping
+ && $line =~ /$code_skipping_pattern_begin/ )
+ {
+ $in_code_skipping = 1;
+ next;
+ }
+ }
+ else {
+ if ( $line =~ /$code_skipping_pattern_end/ ) {
+ $in_code_skipping = 0;
+ }
+ next;
+ }
+
+ # Note that we could also ignore format-skipping lines here
+ # but it isn't clear if that would be best.
+ # See c326 for example code.
+
+ next;
+ }
+ next if ($in_code_skipping);
+
next if ( $line =~ /^\s*$/ ); # skip past blank lines
- $starting_level = guess_old_indentation_level($line);
+
+ # use first line of code for indentation guess
+ $line_for_guess = $line;
last;
+ } ## end while ( defined( $line = ...))
+
+ if ( defined($line_for_guess) ) {
+ $starting_level =
+ $self->guess_old_indentation_level($line_for_guess);
}
$msg = "Line $i implies starting-indentation-level = $starting_level\n";
- write_logfile_entry("$msg");
+ $self->write_logfile_entry("$msg");
}
$self->[_starting_level_] = $starting_level;
reset_indentation_level($starting_level);
} ## end sub find_starting_indentation_level
sub guess_old_indentation_level {
- my ($line) = @_;
+ my ( $self, $line ) = @_;
# Guess the indentation level of an input line.
#
# a block of code within an editor, so that the block stays at the same
# level when perltidy is applied repeatedly.
#
- # USES GLOBAL VARIABLES: $tokenizer_self
+ # USES GLOBAL VARIABLES: (none)
my $level = 0;
# find leading tabs, spaces, and any statement label
# If there are leading tabs, we use the tab scheme for this run, if
# any, so that the code will remain stable when editing.
- if ($1) { $spaces += length($1) * $tokenizer_self->[_tabsize_] }
+ if ($1) { $spaces += length($1) * $tabsize }
if ($2) { $spaces += length($2) }
# correct for outdented labels
- if ( $3 && $tokenizer_self->[_outdent_labels_] ) {
- $spaces += $tokenizer_self->[_continuation_indentation_];
+ if ( $3
+ && $rOpts_outdent_labels
+ && $rOpts_continuation_indentation > 0 )
+ {
+ $spaces += $rOpts_continuation_indentation;
}
}
- # compute indentation using the value of -i for this run.
- # If -i=0 is used for this run (which is possible) it doesn't matter
- # what we do here but we'll guess that the old run used 4 spaces per level.
- my $indent_columns = $tokenizer_self->[_indent_columns_];
- $indent_columns = 4 if ( !$indent_columns );
- $level = int( $spaces / $indent_columns );
+ $level = int( $spaces / $rOpts_indent_columns );
return ($level);
} ## end sub guess_old_indentation_level
-# This is a currently unused debug routine
sub dump_functions {
+ # This is an unused debug routine, save for future use
+
my $fh = *STDOUT;
- foreach my $pkg ( keys %is_user_function ) {
+ foreach my $pkg ( keys %{$ris_user_function} ) {
$fh->print("\nnon-constant subs in package $pkg\n");
- foreach my $sub ( keys %{ $is_user_function{$pkg} } ) {
+ foreach my $sub ( keys %{ $ris_user_function->{$pkg} } ) {
my $msg = EMPTY_STRING;
- if ( $is_block_list_function{$pkg}{$sub} ) {
+ if ( $ris_block_list_function->{$pkg}{$sub} ) {
$msg = 'block_list';
}
- if ( $is_block_function{$pkg}{$sub} ) {
+ if ( $ris_block_function->{$pkg}{$sub} ) {
$msg = 'block';
}
$fh->print("$sub $msg\n");
}
}
- foreach my $pkg ( keys %is_constant ) {
+ foreach my $pkg ( keys %{$ris_constant} ) {
$fh->print("\nconstants and constant subs in package $pkg\n");
- foreach my $sub ( keys %{ $is_constant{$pkg} } ) {
+ foreach my $sub ( keys %{ $ris_constant->{$pkg} } ) {
$fh->print("$sub\n");
}
}
sub prepare_for_a_new_file {
+ my ( $self, $source_object ) = @_;
+
+ # copy the source object lines to an array of lines
+ $self->make_source_array($source_object);
+
# previous tokens needed to determine what to expect next
$last_nonblank_token = ';'; # the only possible starting state which
$last_nonblank_type = ';'; # will make a leading brace a code block
$last_nonblank_block_type = EMPTY_STRING;
# scalars for remembering statement types across multiple lines
- $statement_type = EMPTY_STRING; # '' or 'use' or 'sub..' or 'case..'
- $in_attribute_list = 0;
+ $statement_type = EMPTY_STRING; # '' or 'use' or 'sub..' or 'case..'
# scalars for remembering where we are in the file
$current_package = "main";
$context = UNKNOWN_CONTEXT;
# hashes used to remember function information
- %is_constant = (); # user-defined constants
- %is_user_function = (); # user-defined functions
- %user_function_prototype = (); # their prototypes
- %is_block_function = ();
- %is_block_list_function = ();
- %saw_function_definition = ();
- %saw_use_module = ();
+ $ris_constant = {}; # user-defined constants
+ $ris_user_function = {}; # user-defined functions
+ $ruser_function_prototype = {}; # their prototypes
+ $ris_block_function = {};
+ $ris_block_list_function = {};
+ $rsaw_function_definition = {};
+ $rsaw_use_module = {};
# variables used to track depths of various containers
# and report nesting errors
- $paren_depth = 0;
- $brace_depth = 0;
- $square_bracket_depth = 0;
- @current_depth = (0) x scalar @closing_brace_names;
- $total_depth = 0;
- @total_depth = ();
- @nesting_sequence_number = ( 0 .. @closing_brace_names - 1 );
- @current_sequence_number = ();
- $next_sequence_number = 2; # The value 1 is reserved for SEQ_ROOT
-
- @paren_type = ();
- @paren_semicolon_count = ();
- @paren_structural_type = ();
- @brace_type = ();
- @brace_structural_type = ();
- @brace_context = ();
- @brace_package = ();
- @square_bracket_type = ();
- @square_bracket_structural_type = ();
- @depth_array = ();
- @nested_ternary_flag = ();
- @nested_statement_type = ();
- @starting_line_of_current_depth = ();
-
- $paren_type[$paren_depth] = EMPTY_STRING;
- $paren_semicolon_count[$paren_depth] = 0;
- $paren_structural_type[$brace_depth] = EMPTY_STRING;
- $brace_type[$brace_depth] = ';'; # identify opening brace as code block
- $brace_structural_type[$brace_depth] = EMPTY_STRING;
- $brace_context[$brace_depth] = UNKNOWN_CONTEXT;
- $brace_package[$paren_depth] = $current_package;
- $square_bracket_type[$square_bracket_depth] = EMPTY_STRING;
- $square_bracket_structural_type[$square_bracket_depth] = EMPTY_STRING;
+ $paren_depth = 0;
+ $brace_depth = 0;
+ $square_bracket_depth = 0;
+ $rcurrent_depth = [ (0) x scalar(@closing_brace_names) ];
+ $total_depth = 0;
+ $rtotal_depth = [];
+ $rcurrent_sequence_number = [];
+ $next_sequence_number = SEQ_ROOT + 1;
+
+ $rparen_type = [];
+ $rparen_semicolon_count = [];
+ $rparen_vars = [];
+ $rbrace_type = [];
+ $rbrace_structural_type = [];
+ $rbrace_context = [];
+ $rbrace_package = [];
+ $rsquare_bracket_type = [];
+ $rsquare_bracket_structural_type = [];
+ $rdepth_array = [];
+ $rnested_ternary_flag = [];
+ $rnested_statement_type = [];
+ $rstarting_line_of_current_depth = [];
+
+ $rparen_type->[$paren_depth] = EMPTY_STRING;
+ $rparen_semicolon_count->[$paren_depth] = 0;
+ $rparen_vars->[$paren_depth] = [];
+ $rbrace_type->[$brace_depth] = ';'; # identify opening brace as code block
+ $rbrace_structural_type->[$brace_depth] = EMPTY_STRING;
+ $rbrace_context->[$brace_depth] = UNKNOWN_CONTEXT;
+ $rbrace_package->[$paren_depth] = $current_package;
+ $rsquare_bracket_type->[$square_bracket_depth] = EMPTY_STRING;
+ $rsquare_bracket_structural_type->[$square_bracket_depth] = EMPTY_STRING;
initialize_tokenizer_state();
return;
# TV2: refs to ARRAYS for processing one LINE
# Re-initialized on each call.
- my $routput_token_list = []; # stack of output token indexes
- my $routput_token_type = []; # token types
- my $routput_block_type = []; # types of code block
- my $routput_container_type = []; # paren types, such as if, elsif, ..
- my $routput_type_sequence = []; # nesting sequential number
- my $routput_indent_flag = []; #
+ my $routput_token_list = []; # stack of output token indexes
+ my $routput_token_type = []; # token types
+ my $routput_block_type = []; # types of code block
+ my $routput_type_sequence = []; # nesting sequential number
+ my $routput_indent_flag = []; #
# TV3: SCALARS for quote variables. These are initialized with a
# subroutine call and continually updated as lines are processed.
my ( $in_quote, $quote_type, $quote_character, $quote_pos, $quote_depth,
- $quoted_string_1, $quoted_string_2, $allowed_quote_modifiers, );
+ $quoted_string_1, $quoted_string_2, $allowed_quote_modifiers );
# TV4: SCALARS for multi-line identifiers and
# statements. These are initialized with a subroutine call
# Initialized once and continually updated as lines are
# processed.
my (
- $nesting_token_string, $nesting_type_string,
- $nesting_block_string, $nesting_block_flag,
- $nesting_list_string, $nesting_list_flag,
- $ci_string_in_tokenizer, $continuation_string_in_tokenizer,
- $in_statement_continuation, $level_in_tokenizer,
- $slevel_in_tokenizer, $rslevel_stack,
+ $nesting_token_string, $nesting_block_string,
+ $nesting_block_flag, $level_in_tokenizer,
);
# TV6: SCALARS for remembering several previous
# tokens. Initialized once and continually updated as
# lines are processed.
my (
- $last_nonblank_container_type, $last_nonblank_type_sequence,
- $last_last_nonblank_token, $last_last_nonblank_type,
- $last_last_nonblank_block_type, $last_last_nonblank_container_type,
- $last_last_nonblank_type_sequence, $last_nonblank_prototype,
+ $last_nonblank_container_type, $last_nonblank_type_sequence,
+ $last_last_nonblank_token, $last_last_nonblank_type,
+ $last_nonblank_prototype,
);
# ----------------------------------------------------------------
sub initialize_tokenizer_state {
+ # GV1: initialized once
# TV1: initialized on each call
# TV2: initialized on each call
# TV3:
$want_paren = EMPTY_STRING;
# TV5:
- $nesting_token_string = EMPTY_STRING;
- $nesting_type_string = EMPTY_STRING;
- $nesting_block_string = '1'; # initially in a block
- $nesting_block_flag = 1;
- $nesting_list_string = '0'; # initially not in a list
- $nesting_list_flag = 0; # initially not in a list
- $ci_string_in_tokenizer = EMPTY_STRING;
- $continuation_string_in_tokenizer = "0";
- $in_statement_continuation = 0;
- $level_in_tokenizer = 0;
- $slevel_in_tokenizer = 0;
- $rslevel_stack = [];
+ $nesting_token_string = EMPTY_STRING;
+ $nesting_block_string = '1'; # initially in a block
+ $nesting_block_flag = 1;
+ $level_in_tokenizer = 0;
# TV6:
- $last_nonblank_container_type = EMPTY_STRING;
- $last_nonblank_type_sequence = EMPTY_STRING;
- $last_last_nonblank_token = ';';
- $last_last_nonblank_type = ';';
- $last_last_nonblank_block_type = EMPTY_STRING;
- $last_last_nonblank_container_type = EMPTY_STRING;
- $last_last_nonblank_type_sequence = EMPTY_STRING;
- $last_nonblank_prototype = EMPTY_STRING;
+ $last_nonblank_container_type = EMPTY_STRING;
+ $last_nonblank_type_sequence = EMPTY_STRING;
+ $last_last_nonblank_token = ';';
+ $last_last_nonblank_type = ';';
+ $last_nonblank_prototype = EMPTY_STRING;
return;
} ## end sub initialize_tokenizer_state
sub save_tokenizer_state {
- my $rTV1 = [
- $block_type, $container_type, $expecting,
+ # Global variables:
+ my $rGV1 = [
+ $brace_depth,
+ $context,
+ $current_package,
+ $last_nonblank_block_type,
+ $last_nonblank_token,
+ $last_nonblank_type,
+ $next_sequence_number,
+ $paren_depth,
+ $rbrace_context,
+ $rbrace_package,
+ $rbrace_structural_type,
+ $rbrace_type,
+ $rcurrent_depth,
+ $rcurrent_sequence_number,
+ $rdepth_array,
+ $ris_block_function,
+ $ris_block_list_function,
+ $ris_constant,
+ $ris_user_function,
+ $rnested_statement_type,
+ $rnested_ternary_flag,
+ $rparen_semicolon_count,
+ $rparen_vars,
+ $rparen_type,
+ $rsaw_function_definition,
+ $rsaw_use_module,
+ $rsquare_bracket_structural_type,
+ $rsquare_bracket_type,
+ $rstarting_line_of_current_depth,
+ $rtotal_depth,
+ $ruser_function_prototype,
+ $square_bracket_depth,
+ $statement_type,
+ $total_depth,
+
+ ];
+
+ # Tokenizer closure variables:
+ my $rTV1 = [
+ $block_type, $container_type, $expecting,
$i, $i_tok, $input_line,
$input_line_number, $last_nonblank_i, $max_token_index,
$next_tok, $next_type, $peeked_ahead,
];
my $rTV2 = [
- $routput_token_list, $routput_token_type,
- $routput_block_type, $routput_container_type,
- $routput_type_sequence, $routput_indent_flag,
+ $routput_token_list, $routput_token_type,
+ $routput_block_type, $routput_type_sequence,
+ $routput_indent_flag,
];
my $rTV3 = [
my $rTV4 = [ $id_scan_state, $identifier, $want_paren ];
my $rTV5 = [
- $nesting_token_string, $nesting_type_string,
- $nesting_block_string, $nesting_block_flag,
- $nesting_list_string, $nesting_list_flag,
- $ci_string_in_tokenizer, $continuation_string_in_tokenizer,
- $in_statement_continuation, $level_in_tokenizer,
- $slevel_in_tokenizer, $rslevel_stack,
+ $nesting_token_string, $nesting_block_string,
+ $nesting_block_flag, $level_in_tokenizer,
];
my $rTV6 = [
- $last_nonblank_container_type,
- $last_nonblank_type_sequence,
- $last_last_nonblank_token,
- $last_last_nonblank_type,
- $last_last_nonblank_block_type,
- $last_last_nonblank_container_type,
- $last_last_nonblank_type_sequence,
+ $last_nonblank_container_type, $last_nonblank_type_sequence,
+ $last_last_nonblank_token, $last_last_nonblank_type,
$last_nonblank_prototype,
];
- return [ $rTV1, $rTV2, $rTV3, $rTV4, $rTV5, $rTV6 ];
+ return [ $rGV1, $rTV1, $rTV2, $rTV3, $rTV4, $rTV5, $rTV6 ];
} ## end sub save_tokenizer_state
sub restore_tokenizer_state {
my ($rstate) = @_;
- my ( $rTV1, $rTV2, $rTV3, $rTV4, $rTV5, $rTV6 ) = @{$rstate};
+ my ( $rGV1, $rTV1, $rTV2, $rTV3, $rTV4, $rTV5, $rTV6 ) = @{$rstate};
+
+ (
+ $brace_depth,
+ $context,
+ $current_package,
+ $last_nonblank_block_type,
+ $last_nonblank_token,
+ $last_nonblank_type,
+ $next_sequence_number,
+ $paren_depth,
+ $rbrace_context,
+ $rbrace_package,
+ $rbrace_structural_type,
+ $rbrace_type,
+ $rcurrent_depth,
+ $rcurrent_sequence_number,
+ $rdepth_array,
+ $ris_block_function,
+ $ris_block_list_function,
+ $ris_constant,
+ $ris_user_function,
+ $rnested_statement_type,
+ $rnested_ternary_flag,
+ $rparen_semicolon_count,
+ $rparen_vars,
+ $rparen_type,
+ $rsaw_function_definition,
+ $rsaw_use_module,
+ $rsquare_bracket_structural_type,
+ $rsquare_bracket_type,
+ $rstarting_line_of_current_depth,
+ $rtotal_depth,
+ $ruser_function_prototype,
+ $square_bracket_depth,
+ $statement_type,
+ $total_depth,
+
+ ) = @{$rGV1};
+
(
$block_type, $container_type, $expecting,
$i, $i_tok, $input_line,
) = @{$rTV1};
(
- $routput_token_list, $routput_token_type,
- $routput_block_type, $routput_container_type,
- $routput_type_sequence, $routput_indent_flag,
+ $routput_token_list, $routput_token_type,
+ $routput_block_type, $routput_type_sequence,
+ $routput_indent_flag,
) = @{$rTV2};
(
( $id_scan_state, $identifier, $want_paren ) = @{$rTV4};
(
- $nesting_token_string, $nesting_type_string,
- $nesting_block_string, $nesting_block_flag,
- $nesting_list_string, $nesting_list_flag,
- $ci_string_in_tokenizer, $continuation_string_in_tokenizer,
- $in_statement_continuation, $level_in_tokenizer,
- $slevel_in_tokenizer, $rslevel_stack,
+ $nesting_token_string, $nesting_block_string,
+ $nesting_block_flag, $level_in_tokenizer,
) = @{$rTV5};
(
- $last_nonblank_container_type,
- $last_nonblank_type_sequence,
- $last_last_nonblank_token,
- $last_last_nonblank_type,
- $last_last_nonblank_block_type,
- $last_last_nonblank_container_type,
- $last_last_nonblank_type_sequence,
+ $last_nonblank_container_type, $last_nonblank_type_sequence,
+ $last_last_nonblank_token, $last_last_nonblank_type,
$last_nonblank_prototype,
) = @{$rTV6};
return;
sub split_pretoken {
- my ($numc) = @_;
+ my ( $self, $numc ) = @_;
- # Split the leading $numc characters from the current token (at index=$i)
- # which is pre-type 'w' and insert the remainder back into the pretoken
- # stream with appropriate settings. Since we are splitting a pre-type 'w',
- # there are three cases, depending on if the remainder starts with a digit:
- # Case 1: remainder is type 'd', all digits
- # Case 2: remainder is type 'd' and type 'w': digits and other characters
- # Case 3: remainder is type 'w'
+ # This provides a way to work around the limitations of the
+ # pre-tokenization scheme upon which perltidy is based. It is rarely
+ # needed.
+
+ # Split the leading $numc characters from the current token (at
+ # index=$i) which is pre-type 'w' and insert the remainder back into
+ # the pretoken stream with appropriate settings. Since we are
+ # splitting a pre-type 'w', there are three cases, depending on if the
+ # remainder starts with a digit:
+ # Case 1: remainder is type 'd', all digits
+ # Case 2: remainder is type 'd' and type 'w': digits & other characters
+ # Case 3: remainder is type 'w'
# Examples, for $numc=1:
# $tok => $tok_0 $tok_1 $tok_2
my $len_1 = length($tok_1);
my $len_2 = length($tok_2);
- my $pre_type_0 = 'w';
+ ##my $pre_type_0 = 'w';
my $pre_type_1 = 'd';
my $pre_type_2 = 'w';
$rtokens->[$i] = $tok_0;
return 1;
}
- else {
- # Shouldn't get here
- if (DEVEL_MODE) {
- Fault(<<EOM);
+ # Shouldn't get here - bad call parameters
+ if (DEVEL_MODE) {
+ $self->Fault(<<EOM);
While working near line number $input_line_number, bad arg '$tok' passed to sub split_pretoken()
EOM
- }
}
return;
} ## end sub split_pretoken
}
sub reset_indentation_level {
- $level_in_tokenizer = $slevel_in_tokenizer = shift;
- push @{$rslevel_stack}, $slevel_in_tokenizer;
+ $level_in_tokenizer = shift;
return;
}
sub peeked_ahead {
- my $flag = shift;
+ ( ( my $flag ) ) = @_;
+
+ # get or set the closure flag '$peeked_ahead':
+ # - set $peeked_ahead to $flag if given, then
+ # - return current value
$peeked_ahead = defined($flag) ? $flag : $peeked_ahead;
return $peeked_ahead;
- }
+ } ## end sub peeked_ahead
# ------------------------------------------------------------
# end of tokenizer variable access and manipulation routines
my %is_zero_continuation_block_type;
my @q;
@q = qw( } { BEGIN END CHECK INIT AUTOLOAD DESTROY UNITCHECK continue ;
- if elsif else unless while until for foreach switch case given when);
+ if elsif else unless while until for foreach switch case given when );
@is_zero_continuation_block_type{@q} = (1) x scalar(@q);
my %is_logical_container;
- @q = qw(if elsif unless while and or err not && ! || for foreach);
+ @q = qw( if elsif unless while and or err not && ! || for foreach );
@is_logical_container{@q} = (1) x scalar(@q);
my %is_binary_type;
- @q = qw(|| &&);
+ @q = qw( || && );
@is_binary_type{@q} = (1) x scalar(@q);
my %is_binary_keyword;
- @q = qw(and or err eq ne cmp);
+ @q = qw( and or err eq ne cmp );
@is_binary_keyword{@q} = (1) x scalar(@q);
# 'L' is token for opening { at hash key
@q = qw< L { ( [ >;
@is_opening_type{@q} = (1) x scalar(@q);
+ my %is_opening_or_ternary_type;
+ push @q, '?';
+ @is_opening_or_ternary_type{@q} = (1) x scalar(@q);
+
# 'R' is token for closing } at hash key
my %is_closing_type;
@q = qw< R } ) ] >;
@is_closing_type{@q} = (1) x scalar(@q);
+ my %is_closing_or_ternary_type;
+ push @q, ':';
+ @is_closing_or_ternary_type{@q} = (1) x scalar(@q);
+
my %is_redo_last_next_goto;
- @q = qw(redo last next goto);
+ @q = qw( redo last next goto );
@is_redo_last_next_goto{@q} = (1) x scalar(@q);
my %is_use_require;
- @q = qw(use require);
+ @q = qw( use require );
@is_use_require{@q} = (1) x scalar(@q);
- # This hash holds the array index in $tokenizer_self for these keywords:
+ # This hash holds the array index in $self for these keywords:
# Fix for issue c035: removed 'format' from this hash
my %is_END_DATA = (
'__END__' => _in_end_,
push @q, ',';
@is_list_end_type{@q} = (1) x scalar(@q);
- # original ref: camel 3 p 147,
- # but perl may accept undocumented flags
- # perl 5.10 adds 'p' (preserve)
- # Perl version 5.22 added 'n'
- # From http://perldoc.perl.org/perlop.html we have
- # /PATTERN/msixpodualngc or m?PATTERN?msixpodualngc
- # s/PATTERN/REPLACEMENT/msixpodualngcer
- # y/SEARCHLIST/REPLACEMENTLIST/cdsr
- # tr/SEARCHLIST/REPLACEMENTLIST/cdsr
- # qr/STRING/msixpodualn
- my %quote_modifiers = (
- 's' => '[msixpodualngcer]',
- 'y' => '[cdsr]',
- 'tr' => '[cdsr]',
- 'm' => '[msixpodualngc]',
- 'qr' => '[msixpodualn]',
- 'q' => EMPTY_STRING,
- 'qq' => EMPTY_STRING,
- 'qw' => EMPTY_STRING,
- 'qx' => EMPTY_STRING,
- );
-
# table showing how many quoted things to look for after quote operator..
# s, y, tr have 2 (pattern and replacement)
# others have 1 (pattern only)
);
my %is_for_foreach;
- @q = qw(for foreach);
+ @q = qw( for foreach );
@is_for_foreach{@q} = (1) x scalar(@q);
# These keywords may introduce blocks after parenthesized expressions,
@is_blocktype_with_paren{@q} = (1) x scalar(@q);
my %is_case_default;
- @q = qw(case default);
+ @q = qw( case default );
@is_case_default{@q} = (1) x scalar(@q);
#------------------------
# $replacement_text
# return:
# $rht = reference to any here-doc targets
- my ($replacement_text) = @_;
+ my ( $self, $replacement_text ) = @_;
# quick check
- return unless ( $replacement_text =~ /<</ );
+ return if ( $replacement_text !~ /<</ );
- write_logfile_entry("scanning replacement text for here-doc targets\n");
+ $self->write_logfile_entry(
+ "scanning replacement text for here-doc targets\n");
# save the logger object for error messages
- my $logger_object = $tokenizer_self->[_logger_object_];
-
- # localize all package variables
- local (
- $tokenizer_self, $last_nonblank_token,
- $last_nonblank_type, $last_nonblank_block_type,
- $statement_type, $in_attribute_list,
- $current_package, $context,
- %is_constant, %is_user_function,
- %user_function_prototype, %is_block_function,
- %is_block_list_function, %saw_function_definition,
- $brace_depth, $paren_depth,
- $square_bracket_depth, @current_depth,
- @total_depth, $total_depth,
- @nesting_sequence_number, @current_sequence_number,
- @paren_type, @paren_semicolon_count,
- @paren_structural_type, @brace_type,
- @brace_structural_type, @brace_context,
- @brace_package, @square_bracket_type,
- @square_bracket_structural_type, @depth_array,
- @starting_line_of_current_depth, @nested_ternary_flag,
- @nested_statement_type, $next_sequence_number,
- );
+ my $logger_object = $self->[_logger_object_];
# save all lexical variables
my $rstate = save_tokenizer_state();
_decrement_count(); # avoid error check for multiple tokenizers
# make a new tokenizer
- my $rOpts = {};
- my $source_object = Perl::Tidy::LineSource->new(
- input_file => \$replacement_text,
- rOpts => $rOpts,
- );
my $tokenizer = Perl::Tidy::Tokenizer->new(
- source_object => $source_object,
+ source_object => \$replacement_text,
logger_object => $logger_object,
starting_line_number => $input_line_number,
);
# scan the replacement text
- 1 while ( $tokenizer->get_line() );
+ while ( $tokenizer->get_line() ) { }
# remove any here doc targets
my $rht = undef;
- if ( $tokenizer_self->[_in_here_doc_] ) {
+ if ( $tokenizer->[_in_here_doc_] ) {
$rht = [];
push @{$rht},
[
- $tokenizer_self->[_here_doc_target_],
- $tokenizer_self->[_here_quote_character_]
+ $tokenizer->[_here_doc_target_],
+ $tokenizer->[_here_quote_character_]
];
- if ( $tokenizer_self->[_rhere_target_list_] ) {
- push @{$rht}, @{ $tokenizer_self->[_rhere_target_list_] };
- $tokenizer_self->[_rhere_target_list_] = undef;
+ if ( $tokenizer->[_rhere_target_list_] ) {
+ push @{$rht}, @{ $tokenizer->[_rhere_target_list_] };
+ $tokenizer->[_rhere_target_list_] = undef;
}
- $tokenizer_self->[_in_here_doc_] = undef;
+ $tokenizer->[_in_here_doc_] = undef;
}
# now its safe to report errors
- my $severe_error = $tokenizer->report_tokenization_errors();
+ my $severe_error_uu = $tokenizer->report_tokenization_errors();
# TODO: Could propagate a severe error up
} ## end sub scan_replacement_text
sub scan_bare_identifier {
- ( $i, $tok, $type, $prototype ) =
- scan_bare_identifier_do( $input_line, $i, $tok, $type, $prototype,
- $rtoken_map, $max_token_index );
+ my $self = shift;
+ ( $i, $tok, $type, $prototype ) = $self->scan_bare_identifier_do(
+
+ $input_line,
+ $i,
+ $tok,
+ $type,
+ $prototype,
+ $rtoken_map,
+ $max_token_index
+ );
return;
} ## end sub scan_bare_identifier
sub scan_identifier {
+
+ my $self = shift;
+
(
- $i, $tok, $type, $id_scan_state, $identifier,
+
+ $i,
+ $tok,
+ $type,
+ $id_scan_state,
+ $identifier,
my $split_pretoken_flag
- )
- = scan_complex_identifier( $i, $id_scan_state, $identifier, $rtokens,
- $max_token_index, $expecting, $paren_type[$paren_depth] );
+
+ ) = $self->scan_complex_identifier(
+
+ $i,
+ $id_scan_state,
+ $identifier,
+ $rtokens,
+ $max_token_index,
+ $expecting,
+ $rparen_type->[$paren_depth]
+ );
# Check for signal to fix a special variable adjacent to a keyword,
# such as '$^One$0'.
# Try to fix it by splitting the pretoken
if ( $i > 0
&& $rtokens->[ $i - 1 ] eq '^'
- && split_pretoken(1) )
+ && $self->split_pretoken(1) )
{
$identifier = substr( $identifier, 0, 3 );
$tok = $identifier;
# This shouldn't happen ...
my $var = substr( $tok, 0, 3 );
my $excess = substr( $tok, 3 );
- interrupt_logfile();
- warning(<<EOM);
+ $self->interrupt_logfile();
+ $self->warning(<<EOM);
$input_line_number: Trouble parsing at characters '$excess' after special variable '$var'.
-A space may be needed after '$var'.
+A space may be needed after '$var'.
EOM
- resume_logfile();
+ $self->resume_logfile();
}
}
return;
sub scan_simple_identifier {
+ my $self = shift;
+
# This is a wrapper for sub scan_identifier. It does a fast preliminary
# scan for certain common identifiers:
# '$var', '@var', %var, *var, &var, '@{...}', '%{...}'
$type = 't';
$fast_scan_type = $type;
}
+ else {
+ ## out of tricks
+ }
}
#---------------------------
$identifier = $tok;
$context = UNKNOWN_CONTEXT;
}
+ else {
+ ## out of tricks
+ }
#--------------------------------------
# Verify correctness during development
$tok = $tok_begin;
$i = $i_begin;
- scan_identifier();
+ $self->scan_identifier();
if ( $tok ne $tok_simple
|| $type ne $fast_scan_type
|| $id_scan_state
|| $context ne $context_simple )
{
- print STDERR <<EOM;
+ print {*STDERR} <<EOM;
scan_simple_identifier differs from scan_identifier:
simple: i=$i_simple, tok=$tok_simple, type=$fast_scan_type, ident=$identifier_simple, context='$context_simple
full: i=$i, tok=$tok, type=$type, ident=$identifier, context='$context state=$id_scan_state
# call full scanner if fast method did not succeed
#-------------------------------------------------
if ( !$fast_scan_type ) {
- scan_identifier();
+ $self->scan_identifier();
}
return;
} ## end sub scan_simple_identifier
sub method_ok_here {
+ my $self = shift;
+
# Return:
# false if this is definitely an invalid method declaration
# true otherwise (even if not sure)
# TEST 1: look a valid sub NAME
if (
- $input_line =~ m/\G\s*
+ $input_line =~ m{\G\s*
((?:\w*(?:'|::))*) # package - something that ends in :: or '
(\w+) # NAME - required
- /gcx
+ }gcx
)
{
# For possible future use..
- my $subname = $2;
- my $package = $1 ? $1 : EMPTY_STRING;
+ ##my $subname = $2;
+ ##my $package = $1 ? $1 : EMPTY_STRING;
}
else {
return;
my $next_char = EMPTY_STRING;
if ( $input_line =~ m/\s*(\S)/gcx ) { $next_char = $1 }
if ( !$next_char || $next_char eq '#' ) {
- ( $next_char, my $i_next ) =
- find_next_nonblank_token( $max_token_index,
+ ( $next_char, my $i_next_uu ) =
+ $self->find_next_nonblank_token( $max_token_index,
$rtokens, $max_token_index );
}
sub class_ok_here {
+ my $self = shift;
+
# Return:
# false if this is definitely an invalid class declaration
# true otherwise (even if not sure)
#
# class ExtendsBasicAttributes is BasicAttributes{
- # TEST 1: class stmt can only go where a new statment can start
+ # TEST 1: class stmt can only go where a new statement can start
if ( !new_statement_ok() ) { return }
my $i_beg = $i + 1;
# TEST 2: look for a valid NAME
if (
- $input_line =~ m/\G\s*
+ $input_line =~ m{\G\s*
((?:\w*(?:'|::))*) # package - something that ends in :: or '
(\w+) # NAME - required
- /gcx
+ }gcx
)
{
# For possible future use..
- my $subname = $2;
- my $package = $1 ? $1 : EMPTY_STRING;
+ ##my $subname = $2;
+ ##my $package = $1 ? $1 : EMPTY_STRING;
}
else {
return;
my $next_char = EMPTY_STRING;
if ( $input_line =~ m/\s*(\S)/gcx ) { $next_char = $1 }
if ( !$next_char || $next_char eq '#' ) {
- ( $next_char, my $i_next ) =
- find_next_nonblank_token( $max_token_index,
+ ( $next_char, my $i_next_uu ) =
+ $self->find_next_nonblank_token( $max_token_index,
$rtokens, $max_token_index );
}
if ( !$next_char ) {
} ## end sub class_ok_here
sub scan_id {
- ( $i, $tok, $type, $id_scan_state ) =
- scan_id_do( $input_line, $i, $tok, $rtokens, $rtoken_map,
- $id_scan_state, $max_token_index );
+ my $self = shift;
+ ( $i, $tok, $type, $id_scan_state ) = $self->scan_id_do(
+
+ $input_line,
+ $i, $tok,
+ $rtokens,
+ $rtoken_map,
+ $id_scan_state,
+ $max_token_index
+ );
return;
} ## end sub scan_id
sub scan_number {
+ my $self = shift;
my $number;
( $i, $type, $number ) =
- scan_number_do( $input_line, $i, $rtoken_map, $type,
+ $self->scan_number_do( $input_line, $i, $rtoken_map, $type,
$max_token_index );
return $number;
} ## end sub scan_number
sub scan_number_fast {
+ my $self = shift;
+
# This is a wrapper for sub scan_number. It does a fast preliminary
# scan for a simple integer. It calls the original scan_number if it
# does not find one.
$tok = $tok_begin;
$i = $i_begin;
- $number = scan_number();
+ $number = $self->scan_number();
if ( $type ne $type_simple
|| ( $i != $i_simple && $i <= $max_token_index )
|| $number ne $number_simple )
{
- print STDERR <<EOM;
+ print {*STDERR} <<EOM;
scan_number_fast differs from scan_number:
simple: i=$i_simple, type=$type_simple, number=$number_simple
full: i=$i, type=$type, number=$number
# call full scanner if may not be integer
#----------------------------------------
if ( !defined($number) ) {
- $number = scan_number();
+ $number = $self->scan_number();
}
return $number;
} ## end sub scan_number_fast
# a sub to warn if token found where term expected
sub error_if_expecting_TERM {
+ my $self = shift;
if ( $expecting == TERM ) {
if ( $really_want_term{$last_nonblank_type} ) {
- report_unexpected( $tok, "term", $i_tok, $last_nonblank_i,
- $rtoken_map, $rtoken_type, $input_line );
+ $self->report_unexpected(
+ {
+ found => $tok,
+ expecting => "term",
+ i_tok => $i_tok,
+ last_nonblank_i => $last_nonblank_i,
+ rpretoken_map => $rtoken_map,
+ rpretoken_type => $rtoken_type,
+ input_line => $input_line,
+ }
+ );
return 1;
}
}
# a sub to warn if token found where operator expected
sub error_if_expecting_OPERATOR {
- my $thing = shift;
+
+ my ( $self, ($thing) ) = @_;
+
+ # Issue warning on error if expecting operator
+ # Given:
+ # $thing = the unexpected token or issue
+ # = undef to use current pre-token
+
if ( $expecting == OPERATOR ) {
if ( !defined($thing) ) { $thing = $tok }
- report_unexpected( $thing, "operator", $i_tok, $last_nonblank_i,
- $rtoken_map, $rtoken_type, $input_line );
+ $self->report_unexpected(
+ {
+ found => $thing,
+ expecting => "operator",
+ i_tok => $i_tok,
+ last_nonblank_i => $last_nonblank_i,
+ rpretoken_map => $rtoken_map,
+ rpretoken_type => $rtoken_type,
+ input_line => $input_line,
+ }
+ );
if ( $i_tok == 0 ) {
- interrupt_logfile();
- warning("Missing ';' or ',' above?\n");
- resume_logfile();
+ $self->interrupt_logfile();
+ $self->warning("Missing ';' or ',' above?\n");
+ $self->resume_logfile();
}
return 1;
}
#------------------
sub do_GREATER_THAN_SIGN {
+ my $self = shift;
+
# '>'
- error_if_expecting_TERM()
+ $self->error_if_expecting_TERM()
if ( $expecting == TERM );
return;
} ## end sub do_GREATER_THAN_SIGN
sub do_VERTICAL_LINE {
+ my $self = shift;
+
# '|'
- error_if_expecting_TERM()
+ $self->error_if_expecting_TERM()
if ( $expecting == TERM );
return;
} ## end sub do_VERTICAL_LINE
+ # An identifier in possible indirect object location followed by any of
+ # these tokens: -> , ; } (plus others) is not an indirect object. Fix c257.
+ my %Z_test_hash;
+
+ BEGIN {
+ my @qZ = qw#
+ -> ; } ) ]
+ => =~ = == !~ || >= != *= .. && |= .= -= += <= %=
+ ^= &&= ||= //= <=>
+ #;
+ push @qZ, ',';
+ @Z_test_hash{@qZ} = (1) x scalar(@qZ);
+ }
+
sub do_DOLLAR_SIGN {
+ my $self = shift;
+
# '$'
# start looking for a scalar
- error_if_expecting_OPERATOR("Scalar")
+ $self->error_if_expecting_OPERATOR("Scalar")
if ( $expecting == OPERATOR );
- scan_simple_identifier();
+ $self->scan_simple_identifier();
if ( $identifier eq '$^W' ) {
- $tokenizer_self->[_saw_perl_dash_w_] = 1;
+ $self->[_saw_perl_dash_w_] = 1;
}
# Check for identifier in indirect object slot
$is_indirect_object_taker{$last_nonblank_token}
&& $last_nonblank_type eq 'k'
|| ( ( $last_nonblank_token eq '(' )
- && $is_indirect_object_taker{ $paren_type[$paren_depth] } )
+ && $is_indirect_object_taker{ $rparen_type->[$paren_depth] } )
|| ( $last_nonblank_type eq 'w'
|| $last_nonblank_type eq 'U' ) # possible object
)
{
# An identifier followed by '->' is not indirect object;
- # fixes b1175, b1176
- my ( $next_nonblank_type, $i_next ) =
- find_next_noncomment_type( $i, $rtokens, $max_token_index );
- $type = 'Z' if ( $next_nonblank_type ne '->' );
+ # fixes b1175, b1176. Fix c257: Likewise for other tokens like
+ # comma, semicolon, closing brace, and single space.
+ my ( $next_nonblank_token, $i_next_uu ) =
+ $self->find_next_noncomment_token( $i, $rtokens,
+ $max_token_index );
+ $type = 'Z' if ( !$Z_test_hash{$next_nonblank_token} );
}
return;
} ## end sub do_DOLLAR_SIGN
sub do_LEFT_PARENTHESIS {
+ my $self = shift;
+
# '('
++$paren_depth;
- $paren_semicolon_count[$paren_depth] = 0;
+
+ # variable to enable check for brace after closing paren (c230)
+ my $want_brace = EMPTY_STRING;
+
if ($want_paren) {
$container_type = $want_paren;
+ $want_brace = $want_paren;
$want_paren = EMPTY_STRING;
}
- elsif ( $statement_type =~ /^sub\b/ ) {
+ elsif ( substr( $statement_type, 0, 3 ) eq 'sub'
+ && $statement_type =~ /^sub\b/ )
+ {
$container_type = $statement_type;
}
else {
# NOTE: at present, braces in something like &{ xxx }
# are not marked as a block, we might have a method call.
# Added ')' to fix case c017, something like ()()()
- && $last_nonblank_token !~ /^([\]\}\)\&]|\-\>)/
+ && $last_nonblank_token !~ /^(?:[\]\}\)\&]|\-\>)/
)
{
# ref: camel 3 p 703.
if ( $last_last_nonblank_token eq 'do' ) {
- complain(
+ $self->complain(
"do SUBROUTINE is deprecated; consider & or -> notation\n"
);
}
# if this is an empty list, (), then it is not an
# error; for example, we might have a constant pi and
# invoke it with pi() or just pi;
- my ( $next_nonblank_token, $i_next ) =
- find_next_nonblank_token( $i, $rtokens,
+ my ( $next_nonblank_token, $i_next_uu ) =
+ $self->find_next_nonblank_token( $i, $rtokens,
$max_token_index );
# Patch for c029: give up error check if
{
my $hint;
- error_if_expecting_OPERATOR('(');
+ $self->error_if_expecting_OPERATOR('(');
if ( $last_nonblank_type eq 'C' ) {
$hint =
"Do you mean '$last_nonblank_token->(' ?\n";
}
}
+ else {
+ ## no hint
+ }
if ($hint) {
- interrupt_logfile();
- warning($hint);
- resume_logfile();
+ $self->interrupt_logfile();
+ $self->warning($hint);
+ $self->resume_logfile();
}
} ## end if ( $next_nonblank_token...
} ## end else [ if ( $last_last_nonblank_token...
} ## end if ( $expecting == OPERATOR...
}
- # Do not update container type at ') ('; fix for git #105. This will
- # propagate the container type onward so that any subsequent brace gets
- # correctly marked. I have implemented this as a general rule, which
- # should be safe, but if necessary it could be restricted to certain
- # container statement types such as 'for'.
- $paren_type[$paren_depth] = $container_type
- if ( $last_nonblank_token ne ')' );
-
( $type_sequence, $indent_flag ) =
- increase_nesting_depth( PAREN, $rtoken_map->[$i_tok] );
+ $self->increase_nesting_depth( PAREN, $rtoken_map->[$i_tok] );
# propagate types down through nested parens
# for example: the second paren in 'if ((' would be structural
}
if ( $last_nonblank_type eq ')' ) {
- warning(
+ $self->warning(
"Syntax error? found token '$last_nonblank_type' then '('\n");
}
- $paren_structural_type[$paren_depth] = $type;
+
+ # git #105: Copy container type and want-brace flag at ') (';
+ # propagate the container type onward so that any subsequent brace gets
+ # correctly marked. I have implemented this as a general rule, which
+ # should be safe, but if necessary it could be restricted to certain
+ # container statement types such as 'for'.
+ if ( $last_nonblank_token eq ')' ) {
+ my $rvars = $rparen_vars->[$paren_depth];
+ if ( defined($rvars) ) {
+ $container_type = $rparen_type->[$paren_depth];
+ ( my $type_lp_uu, $want_brace ) = @{$rvars};
+ }
+ }
+
+ $rparen_type->[$paren_depth] = $container_type;
+ $rparen_vars->[$paren_depth] = [ $type, $want_brace ];
+ $rparen_semicolon_count->[$paren_depth] = 0;
+
return;
} ## end sub do_LEFT_PARENTHESIS
sub do_RIGHT_PARENTHESIS {
+ my $self = shift;
+
# ')'
( $type_sequence, $indent_flag ) =
- decrease_nesting_depth( PAREN, $rtoken_map->[$i_tok] );
+ $self->decrease_nesting_depth( PAREN, $rtoken_map->[$i_tok] );
- if ( $paren_structural_type[$paren_depth] eq '{' ) {
- $type = '}';
+ my $rvars = $rparen_vars->[$paren_depth];
+ if ( defined($rvars) ) {
+ my ( $type_lp, $want_brace_uu ) = @{$rvars};
+ if ( $type_lp && $type_lp eq '{' ) {
+ $type = '}';
+ }
}
- $container_type = $paren_type[$paren_depth];
+ $container_type = $rparen_type->[$paren_depth];
# restore statement type as 'sub' at closing paren of a signature
# so that a subsequent ':' is identified as an attribute
- if ( $container_type =~ /^sub\b/ ) {
+ if ( substr( $container_type, 0, 3 ) eq 'sub'
+ && $container_type =~ /^sub\b/ )
+ {
$statement_type = $container_type;
}
- # /^(for|foreach)$/
- if ( $is_for_foreach{ $paren_type[$paren_depth] } ) {
- my $num_sc = $paren_semicolon_count[$paren_depth];
+ if ( $is_for_foreach{ $rparen_type->[$paren_depth] } ) {
+ my $num_sc = $rparen_semicolon_count->[$paren_depth];
if ( $num_sc > 0 && $num_sc != 2 ) {
- warning("Expected 2 ';' in 'for(;;)' but saw $num_sc\n");
+ $self->warning("Expected 2 ';' in 'for(;;)' but saw $num_sc\n");
}
}
sub do_COMMA {
+ my $self = shift;
+
# ','
if ( $last_nonblank_type eq ',' ) {
- complain("Repeated ','s \n");
+ $self->complain("Repeated ','s \n");
}
# Note that we have to check both token and type here because a
# comma following a qw list can have last token='(' but type = 'q'
elsif ( $last_nonblank_token eq '(' && $last_nonblank_type eq '{' ) {
- warning("Unexpected leading ',' after a '('\n");
+ $self->warning("Unexpected leading ',' after a '('\n");
+ }
+ else {
+ # no complaints about the comma
}
# patch for operator_expected: note if we are in the list (use.t)
sub do_SEMICOLON {
+ my $self = shift;
+
# ';'
$context = UNKNOWN_CONTEXT;
$statement_type = EMPTY_STRING;
$want_paren = EMPTY_STRING;
- # /^(for|foreach)$/
- if ( $is_for_foreach{ $paren_type[$paren_depth] } )
+ if ( $is_for_foreach{ $rparen_type->[$paren_depth] } )
{ # mark ; in for loop
# Be careful: we do not want a semicolon such as the
#
# for (sort {strcoll($a,$b);} keys %investments) {
- if ( $brace_depth == $depth_array[PAREN][BRACE][$paren_depth]
+ if ( $brace_depth == $rdepth_array->[PAREN][BRACE][$paren_depth]
&& $square_bracket_depth ==
- $depth_array[PAREN][SQUARE_BRACKET][$paren_depth] )
+ $rdepth_array->[PAREN][SQUARE_BRACKET][$paren_depth] )
{
$type = 'f';
- $paren_semicolon_count[$paren_depth]++;
+ $rparen_semicolon_count->[$paren_depth]++;
}
}
return;
sub do_QUOTATION_MARK {
+ my $self = shift;
+
# '"'
- error_if_expecting_OPERATOR("String")
+ $self->error_if_expecting_OPERATOR("String")
if ( $expecting == OPERATOR );
$in_quote = 1;
$type = 'Q';
sub do_APOSTROPHE {
+ my $self = shift;
+
# "'"
- error_if_expecting_OPERATOR("String")
+ $self->error_if_expecting_OPERATOR("String")
if ( $expecting == OPERATOR );
$in_quote = 1;
$type = 'Q';
sub do_BACKTICK {
+ my $self = shift;
+
# '`'
- error_if_expecting_OPERATOR("String")
+ $self->error_if_expecting_OPERATOR("String")
if ( $expecting == OPERATOR );
$in_quote = 1;
$type = 'Q';
sub do_SLASH {
+ my $self = shift;
+
# '/'
my $is_pattern;
elsif ( $expecting == UNKNOWN ) { # indeterminate, must guess..
my $msg;
( $is_pattern, $msg ) =
- guess_if_pattern_or_division( $i, $rtokens, $rtoken_map,
- $max_token_index );
+ $self->guess_if_pattern_or_division( $i, $rtokens, $rtoken_type,
+ $rtoken_map, $max_token_index );
if ($msg) {
- write_diagnostics("DIVIDE:$msg\n");
- write_logfile_entry($msg);
+ $self->write_diagnostics("DIVIDE:$msg\n");
+ $self->write_logfile_entry($msg);
}
}
else { $is_pattern = ( $expecting == TERM ) }
if ($is_pattern) {
$in_quote = 1;
$type = 'Q';
- $allowed_quote_modifiers = '[msixpodualngc]';
+ $allowed_quote_modifiers = $quote_modifiers{'m'};
}
else { # not a pattern; check for a /= token
$type = $tok;
}
- #DEBUG - collecting info on what tokens follow a divide
- # for development of guessing algorithm
- #if ( is_possible_numerator( $i, $rtokens, $max_token_index ) < 0 ) {
- # #write_diagnostics( "DIVIDE? $input_line\n" );
- #}
+ #DEBUG - collecting info on what tokens follow a divide
+ # for development of guessing algorithm
+ ## if (
+ ## $self->is_possible_numerator( $i, $rtokens,
+ ## $max_token_index ) < 0
+ ## )
+ ## {
+ ## $self->write_diagnostics("DIVIDE? $input_line\n");
+ ## }
}
return;
} ## end sub do_SLASH
sub do_LEFT_CURLY_BRACKET {
+ my $self = shift;
+
# '{'
# if we just saw a ')', we will label this block with
# its type. We need to do this to allow sub
$container_type = EMPTY_STRING;
# ATTRS: for a '{' following an attribute list, reset
- # things to look like we just saw the sub name
+ # things to look like we just saw a sub name
# Added 'package' (can be 'class') for --use-feature=class (rt145706)
- if ( $statement_type =~ /^(sub|package)\b/ ) {
+ if ( substr( $statement_type, 0, 3 ) eq 'sub' ) {
+ $last_nonblank_token = $statement_type;
+ $last_nonblank_type = 'S'; # c250 change
+ $statement_type = EMPTY_STRING;
+ }
+ elsif ( substr( $statement_type, 0, 7 ) eq 'package' ) {
$last_nonblank_token = $statement_type;
- $last_nonblank_type = 'i';
+ $last_nonblank_type = 'P'; # c250 change
$statement_type = EMPTY_STRING;
}
}
elsif ( $last_nonblank_token eq ')' ) {
- $last_nonblank_token = $paren_type[ $paren_depth + 1 ];
+ $last_nonblank_token = $rparen_type->[ $paren_depth + 1 ];
# defensive move in case of a nesting error (pbug.t)
# in which this ')' had no previous '('
$last_nonblank_token = 'if';
}
- # check for syntax error here;
- unless ( $is_blocktype_with_paren{$last_nonblank_token} ) {
- if ( $tokenizer_self->[_extended_syntax_] ) {
+ # Syntax check at '){'
+ if ( $is_blocktype_with_paren{$last_nonblank_token} ) {
+
+ my $rvars = $rparen_vars->[ $paren_depth + 1 ];
+ if ( defined($rvars) ) {
+ my ( $type_lp_uu, $want_brace ) = @{$rvars};
+
+ # OLD: Now verify that this is not a trailing form
+ # FIX for git #124: we have to skip this check because
+ # the 'gather' keyword of List::Gather can operate on
+ # a full statement, so it isn't possible to be sure
+ # this is a trailing form.
+ if ( 0 && !$want_brace ) {
+ $self->warning(
+"syntax error at ') {', unexpected '{' after closing ')' of a trailing '$last_nonblank_token'\n"
+ );
+ }
+ }
+ }
+ else {
+ if ($rOpts_extended_syntax) {
# we append a trailing () to mark this as an unknown
# block type. This allows perltidy to format some
else {
my $list =
join( SPACE, sort keys %is_blocktype_with_paren );
- warning(
+ $self->warning(
"syntax error at ') {', didn't see one of: <<$list>>; If this code is okay try using the -xs flag\n"
);
}
{
$last_nonblank_token = $want_paren;
if ( $last_last_nonblank_token eq $want_paren ) {
- warning(
+ $self->warning(
"syntax error at '$want_paren .. {' -- missing \$ loop variable\n"
);
}
$want_paren = EMPTY_STRING;
}
+ else {
+ # not special
+ }
# now identify which of the three possible types of
# curly braces we have: hash index container, anonymous
# which will be blank for an anonymous hash
else {
- $block_type = code_block_type( $i_tok, $rtokens, $rtoken_type,
+ $block_type =
+ $self->code_block_type( $i_tok, $rtokens, $rtoken_type,
$max_token_index );
# patch to promote bareword type to function taking block
}
}
- $brace_type[ ++$brace_depth ] = $block_type;
+ $rbrace_type->[ ++$brace_depth ] = $block_type;
# Patch for CLASS BLOCK definitions: do not update the package for the
# current depth if this is a BLOCK type definition.
# TODO: should make 'class' separate from 'package' and only do
# this for 'class'
- $brace_package[$brace_depth] = $current_package
+ $rbrace_package->[$brace_depth] = $current_package
if ( substr( $block_type, 0, 8 ) ne 'package ' );
- $brace_structural_type[$brace_depth] = $type;
- $brace_context[$brace_depth] = $context;
+ $rbrace_structural_type->[$brace_depth] = $type;
+ $rbrace_context->[$brace_depth] = $context;
( $type_sequence, $indent_flag ) =
- increase_nesting_depth( BRACE, $rtoken_map->[$i_tok] );
+ $self->increase_nesting_depth( BRACE, $rtoken_map->[$i_tok] );
+
return;
} ## end sub do_LEFT_CURLY_BRACKET
sub do_RIGHT_CURLY_BRACKET {
+ my $self = shift;
+
# '}'
- $block_type = $brace_type[$brace_depth];
+ $block_type = $rbrace_type->[$brace_depth];
if ($block_type) { $statement_type = EMPTY_STRING }
- if ( defined( $brace_package[$brace_depth] ) ) {
- $current_package = $brace_package[$brace_depth];
+ if ( defined( $rbrace_package->[$brace_depth] ) ) {
+ $current_package = $rbrace_package->[$brace_depth];
}
# can happen on brace error (caught elsewhere)
else {
}
( $type_sequence, $indent_flag ) =
- decrease_nesting_depth( BRACE, $rtoken_map->[$i_tok] );
+ $self->decrease_nesting_depth( BRACE, $rtoken_map->[$i_tok] );
- if ( $brace_structural_type[$brace_depth] eq 'L' ) {
+ if ( $rbrace_structural_type->[$brace_depth] eq 'L' ) {
$type = 'R';
}
$tok = $block_type;
}
- $context = $brace_context[$brace_depth];
+ # pop non-indenting brace stack if sequence number matches
+ if ( @{ $self->[_rnon_indenting_brace_stack_] }
+ && $self->[_rnon_indenting_brace_stack_]->[-1] eq $type_sequence )
+ {
+ pop @{ $self->[_rnon_indenting_brace_stack_] };
+ }
+
+ $context = $rbrace_context->[$brace_depth];
if ( $brace_depth > 0 ) { $brace_depth--; }
return;
} ## end sub do_RIGHT_CURLY_BRACKET
sub do_AMPERSAND {
+ my $self = shift;
+
# '&' = maybe sub call? start looking
# We have to check for sub call unless we are sure we
# are expecting an operator. This example from s2p
# For example we probably don't want & as sub call here:
# Fcntl::S_IRUSR & $mode;
if ( $expecting == TERM || $next_type ne 'b' ) {
- scan_simple_identifier();
+ $self->scan_simple_identifier();
}
}
else {
sub do_LESS_THAN_SIGN {
+ my $self = shift;
+
# '<' - angle operator or less than?
if ( $expecting != OPERATOR ) {
( $i, $type ) =
- find_angle_operator_termination( $input_line, $i, $rtoken_map,
- $expecting, $max_token_index );
-
- ## This message is not very helpful and quite confusing if the above
- ## routine decided not to write a message with the line number.
- ## if ( $type eq '<' && $expecting == TERM ) {
- ## error_if_expecting_TERM();
- ## interrupt_logfile();
- ## warning("Unterminated <> operator?\n");
- ## resume_logfile();
- ## }
-
+ $self->find_angle_operator_termination( $input_line, $i,
+ $rtoken_map, $expecting, $max_token_index );
}
else {
}
sub do_QUESTION_MARK {
+ my $self = shift;
+
# '?' = conditional or starting pattern?
my $is_pattern;
# /(.*)/ && (print $1,"\n");
my $msg;
( $is_pattern, $msg ) =
- guess_if_pattern_or_conditional( $i, $rtokens, $rtoken_map,
- $max_token_index );
+ $self->guess_if_pattern_or_conditional( $i, $rtokens,
+ $rtoken_type, $rtoken_map, $max_token_index );
- if ($msg) { write_logfile_entry($msg) }
+ if ($msg) { $self->write_logfile_entry($msg) }
}
else { $is_pattern = ( $expecting == TERM ) }
if ($is_pattern) {
$in_quote = 1;
$type = 'Q';
- $allowed_quote_modifiers = '[msixpodualngc]';
+ $allowed_quote_modifiers = $quote_modifiers{'m'};
}
else {
( $type_sequence, $indent_flag ) =
- increase_nesting_depth( QUESTION_COLON, $rtoken_map->[$i_tok] );
+ $self->increase_nesting_depth( QUESTION_COLON,
+ $rtoken_map->[$i_tok] );
}
return;
} ## end sub do_QUESTION_MARK
sub do_STAR {
+ my $self = shift;
+
# '*' = typeglob, or multiply?
- if ( $expecting == UNKNOWN && $last_nonblank_type eq 'Z' ) {
- if ( $next_type ne 'b'
+ if ( $expecting == UNKNOWN ) {
+ if ( $last_nonblank_type eq 'Z'
+ && $next_type ne 'b'
&& $next_type ne '('
&& $next_type ne '#' ) # Fix c036
{
$expecting = TERM;
}
}
+
if ( $expecting == TERM ) {
- scan_simple_identifier();
+ $self->scan_simple_identifier();
}
else {
$i++;
}
}
+ else {
+ ## not multiple characters
+ }
}
return;
} ## end sub do_STAR
sub do_DOT {
+ my $self = shift;
+
# '.' = what kind of . ?
if ( $expecting != OPERATOR ) {
- scan_number();
+ $self->scan_number();
if ( $type eq '.' ) {
- error_if_expecting_TERM()
+ $self->error_if_expecting_TERM()
if ( $expecting == TERM );
}
}
sub do_COLON {
+ my $self = shift;
+
# ':' = label, ternary, attribute, ?
# if this is the first nonblank character, call it a label
# either after a 'sub' keyword or within a paren list
# Added 'package' (can be 'class') for --use-feature=class (rt145706)
elsif ( $statement_type =~ /^(sub|package)\b/ ) {
- $type = 'A';
- $in_attribute_list = 1;
+ $type = 'A';
+ $self->[_in_attribute_list_] = 1;
}
# Within a signature, unless we are in a ternary. For example,
# from 't/filter_example.t':
# method foo4 ( $class: $bar ) { $class->bar($bar) }
- elsif ( $paren_type[$paren_depth] =~ /^sub\b/
+ elsif ( $rparen_type->[$paren_depth] =~ /^sub\b/
&& !is_balanced_closing_container(QUESTION_COLON) )
{
- $type = 'A';
- $in_attribute_list = 1;
+ $type = 'A';
+ $self->[_in_attribute_list_] = 1;
}
# check for scalar attribute, such as
# my $foo : shared = 1;
elsif ($is_my_our_state{$statement_type}
- && $current_depth[QUESTION_COLON] == 0 )
+ && $rcurrent_depth->[QUESTION_COLON] == 0 )
{
- $type = 'A';
- $in_attribute_list = 1;
+ $type = 'A';
+ $self->[_in_attribute_list_] = 1;
}
# Look for Switch::Plain syntax if an error would otherwise occur
$type = 'J';
}
+ # mark colon as attribute if an error would occur otherwise; git #162
+ elsif ( !$rcurrent_depth->[QUESTION_COLON] ) {
+ $type = 'A';
+ $self->[_in_attribute_list_] = 1;
+ }
+
# otherwise, it should be part of a ?/: operator
else {
( $type_sequence, $indent_flag ) =
- decrease_nesting_depth( QUESTION_COLON, $rtoken_map->[$i_tok] );
+ $self->decrease_nesting_depth( QUESTION_COLON,
+ $rtoken_map->[$i_tok] );
if ( $last_nonblank_token eq '?' ) {
- warning("Syntax error near ? :\n");
+ $self->warning("Syntax error near ? :\n");
}
}
return;
sub do_PLUS_SIGN {
+ my $self = shift;
+
# '+' = what kind of plus?
if ( $expecting == TERM ) {
- my $number = scan_number_fast();
+ my $number = $self->scan_number_fast();
# unary plus is safest assumption if not a number
if ( !defined($number) ) { $type = 'p'; }
sub do_AT_SIGN {
+ my $self = shift;
+
# '@' = sigil for array?
- error_if_expecting_OPERATOR("Array")
+ $self->error_if_expecting_OPERATOR("Array")
if ( $expecting == OPERATOR );
- scan_simple_identifier();
+ $self->scan_simple_identifier();
return;
} ## end sub do_AT_SIGN
sub do_PERCENT_SIGN {
+ my $self = shift;
+
# '%' = hash or modulo?
# first guess is hash if no following blank or paren
if ( $expecting == UNKNOWN ) {
}
}
if ( $expecting == TERM ) {
- scan_simple_identifier();
+ $self->scan_simple_identifier();
}
return;
} ## end sub do_PERCENT_SIGN
sub do_LEFT_SQUARE_BRACKET {
+ my $self = shift;
+
# '['
- $square_bracket_type[ ++$square_bracket_depth ] = $last_nonblank_token;
+ $rsquare_bracket_type->[ ++$square_bracket_depth ] =
+ $last_nonblank_token;
( $type_sequence, $indent_flag ) =
- increase_nesting_depth( SQUARE_BRACKET, $rtoken_map->[$i_tok] );
+ $self->increase_nesting_depth( SQUARE_BRACKET,
+ $rtoken_map->[$i_tok] );
# It may seem odd, but structural square brackets have
# type '{' and '}'. This simplifies the indentation logic.
if ( !is_non_structural_brace() ) {
$type = '{';
}
- $square_bracket_structural_type[$square_bracket_depth] = $type;
+ $rsquare_bracket_structural_type->[$square_bracket_depth] = $type;
return;
} ## end sub do_LEFT_SQUARE_BRACKET
sub do_RIGHT_SQUARE_BRACKET {
+ my $self = shift;
+
# ']'
( $type_sequence, $indent_flag ) =
- decrease_nesting_depth( SQUARE_BRACKET, $rtoken_map->[$i_tok] );
+ $self->decrease_nesting_depth( SQUARE_BRACKET,
+ $rtoken_map->[$i_tok] );
- if ( $square_bracket_structural_type[$square_bracket_depth] eq '{' ) {
+ if ( $rsquare_bracket_structural_type->[$square_bracket_depth] eq '{' )
+ {
$type = '}';
}
# propagate type information for smartmatch operator. This is
# necessary to enable us to know if an operator or term is expected
# next.
- if ( $square_bracket_type[$square_bracket_depth] eq '~~' ) {
- $tok = $square_bracket_type[$square_bracket_depth];
+ if ( $rsquare_bracket_type->[$square_bracket_depth] eq '~~' ) {
+ $tok = $rsquare_bracket_type->[$square_bracket_depth];
}
if ( $square_bracket_depth > 0 ) { $square_bracket_depth--; }
sub do_MINUS_SIGN {
+ my $self = shift;
+
# '-' = what kind of minus?
if ( ( $expecting != OPERATOR )
&& $is_file_test_operator{$next_tok} )
{
- my ( $next_nonblank_token, $i_next ) =
- find_next_nonblank_token( $i + 1, $rtokens, $max_token_index );
+ my ( $next_nonblank_token, $i_next_uu ) =
+ $self->find_next_nonblank_token( $i + 1, $rtokens,
+ $max_token_index );
# check for a quoted word like "-w=>xx";
# it is sufficient to just check for a following '='
}
}
elsif ( $expecting == TERM ) {
- my $number = scan_number_fast();
+ my $number = $self->scan_number_fast();
# maybe part of bareword token? unary is safest
if ( !defined($number) ) { $type = 'm'; }
elsif ( $expecting == OPERATOR ) {
}
else {
-
if ( $next_type eq 'w' ) {
$type = 'm';
}
sub do_CARAT_SIGN {
+ my $self = shift;
+
# '^'
# check for special variables like ${^WARNING_BITS}
if ( $expecting == TERM ) {
{
if ( $next_tok eq 'W' ) {
- $tokenizer_self->[_saw_perl_dash_w_] = 1;
+ $self->[_saw_perl_dash_w_] = 1;
}
$tok = $tok . $next_tok;
$i = $i + 1;
&& $last_last_nonblank_type ne 'Z'
&& $last_last_nonblank_token ne '$#' )
{
- warning("Possible syntax error near '{^'\n");
+ $self->warning("Possible syntax error near '{^'\n");
}
}
else {
- unless ( error_if_expecting_TERM() ) {
+ if ( !$self->error_if_expecting_TERM() ) {
# Something like this is valid but strange:
# undef ^I;
- complain("The '^' seems unusual here\n");
+ $self->complain("The '^' seems unusual here\n");
}
}
}
sub do_DOUBLE_COLON {
+ my $self = shift;
+
# '::' = probably a sub call
- scan_bare_identifier();
+ $self->scan_bare_identifier();
return;
} ## end sub do_DOUBLE_COLON
sub do_LEFT_SHIFT {
- # '<<' = maybe a here-doc?
-
-## This check removed because it could be a deprecated here-doc with
-## no specified target. See example in log 16 Sep 2020.
-## return
-## unless ( $i < $max_token_index )
-## ; # here-doc not possible if end of line
+ my $self = shift;
+ # '<<' = maybe a here-doc?
if ( $expecting != OPERATOR ) {
my ( $found_target, $here_doc_target, $here_quote_character,
$saw_error );
$found_target, $here_doc_target, $here_quote_character, $i,
$saw_error
)
- = find_here_doc( $expecting, $i, $rtokens, $rtoken_map,
- $max_token_index );
+ = $self->find_here_doc( $expecting, $i, $rtokens, $rtoken_type,
+ $rtoken_map, $max_token_index );
if ($found_target) {
push @{$rhere_target_list},
$type = 'h';
if ( length($here_doc_target) > 80 ) {
my $truncated = substr( $here_doc_target, 0, 80 );
- complain("Long here-target: '$truncated' ...\n");
+ $self->complain("Long here-target: '$truncated' ...\n");
}
elsif ( !$here_doc_target ) {
- warning(
+ $self->warning(
'Use of bare << to mean <<"" is deprecated' . "\n" )
- unless ($here_quote_character);
+ if ( !$here_quote_character );
}
elsif ( $here_doc_target !~ /^[A-Z_]\w+$/ ) {
- complain(
+ $self->complain(
"Unconventional here-target: '$here_doc_target'\n");
}
+ else {
+ # nothing to complain about
+ }
}
elsif ( $expecting == TERM ) {
- unless ($saw_error) {
+ if ( !$saw_error ) {
# shouldn't happen..arriving here implies an error in
# the logic in sub 'find_here_doc'
if (DEVEL_MODE) {
- Fault(<<EOM);
+ $self->Fault(<<EOM);
Program bug; didn't find here doc target
EOM
}
- warning(
+ $self->warning(
"Possible program error: didn't find here doc target\n"
);
- report_definite_bug();
+ $self->report_definite_bug();
}
}
+
+ # target not found, expecting == UNKNOWN
+ else {
+ # assume it is a shift
+ }
}
else {
}
sub do_NEW_HERE_DOC {
# '<<~' = a here-doc, new type added in v26
+
+ my $self = shift;
+
return
- unless ( $i < $max_token_index )
- ; # here-doc not possible if end of line
+ if ( $i >= $max_token_index ); # here-doc not possible if end of line
if ( $expecting != OPERATOR ) {
my ( $found_target, $here_doc_target, $here_quote_character,
$saw_error );
$found_target, $here_doc_target, $here_quote_character, $i,
$saw_error
)
- = find_here_doc( $expecting, $i, $rtokens, $rtoken_map,
- $max_token_index );
+ = $self->find_here_doc( $expecting, $i, $rtokens, $rtoken_type,
+ $rtoken_map, $max_token_index );
if ($found_target) {
if ( length($here_doc_target) > 80 ) {
my $truncated = substr( $here_doc_target, 0, 80 );
- complain("Long here-target: '$truncated' ...\n");
+ $self->complain("Long here-target: '$truncated' ...\n");
}
elsif ( $here_doc_target !~ /^[A-Z_]\w+$/ ) {
- complain(
+ $self->complain(
"Unconventional here-target: '$here_doc_target'\n");
}
+ else {
+ # nothing to complain about
+ }
# Note that we put a leading space on the here quote
# character indicate that it may be preceded by spaces
[ $here_doc_target, $here_quote_character ];
$type = 'h';
}
+
+ # target not found ..
elsif ( $expecting == TERM ) {
- unless ($saw_error) {
+ if ( !$saw_error ) {
# shouldn't happen..arriving here implies an error in
# the logic in sub 'find_here_doc'
if (DEVEL_MODE) {
- Fault(<<EOM);
+ $self->Fault(<<EOM);
Program bug; didn't find here doc target
EOM
}
- warning(
+ $self->warning(
"Possible program error: didn't find here doc target\n"
);
- report_definite_bug();
+ $self->report_definite_bug();
}
}
+
+ # Target not found, expecting==UNKNOWN
+ else {
+ $self->warning("didn't find here doc target after '<<~'\n");
+ }
}
else {
- error_if_expecting_OPERATOR();
+ $self->error_if_expecting_OPERATOR();
}
return;
} ## end sub do_NEW_HERE_DOC
sub do_PLUS_PLUS {
+ my $self = shift;
+
# '++'
# type = 'pp' for pre-increment, '++' for post-increment
- if ( $expecting == TERM ) { $type = 'pp' }
- elsif ( $expecting == UNKNOWN ) {
+ if ( $expecting == OPERATOR ) { $type = '++' }
+ elsif ( $expecting == TERM ) { $type = 'pp' }
+
+ # handle ( $expecting == UNKNOWN )
+ else {
+ # look ahead ..
my ( $next_nonblank_token, $i_next ) =
- find_next_nonblank_token( $i, $rtokens, $max_token_index );
+ $self->find_next_nonblank_token( $i, $rtokens, $max_token_index );
# Fix for c042: look past a side comment
if ( $next_nonblank_token eq '#' ) {
( $next_nonblank_token, $i_next ) =
- find_next_nonblank_token( $max_token_index,
+ $self->find_next_nonblank_token( $max_token_index,
$rtokens, $max_token_index );
}
sub do_FAT_COMMA {
+ my $self = shift;
+
# '=>'
if ( $last_nonblank_type eq $tok ) {
- complain("Repeated '=>'s \n");
+ $self->complain("Repeated '=>'s \n");
}
# patch for operator_expected: note if we are in the list (use.t)
sub do_MINUS_MINUS {
+ my $self = shift;
+
# '--'
# type = 'mm' for pre-decrement, '--' for post-decrement
- if ( $expecting == TERM ) { $type = 'mm' }
- elsif ( $expecting == UNKNOWN ) {
+ if ( $expecting == OPERATOR ) { $type = '--' }
+ elsif ( $expecting == TERM ) { $type = 'mm' }
+
+ # handle ( $expecting == UNKNOWN )
+ else {
+
+ # look ahead ..
my ( $next_nonblank_token, $i_next ) =
- find_next_nonblank_token( $i, $rtokens, $max_token_index );
+ $self->find_next_nonblank_token( $i, $rtokens, $max_token_index );
# Fix for c042: look past a side comment
if ( $next_nonblank_token eq '#' ) {
( $next_nonblank_token, $i_next ) =
- find_next_nonblank_token( $max_token_index,
+ $self->find_next_nonblank_token( $max_token_index,
$rtokens, $max_token_index );
}
if ( $next_nonblank_token eq '$' ) { $type = 'mm' }
}
+
return;
} ## end sub do_MINUS_MINUS
sub do_LOGICAL_AND {
+ my $self = shift;
+
# '&&'
- error_if_expecting_TERM()
+ $self->error_if_expecting_TERM()
if ( $expecting == TERM && $last_nonblank_token ne ',' ); #c015
return;
} ## end sub do_LOGICAL_AND
sub do_LOGICAL_OR {
+ my $self = shift;
+
# '||'
- error_if_expecting_TERM()
+ $self->error_if_expecting_TERM()
if ( $expecting == TERM && $last_nonblank_token ne ',' ); #c015
return;
} ## end sub do_LOGICAL_OR
sub do_SLASH_SLASH {
+ my $self = shift;
+
# '//'
- error_if_expecting_TERM()
+ $self->error_if_expecting_TERM()
if ( $expecting == TERM );
return;
} ## end sub do_SLASH_SLASH
sub do_DIGITS {
+ my $self = shift;
+
# 'd' = string of digits
- error_if_expecting_OPERATOR("Number")
+ $self->error_if_expecting_OPERATOR("Number")
if ( $expecting == OPERATOR );
- my $number = scan_number_fast();
+ my $number = $self->scan_number_fast();
if ( !defined($number) ) {
# shouldn't happen - we should always get a number
if (DEVEL_MODE) {
- Fault(<<EOM);
+ $self->Fault(<<EOM);
non-number beginning with digit--program bug
EOM
}
- warning(
+ $self->warning(
"Unexpected error condition: non-number beginning with digit\n"
);
- report_definite_bug();
+ $self->report_definite_bug();
}
return;
} ## end sub do_DIGITS
sub do_ATTRIBUTE_LIST {
- my ($next_nonblank_token) = @_;
+ my ( $self, $next_nonblank_token ) = @_;
# Called at a bareword encountered while in an attribute list
# returns 'is_attribute':
# start just after the word 'prototype'
my $i_beg = $i + 1;
- ( $i, $tok, $type, $id_scan_state ) = do_scan_sub(
+ ( $i, $tok, $type, $id_scan_state ) = $self->do_scan_sub(
{
input_line => $input_line,
i => $i,
return;
} ## end sub do_ATTRIBUTE_LIST
- sub do_QUOTED_BAREWORD {
-
- # find type of a bareword followed by a '=>'
- if ( $is_constant{$current_package}{$tok} ) {
- $type = 'C';
- }
- elsif ( $is_user_function{$current_package}{$tok} ) {
- $type = 'U';
- $prototype = $user_function_prototype{$current_package}{$tok};
- }
- elsif ( $tok =~ /^v\d+$/ ) {
- $type = 'v';
- report_v_string($tok);
- }
- else {
-
- # Bareword followed by a fat comma - see 'git18.in'
- # If tok is something like 'x17' then it could
- # actually be operator x followed by number 17.
- # For example, here:
- # 123x17 => [ 792, 1224 ],
- # (a key of 123 repeated 17 times, perhaps not
- # what was intended). We will mark x17 as type
- # 'n' and it will be split. If the previous token
- # was also a bareword then it is not very clear is
- # going on. In this case we will not be sure that
- # an operator is expected, so we just mark it as a
- # bareword. Perl is a little murky in what it does
- # with stuff like this, and its behavior can change
- # over time. Something like
- # a x18 => [792, 1224], will compile as
- # a key with 18 a's. But something like
- # push @array, a x18;
- # is a syntax error.
- if (
- $expecting == OPERATOR
- && substr( $tok, 0, 1 ) eq 'x'
- && ( length($tok) == 1
- || substr( $tok, 1, 1 ) =~ /^\d/ )
- )
- {
- $type = 'n';
- if ( split_pretoken(1) ) {
- $type = 'x';
- $tok = 'x';
- }
- }
- else {
-
- # git #18
- $type = 'w';
- error_if_expecting_OPERATOR();
- }
- }
- return;
- } ## end sub do_QUOTED_BAREWORD
-
sub do_X_OPERATOR {
+ my $self = shift;
+
if ( $tok eq 'x' ) {
if ( $rtokens->[ $i + 1 ] eq '=' ) { # x=
$tok = 'x=';
# as a number, $type = 'n', and fixed downstream by the
# Formatter.
$type = 'n';
- if ( split_pretoken(1) ) {
+ if ( $self->split_pretoken(1) ) {
$type = 'x';
$tok = 'x';
}
} ## end sub do_X_OPERATOR
sub do_USE_CONSTANT {
- scan_bare_identifier();
- my ( $next_nonblank_tok2, $i_next2 ) =
- find_next_nonblank_token( $i, $rtokens, $max_token_index );
+
+ my $self = shift;
+
+ $self->scan_bare_identifier();
+ my ( $next_nonblank_tok2, $i_next2_uu ) =
+ $self->find_next_nonblank_token( $i, $rtokens, $max_token_index );
if ($next_nonblank_tok2) {
# versions of perl do not complain here, but
# the coding is retained for reference.
if ( 0 && $next_nonblank_tok2 ne 'qw' ) {
- warning(
+ $self->warning(
"Attempting to define constant '$next_nonblank_tok2' which is a perl keyword\n"
);
}
}
else {
- $is_constant{$current_package}{$next_nonblank_tok2} = 1;
+ $ris_constant->{$current_package}{$next_nonblank_tok2} = 1;
}
}
return;
sub do_KEYWORD {
+ my $self = shift;
+
# found a keyword - set any associated flags
$type = 'k';
# Since for and foreach may not be followed immediately
# by an opening paren, we have to remember which keyword
# is associated with the next '('
- if ( $is_for_foreach{$tok} ) {
+ # Previously, before update c230 : if ( $is_for_foreach{$tok} ) {
+ ##(if elsif unless while until for foreach switch case given when catch)
+ if ( $is_blocktype_with_paren{$tok} ) {
if ( new_statement_ok() ) {
$want_paren = $tok;
}
}
# recognize 'use' statements, which are special
- elsif ( $is_use_require{$tok} ) {
+ if ( $is_use_require{$tok} ) {
$statement_type = $tok;
- error_if_expecting_OPERATOR()
+ $self->error_if_expecting_OPERATOR()
if ( $expecting == OPERATOR );
}
$statement_type = $tok;
}
- # Check for misplaced 'elsif' and 'else', but allow isolated
- # else or elsif blocks to be formatted. This is indicated
- # by a last noblank token of ';'
+ # Check for unexpected 'elsif'
elsif ( $tok eq 'elsif' ) {
if (
- $last_nonblank_token ne ';'
- ## !~ /^(if|elsif|unless)$/
- && !$is_if_elsif_unless{$last_nonblank_block_type}
+ !$is_if_elsif_unless{$last_nonblank_block_type}
+
+ # Allow isolated blocks of any kind during editing
+ # by checking for a last noblank token of ';' and no
+ # sequence numbers having been issued (c272). The check
+ # on sequence number is not perfect but good enough.
+ && !(
+ $last_nonblank_token eq ';'
+ && $next_sequence_number == SEQ_ROOT + 1
+ )
+
)
{
- warning(
+ $self->warning(
"expecting '$tok' to follow one of 'if|elsif|unless'\n");
}
}
+
+ # Check for unexpected 'else'
elsif ( $tok eq 'else' ) {
# patched for SWITCH/CASE
if (
- $last_nonblank_token ne ';'
- ## !~ /^(if|elsif|unless|case|when)$/
- && !$is_if_elsif_unless_case_when{$last_nonblank_block_type}
+ !$is_if_elsif_unless_case_when{$last_nonblank_block_type}
# patch to avoid an unwanted error message for
# the case of a parenless 'case' (RT 105484):
# switch ( 1 ) { case x { 2 } else { } }
- ## !~ /^(if|elsif|unless|case|when)$/
&& !$is_if_elsif_unless_case_when{$statement_type}
- )
- {
- warning(
-"expecting '$tok' to follow one of 'if|elsif|unless|case|when'\n"
+
+ # Allow isolated blocks of any kind during editing (c272)
+ && !(
+ $last_nonblank_token eq ';'
+ && $next_sequence_number == SEQ_ROOT + 1
+ )
+
+ )
+ {
+ $self->warning(
+"expecting '$tok' to follow one of 'if|elsif|unless|case|when'\n"
);
}
}
elsif ( $tok eq 'err' ) {
if ( $expecting != OPERATOR ) { $type = 'w' }
}
+ else {
+ ## no special treatment needed
+ }
return;
} ## end sub do_KEYWORD
sub do_QUOTE_OPERATOR {
+ my $self = shift;
+
if ( $expecting == OPERATOR ) {
# Be careful not to call an error for a qw quote
# See notes in 'sub code_block_type' and
# 'sub is_non_structural_brace'
- unless (
- $tok eq 'qw'
- && ( $last_nonblank_token =~ /^([\]\}\&]|\-\>)/
- || $is_for_foreach{$want_paren} )
- )
- {
- error_if_expecting_OPERATOR();
+ my $paren_list_possible = $tok eq 'qw'
+ && ( $last_nonblank_token =~ /^([\]\}\&]|\-\>)/
+ || $is_for_foreach{$want_paren} );
+
+ if ( !$paren_list_possible ) {
+ $self->error_if_expecting_OPERATOR();
}
}
$in_quote = $quote_items{$tok};
# of leading and trailing whitespace. So they are given a
# separate type, 'q', unless requested otherwise.
$type =
- ( $tok eq 'qw' && $tokenizer_self->[_trim_qw_] )
+ ( $tok eq 'qw' && $rOpts_trim_qw )
? 'q'
: 'Q';
$quote_type = $type;
sub do_UNKNOWN_BAREWORD {
- my ($next_nonblank_token) = @_;
+ my ( $self, $next_nonblank_token ) = @_;
- scan_bare_identifier();
+ $self->scan_bare_identifier();
if ( $statement_type eq 'use'
&& $last_nonblank_token eq 'use' )
{
- $saw_use_module{$current_package}->{$tok} = 1;
+ $rsaw_use_module->{$current_package}->{$tok} = 1;
}
if ( $type eq 'w' ) {
# '*' => \&sse_mul,
# '/' => \&sse_div;
# TODO: this could eventually be generalized
- if ( $saw_use_module{$current_package}->{'RPerl'}
+ if ( $rsaw_use_module->{$current_package}->{'RPerl'}
&& $tok =~ /^sse_(mul|div|add|sub)$/ )
{
}
+ # patch for Syntax::Operator::In, git #162
+ elsif ( $tok eq 'in' && $next_nonblank_token eq ':' ) {
+
+ }
+
# Fix part 1 for git #63 in which a comment falls
# between an -> and the following word. An
# alternate fix would be to change operator_expected
#
elsif ( $last_nonblank_type eq 'C' ) {
if ( $tok !~ /::$/ ) {
- complain(<<EOM);
+ $self->complain(<<EOM);
Expecting operator after '$last_nonblank_token' but found bare word '$tok'
Maybe indirectet object notation?
EOM
}
}
else {
- error_if_expecting_OPERATOR("bareword");
+ $self->error_if_expecting_OPERATOR("bareword");
}
}
# the type the same as if the -> were not separated
elsif ( $last_nonblank_type ne '->' ) { $type = 'U' }
+ # not a special case
+ else { }
+
}
# underscore after file test operator is file handle
# patch for SWITCH/CASE if 'case' and 'when are
# not treated as keywords:
if (
- ( $tok eq 'case' && $brace_type[$brace_depth] eq 'switch' )
+ ( $tok eq 'case' && $rbrace_type->[$brace_depth] eq 'switch' )
|| ( $tok eq 'when'
- && $brace_type[$brace_depth] eq 'given' )
+ && $rbrace_type->[$brace_depth] eq 'given' )
)
{
$statement_type = $tok; # next '{' is block
elsif ( $tok eq 'x' && $last_nonblank_type eq 'w' ) {
$type = 'x';
}
+ else {
+ ## not a special case
+ }
}
}
return;
sub sub_attribute_ok_here {
- my ( $tok_kw, $next_nonblank_token, $i_next ) = @_;
+ my ( $self, $tok_kw, $next_nonblank_token, $i_next ) = @_;
+
+ # Decide if a ':' can introduce an attribute. For example,
+ # something like 'sub :'
+
+ # Given:
+ # $tok_kw = a bareword token
+ # $next_nonblank_token = a following ':' being examined
+ # $i_next = the index of the following ':'
- # Decide if 'sub :' can be the start of a sub attribute list.
- # We will decide based on if the colon is followed by a
- # bareword which is not a keyword.
- # Changed inext+1 to inext to fixed case b1190.
+ # We will decide based on if the colon is followed by a bareword
+ # which is not a keyword. Changed inext+1 to inext to fixed case
+ # b1190.
my $sub_attribute_ok_here;
if ( $is_sub{$tok_kw}
&& $expecting != OPERATOR
&& $next_nonblank_token eq ':' )
{
- my ( $nn_nonblank_token, $i_nn ) =
- find_next_nonblank_token( $i_next, $rtokens, $max_token_index );
+ my ( $nn_nonblank_token, $i_nn_uu ) =
+ $self->find_next_nonblank_token( $i_next, $rtokens,
+ $max_token_index );
$sub_attribute_ok_here =
$nn_nonblank_token =~ /^\w/
&& $nn_nonblank_token !~ /^\d/
return $sub_attribute_ok_here;
} ## end sub sub_attribute_ok_here
+ use constant DEBUG_BAREWORD => 0;
+
+ sub saw_bareword_function {
+ my ( $self, $bareword ) = @_;
+ $self->[_rbareword_info_]->{$current_package}->{$bareword}
+ ->{function_count}++;
+ return;
+ } ## end sub saw_bareword_function
+
+ sub saw_bareword_constant {
+ my ( $self, $bareword ) = @_;
+ $self->[_rbareword_info_]->{$current_package}->{$bareword}
+ ->{constant_count}++;
+ return;
+ } ## end sub saw_bareword_constant
+
+ sub get_bareword_counts {
+ my ( $self, $bareword ) = @_;
+
+ # Given:
+ # $bareword = a bareword
+ # Return:
+ # $function_count = number of times seen as function taking >0 args
+ # $constant_count = number of times seen as function taking 0 args
+ # Note:
+ # $function_count > 0 implies that a TERM should come next
+ # $constant_count > 0 implies that an OPERATOR **may** come next,
+ # but this can be incorrect if $bareword can take 0 or more args.
+ # This is used to help guess tokenization around unknown barewords.
+ my $function_count;
+ my $constant_count;
+ my $rbareword_info_tok = $self->[_rbareword_info_]->{$current_package};
+ if ($rbareword_info_tok) {
+ $rbareword_info_tok = $rbareword_info_tok->{$bareword};
+ if ($rbareword_info_tok) {
+ $function_count = $rbareword_info_tok->{function_count};
+ $constant_count = $rbareword_info_tok->{constant_count};
+
+ # a positive function count overrides a constant count
+ if ($function_count) { $constant_count = 0 }
+ }
+ }
+ if ( !defined($function_count) ) { $function_count = 0 }
+ if ( !defined($constant_count) ) { $constant_count = 0 }
+ return ( $function_count, $constant_count );
+ } ## end sub get_bareword_counts
+
+ # hashes used to help determine a bareword type
+ my %is_wiUC;
+ my %is_function_follower;
+ my %is_constant_follower;
+ my %is_use_require_no;
+
+ BEGIN {
+ my @qz = qw( w i U C );
+ @is_wiUC{@qz} = (1) x scalar(@qz);
+
+ @qz = qw( use require no );
+ @is_use_require_no{@qz} = (1) x scalar(@qz);
+
+ # These pre-token types after a bareword imply that it
+ # is not a constant, except when '(' is followed by ')'.
+ @qz = qw# ( [ { $ @ " ' m #;
+ @is_function_follower{@qz} = (1) x scalar(@qz);
+
+ # These pre-token types after a bareword imply that it
+ # MIGHT be a constant, but it also might be a function taking
+ # 0 or more call args.
+ @qz = qw# ; ) ] } if unless #;
+ push @qz, ',';
+ @is_constant_follower{@qz} = (1) x scalar(@qz);
+ }
+
sub do_BAREWORD {
- my ($is_END_or_DATA) = @_;
+ my ($self) = @_;
# handle a bareword token:
# returns
# true if this token ends the current line
# false otherwise
- my ( $next_nonblank_token, $i_next ) =
- find_next_nonblank_token( $i, $rtokens, $max_token_index );
+ my $next_nonblank_token;
+ my $i_next = $i + 1;
+ if ( $i_next <= $max_token_index && $rtoken_type->[$i_next] eq 'b' ) {
+ $i_next++;
+ }
+ if ( $i_next <= $max_token_index ) {
+ $next_nonblank_token = $rtokens->[$i_next];
+ }
+ else {
+ ( $next_nonblank_token, $i_next ) =
+ $self->find_next_nonblank_token( $i, $rtokens, $max_token_index );
+ }
# a bare word immediately followed by :: is not a keyword;
# use $tok_kw when testing for keywords to avoid a mistake
$tok_kw .= '::';
}
- if ($in_attribute_list) {
- my $is_attribute = do_ATTRIBUTE_LIST($next_nonblank_token);
+ if ( $self->[_in_attribute_list_] ) {
+ my $is_attribute = $self->do_ATTRIBUTE_LIST($next_nonblank_token);
return if ($is_attribute);
}
# Scan a bare word following a -> as an identifier; it could
# have a long package name. Fixes c037, c041.
if ( $last_nonblank_token eq '->' ) {
- scan_bare_identifier();
+ $self->scan_bare_identifier();
# a bareward after '->' gets type 'i'
$type = 'i';
}
# Quote a word followed by => operator
- # unless the word __END__ or __DATA__ and the only word on
- # the line.
- elsif ( !$is_END_or_DATA
- && $next_nonblank_token eq '='
- && $rtokens->[ $i_next + 1 ] eq '>' )
+ elsif (
+ ( $next_nonblank_token eq '=' && $rtokens->[ $i_next + 1 ] eq '>' )
+
+ # unless the word is __END__ or __DATA__ and is the only word on
+ # the line.
+ && ( !defined( $is_END_DATA{$tok_kw} )
+ || $input_line !~ /^\s*__(?:END|DATA)__\s*$/ )
+ )
{
- do_QUOTED_BAREWORD();
+ # Bareword followed by a fat comma - see 'git18.in'
+ # This code was previously sub do_QUOTED_BAREWORD: see c316, c317
+
+ # Older perl:
+ # 'v25=>1' is a v-string key!
+ # '-v25=>1' is also a v-string key!
+ # Deactivated: this is no longer true; see git #165
+ if ( 0 && $tok =~ /^v\d+$/ ) {
+ $type = 'v';
+ $self->complain("v-string used as hash key\n");
+ $self->report_v_string($tok);
+ }
+
+ # If tok is something like 'x17' then it could
+ # actually be operator x followed by number 17.
+ # For example, here:
+ # 123x17 => [ 792, 1224 ],
+ # (a key of 123 repeated 17 times, perhaps not
+ # what was intended). We will mark x17 as type
+ # 'n' and it will be split. If the previous token
+ # was also a bareword then it is not very clear is
+ # going on. In this case we will not be sure that
+ # an operator is expected, so we just mark it as a
+ # bareword. Perl is a little murky in what it does
+ # with stuff like this, and its behavior can change
+ # over time. Something like
+ # a x18 => [792, 1224], will compile as
+ # a key with 18 a's. But something like
+ # push @array, a x18;
+ # is a syntax error.
+ elsif (
+ $expecting == OPERATOR
+ && substr( $tok, 0, 1 ) eq 'x'
+ && ( length($tok) == 1
+ || substr( $tok, 1, 1 ) =~ /^\d/ )
+ )
+ {
+ $type = 'n';
+ if ( $self->split_pretoken(1) ) {
+ $type = 'x';
+ $tok = 'x';
+ }
+ $self->complain("x operator in hash key\n");
+ }
+ else {
+
+ # git #18
+ $type = 'w';
+ $self->error_if_expecting_OPERATOR();
+ }
}
# quote a bare word within braces..like xxx->{s}; note that we
|| substr( $tok, 1, 1 ) =~ /^\d/ )
)
{
- do_X_OPERATOR();
+ $self->do_X_OPERATOR();
}
elsif ( $tok_kw eq 'CORE::' ) {
$type = $tok = $tok_kw;
elsif ( ( $tok eq 'strict' )
and ( $last_nonblank_token eq 'use' ) )
{
- $tokenizer_self->[_saw_use_strict_] = 1;
- scan_bare_identifier();
+ $self->[_saw_use_strict_] = 1;
+ $self->scan_bare_identifier();
}
elsif ( ( $tok eq 'warnings' )
and ( $last_nonblank_token eq 'use' ) )
{
- $tokenizer_self->[_saw_perl_dash_w_] = 1;
+ $self->[_saw_perl_dash_w_] = 1;
# scan as identifier, so that we pick up something like:
# use warnings::register
- scan_bare_identifier();
+ $self->scan_bare_identifier();
}
elsif (
$tok eq 'AutoLoader'
- && $tokenizer_self->[_look_for_autoloader_]
+ && $self->[_look_for_autoloader_]
&& (
$last_nonblank_token eq 'use'
)
)
{
- write_logfile_entry("AutoLoader seen, -nlal deactivates\n");
- $tokenizer_self->[_saw_autoloader_] = 1;
- $tokenizer_self->[_look_for_autoloader_] = 0;
- scan_bare_identifier();
+ $self->write_logfile_entry("AutoLoader seen, -nlal deactivates\n");
+ $self->[_saw_autoloader_] = 1;
+ $self->[_look_for_autoloader_] = 0;
+ $self->scan_bare_identifier();
}
elsif (
$tok eq 'SelfLoader'
- && $tokenizer_self->[_look_for_selfloader_]
+ && $self->[_look_for_selfloader_]
&& ( $last_nonblank_token eq 'use'
|| $input_line =~ /^\s*(use|require)\s+SelfLoader\b/
|| $input_line =~ /\bISA\s*=.*\bSelfLoader\b/ )
)
{
- write_logfile_entry("SelfLoader seen, -nlsl deactivates\n");
- $tokenizer_self->[_saw_selfloader_] = 1;
- $tokenizer_self->[_look_for_selfloader_] = 0;
- scan_bare_identifier();
+ $self->write_logfile_entry("SelfLoader seen, -nlsl deactivates\n");
+ $self->[_saw_selfloader_] = 1;
+ $self->[_look_for_selfloader_] = 0;
+ $self->scan_bare_identifier();
}
elsif ( ( $tok eq 'constant' )
and ( $last_nonblank_token eq 'use' ) )
{
- do_USE_CONSTANT();
+ $self->do_USE_CONSTANT();
}
# various quote operators
elsif ( $is_q_qq_qw_qx_qr_s_y_tr_m{$tok} ) {
- do_QUOTE_OPERATOR();
+ $self->do_QUOTE_OPERATOR();
}
# check for a statement label
&& ( $i_next <= $max_token_index ) # colon on same line
# like 'sub : lvalue' ?
- && !sub_attribute_ok_here( $tok_kw, $next_nonblank_token, $i_next )
- && label_ok()
+ && !$self->sub_attribute_ok_here( $tok_kw, $next_nonblank_token,
+ $i_next )
+ && new_statement_ok()
)
{
if ( $tok !~ /[A-Z]/ ) {
- push @{ $tokenizer_self->[_rlower_case_labels_at_] },
- $input_line_number;
+ push @{ $self->[_rlower_case_labels_at_] }, $input_line_number;
}
$type = 'J';
$tok .= ':';
# Update for --use-feature=class (rt145706):
# We have to be extra careful to avoid misparsing other uses of
# 'method' in older scripts.
- if ( $tok_kw eq 'method' ) {
+ if ( $tok_kw eq 'method' && $guess_if_method ) {
if ( $expecting == OPERATOR
- || $next_nonblank_token !~ /^(\w|\:)/
- || !method_ok_here() )
+ || $next_nonblank_token !~ /^[\w\:]/
+ || !$self->method_ok_here() )
{
- do_UNKNOWN_BAREWORD($next_nonblank_token);
+ $self->do_UNKNOWN_BAREWORD($next_nonblank_token);
}
else {
initialize_subname();
- scan_id();
+ $self->scan_id();
}
}
else {
- error_if_expecting_OPERATOR()
+ $self->error_if_expecting_OPERATOR()
if ( $expecting == OPERATOR );
initialize_subname();
- scan_id();
+ $self->scan_id();
}
}
# package($x) - error
if ( $tok_kw eq 'class' ) {
if ( $expecting == OPERATOR
- || $next_nonblank_token !~ /^(\w|\:)/
- || !class_ok_here() )
+ || $next_nonblank_token !~ /^[\w\:]/
+ || !$self->class_ok_here() )
{
- do_UNKNOWN_BAREWORD($next_nonblank_token);
+ $self->do_UNKNOWN_BAREWORD($next_nonblank_token);
}
- else { scan_id() }
+ else { $self->scan_id() }
}
else {
- error_if_expecting_OPERATOR()
+ $self->error_if_expecting_OPERATOR()
if ( $expecting == OPERATOR );
- scan_id();
+ $self->scan_id();
}
}
# Fix for c035: split 'format' from 'is_format_END_DATA' to be
# more restrictive. Require a new statement to be ok here.
elsif ( $tok_kw eq 'format' && new_statement_ok() ) {
- $type = ';'; # make tokenizer look for TERM next
- $tokenizer_self->[_in_format_] = 1;
- $is_last = 1; ## is last token on this line
+ $type = ';'; # make tokenizer look for TERM next
+ $self->[_in_format_] = 1;
+ $is_last = 1; ## is last token on this line
}
# Note on token types for format, __DATA__, __END__:
# It simplifies things to give these type ';', so that when we
# start rescanning we will be expecting a token of type TERM.
# We will switch to type 'k' before outputting the tokens.
- elsif ( $is_END_DATA{$tok_kw} ) {
+ elsif ( defined( $is_END_DATA{$tok_kw} ) ) {
$type = ';'; # make tokenizer look for TERM next
# Remember that we are in one of these three sections
- $tokenizer_self->[ $is_END_DATA{$tok_kw} ] = 1;
- $is_last = 1; ## is last token on this line
+ $self->[ $is_END_DATA{$tok_kw} ] = 1;
+ $is_last = 1; ## is last token on this line
}
elsif ( $is_keyword{$tok_kw} ) {
- do_KEYWORD();
+ $self->do_KEYWORD();
}
# check for inline label following
# something else --
else {
- do_UNKNOWN_BAREWORD($next_nonblank_token);
+ $self->do_UNKNOWN_BAREWORD($next_nonblank_token);
}
+ #----------------------------------------------------------------
+ # Save info for use in later guessing. Even for types 'i' and 'U'
+ # because those may be marked as type 'w' (barewords) elsewhere.
+ #----------------------------------------------------------------
+ if ( $is_wiUC{$type}
+ && $statement_type ne 'use'
+ && $statement_type ne '_use' )
+ {
+ my $result = "unknown";
+
+ # Words are marked 'function' if they appear in a role which
+ # is not consistent with a constant value. Typically they are
+ # function calls.
+ if ( $type eq 'U'
+ || $is_function_follower{$next_nonblank_token} )
+ {
+
+ my $empty_parens = 0;
+ if ( $next_nonblank_token eq '(' && $i_next < $max_token_index )
+ {
+ my $tok_next_p1 = $rtokens->[ $i_next + 1 ];
+ if ( substr( $tok_next_p1, 0, 1 ) eq SPACE
+ && $i_next + 2 <= $max_token_index )
+ {
+ $tok_next_p1 = $rtokens->[ $i_next + 2 ];
+ }
+ $empty_parens = $tok_next_p1 eq ')';
+ }
+
+ if ( !$empty_parens ) {
+
+ # not a constant term - probably a function
+ $result = "function";
+ $self->saw_bareword_function($tok);
+ }
+ }
+
+ # Words are marked 'constant' if they appear in a role
+ # consistent with a constant value. However, they may simply
+ # be functions which optionally take zero args. So if a word
+ # appears as both constant and function, it is not a constant.
+ elsif ($type eq 'C'
+ || $is_constant_follower{$next_nonblank_token} )
+ {
+
+ my $is_hash_key = $next_nonblank_token eq '}'
+ && (
+ $last_nonblank_type eq 'L'
+ || ( $last_nonblank_type eq 'm'
+ && $last_last_nonblank_type eq 'L' )
+ );
+
+ if (
+
+ # not a hash key like {bareword} or {-bareword}
+ !$is_hash_key
+
+ # not a package name, etc
+ && ( $last_nonblank_type ne 'k'
+ || !$is_use_require_no{$last_nonblank_token} )
+
+ # skip arrow calls, which can go either way
+ && $last_nonblank_token ne '->'
+ )
+ {
+ # possibly a constant or constant function
+ $result = "constant";
+ $self->saw_bareword_constant($tok);
+ }
+ else {
+ $result = "other bareword";
+ }
+ }
+ else {
+ }
+
+ if ( DEBUG_BAREWORD && $result ne 'other bareword' ) {
+ print
+"$input_line_number: $result: $tok: type=$type : last_tok=$last_nonblank_token : next_tok='$next_nonblank_token'\n";
+ }
+ }
return $is_last;
} ## end sub do_BAREWORD
sub do_FOLLOW_QUOTE {
+ my $self = shift;
+
# Continue following a quote on a new line
$type = $quote_type;
- unless ( @{$routput_token_list} ) { # initialize if continuation line
+ if ( !@{$routput_token_list} ) { # initialize if continuation line
push( @{$routput_token_list}, $i );
$routput_token_type->[$i] = $type;
$quoted_string_1,
$quoted_string_2,
- ) = do_quote(
+ ) = $self->do_quote(
$i,
$in_quote,
$quoted_string_1,
$quoted_string_2,
$rtokens,
+ $rtoken_type,
$rtoken_map,
$max_token_index,
# See test 'here2.in'.
if ( $saw_modifier_e && $i_tok >= 0 ) {
- my $rht = scan_replacement_text($qs1);
+ my $rht = $self->scan_replacement_text($qs1);
# Change type from 'Q' to 'h' for quotes with
# here-doc targets so that the formatter (see sub
# This error might also be triggered if my quote
# modifier characters are incomplete
else {
- warning(<<EOM);
+ $self->warning(<<EOM);
Partial match to quote modifier $allowed_quote_modifiers at word: '$str'
Please put a space between quote modifiers and trailing keywords.
# example file: rokicki4.pl
# This error might also be triggered if my quote
# modifier characters are incomplete
- write_logfile_entry(
+ $self->write_logfile_entry(
"Note: found word $str at quote modifier location\n");
}
}
use constant DEBUG_TOKENIZE => 0;
+ my %is_arrow_or_Z;
+
+ BEGIN {
+ my @qZ = qw( -> Z );
+ @is_arrow_or_Z{@qZ} = (1) x scalar(@qZ);
+ }
+
sub tokenize_this_line {
- # This routine breaks a line of perl code into tokens which are of use in
- # indentation and reformatting. One of my goals has been to define tokens
- # such that a newline may be inserted between any pair of tokens without
- # changing or invalidating the program. This version comes close to this,
- # although there are necessarily a few exceptions which must be caught by
- # the formatter. Many of these involve the treatment of bare words.
- #
- # The tokens and their types are returned in arrays. See previous
- # routine for their names.
- #
- # See also the array "valid_token_types" in the BEGIN section for an
- # up-to-date list.
- #
- # To simplify things, token types are either a single character, or they
- # are identical to the tokens themselves.
- #
- # As a debugging aid, the -D flag creates a file containing a side-by-side
- # comparison of the input string and its tokenization for each line of a file.
- # This is an invaluable debugging aid.
- #
- # In addition to tokens, and some associated quantities, the tokenizer
- # also returns flags indication any special line types. These include
- # quotes, here_docs, formats.
- #
- # -----------------------------------------------------------------------
- #
- # How to add NEW_TOKENS:
- #
- # New token types will undoubtedly be needed in the future both to keep up
- # with changes in perl and to help adapt the tokenizer to other applications.
- #
- # Here are some notes on the minimal steps. I wrote these notes while
- # adding the 'v' token type for v-strings, which are things like version
- # numbers 5.6.0, and ip addresses, and will use that as an example. ( You
- # can use your editor to search for the string "NEW_TOKENS" to find the
- # appropriate sections to change):
- #
- # *. Try to talk somebody else into doing it! If not, ..
- #
- # *. Make a backup of your current version in case things don't work out!
- #
- # *. Think of a new, unused character for the token type, and add to
- # the array @valid_token_types in the BEGIN section of this package.
- # For example, I used 'v' for v-strings.
- #
- # *. Implement coding to recognize the $type of the token in this routine.
- # This is the hardest part, and is best done by imitating or modifying
- # some of the existing coding. For example, to recognize v-strings, I
- # patched 'sub scan_bare_identifier' to recognize v-strings beginning with
- # 'v' and 'sub scan_number' to recognize v-strings without the leading 'v'.
- #
- # *. Update sub operator_expected. This update is critically important but
- # the coding is trivial. Look at the comments in that routine for help.
- # For v-strings, which should behave like numbers, I just added 'v' to the
- # regex used to handle numbers and strings (types 'n' and 'Q').
- #
- # *. Implement a 'bond strength' rule in sub set_bond_strengths in
- # Perl::Tidy::Formatter for breaking lines around this token type. You can
- # skip this step and take the default at first, then adjust later to get
- # desired results. For adding type 'v', I looked at sub bond_strength and
- # saw that number type 'n' was using default strengths, so I didn't do
- # anything. I may tune it up someday if I don't like the way line
- # breaks with v-strings look.
- #
- # *. Implement a 'whitespace' rule in sub set_whitespace_flags in
- # Perl::Tidy::Formatter. For adding type 'v', I looked at this routine
- # and saw that type 'n' used spaces on both sides, so I just added 'v'
- # to the array @spaces_both_sides.
- #
- # *. Update HtmlWriter package so that users can colorize the token as
- # desired. This is quite easy; see comments identified by 'NEW_TOKENS' in
- # that package. For v-strings, I initially chose to use a default color
- # equal to the default for numbers, but it might be nice to change that
- # eventually.
- #
- # *. Update comments in Perl::Tidy::Tokenizer::dump_token_types.
- #
- # *. Run lots and lots of debug tests. Start with special files designed
- # to test the new token type. Run with the -D flag to create a .DEBUG
- # file which shows the tokenization. When these work ok, test as many old
- # scripts as possible. Start with all of the '.t' files in the 'test'
- # directory of the distribution file. Compare .tdy output with previous
- # version and updated version to see the differences. Then include as
- # many more files as possible. My own technique has been to collect a huge
- # number of perl scripts (thousands!) into one directory and run perltidy
- # *, then run diff between the output of the previous version and the
- # current version.
- #
- # *. For another example, search for the smartmatch operator '~~'
- # with your editor to see where updates were made for it.
- #
- # -----------------------------------------------------------------------
+ my ( $self, $line_of_tokens, $trimmed_input_line ) = @_;
- my ( $self, $line_of_tokens ) = @_;
- my ($untrimmed_input_line) = $line_of_tokens->{_line_text};
+ # This routine tokenizes one line. The results are stored in
+ # the hash ref '$line_of_tokens'.
+
+ # Given:
+ # $line_of_tokens = ref to hash of values being filled for this line
+ # $trimmed_input_line
+ # = the input line without leading whitespace, OR
+ # = undef if not available
+ # Returns:
+ # nothing
+
+ my $untrimmed_input_line = $line_of_tokens->{_line_text};
# Extract line number for use in error messages
$input_line_number = $line_of_tokens->{_line_number};
- # Check for pod documentation
+ #-------------------------------------
+ # Check for start of pod documentation
+ #-------------------------------------
if ( substr( $untrimmed_input_line, 0, 1 ) eq '='
&& $untrimmed_input_line =~ /^=[A-Za-z_]/ )
{
# Must not be in multi-line quote
# and must not be in an equation
- if ( !$in_quote
- && ( operator_expected( [ 'b', '=', 'b' ] ) == TERM ) )
+ my $blank_after_Z = 1;
+ if (
+ !$in_quote
+ && ( $self->operator_expected( '=', 'b', $blank_after_Z ) ==
+ TERM )
+ )
{
$self->[_in_pod_] = 1;
return;
}
}
- $input_line = $untrimmed_input_line;
-
- chomp $input_line;
-
- # Set a flag to indicate if we might be at an __END__ or __DATA__ line
- # This will be used below to avoid quoting a bare word followed by
- # a fat comma.
- my $is_END_or_DATA;
-
- # Reinitialize the multi-line quote flag
+ #--------------------------
+ # Trim leading whitespace ?
+ #--------------------------
+ # Use untrimmed line if we are continuing in a type 'Q' quote
if ( $in_quote && $quote_type eq 'Q' ) {
$line_of_tokens->{_starting_in_quote} = 1;
+ $input_line = $untrimmed_input_line;
+ chomp $input_line;
}
+
+ # Trim start of this line if we are not continuing a quoted line.
+ # Do not trim end because we might end in a quote (test: deken4.pl)
+ # Perl::Tidy::Formatter will delete needless trailing blanks
else {
$line_of_tokens->{_starting_in_quote} = 0;
- # Trim start of this line unless we are continuing a quoted line.
- # Do not trim end because we might end in a quote (test: deken4.pl)
- # Perl::Tidy::Formatter will delete needless trailing blanks
- $input_line =~ s/^(\s+)//;
+ # Use the pre-computed trimmed line if defined (most efficient)
+ $input_line = $trimmed_input_line;
- # Calculate a guessed level for nonblank lines to avoid calls to
- # sub guess_old_indentation_level()
- if ( length($input_line) && $1 ) {
- my $leading_spaces = $1;
- my $spaces = length($leading_spaces);
+ # otherwise trim the raw input line (much less efficient)
+ if ( !defined($input_line) ) {
+ $input_line = $untrimmed_input_line;
+ $input_line =~ s/^\s+//;
+ }
- # handle leading tabs
- if ( ord( substr( $leading_spaces, 0, 1 ) ) == ORD_TAB
- && $leading_spaces =~ /^(\t+)/ )
- {
- my $tabsize = $self->[_tabsize_];
- $spaces += length($1) * ( $tabsize - 1 );
- }
+ chomp $input_line;
- my $indent_columns = $self->[_indent_columns_];
- $line_of_tokens->{_guessed_indentation_level} =
- int( $spaces / $indent_columns );
+ # define 'guessed_indentation_level' if logfile will be saved
+ if ( $self->[_save_logfile_] && length($input_line) ) {
+ my $guess =
+ $self->guess_old_indentation_level($untrimmed_input_line);
+ $line_of_tokens->{_guessed_indentation_level} = $guess;
}
+ }
- $is_END_or_DATA = substr( $input_line, 0, 1 ) eq '_'
- && $input_line =~ /^__(END|DATA)__\s*$/;
+ #------------
+ # Blank lines
+ #------------
+ if ( !length($input_line) ) {
+ $line_of_tokens->{_line_type} = 'CODE';
+ $line_of_tokens->{_rtokens} = [];
+ $line_of_tokens->{_rtoken_type} = [];
+ $line_of_tokens->{_rlevels} = [];
+ $line_of_tokens->{_rblock_type} = [];
+ $line_of_tokens->{_nesting_tokens_0} = $nesting_token_string;
+ $line_of_tokens->{_nesting_blocks_0} = $nesting_block_string;
+ return;
}
- # Optimize for a full-line comment.
- if ( !$in_quote ) {
- if ( substr( $input_line, 0, 1 ) eq '#' ) {
+ #---------
+ # Comments
+ #---------
+ if ( !$in_quote && substr( $input_line, 0, 1 ) eq '#' ) {
- # and check for skipped section
- if ( $rOpts_code_skipping
- && $input_line =~ /$code_skipping_pattern_begin/ )
- {
- $self->[_in_skipped_] = 1;
- return;
- }
-
- # Optional fast processing of a block comment
- my $ci_string_sum =
- ( my $str = $ci_string_in_tokenizer ) =~ tr/1/0/;
- my $ci_string_i = $ci_string_sum + $in_statement_continuation;
- $line_of_tokens->{_line_type} = 'CODE';
- $line_of_tokens->{_rtokens} = [$input_line];
- $line_of_tokens->{_rtoken_type} = ['#'];
- $line_of_tokens->{_rlevels} = [$level_in_tokenizer];
- $line_of_tokens->{_rci_levels} = [$ci_string_i];
- $line_of_tokens->{_rblock_type} = [EMPTY_STRING];
- $line_of_tokens->{_nesting_tokens_0} = $nesting_token_string;
- $line_of_tokens->{_nesting_blocks_0} = $nesting_block_string;
+ # and check for skipped section
+ if (
+ (
+ substr( $input_line, 0, 4 ) eq '#<<V'
+ || $rOpts_code_skipping_begin
+ )
+ && $rOpts_code_skipping
+
+ # note that the code_skipping_patterns require a newline
+ && $input_line . "\n" =~ /$code_skipping_pattern_begin/
+ )
+ {
+ $self->[_in_code_skipping_] = $self->[_last_line_number_];
return;
}
- # Optimize handling of a blank line
- if ( !length($input_line) ) {
- $line_of_tokens->{_line_type} = 'CODE';
- $line_of_tokens->{_rtokens} = [];
- $line_of_tokens->{_rtoken_type} = [];
- $line_of_tokens->{_rlevels} = [];
- $line_of_tokens->{_rci_levels} = [];
- $line_of_tokens->{_rblock_type} = [];
- $line_of_tokens->{_nesting_tokens_0} = $nesting_token_string;
- $line_of_tokens->{_nesting_blocks_0} = $nesting_block_string;
- return;
+ if ( !$self->[_in_format_skipping_] ) {
+ if (
+ (
+ substr( $input_line, 0, 4 ) eq '#<<<'
+ || $rOpts_format_skipping_begin
+ )
+ && $rOpts_format_skipping
+
+ # note that the code_skipping_patterns require a newline
+ && $input_line . "\n" =~ /$format_skipping_pattern_begin/
+ )
+ {
+ $self->[_in_format_skipping_] = $self->[_last_line_number_];
+ }
+ }
+ else {
+ if (
+ (
+ substr( $input_line, 0, 4 ) eq '#>>>'
+ || $rOpts_format_skipping_end
+ )
+
+ # note that the code_skipping_patterns require a newline
+ && $input_line . "\n" =~ /$format_skipping_pattern_end/
+ )
+ {
+ $self->[_in_format_skipping_] = 0;
+ }
}
+
+ # Optional fast processing of a block comment
+ $line_of_tokens->{_line_type} = 'CODE';
+ $line_of_tokens->{_rtokens} = [$input_line];
+ $line_of_tokens->{_rtoken_type} = ['#'];
+ $line_of_tokens->{_rlevels} = [$level_in_tokenizer];
+ $line_of_tokens->{_rblock_type} = [EMPTY_STRING];
+ $line_of_tokens->{_nesting_tokens_0} = $nesting_token_string;
+ $line_of_tokens->{_nesting_blocks_0} = $nesting_block_string;
+ return;
}
- # update the copy of the line for use in error messages
+ #-------------------------------------
+ # Loop to find all tokens on this line
+ #-------------------------------------
+
+ # Update the copy of the line for use in error messages
# This must be exactly what we give the pre_tokenizer
$self->[_line_of_text_] = $input_line;
# re-initialize for the main loop
- $routput_token_list = []; # stack of output token indexes
- $routput_token_type = []; # token types
- $routput_block_type = []; # types of code block
- $routput_container_type = []; # paren types, such as if, elsif, ..
- $routput_type_sequence = []; # nesting sequential number
+ $routput_token_list = []; # stack of output token indexes
+ $routput_token_type = []; # token types
+ $routput_block_type = []; # types of code block
+ $routput_type_sequence = []; # nesting sequential number
$rhere_target_list = [];
$indent_flag = 0;
$peeked_ahead = 0;
- $self->tokenizer_main_loop($is_END_or_DATA);
-
- #-----------------------------------------------
- # all done tokenizing this line ...
- # now prepare the final list of tokens and types
- #-----------------------------------------------
+ $self->tokenizer_main_loop();
+ #-------------------------------------------------
+ # Done tokenizing this line ... package the result
+ #-------------------------------------------------
$self->tokenizer_wrapup_line($line_of_tokens);
return;
sub tokenizer_main_loop {
- my ( $self, $is_END_or_DATA ) = @_;
+ my ($self) = @_;
- #---------------------------------
# Break one input line into tokens
- #---------------------------------
+ # We are working on closure variables.
+
+ # Start by breaking the line into pre-tokens
+ ( $rtokens, $rtoken_map, $rtoken_type ) = pre_tokenize($input_line);
+
+ # Verify that all leading whitespace has been trimmed
+ # except for quotes of type 'Q' (c273).
+ if ( @{$rtokens}
+ && $rtoken_type->[0] eq 'b'
+ && !( $in_quote && $quote_type eq 'Q' ) )
+ {
- # Input parameter:
- # $is_END_or_DATA is true for a __END__ or __DATA__ line
+ # Shouldn't happen if calling sub did trim operation correctly.
+ DEVEL_MODE && $self->Fault(<<EOM);
+leading blank at line
+$input_line
+EOM
- # start by breaking the line into pre-tokens
- my $max_tokens_wanted = 0; # this signals pre_tokenize to get all tokens
- ( $rtokens, $rtoken_map, $rtoken_type ) =
- pre_tokenize( $input_line, $max_tokens_wanted );
+ # Fix by removing the leading blank token. This fix has been
+ # tested and works correctly even if no whitespaces was trimmed.
+ # But it is an inefficient way to do things because, for example,
+ # it forces all comments to be processed by sub pre_tokenize.
+ # And it may cause indented code-skipping comments to be missed.
+ shift @{$rtokens};
+ shift @{$rtoken_map};
+ shift @{$rtoken_type};
+ }
$max_token_index = scalar( @{$rtokens} ) - 1;
push( @{$rtokens}, SPACE, SPACE, SPACE )
- ; # extra whitespace simplifies logic
+ ; # extra whitespace simplifies logic
push( @{$rtoken_map}, 0, 0, 0 ); # shouldn't be referenced
push( @{$rtoken_type}, 'b', 'b', 'b' );
# initialize for main loop
if (0) { #<<< this is not necessary
foreach my $ii ( 0 .. $max_token_index + 3 ) {
- $routput_token_type->[$ii] = EMPTY_STRING;
- $routput_block_type->[$ii] = EMPTY_STRING;
- $routput_container_type->[$ii] = EMPTY_STRING;
- $routput_type_sequence->[$ii] = EMPTY_STRING;
- $routput_indent_flag->[$ii] = 0;
+ $routput_token_type->[$ii] = EMPTY_STRING;
+ $routput_block_type->[$ii] = EMPTY_STRING;
+ $routput_type_sequence->[$ii] = EMPTY_STRING;
+ $routput_indent_flag->[$ii] = 0;
}
}
$i = -1;
$i_tok = -1;
- #-----------------------------
- # begin main tokenization loop
- #-----------------------------
+ #-----------------------
+ # main tokenization loop
+ #-----------------------
# we are looking at each pre-token of one line and combining them
# into tokens
# continue looking for the end of a quote
if ($in_quote) {
- do_FOLLOW_QUOTE();
+ $self->do_FOLLOW_QUOTE();
last if ( $in_quote || $i > $max_token_index );
}
- if ( $type ne 'b' && $tok ne 'CORE::' ) {
+ if ( $type ne 'b' && $type ne 'CORE::' ) {
# try to catch some common errors
if ( ( $type eq 'n' ) && ( $tok ne '0' ) ) {
if ( $last_nonblank_token eq 'eq' ) {
- complain("Should 'eq' be '==' here ?\n");
+ $self->complain("Should 'eq' be '==' here ?\n");
}
elsif ( $last_nonblank_token eq 'ne' ) {
- complain("Should 'ne' be '!=' here ?\n");
+ $self->complain("Should 'ne' be '!=' here ?\n");
+ }
+ else {
+ # that's all
}
}
# fix c090, only rotate vars if a new token will be stored
if ( $i_tok >= 0 ) {
- $last_last_nonblank_token = $last_nonblank_token;
- $last_last_nonblank_type = $last_nonblank_type;
- $last_last_nonblank_block_type = $last_nonblank_block_type;
- $last_last_nonblank_container_type =
- $last_nonblank_container_type;
- $last_last_nonblank_type_sequence =
- $last_nonblank_type_sequence;
- # Fix part #3 for git82: propagate type 'Z' though L-R pair
- unless ( $type eq 'R' && $last_nonblank_type eq 'Z' ) {
- $last_nonblank_token = $tok;
- $last_nonblank_type = $type;
- }
+ $last_last_nonblank_token = $last_nonblank_token;
+ $last_last_nonblank_type = $last_nonblank_type;
+
$last_nonblank_prototype = $prototype;
$last_nonblank_block_type = $block_type;
$last_nonblank_container_type = $container_type;
$last_nonblank_type_sequence = $type_sequence;
$last_nonblank_i = $i_tok;
+ $last_nonblank_token = $tok;
+ $last_nonblank_type = $type;
}
- # Patch for c030: Fix things in case a '->' got separated from
- # the subsequent identifier by a side comment. We need the
- # last_nonblank_token to have a leading -> to avoid triggering
- # an operator expected error message at the next '('. See also
- # fix for git #63.
- if ( $last_last_nonblank_token eq '->' ) {
- if ( $last_nonblank_type eq 'w'
- || $last_nonblank_type eq 'i' )
- {
- $last_nonblank_token = '->' . $last_nonblank_token;
- $last_nonblank_type = 'i';
+ # Check for patches
+ if ( $is_arrow_or_Z{$last_last_nonblank_type} ) {
+
+ # Patch for c030: Fix things in case a '->' got separated
+ # from the subsequent identifier by a side comment. We
+ # need the last_nonblank_token to have a leading -> to
+ # avoid triggering an operator expected error message at
+ # the next '('. See also fix for git #63.
+ if ( $last_last_nonblank_type eq '->' ) {
+ if ( $last_nonblank_type eq 'w'
+ || $last_nonblank_type eq 'i' )
+ {
+ $last_nonblank_token = '->' . $last_nonblank_token;
+ $last_nonblank_type = 'i';
+ }
+ }
+
+ # Fix part #3 for git82: propagate type 'Z' though L-R pair
+ elsif ( $last_last_nonblank_type eq 'Z' ) {
+ if ( $last_nonblank_type eq 'R' ) {
+ $last_nonblank_type = $last_last_nonblank_type;
+ $last_nonblank_token = $last_last_nonblank_token;
+ }
+ }
+ else {
+ # No other patches
}
}
}
# store previous token type
if ( $i_tok >= 0 ) {
- $routput_token_type->[$i_tok] = $type;
- $routput_block_type->[$i_tok] = $block_type;
- $routput_container_type->[$i_tok] = $container_type;
- $routput_type_sequence->[$i_tok] = $type_sequence;
- $routput_indent_flag->[$i_tok] = $indent_flag;
+ $routput_token_type->[$i_tok] = $type;
+ $routput_block_type->[$i_tok] = $block_type;
+ $routput_type_sequence->[$i_tok] = $type_sequence;
+ $routput_indent_flag->[$i_tok] = $indent_flag;
}
# get the next pre-token and type
my $pre_tok = $tok = $rtokens->[$i]; # get the next pre-token
my $pre_type = $type = $rtoken_type->[$i]; # and type
- # remember the starting index of this token; we will be updating $i
- $i_tok = $i;
-
# re-initialize various flags for the next output token
- $block_type &&= EMPTY_STRING;
- $container_type &&= EMPTY_STRING;
- $type_sequence &&= EMPTY_STRING;
- $indent_flag &&= 0;
- $prototype &&= EMPTY_STRING;
+ (
+
+ # remember the starting index of this token; we will update $i
+ $i_tok,
+ $block_type,
+ $container_type,
+ $type_sequence,
+ $indent_flag,
+ $prototype,
+ )
+ = (
+
+ $i,
+ EMPTY_STRING,
+ EMPTY_STRING,
+ EMPTY_STRING,
+ 0,
+ EMPTY_STRING,
+ );
# this pre-token will start an output token
push( @{$routput_token_list}, $i_tok );
- #--------------------------
- # handle a whitespace token
- #--------------------------
+ #---------------------------------------------------
+ # The token search leads to one of 5 main END NODES:
+ #---------------------------------------------------
+
+ #-----------------------
+ # END NODE 1: whitespace
+ #-----------------------
next if ( $pre_type eq 'b' );
- #-----------------
- # handle a comment
- #-----------------
- last if ( $pre_type eq '#' );
+ #----------------------
+ # END NODE 2: a comment
+ #----------------------
+ if ( $pre_type eq '#' ) {
+
+ # push non-indenting brace stack Look for a possible
+ # non-indenting brace. This is only used to give a hint in
+ # case the file is unbalanced.
+ # Hardwired to '#<<<' for efficiency. We will not use the
+ # result later if the pattern has been changed (very unusual).
+ if ( $last_nonblank_token eq '{'
+ && $last_nonblank_block_type
+ && $last_nonblank_type_sequence
+ && !$self->[_in_format_skipping_]
+ && $rOpts_non_indenting_braces )
+ {
+ my $offset = $rtoken_map->[$i_tok];
+ my $text = substr( $input_line, $offset, 5 );
+ my $len = length($text);
+ if ( $len == 4 && $text eq '#<<<'
+ || $len > 4 && $text eq '#<<< ' )
+ {
+ push @{ $self->[_rnon_indenting_brace_stack_] },
+ $last_nonblank_type_sequence;
+ }
+ }
+ last;
+ }
# continue gathering identifier if necessary
if ($id_scan_state) {
if ( $is_sub{$id_scan_state} || $is_package{$id_scan_state} ) {
- scan_id();
+ $self->scan_id();
}
else {
- scan_identifier();
+ $self->scan_identifier();
}
if ($id_scan_state) {
# done if nothing left to scan on this line
last if ( $i > $max_token_index );
- my ( $next_nonblank_token, $i_next ) =
+ my ( $next_nonblank_token_uu, $i_next ) =
find_next_nonblank_token_on_this_line( $i, $rtokens,
$max_token_index );
$tok = $pre_tok;
}
-## my $prev_tok = $i > 0 ? $rtokens->[ $i - 1 ] : SPACE;
- my $prev_type = $i > 0 ? $rtoken_type->[ $i - 1 ] : 'b';
-
#-----------------------------------------------------------
# Combine pre-tokens into digraphs and trigraphs if possible
#-----------------------------------------------------------
if ( $test_tok eq '//' && $last_nonblank_type ne 'Z' ) {
# note that here $tok = '/' and the next tok and type is '/'
- $expecting = operator_expected( [ $prev_type, $tok, '/' ] );
+ my $blank_after_Z;
+ $expecting =
+ $self->operator_expected( $tok, '/', $blank_after_Z );
# Patched for RT#101547, was 'unless ($expecting==OPERATOR)'
$combine_ok = 0 if ( $expecting == TERM );
}
- # Patch for RT #114359: Missparsing of "print $x ** 0.5;
+ # Patch for RT #114359: mis-parsing of "print $x ** 0.5;
# Accept the digraphs '**' only after type 'Z'
# Otherwise postpone the decision.
if ( $test_tok eq '**' ) {
}
# The only current tetragraph is the double diamond operator
- # and its first three characters are not a trigraph, so
+ # and its first three characters are NOT a trigraph, so
# we do can do a special test for it
- elsif ( $test_tok eq '<<>' ) {
- $test_tok .= $rtokens->[ $i + 2 ];
- if ( $is_tetragraph{$test_tok} ) {
- $tok = $test_tok;
- $i += 2;
+ else {
+ if ( $test_tok eq '<<>' ) {
+ $test_tok .= $rtokens->[ $i + 2 ];
+ if ( $is_tetragraph{$test_tok} ) {
+ $tok = $test_tok;
+ $i += 2;
+ }
}
}
}
$next_tok = $rtokens->[ $i + 1 ];
$next_type = $rtoken_type->[ $i + 1 ];
+ # expecting an operator here? first try table lookup, then function
+ $expecting = $op_expected_table{$last_nonblank_type};
+ if ( !defined($expecting) ) {
+ my $blank_after_Z = $last_nonblank_type eq 'Z'
+ && ( $i == 0 || $rtoken_type->[ $i - 1 ] eq 'b' );
+ $expecting =
+ $self->operator_expected( $tok, $next_type, $blank_after_Z );
+ }
+
DEBUG_TOKENIZE && do {
local $LIST_SEPARATOR = ')(';
my @debug_list = (
- $last_nonblank_token, $tok,
- $next_tok, $brace_depth,
- $brace_type[$brace_depth], $paren_depth,
- $paren_type[$paren_depth],
+ $last_nonblank_token, $tok,
+ $next_tok, $brace_depth,
+ $rbrace_type->[$brace_depth], $paren_depth,
+ $rparen_type->[$paren_depth],
);
- print STDOUT "TOKENIZE:(@debug_list)\n";
+ print {*STDOUT} "TOKENIZE:(@debug_list)\n";
};
- # Turn off attribute list on first non-blank, non-bareword.
- # Added '#' to fix c038 (later moved above).
- if ( $in_attribute_list && $pre_type ne 'w' ) {
- $in_attribute_list = 0;
- }
-
- #--------------------------------------------------------
- # We have the next token, $tok.
- # Now we have to examine this token and decide what it is
- # and define its $type
- #
- # section 1: bare words
- #--------------------------------------------------------
+ # The next token is '$tok'.
+ # Now we have to define its '$type'
+ #------------------------
+ # END NODE 3: a bare word
+ #------------------------
if ( $pre_type eq 'w' ) {
- $expecting =
- operator_expected( [ $prev_type, $tok, $next_type ] );
- my $is_last = do_BAREWORD($is_END_or_DATA);
+ my $is_last = $self->do_BAREWORD();
last if ($is_last);
+ next;
}
- #-----------------------------
- # section 2: strings of digits
- #-----------------------------
- elsif ( $pre_type eq 'd' ) {
- $expecting =
- operator_expected( [ $prev_type, $tok, $next_type ] );
- do_DIGITS();
- }
+ # Turn off attribute list on first non-blank, non-bareword,
+ # and non-comment (added to fix c038)
+ $self->[_in_attribute_list_] = 0;
- #----------------------------
- # section 3: all other tokens
- #----------------------------
- else {
- my $code = $tokenization_code->{$tok};
- if ($code) {
- $expecting =
- operator_expected( [ $prev_type, $tok, $next_type ] );
- $code->();
- redo if $in_quote;
- }
+ #-------------------------------
+ # END NODE 4: a string of digits
+ #-------------------------------
+ if ( $pre_type eq 'd' ) {
+ $self->do_DIGITS();
+ next;
}
- }
- # -----------------------------
- # end of main tokenization loop
- # -----------------------------
+ #------------------------------------------
+ # END NODE 5: everything else (punctuation)
+ #------------------------------------------
+ my $code = $tokenization_code->{$tok};
+ if ($code) {
+ $code->($self);
+ redo if $in_quote;
+ }
+ } ## End main tokenizer loop
# Store the final token
if ( $i_tok >= 0 ) {
- $routput_token_type->[$i_tok] = $type;
- $routput_block_type->[$i_tok] = $block_type;
- $routput_container_type->[$i_tok] = $container_type;
- $routput_type_sequence->[$i_tok] = $type_sequence;
- $routput_indent_flag->[$i_tok] = $indent_flag;
+ $routput_token_type->[$i_tok] = $type;
+ $routput_block_type->[$i_tok] = $block_type;
+ $routput_type_sequence->[$i_tok] = $type_sequence;
+ $routput_indent_flag->[$i_tok] = $indent_flag;
}
# Remember last nonblank values
if ( $type ne 'b' && $type ne '#' ) {
- $last_last_nonblank_token = $last_nonblank_token;
- $last_last_nonblank_type = $last_nonblank_type;
- $last_last_nonblank_block_type = $last_nonblank_block_type;
- $last_last_nonblank_container_type = $last_nonblank_container_type;
- $last_last_nonblank_type_sequence = $last_nonblank_type_sequence;
- $last_nonblank_token = $tok;
- $last_nonblank_type = $type;
- $last_nonblank_block_type = $block_type;
- $last_nonblank_container_type = $container_type;
- $last_nonblank_type_sequence = $type_sequence;
- $last_nonblank_prototype = $prototype;
+
+ $last_last_nonblank_token = $last_nonblank_token;
+ $last_last_nonblank_type = $last_nonblank_type;
+
+ $last_nonblank_prototype = $prototype;
+ $last_nonblank_block_type = $block_type;
+ $last_nonblank_container_type = $container_type;
+ $last_nonblank_type_sequence = $type_sequence;
+ $last_nonblank_token = $tok;
+ $last_nonblank_type = $type;
}
# reset indentation level if necessary at a sub or package
if ( $level_in_tokenizer < 0 ) {
if ( $input_line =~ /^\s*(sub|package)\s+(\w+)/ ) {
reset_indentation_level(0);
- brace_warning("resetting level to 0 at $1 $2\n");
+ $self->brace_warning("resetting level to 0 at $1 $2\n");
}
}
- $self->[_in_attribute_list_] = $in_attribute_list;
- $self->[_in_quote_] = $in_quote;
+ $self->[_in_quote_] = $in_quote;
$self->[_quote_target_] =
- $in_quote ? matching_end_token($quote_character) : EMPTY_STRING;
+ $in_quote
+ ? $matching_end_token{$quote_character}
+ ? $matching_end_token{$quote_character}
+ : $quote_character
+ : EMPTY_STRING;
$self->[_rhere_target_list_] = $rhere_target_list;
return;
# Package a line of tokens for shipping back to the caller
#---------------------------------------------------------
- # Most of the remaining work involves defining the two indentation
- # parameters that the formatter needs for each token:
- # - $level = structural indentation level and
- # - $ci_level = continuation indentation level
-
- # The method for setting the indentation level is straightforward.
- # But the method used to define the continuation indentation is
- # complicated because it has evolved over a long time by trial and
- # error. It could undoubtedly be simplified but it works okay as is.
-
- # Here is a brief description of how indentation is computed.
- # Perl::Tidy computes indentation as the sum of 2 terms:
- #
- # (1) structural indentation, such as if/else/elsif blocks
- # (2) continuation indentation, such as long parameter call lists.
- #
- # These are occasionally called primary and secondary indentation.
- #
- # Structural indentation is introduced by tokens of type '{',
- # although the actual tokens might be '{', '(', or '['. Structural
- # indentation is of two types: BLOCK and non-BLOCK. Default
- # structural indentation is 4 characters if the standard indentation
- # scheme is used.
- #
- # Continuation indentation is introduced whenever a line at BLOCK
- # level is broken before its termination. Default continuation
- # indentation is 2 characters in the standard indentation scheme.
- #
- # Both types of indentation may be nested arbitrarily deep and
- # interlaced. The distinction between the two is somewhat arbitrary.
- #
- # For each token, we will define two variables which would apply if
- # the current statement were broken just before that token, so that
- # that token started a new line:
- #
- # $level = the structural indentation level,
- # $ci_level = the continuation indentation level
- #
- # The total indentation will be $level * (4 spaces) + $ci_level * (2
- # spaces), assuming defaults. However, in some special cases it is
- # customary to modify $ci_level from this strict value.
- #
- # The total structural indentation is easy to compute by adding and
- # subtracting 1 from a saved value as types '{' and '}' are seen.
- # The running value of this variable is $level_in_tokenizer.
- #
- # The total continuation is much more difficult to compute, and
- # requires several variables. These variables are:
- #
- # $ci_string_in_tokenizer = a string of 1's and 0's indicating, for
- # each indentation level, if there are intervening open secondary
- # structures just prior to that level.
- # $continuation_string_in_tokenizer = a string of 1's and 0's
- # indicating if the last token at that level is "continued", meaning
- # that it is not the first token of an expression.
- # $nesting_block_string = a string of 1's and 0's indicating, for each
- # indentation level, if the level is of type BLOCK or not.
- # $nesting_block_flag = the most recent 1 or 0 of $nesting_block_string
- # $nesting_list_string = a string of 1's and 0's indicating, for each
- # indentation level, if it is appropriate for list formatting.
- # If so, continuation indentation is used to indent long list items.
- # $nesting_list_flag = the most recent 1 or 0 of $nesting_list_string
- # @{$rslevel_stack} = a stack of total nesting depths at each
- # structural indentation level, where "total nesting depth" means
- # the nesting depth that would occur if every nesting token
- # -- '{', '[', # and '(' -- , regardless of context, is used to
- # compute a nesting depth.
-
- # Notes on the Continuation Indentation
- #
- # There is a sort of chicken-and-egg problem with continuation
- # indentation. The formatter can't make decisions on line breaks
- # without knowing what 'ci' will be at arbitrary locations.
- #
- # But a problem with setting the continuation indentation (ci) here
- # in the tokenizer is that we do not know where line breaks will
- # actually be. As a result, we don't know if we should propagate
- # continuation indentation to higher levels of structure.
- #
- # For nesting of only structural indentation, we never need to do
- # this. For example, in a long if statement, like this
- #
- # if ( !$output_block_type[$i]
- # && ($in_statement_continuation) )
- # { <--outdented
- # do_something();
- # }
- #
- # the second line has ci but we do normally give the lines within
- # the BLOCK any ci. This would be true if we had blocks nested
- # arbitrarily deeply.
- #
- # But consider something like this, where we have created a break
- # after an opening paren on line 1, and the paren is not (currently)
- # a structural indentation token:
- #
- # my $file = $menubar->Menubutton(
- # qw/-text File -underline 0 -menuitems/ => [
- # [
- # Cascade => '~View',
- # -menuitems => [
- # ...
- #
- # The second line has ci, so it would seem reasonable to propagate
- # it down, giving the third line 1 ci + 1 indentation. This
- # suggests the following rule, which is currently used to
- # propagating ci down: if there are any non-structural opening
- # parens (or brackets, or braces), before an opening structural
- # brace, then ci is propagated down, and otherwise
- # not. The variable $intervening_secondary_structure contains this
- # information for the current token, and the string
- # "$ci_string_in_tokenizer" is a stack of previous values of this
- # variable.
-
- my @token_type = (); # stack of output token types
- my @block_type = (); # stack of output code block types
- my @type_sequence = (); # stack of output type sequence numbers
- my @tokens = (); # output tokens
- my @levels = (); # structural brace levels of output tokens
- my @ci_string = (); # string needed to compute continuation indentation
-
- # Count the number of '1's in the string (previously sub ones_count)
- my $ci_string_sum = ( my $str = $ci_string_in_tokenizer ) =~ tr/1/0/;
+ # Arrays to hold token values for this line:
+ my (
+ @output_levels, @output_block_type, @output_type_sequence,
+ @output_token_type, @output_tokens
+ );
$line_of_tokens->{_nesting_tokens_0} = $nesting_token_string;
- my ( $ci_string_i, $level_i );
+ # Remember starting nesting block string
+ my $nesting_block_string_0 = $nesting_block_string;
#-----------------
# Loop over tokens
#-----------------
- my $rtoken_map_im;
- foreach my $i ( @{$routput_token_list} ) {
-
- my $type_i = $routput_token_type->[$i];
- $level_i = $level_in_tokenizer;
+ # $i is the index of the pretoken which starts this full token
+ foreach my $ii ( @{$routput_token_list} ) {
- # Quick handling of indentation levels for blanks and comments
- if ( $type_i eq 'b' || $type_i eq '#' ) {
- $ci_string_i = $ci_string_sum + $in_statement_continuation;
- }
+ my $type_i = $routput_token_type->[$ii];
- # All other types
- else {
+ #----------------------------------------
+ # Section 1. Handle a non-sequenced token
+ #----------------------------------------
+ if ( !$routput_type_sequence->[$ii] ) {
- # $tok_i is the PRE-token. It only equals the token for symbols
- my $tok_i = $rtokens->[$i];
+ #-------------------------------
+ # Section 1.1. types ';' and 't'
+ #-------------------------------
+ # - output anonymous 'sub' as keyword (type 'k')
+ # - output __END__, __DATA__, and format as type 'k' instead
+ # of ';' to make html colors correct, etc.
+ if ( $is_semicolon_or_t{$type_i} ) {
+ my $tok_i = $rtokens->[$ii];
+ if ( $is_END_DATA_format_sub{$tok_i} ) {
+ $type_i = 'k';
+ }
+ }
- # Check for an invalid token type..
+ #----------------------------------------------
+ # Section 1.2. Check for an invalid token type.
+ #----------------------------------------------
# This can happen by running perltidy on non-scripts although
# it could also be bug introduced by programming change. Perl
# silently accepts a 032 (^Z) and takes it as the end
- if ( !$is_valid_token_type{$type_i} ) {
+ elsif ( !$is_valid_token_type{$type_i} ) {
my $val = ord($type_i);
- warning(
+ $self->warning(
"unexpected character decimal $val ($type_i) in script\n"
);
$self->[_in_error_] = 1;
}
+ else {
+ # valid token type other than ; and t
+ }
+
+ #----------------------------------------------------
+ # Section 1.3. Store values for a non-sequenced token
+ #----------------------------------------------------
+ push( @output_levels, $level_in_tokenizer );
+ push( @output_block_type, EMPTY_STRING );
+ push( @output_type_sequence, EMPTY_STRING );
+ push( @output_token_type, $type_i );
- # $ternary_indentation_flag indicates that we need a change
+ }
+
+ #------------------------------------
+ # Section 2. Handle a sequenced token
+ # One of { [ ( ? : ) ] }
+ #------------------------------------
+ else {
+
+ # $level_i is the level we will store. Levels of braces are
+ # set so that the leading braces have a HIGHER level than their
+ # CONTENTS, which is convenient for indentation.
+ my $level_i = $level_in_tokenizer;
+
+ # $tok_i is the PRE-token. It only equals the token for symbols
+ my $tok_i = $rtokens->[$ii];
+
+ # $routput_indent_flag->[$ii] indicates that we need a change
# in level at a nested ternary, as follows
# 1 => at a nested ternary ?
# -1 => at a nested ternary :
# 0 => otherwise
- my $ternary_indentation_flag = $routput_indent_flag->[$i];
-
- #-------------------------------------------
- # Section 1: handle a level-increasing token
- #-------------------------------------------
- # set primary indentation levels based on structural braces
- # Note: these are set so that the leading braces have a HIGHER
- # level than their CONTENTS, which is convenient for indentation
- # Also, define continuation indentation for each token.
- if ( $type_i eq '{'
- || $type_i eq 'L'
- || $ternary_indentation_flag > 0 )
- {
-
- # if the difference between total nesting levels is not 1,
- # there are intervening non-structural nesting types between
- # this '{' and the previous unclosed '{'
- my $intervening_secondary_structure = 0;
- if ( @{$rslevel_stack} ) {
- $intervening_secondary_structure =
- $slevel_in_tokenizer - $rslevel_stack->[-1];
- }
- # save the current states
- push( @{$rslevel_stack}, 1 + $slevel_in_tokenizer );
- $level_in_tokenizer++;
+ #--------------------------------------------
+ # Section 2.1 Handle a level-increasing token
+ #--------------------------------------------
+ if ( $is_opening_or_ternary_type{$type_i} ) {
- if ( $level_in_tokenizer > $self->[_maximum_level_] ) {
- $self->[_maximum_level_] = $level_in_tokenizer;
- }
+ if ( $type_i eq '?' ) {
- if ($ternary_indentation_flag) {
+ if ( $routput_indent_flag->[$ii] > 0 ) {
+ $level_in_tokenizer++;
- # break BEFORE '?' in a nested ternary
- if ( $type_i eq '?' ) {
+ # break BEFORE '?' in a nested ternary
$level_i = $level_in_tokenizer;
- }
+ $nesting_block_string .= "$nesting_block_flag";
- $nesting_block_string .= "$nesting_block_flag";
- } ## end if ($ternary_indentation_flag)
- else {
-
- if ( $routput_block_type->[$i] ) {
- $nesting_block_flag = 1;
- $nesting_block_string .= '1';
- }
- else {
- $nesting_block_flag = 0;
- $nesting_block_string .= '0';
}
}
+ else {
- # we will use continuation indentation within containers
- # which are not blocks and not logical expressions
- my $bit = 0;
- if ( !$routput_block_type->[$i] ) {
+ $nesting_token_string .= $tok_i;
- # propagate flag down at nested open parens
- if ( $routput_container_type->[$i] eq '(' ) {
- $bit = 1 if $nesting_list_flag;
- }
+ if ( $type_i eq '{' || $type_i eq 'L' ) {
- # use list continuation if not a logical grouping
- # /^(if|elsif|unless|while|and|or|not|&&|!|\|\||for|foreach)$/
- else {
- $bit = 1
- unless
- $is_logical_container{ $routput_container_type
- ->[$i] };
+ $level_in_tokenizer++;
+
+ if ( $routput_block_type->[$ii] ) {
+ $nesting_block_flag = 1;
+ $nesting_block_string .= '1';
+ }
+ else {
+ $nesting_block_flag = 0;
+ $nesting_block_string .= '0';
+ }
}
}
- $nesting_list_string .= $bit;
- $nesting_list_flag = $bit;
-
- $ci_string_in_tokenizer .=
- ( $intervening_secondary_structure != 0 ) ? '1' : '0';
- $ci_string_sum =
- ( my $str = $ci_string_in_tokenizer ) =~ tr/1/0/;
- $continuation_string_in_tokenizer .=
- ( $in_statement_continuation > 0 ) ? '1' : '0';
-
- # Sometimes we want to give an opening brace
- # continuation indentation, and sometimes not. For code
- # blocks, we don't do it, so that the leading '{' gets
- # outdented, like this:
- #
- # if ( !$output_block_type[$i]
- # && ($in_statement_continuation) )
- # { <--outdented
- #
- # For other types, we will give them continuation
- # indentation. For example, here is how a list looks
- # with the opening paren indented:
- #
- # @LoL =
- # ( [ "fred", "barney" ], [ "george", "jane", "elroy" ],
- # [ "homer", "marge", "bart" ], );
- #
- # This looks best when 'ci' is one-half of the
- # indentation (i.e., 2 and 4)
-
- my $total_ci = $ci_string_sum;
- if (
- !$routput_block_type->[$i] # patch: skip for BLOCK
- && ($in_statement_continuation)
- && !( $ternary_indentation_flag && $type_i eq ':' )
- )
- {
- $total_ci += $in_statement_continuation
- unless (
- substr( $ci_string_in_tokenizer, -1 ) eq '1' );
- }
-
- $ci_string_i = $total_ci;
- $in_statement_continuation = 0;
- } ## end if ( $type_i eq '{' ||...})
-
- #-------------------------------------------
- # Section 2: handle a level-decreasing token
- #-------------------------------------------
- elsif ($type_i eq '}'
- || $type_i eq 'R'
- || $ternary_indentation_flag < 0 )
- {
-
- # only a nesting error in the script would prevent
- # popping here
- if ( @{$rslevel_stack} > 1 ) { pop( @{$rslevel_stack} ); }
+ }
- $level_i = --$level_in_tokenizer;
+ #---------------------------------------------
+ # Section 2.2. Handle a level-decreasing token
+ #---------------------------------------------
+ elsif ( $is_closing_or_ternary_type{$type_i} ) {
- if ( $level_in_tokenizer < 0 ) {
- unless ( $self->[_saw_negative_indentation_] ) {
- $self->[_saw_negative_indentation_] = 1;
- warning("Starting negative indentation\n");
+ if ( $type_i ne ':' ) {
+ my $char = chop $nesting_token_string;
+ if ( $char ne $matching_start_token{$tok_i} ) {
+ $nesting_token_string .= $char . $tok_i;
}
}
- # restore previous level values
- if ( length($nesting_block_string) > 1 )
- { # true for valid script
- chop $nesting_block_string;
- $nesting_block_flag =
- substr( $nesting_block_string, -1 ) eq '1';
- chop $nesting_list_string;
- $nesting_list_flag =
- substr( $nesting_list_string, -1 ) eq '1';
-
- chop $ci_string_in_tokenizer;
- $ci_string_sum =
- ( my $str = $ci_string_in_tokenizer ) =~ tr/1/0/;
-
- $in_statement_continuation =
- chop $continuation_string_in_tokenizer;
-
- # zero continuation flag at terminal BLOCK '}' which
- # ends a statement.
- my $block_type_i = $routput_block_type->[$i];
- if ($block_type_i) {
-
- # ...These include non-anonymous subs
- # note: could be sub ::abc { or sub 'abc
- if ( substr( $block_type_i, 0, 3 ) eq 'sub'
- && $block_type_i =~ m/^sub\s*/gc )
- {
-
- # note: older versions of perl require the /gc
- # modifier here or else the \G does not work.
- $in_statement_continuation = 0
- if ( $block_type_i =~ /\G('|::|\w)/gc );
- }
+ if (
+ $type_i eq '}'
+ || $type_i eq 'R'
- # ...and include all block types except user subs
- # with block prototypes and these:
- # (sort|grep|map|do|eval)
- elsif (
- $is_zero_continuation_block_type{$block_type_i}
- )
- {
- $in_statement_continuation = 0;
- }
+ # only the second and higher ? : have levels
+ || $type_i eq ':' && $routput_indent_flag->[$ii] < 0
+ )
+ {
- # ..but these are not terminal types:
- # /^(sort|grep|map|do|eval)$/ )
- elsif ($is_sort_map_grep_eval_do{$block_type_i}
- || $is_grep_alias{$block_type_i} )
- {
- }
+ $level_i = --$level_in_tokenizer;
- # ..and a block introduced by a label
- # /^\w+\s*:$/gc ) {
- elsif ( $block_type_i =~ /:$/ ) {
- $in_statement_continuation = 0;
+ if ( $level_in_tokenizer < 0 ) {
+ if ( !$self->[_saw_negative_indentation_] ) {
+ $self->[_saw_negative_indentation_] = 1;
+ $self->warning(
+ "Starting negative indentation\n");
}
+ }
- # user function with block prototype
- else {
- $in_statement_continuation = 0;
- }
- } ## end if ($block_type_i)
-
- # If we are in a list, then
- # we must set continuation indentation at the closing
- # paren of something like this (paren after $check):
- # assert(
- # __LINE__,
- # ( not defined $check )
- # or ref $check
- # or $check eq "new"
- # or $check eq "old",
- # );
- elsif ( $tok_i eq ')' ) {
- $in_statement_continuation = 1
- if (
- $is_list_end_type{
- $routput_container_type->[$i]
- }
- );
- ##if $routput_container_type->[$i] =~ /^[;,\{\}]$/;
+ # restore previous level values
+ if ( length($nesting_block_string) > 1 )
+ { # true for valid script
+ chop $nesting_block_string;
+ $nesting_block_flag =
+ substr( $nesting_block_string, -1 ) eq '1';
}
- } ## end if ( length($nesting_block_string...))
- $ci_string_i = $ci_string_sum + $in_statement_continuation;
- } ## end elsif ( $type_i eq '}' ||...{)
+ }
+ }
- #-----------------------------------------
- # Section 3: handle a constant level token
- #-----------------------------------------
+ #-----------------------------------------------------
+ # Section 2.3. Unexpected sequenced token type - error
+ #-----------------------------------------------------
else {
- # zero the continuation indentation at certain tokens so
- # that they will be at the same level as its container. For
- # commas, this simplifies the -lp indentation logic, which
- # counts commas. For ?: it makes them stand out.
- if (
- $nesting_list_flag
- ## $type_i =~ /^[,\?\:]$/
- && $is_comma_question_colon{$type_i}
- )
- {
- $in_statement_continuation = 0;
- }
+ # The tokenizer should only be assigning sequence numbers
+ # to types { [ ( ? ) ] } :
+ DEVEL_MODE && $self->Fault(<<EOM);
+unexpected sequence number on token type $type_i with pre-tok=$tok_i
+EOM
+ }
- # Be sure binary operators get continuation indentation.
- # Note: the check on $nesting_block_flag is only needed
- # to add ci to binary operators following a 'try' block,
- # or similar extended syntax block operator (see c158).
- if (
- !$in_statement_continuation
- && ( $nesting_block_flag || $nesting_list_flag )
- && ( $type_i eq 'k' && $is_binary_keyword{$tok_i}
- || $is_binary_type{$type_i} )
- )
- {
- $in_statement_continuation = 1;
- }
+ #------------------------------------------------
+ # Section 2.4. Store values for a sequenced token
+ #------------------------------------------------
- # continuation indentation is sum of any open ci from
- # previous levels plus the current level
- $ci_string_i = $ci_string_sum + $in_statement_continuation;
+ # The starting nesting block string, which is used in any .LOG
+ # output, should include the first token of the line
+ if ( !@output_levels ) {
+ $nesting_block_string_0 = $nesting_block_string;
+ }
- # update continuation flag ...
+ # Store values for a sequenced token
+ push( @output_levels, $level_i );
+ push( @output_block_type, $routput_block_type->[$ii] );
+ push( @output_type_sequence, $routput_type_sequence->[$ii] );
+ push( @output_token_type, $type_i );
- # if we are in a BLOCK
- if ($nesting_block_flag) {
+ }
+ } ## End loop to over tokens
- # the next token after a ';' and label starts a new stmt
- if ( $type_i eq ';' || $type_i eq 'J' ) {
- $in_statement_continuation = 0;
- }
+ #---------------------
+ # Post-loop operations
+ #---------------------
- # otherwise, we are continuing the current statement
- else {
- $in_statement_continuation = 1;
- }
- }
+ $line_of_tokens->{_nesting_blocks_0} = $nesting_block_string_0;
- # if we are not in a BLOCK..
- else {
+ # Form and store the tokens
+ if (@output_levels) {
- # do not use continuation indentation if not list
- # environment (could be within if/elsif clause)
- if ( !$nesting_list_flag ) {
- $in_statement_continuation = 0;
- }
+ my $im = shift @{$routput_token_list};
+ my $offset = $rtoken_map->[$im];
+ foreach my $ii ( @{$routput_token_list} ) {
+ my $numc = $rtoken_map->[$ii] - $offset;
+ push( @output_tokens, substr( $input_line, $offset, $numc ) );
+ $offset += $numc;
- # otherwise, the token after a ',' starts a new term
+ # programming note: it seems most efficient to 'next' out of
+ # a critical loop like this as early as possible. So instead
+ # of 'if ( DEVEL_MODE && $numc < 0 )' we write:
+ next unless DEVEL_MODE;
+ next if ( $numc > 0 );
- # Patch FOR RT#99961; no continuation after a ';'
- # This is needed because perltidy currently marks
- # a block preceded by a type character like % or @
- # as a non block, to simplify formatting. But these
- # are actually blocks and can have semicolons.
- # See code_block_type() and is_non_structural_brace().
- elsif ( $type_i eq ',' || $type_i eq ';' ) {
- $in_statement_continuation = 0;
- }
-
- # otherwise, we are continuing the current term
- else {
- $in_statement_continuation = 1;
- }
- } ## end else [ if ($nesting_block_flag)]
+ # Should not happen unless @{$rtoken_map} is corrupted
+ $self->Fault(
+ "number of characters is '$numc' but should be >0\n");
+ }
- } ## end else [ if ( $type_i eq '{' ||...})]
+ # Form and store the final token of this line
+ my $numc = length($input_line) - $offset;
+ push( @output_tokens, substr( $input_line, $offset, $numc ) );
- #-------------------------------------------
- # Section 4: operations common to all levels
- #-------------------------------------------
+ if (DEVEL_MODE) {
+ if ( $numc <= 0 ) {
- # set secondary nesting levels based on all containment token
- # types Note: these are set so that the nesting depth is the
- # depth of the PREVIOUS TOKEN, which is convenient for setting
- # the strength of token bonds
+ # check '$rtoken_map' and '$routput_token_list'
+ $self->Fault(
+ "Number of Characters is '$numc' but should be >0\n");
+ }
- # /^[L\{\(\[]$/
- if ( $is_opening_type{$type_i} ) {
- $slevel_in_tokenizer++;
- $nesting_token_string .= $tok_i;
- $nesting_type_string .= $type_i;
+ # Make sure we didn't gain or lose any characters
+ my $test_line = join EMPTY_STRING, @output_tokens;
+ if ( $test_line ne $input_line ) {
+ my $len_input = length($input_line);
+ my $len_test = length($test_line);
+
+ # check '$rtoken_map' and '$routput_token_list'
+ $self->Fault(<<EOM);
+Reconstruted line difers from input; input_length=$len_input test_length=$len_test
+input:'$input_line'
+test :'$test_line'
+EOM
}
+ }
+ }
+
+ # Wrap up this line of tokens for shipping to the Formatter
+ $line_of_tokens->{_rtoken_type} = \@output_token_type;
+ $line_of_tokens->{_rtokens} = \@output_tokens;
+ $line_of_tokens->{_rblock_type} = \@output_block_type;
+ $line_of_tokens->{_rtype_sequence} = \@output_type_sequence;
+ $line_of_tokens->{_rlevels} = \@output_levels;
+
+ #-----------------------------------------------------------------
+ # Compare input indentation with computed levels at closing braces
+ #-----------------------------------------------------------------
+ # This may provide a useful hint for error location if the file
+ # is not balanced in braces. Closing braces are used because they
+ # have a well-defined indentation and can be processed efficiently.
+ if ( $output_tokens[0] eq '}' ) {
+
+ my $blk = $output_block_type[0];
+ if (
+ (
+ # builtin block types without continuation indentation
+ $is_zero_continuation_block_type{$blk}
- # /^[R\}\)\]]$/
- elsif ( $is_closing_type{$type_i} ) {
- $slevel_in_tokenizer--;
- my $char = chop $nesting_token_string;
+ # or a named sub, but skip sub aliases for efficiency,
+ # since this is just for diagnostic info
+ || substr( $blk, 0, 4 ) eq 'sub '
+ )
- if ( $char ne $matching_start_token{$tok_i} ) {
- $nesting_token_string .= $char . $tok_i;
- $nesting_type_string .= $type_i;
- }
- else {
- chop $nesting_type_string;
- }
- }
+ # and we are not in format skipping
+ && !$self->[_in_format_skipping_]
+ )
+ {
- # apply token type patch:
- # - output anonymous 'sub' as keyword (type 'k')
- # - output __END__, __DATA__, and format as type 'k' instead
- # of ';' to make html colors correct, etc.
- # The following hash tests are equivalent to these older tests:
- # if ( $type_i eq 't' && $is_sub{$tok_i} ) { $fix_type = 'k' }
- # if ( $type_i eq ';' && $tok_i =~ /\w/ ) { $fix_type = 'k' }
- if ( $is_END_DATA_format_sub{$tok_i}
- && $is_semicolon_or_t{$type_i} )
+ # subtract 1 space for newline in untrimmed line
+ my $untrimmed_input_line = $line_of_tokens->{_line_text};
+ my $space_count =
+ length($untrimmed_input_line) - length($input_line) - 1;
+
+ # check for tabs
+ if ( $space_count
+ && ord( substr( $untrimmed_input_line, 0, 1 ) ) == ORD_TAB )
{
- $type_i = 'k';
+ if ( $untrimmed_input_line =~ /^(\t+)?(\s+)?/ ) {
+ if ($1) { $space_count += length($1) * $tabsize }
+ if ($2) { $space_count += length($2) }
+ }
}
- } ## end else [ if ( $type_i eq 'b' ||...)]
- #--------------------------------
- # Store the values for this token
- #--------------------------------
- push( @ci_string, $ci_string_i );
- push( @levels, $level_i );
- push( @block_type, $routput_block_type->[$i] );
- push( @type_sequence, $routput_type_sequence->[$i] );
- push( @token_type, $type_i );
+ # '$guess' = the level according to indentation
+ my $guess = int( $space_count / $rOpts_indent_columns );
- # Form and store the PREVIOUS token
- if ( defined($rtoken_map_im) ) {
- my $numc =
- $rtoken_map->[$i] - $rtoken_map_im; # how many characters
+ # subtract 1 level from guess for --indent-closing-brace
+ $guess -= 1 if ($rOpts_indent_closing_brace);
- if ( $numc > 0 ) {
- push( @tokens,
- substr( $input_line, $rtoken_map_im, $numc ) );
- }
- else {
+ # subtract 1 from $level for each non-indenting brace level
+ my $adjust = @{ $self->[_rnon_indenting_brace_stack_] };
- # Should not happen unless @{$rtoken_map} is corrupted
- DEVEL_MODE
- && Fault(
- "number of characters is '$numc' but should be >0\n");
- }
- }
+ my $level = $output_levels[0];
- # or grab some values for the leading token (needed for log output)
- else {
- $line_of_tokens->{_nesting_blocks_0} = $nesting_block_string;
- }
+ # find the difference between expected and indentation guess
+ my $level_diff = $level - $adjust - $guess;
- $rtoken_map_im = $rtoken_map->[$i];
- } ## end foreach my $i ( @{$routput_token_list...})
+ my $rhash = $self->[_rclosing_brace_indentation_hash_];
- #------------------------
- # End loop to over tokens
- #------------------------
+ # results are only valid if we guess correctly at the
+ # first spaced brace
+ if ( $space_count && !defined( $rhash->{valid} ) ) {
+ $rhash->{valid} = !$level_diff;
+ }
- # Form and store the final token of this line
- if ( defined($rtoken_map_im) ) {
- my $numc = length($input_line) - $rtoken_map_im;
- if ( $numc > 0 ) {
- push( @tokens, substr( $input_line, $rtoken_map_im, $numc ) );
- }
- else {
+ # save the result
+ my $rhistory_line_number = $rhash->{rhistory_line_number};
+ my $rhistory_level_diff = $rhash->{rhistory_level_diff};
+ my $rhistory_anchor_point = $rhash->{rhistory_anchor_point};
+
+ if ( $rhistory_level_diff->[-1] != $level_diff ) {
+
+ # Patch for non-indenting-braces: if we guess zero and
+ # match before all non-indenting braces have been found,
+ # it means that we would need negative indentation to
+ # match if/when the brace is found. So we have a problem
+ # from here on. We indicate this with a value 2 instead
+ # of 1 as a signal to stop outputting the table here.
+ my $anchor = 1;
+ if ( $guess == 0 && $adjust > 0 ) { $anchor = 2 }
+
+ # add an anchor point
+ push @{$rhistory_level_diff}, $level_diff;
+ push @{$rhistory_line_number}, $input_line_number;
+ push @{$rhistory_anchor_point}, $anchor;
+ }
+ else {
- # Should not happen unless @{$rtoken_map} is corrupted
- DEVEL_MODE
- && Fault(
- "Number of Characters is '$numc' but should be >0\n");
+ # add a movable point following an anchor point
+ if ( $rhistory_anchor_point->[-1] ) {
+ push @{$rhistory_level_diff}, $level_diff;
+ push @{$rhistory_line_number}, $input_line_number;
+ push @{$rhistory_anchor_point}, 0;
+ }
+
+ # extend a movable point
+ else {
+ $rhistory_line_number->[-1] = $input_line_number;
+ }
+ }
}
}
- #----------------------------------------------------------
- # Wrap up this line of tokens for shipping to the Formatter
- #----------------------------------------------------------
- $line_of_tokens->{_rtoken_type} = \@token_type;
- $line_of_tokens->{_rtokens} = \@tokens;
- $line_of_tokens->{_rblock_type} = \@block_type;
- $line_of_tokens->{_rtype_sequence} = \@type_sequence;
- $line_of_tokens->{_rlevels} = \@levels;
- $line_of_tokens->{_rci_levels} = \@ci_string;
-
return;
} ## end sub tokenizer_wrapup_line
+
} ## end tokenize_this_line
#######################################################################
# Tokenizer routines which assist in identifying token types
#######################################################################
-# hash lookup table of operator expected values
-my %op_expected_table;
+# Define Global '%op_expected_table'
+# = hash table of operator expected values based on last nonblank token
# exceptions to perl's weird parsing rules after type 'Z'
my %is_weird_parsing_rule_exception;
# Always expecting TERM following these types:
# note: this is identical to '@value_requestor_type' defined later.
- my @q = qw(
- ; ! + x & ? F J - p / Y : % f U ~ A G j L * . | ^ < = [ m { \ > t
+ # Fix for c250: add new type 'P' for package (expecting VERSION or {}
+ # after package NAMESPACE, so expecting TERM)
+ # Fix for c250: add new type 'S' for sub (not expecting operator)
+ my @q = qw#
+ ; ! + x & ? F J - p / Y : % f U ~ A G j L P S * . | ^ < = [ m { \ > t
|| >= != mm *= => .. !~ == && |= .= pp -= =~ += <= %= ^= x= ~~ ** << /=
&= // >> ~. &. |. ^.
... **= <<= >>= &&= ||= //= <=> !~~ &.= |.= ^.= <<~
- );
+ #;
push @q, ',';
push @q, '('; # for completeness, not currently a token type
push @q, '->'; # was previously in UNKNOWN
- @{op_expected_table}{@q} = (TERM) x scalar(@q);
+ @op_expected_table{@q} = (TERM) x scalar(@q);
- # Always UNKNOWN following these types;
- # previously had '->' in this list for c030
- @q = qw( w );
- @{op_expected_table}{@q} = (UNKNOWN) x scalar(@q);
+ # No UNKNOWN table types:
+ # removed '->' for c030, now always TERM
+ # removed 'w' for c392 to allow use of 'function_count' info in the sub
# Always expecting OPERATOR ...
# 'n' and 'v' are currently excluded because they might be VERSION numbers
# 'i' is currently excluded because it might be a package
# 'q' is currently excluded because it might be a prototype
# Fix for c030: removed '->' from this list:
- @q = qw( -- C h R ++ ] Q <> ); ## n v q i );
+ # Fix for c250: added 'i' because new type 'P' was added
+ @q = qw( -- C h R ++ ] Q <> i );
push @q, ')';
- @{op_expected_table}{@q} = (OPERATOR) x scalar(@q);
+ @op_expected_table{@q} = (OPERATOR) x scalar(@q);
# Fix for git #62: added '*' and '%'
@q = qw( < ? * % );
- @{is_weird_parsing_rule_exception}{@q} = (1) x scalar(@q);
+ @is_weird_parsing_rule_exception{@q} = (1) x scalar(@q);
@q = qw<) $>;
- @{is_paren_dollar}{@q} = (1) x scalar(@q);
+ @is_paren_dollar{@q} = (1) x scalar(@q);
@q = qw( n v );
- @{is_n_v}{@q} = (1) x scalar(@q);
+ @is_n_v{@q} = (1) x scalar(@q);
} ## end BEGIN
sub operator_expected {
+ my ( $self, $tok, $next_type, $blank_after_Z ) = @_;
+
# Returns a parameter indicating what types of tokens can occur next
# Call format:
- # $op_expected = operator_expected( [ $prev_type, $tok, $next_type ] );
+ # $op_expected =
+ # $self->operator_expected( $tok, $next_type, $blank_after_Z );
# where
- # $prev_type is the type of the previous token (blank or not)
# $tok is the current token
# $next_type is the type of the next token (blank or not)
+ # $blank_after_Z = flag for guessing after a type 'Z':
+ # true if $tok follows type 'Z' with intermediate blank
+ # false if $tok follows type 'Z' with no intermediate blank
+ # ignored if $tok does not follow type 'Z'
# Many perl symbols have two or more meanings. For example, '<<'
# can be a shift operator or a here-doc operator. The
# the 'operator_expected' value by a simple hash lookup. If there are
# exceptions, that is an indication that a new type is needed.
- my ($rarg) = @_;
-
- #-------------
- # Table lookup
- #-------------
+ #--------------------------------------------
+ # Section 1: Table lookup will get most cases
+ #--------------------------------------------
- # Many types are can be obtained by a table lookup given the previous type.
- # This typically handles half or more of the calls.
+ # Many types are can be obtained by a table lookup. This typically handles
+ # more than half of the calls. For speed, the caller may try table lookup
+ # first before calling this sub.
my $op_expected = $op_expected_table{$last_nonblank_type};
if ( defined($op_expected) ) {
DEBUG_OPERATOR_EXPECTED
- && print STDOUT
+ && print {*STDOUT}
"OPERATOR_EXPECTED: Table Lookup; returns $op_expected for last type $last_nonblank_type token $last_nonblank_token\n";
return $op_expected;
}
- #---------------------
- # Handle special cases
- #---------------------
+ DEBUG_OPERATOR_EXPECTED
+ && print {*STDOUT}
+"OPERATOR_EXPECTED: in hardwired table for last type $last_nonblank_type token $last_nonblank_token\n";
- $op_expected = UNKNOWN;
- my ( $prev_type, $tok, $next_type ) = @{$rarg};
+ #---------------------------------------------
+ # Section 2: Handle special cases if necessary
+ #---------------------------------------------
# Types 'k', '}' and 'Z' depend on context
- # Types 'i', 'n', 'v', 'q' currently also temporarily depend on context.
+ # Types 'n', 'v', 'q' also depend on context.
# identifier...
- if ( $last_nonblank_type eq 'i' ) {
- $op_expected = OPERATOR;
-
- # TODO: it would be cleaner to make this a special type
- # expecting VERSION or {} after package NAMESPACE;
- # maybe mark these words as type 'Y'?
- if ( substr( $last_nonblank_token, 0, 7 ) eq 'package'
- && $statement_type =~ /^package\b/
- && $last_nonblank_token =~ /^package\b/ )
- {
- $op_expected = TERM;
- }
- }
+ # Fix for c250: removed coding for type 'i' because 'i' and new type 'P'
+ # are now done by hash table lookup
- # keyword...
- elsif ( $last_nonblank_type eq 'k' ) {
- $op_expected = TERM;
- if ( $expecting_operator_token{$last_nonblank_token} ) {
- $op_expected = OPERATOR;
- }
- elsif ( $expecting_term_token{$last_nonblank_token} ) {
+ #--------------------
+ # Section 2A: keyword
+ #--------------------
+ if ( $last_nonblank_type eq 'k' ) {
+
+ # keywords expecting TERM:
+ if ( $expecting_term_token{$last_nonblank_token} ) {
# Exceptions from TERM:
$last_nonblank_token}
)
{
- $op_expected = OPERATOR;
+ return OPERATOR;
}
# Patch to allow a ? following 'split' to be a deprecated pattern
# from the list
# %is_keyword_rejecting_question_as_pattern_delimiter. This patch
# will force perltidy to guess.
- elsif ($tok eq '?'
+ if ( $tok eq '?'
&& $last_nonblank_token eq 'split' )
{
- $op_expected = UNKNOWN;
+ return UNKNOWN;
}
+
+ return TERM;
+ }
+
+ # keywords expecting OPERATOR:
+ if ( $expecting_operator_token{$last_nonblank_token} ) {
+ return OPERATOR;
}
+
+ return TERM;
+
} ## end type 'k'
- # closing container token...
+ #------------------------------------
+ # Section 2B: Closing container token
+ #------------------------------------
# Note that the actual token for type '}' may also be a ')'.
# $a = do { BLOCK } / 2;
# the $last_nonblank_token is 'do' when $last_nonblank_type eq '}'.
- elsif ( $last_nonblank_type eq '}' ) {
- $op_expected = UNKNOWN;
+ if ( $last_nonblank_type eq '}' ) {
# handle something after 'do' and 'eval'
if ( $is_block_operator{$last_nonblank_token} ) {
# something like $a = do { BLOCK } / 2;
- $op_expected = OPERATOR; # block mode following }
+ return OPERATOR; # block mode following }
}
# $last_nonblank_token =~ /^(\)|\$|\-\>)/
- elsif ( $is_paren_dollar{ substr( $last_nonblank_token, 0, 1 ) }
+ if ( $is_paren_dollar{ substr( $last_nonblank_token, 0, 1 ) }
|| substr( $last_nonblank_token, 0, 2 ) eq '->' )
{
- $op_expected = OPERATOR;
- if ( $last_nonblank_token eq '$' ) { $op_expected = UNKNOWN }
+ if ( $last_nonblank_token eq '$' ) { return UNKNOWN }
+ return OPERATOR;
}
# Check for smartmatch operator before preceding brace or square
#
# qr/3/ ~~ ['1234'] ? 1 : 0;
# map { $_ ~~ [ '0', '1' ] ? 'x' : 'o' } @a;
- elsif ( $last_nonblank_token eq '~~' ) {
- $op_expected = OPERATOR;
+ if ( $last_nonblank_token eq '~~' ) {
+ return OPERATOR;
}
# A right brace here indicates the end of a simple block. All
# block operator keywords have been given those keywords as
# "last_nonblank_token" and caught above. (This statement is order
# dependent, and must come after checking $last_nonblank_token).
- else {
- # patch for dor.t (defined or).
- if ( $tok eq '/'
- && $next_type eq '/'
- && $last_nonblank_token eq ']' )
- {
- $op_expected = OPERATOR;
- }
+ # patch for dor.t (defined or).
+ if ( $tok eq '/'
+ && $next_type eq '/'
+ && $last_nonblank_token eq ']' )
+ {
+ return OPERATOR;
+ }
- # Patch for RT #116344: misparse a ternary operator after an
- # anonymous hash, like this:
- # return ref {} ? 1 : 0;
- # The right brace should really be marked type 'R' in this case,
- # and it is safest to return an UNKNOWN here. Expecting a TERM will
- # cause the '?' to always be interpreted as a pattern delimiter
- # rather than introducing a ternary operator.
- elsif ( $tok eq '?' ) {
- $op_expected = UNKNOWN;
- }
- else {
- $op_expected = TERM;
- }
+ # Patch for RT #116344: misparse a ternary operator after an
+ # anonymous hash, like this:
+ # return ref {} ? 1 : 0;
+ # The right brace should really be marked type 'R' in this case,
+ # and it is safest to return an UNKNOWN here. Expecting a TERM will
+ # cause the '?' to always be interpreted as a pattern delimiter
+ # rather than introducing a ternary operator.
+ if ( $tok eq '?' ) {
+ return UNKNOWN;
}
+ return TERM;
+
} ## end type '}'
- # number or v-string...
+ #-------------------------------
+ # Section 2C: number or v-string
+ #-------------------------------
# An exception is for VERSION numbers a 'use' statement. It has the format
# use Module VERSION LIST
# We could avoid this exception by writing a special sub to parse 'use'
# statements and perhaps mark these numbers with a new type V (for VERSION)
- ##elsif ( $last_nonblank_type =~ /^[nv]$/ ) {
- elsif ( $is_n_v{$last_nonblank_type} ) {
- $op_expected = OPERATOR;
+ if ( $is_n_v{$last_nonblank_type} ) {
if ( $statement_type eq 'use' ) {
- $op_expected = UNKNOWN;
+ return UNKNOWN;
}
+ return OPERATOR;
}
- # quote...
+ #---------------------
+ # Section 2D: qw quote
+ #---------------------
# TODO: labeled prototype words would better be given type 'A' or maybe
# 'J'; not 'q'; or maybe mark as type 'Y'?
- elsif ( $last_nonblank_type eq 'q' ) {
- $op_expected = OPERATOR;
+ if ( $last_nonblank_type eq 'q' ) {
if ( $last_nonblank_token eq 'prototype' ) {
- $op_expected = TERM;
+ return TERM;
}
# update for --use-feature=class (rt145706):
# Look for class VERSION after possible attribute, as in
# class Example::Subclass : isa(Example::Base) 1.345 { ... }
- elsif ( $statement_type =~ /^package\b/ ) {
- $op_expected = TERM;
+ if ( $statement_type =~ /^package\b/ ) {
+ return TERM;
}
+
+ # everything else
+ return OPERATOR;
}
- # file handle or similar
- elsif ( $last_nonblank_type eq 'Z' ) {
+ # Section 2E: bareword
+ if ( $last_nonblank_type eq 'w' ) {
- $op_expected = UNKNOWN;
+ # see if this has been seen in the role of a function taking args
+ my $rinfo = $self->[_rbareword_info_]->{$current_package};
+ if ($rinfo) {
+ $rinfo = $rinfo->{$last_nonblank_token};
+ if ($rinfo) {
+ my $function_count = $rinfo->{function_count};
+ if ( $function_count && $function_count > 0 ) { return TERM }
+ }
+ }
+ return UNKNOWN;
+ }
+
+ #-----------------------------------
+ # Section 2F: file handle or similar
+ #-----------------------------------
+ if ( $last_nonblank_type eq 'Z' ) {
# angle.t
if ( $last_nonblank_token =~ /^\w/ ) {
- $op_expected = UNKNOWN;
+ return UNKNOWN;
}
# Exception to weird parsing rules for 'x(' ... see case b1205:
# In something like 'print $vv x(...' the x is an operator;
# Likewise in 'print $vv x$ww' the x is an operator (case b1207)
# otherwise x follows the weird parsing rules.
- elsif ( $tok eq 'x' && $next_type =~ /^[\(\$\@\%]$/ ) {
- $op_expected = OPERATOR;
+ if ( $tok eq 'x' && $next_type =~ /^[\(\$\@\%]$/ ) {
+ return OPERATOR;
}
# The 'weird parsing rules' of next section do not work for '<' and '?'
# It is best to mark them as unknown. Test case:
# print $fh <DATA>;
- elsif ( $is_weird_parsing_rule_exception{$tok} ) {
- $op_expected = UNKNOWN;
+ if ( $is_weird_parsing_rule_exception{$tok} ) {
+ return UNKNOWN;
}
# For possible file handle like "$a", Perl uses weird parsing rules.
# print $fh &xsi_protos(@mods);
# my $x = new $CompressClass *FH;
# print $OUT +( $count % 15 ? ", " : "\n\t" );
- elsif ($prev_type eq 'b'
+ if ( $blank_after_Z
&& $next_type ne 'b' )
{
- $op_expected = TERM;
+ return TERM;
}
# Note that '?' and '<' have been moved above
# ( $tok =~ /^([x\/\+\-\*\%\&\.\?\<]|\>\>)$/ ) {
- elsif ( $tok =~ /^([x\/\+\-\*\%\&\.]|\>\>)$/ ) {
+ if ( $tok =~ /^([x\/\+\-\*\%\&\.]|\>\>)$/ ) {
# Do not complain in 'use' statements, which have special syntax.
# For example, from RT#130344:
# use lib $FindBin::Bin . '/lib';
if ( $statement_type ne 'use' ) {
- complain(
+ $self->complain(
"operator in possible indirect object location not recommended\n"
);
}
- $op_expected = OPERATOR;
+ return OPERATOR;
}
- }
- # anything else...
- else {
- $op_expected = UNKNOWN;
- }
+ # all other cases
- DEBUG_OPERATOR_EXPECTED
- && print STDOUT
-"OPERATOR_EXPECTED: returns $op_expected for last type $last_nonblank_type token $last_nonblank_token\n";
+ return UNKNOWN;
+ }
- return $op_expected;
+ #--------------------------
+ # Section 2F: anything else
+ #--------------------------
+ return UNKNOWN;
} ## end sub operator_expected
sub new_statement_ok {
- # return true if the current token can start a new statement
- # USES GLOBAL VARIABLES: $last_nonblank_type
-
- return label_ok() # a label would be ok here
-
- || $last_nonblank_type eq 'J'; # or we follow a label
-
-} ## end sub new_statement_ok
-
-sub label_ok {
+ # Returns:
+ # true if a new statement can begin here
+ # false otherwise
- # Decide if a bare word followed by a colon here is a label
# USES GLOBAL VARIABLES: $last_nonblank_token, $last_nonblank_type,
- # $brace_depth, @brace_type
+ # $brace_depth, $rbrace_type
+
+ # Uses:
+ # - See if a 'class' statement can occur here
+ # - See if a keyword begins at a new statement; i.e. is an 'if' a
+ # block if or a trailing if? Also see if 'format' starts a statement.
+ # - Decide if a ':' is part of a statement label (not a ternary)
+
+ # Curly braces are tricky because some small blocks do not get marked as
+ # blocks..
+
+ # if it follows an opening curly brace..
+ if ( $last_nonblank_token eq '{' ) {
+
+ # The safe thing is to return true in all cases because:
+ # - a ternary ':' cannot occur here
+ # - an 'if' here, for example, cannot be a trailing if
+ # See test case c231 for an example.
+ # This works but could be improved, if necessary, by returning
+ # 'false' at obvious non-blocks.
+ return 1;
+ }
- # if it follows an opening or closing code block curly brace..
- if ( ( $last_nonblank_token eq '{' || $last_nonblank_token eq '}' )
+ # if it follows a closing code block curly brace..
+ elsif ($last_nonblank_token eq '}'
&& $last_nonblank_type eq $last_nonblank_token )
{
- # it is a label if and only if the curly encloses a code block
- return $brace_type[$brace_depth];
+ # a new statement can follow certain closing block braces ...
+ # FIXME: The following has worked well but returns true in some cases
+ # where it really should not. We could fix this by either excluding
+ # certain blocks, like sort/map/grep/eval/asub or by just including
+ # certain blocks.
+ return $rbrace_type->[$brace_depth];
}
# otherwise, it is a label if and only if it follows a ';' (real or fake)
else {
return ( $last_nonblank_type eq ';' || $last_nonblank_type eq 'J' );
}
-} ## end sub label_ok
+} ## end sub new_statement_ok
sub code_block_type {
+ my ( $self, $i, $rtokens, $rtoken_type, $max_token_index ) = @_;
+
# Decide if this is a block of code, and its type.
# Must be called only when $type = $token = '{'
# The problem is to distinguish between the start of a block of code
# to indicate the type of code block. (For example, 'last_nonblank_token'
# might be 'if' for an if block, 'else' for an else block, etc).
# USES GLOBAL VARIABLES: $last_nonblank_token, $last_nonblank_type,
- # $last_nonblank_block_type, $brace_depth, @brace_type
+ # $last_nonblank_block_type, $brace_depth, $rbrace_type
# handle case of multiple '{'s
# print "BLOCK_TYPE EXAMINING: type=$last_nonblank_type tok=$last_nonblank_token\n";
- my ( $i, $rtokens, $rtoken_type, $max_token_index ) = @_;
if ( $last_nonblank_token eq '{'
&& $last_nonblank_type eq $last_nonblank_token )
{
# opening brace where a statement may appear is probably
# a code block but might be and anonymous hash reference
- if ( $brace_type[$brace_depth] ) {
- return decide_if_code_block( $i, $rtokens, $rtoken_type,
+ if ( $rbrace_type->[$brace_depth] ) {
+ return $self->decide_if_code_block( $i, $rtokens, $rtoken_type,
$max_token_index );
}
# an opening brace where a statement may appear is probably
# a code block but might be and anonymous hash reference
- return decide_if_code_block( $i, $rtokens, $rtoken_type,
+ return $self->decide_if_code_block( $i, $rtokens, $rtoken_type,
$max_token_index );
}
# a } { situation ...
# could be hash reference after code block..(blktype1.t)
if ($last_nonblank_block_type) {
- return decide_if_code_block( $i, $rtokens, $rtoken_type,
+ return $self->decide_if_code_block( $i, $rtokens, $rtoken_type,
$max_token_index );
}
return $last_nonblank_token;
}
-# otherwise, look at previous token. This must be a code block if
-# it follows any of these:
-# /^(BEGIN|END|CHECK|INIT|AUTOLOAD|DESTROY|UNITCHECK|continue|if|elsif|else|unless|do|while|until|eval|for|foreach|map|grep|sort)$/
+ # otherwise, see if a block must follow the previous token (such as 'if'):
elsif ($is_code_block_token{$last_nonblank_token}
|| $is_grep_alias{$last_nonblank_token} )
{
}
# or a sub or package BLOCK
- elsif ( ( $last_nonblank_type eq 'i' || $last_nonblank_type eq 't' )
- && $last_nonblank_token =~ /^(sub|package)\b/ )
+ # Fixed for c250 to include new package type 'P', and change 'i' to 'S'
+ elsif (
+ $last_nonblank_type eq 'P'
+ || $last_nonblank_type eq 'S'
+ || ( $last_nonblank_type eq 't'
+ && substr( $last_nonblank_token, 0, 3 ) eq 'sub' )
+ )
{
return $last_nonblank_token;
}
- # or a sub alias
- elsif (( $last_nonblank_type eq 'i' || $last_nonblank_type eq 't' )
- && ( $is_sub{$last_nonblank_token} ) )
- {
- return 'sub';
- }
-
elsif ( $statement_type =~ /^(sub|package)\b/ ) {
return $statement_type;
}
# This fixes b1022 b1025 b1027 b1028 b1029 b1030 b1031
return EMPTY_STRING if ( $statement_type eq 'use' );
- return decide_if_code_block( $i, $rtokens, $rtoken_type,
+ return $self->decide_if_code_block( $i, $rtokens, $rtoken_type,
$max_token_index );
}
# map( { foreach my $item ( '0', '1' ) { print $item} } qw(a b c) );
# Check for a code block within a parenthesized function call
elsif ( $last_nonblank_token eq '(' ) {
- my $paren_type = $paren_type[$paren_depth];
+ my $paren_type = $rparen_type->[$paren_depth];
# /^(map|grep|sort)$/
if ( $paren_type && $is_sort_map_grep{$paren_type} ) {
sub decide_if_code_block {
# USES GLOBAL VARIABLES: $last_nonblank_token
- my ( $i, $rtokens, $rtoken_type, $max_token_index ) = @_;
+ my ( $self, $i, $rtokens, $rtoken_type, $max_token_index ) = @_;
- my ( $next_nonblank_token, $i_next ) =
- find_next_nonblank_token( $i, $rtokens, $max_token_index );
+ my ( $next_nonblank_token, $i_next_uu ) =
+ $self->find_next_nonblank_token( $i, $rtokens, $max_token_index );
# we are at a '{' where a statement may appear.
# We must decide if this brace starts an anonymous hash or a code
@pre_types = @{$rtoken_type}[ $i + 1 .. $max_token_index ];
@pre_tokens = @{$rtokens}[ $i + 1 .. $max_token_index ];
}
+
+ # Here 20 is arbitrary but generous, and prevents wasting lots of time
+ # in mangled files
my ( $rpre_tokens, $rpre_types ) =
- peek_ahead_for_n_nonblank_pre_tokens(20); # 20 is arbitrary but
- # generous, and prevents
- # wasting lots of
- # time in mangled files
+ $self->peek_ahead_for_n_nonblank_pre_tokens(20);
if ( defined($rpre_types) && @{$rpre_types} ) {
push @pre_types, @{$rpre_types};
push @pre_tokens, @{$rpre_tokens};
foreach my $k ( $j + 1 .. @pre_types - 2 ) {
if ( $pre_types[$k] eq $quote_mark ) {
$j = $k + 1;
- ##my $next = $pre_types[$j];
last;
}
}
elsif ( $pre_types[$j] eq '-' && $pre_types[ ++$j ] eq 'w' ) {
$j++;
}
+ else {
+ # none of the above
+ }
if ( $j > $jbeg ) {
$j++ if $pre_types[$j] eq 'b';
# it is a comma which is not a pattern delimiter except for qw
(
$pre_types[$j] eq ','
- ## !~ /^(s|m|y|tr|qr|q|qq|qx)$/
&& !$is_q_qq_qx_qr_s_y_tr_m{ $pre_tokens[$jbeg] }
)
sub report_unexpected {
# report unexpected token type and show where it is
- # USES GLOBAL VARIABLES: $tokenizer_self
- my ( $found, $expecting, $i_tok, $last_nonblank_i, $rpretoken_map,
- $rpretoken_type, $input_line )
- = @_;
-
- if ( ++$tokenizer_self->[_unexpected_error_count_] <= MAX_NAG_MESSAGES ) {
+ # USES GLOBAL VARIABLES: (none)
+ my ( $self, $rcall_hash ) = @_;
+
+ my $found = $rcall_hash->{found};
+ my $expecting = $rcall_hash->{expecting};
+ my $i_tok = $rcall_hash->{i_tok};
+ my $last_nonblank_i = $rcall_hash->{last_nonblank_i};
+ my $rpretoken_map = $rcall_hash->{rpretoken_map};
+ my $rpretoken_type = $rcall_hash->{rpretoken_type};
+ my $input_line = $rcall_hash->{input_line};
+
+ if ( ++$self->[_unexpected_error_count_] <= MAX_NAG_MESSAGES ) {
my $msg = "found $found where $expecting expected";
my $pos = $rpretoken_map->[$i_tok];
- interrupt_logfile();
- my $input_line_number = $tokenizer_self->[_last_line_number_];
+ $self->interrupt_logfile();
+ my $input_line_number = $self->[_last_line_number_];
my ( $offset, $numbered_line, $underline ) =
make_numbered_line( $input_line_number, $input_line, $pos );
$underline = write_on_underline( $underline, $pos - $offset, '^' );
$trailer = " (previous token underlined)";
}
$underline =~ s/\s+$//;
- warning( $numbered_line . "\n" );
- warning( $underline . "\n" );
- warning( $msg . $trailer . "\n" );
- resume_logfile();
+ $self->warning( $numbered_line . "\n" );
+ $self->warning( $underline . "\n" );
+ $self->warning( $msg . $trailer . "\n" );
+ $self->resume_logfile();
}
return;
} ## end sub report_unexpected
BEGIN {
my @q = qw< $ & % * @ ) >;
- @{is_sigil_or_paren}{@q} = (1) x scalar(@q);
+ @is_sigil_or_paren{@q} = (1) x scalar(@q);
- @q = qw(R ]);
- @{is_R_closing_sb}{@q} = (1) x scalar(@q);
+ @q = qw( R ] );
+ @is_R_closing_sb{@q} = (1) x scalar(@q);
} ## end BEGIN
sub is_non_structural_brace {
#
# The matrix
#
-# $depth_array[$a][$b][ $current_depth[$a] ] = $current_depth[$b];
+# $rdepth_array->[$a][$b][ $rcurrent_depth->[$a] ] = $rcurrent_depth->[$b];
#
# saves the nesting depth of brace type $b (where $b is either of the other
# nesting types) when brace type $a enters a new depth. When this depth
# way.
sub increase_nesting_depth {
- my ( $aa, $pos ) = @_;
+ my ( $self, $aa, $pos ) = @_;
- # USES GLOBAL VARIABLES: $tokenizer_self, @current_depth,
- # @current_sequence_number, @depth_array, @starting_line_of_current_depth,
- # $statement_type
- $current_depth[$aa]++;
+ # Given:
+ # $aa = integer code of container type, 0-3
+ # $pos = position of character, for error message
+
+ # USES GLOBAL VARIABLES: $rcurrent_depth,
+ # $rcurrent_sequence_number, $rdepth_array,
+ # $rstarting_line_of_current_depth, $statement_type
+ my $cd_aa = ++$rcurrent_depth->[$aa];
$total_depth++;
- $total_depth[$aa][ $current_depth[$aa] ] = $total_depth;
- my $input_line_number = $tokenizer_self->[_last_line_number_];
- my $input_line = $tokenizer_self->[_line_of_text_];
+ $rtotal_depth->[$aa][$cd_aa] = $total_depth;
+ my $input_line_number = $self->[_last_line_number_];
+ my $input_line = $self->[_line_of_text_];
# Sequence numbers increment by number of items. This keeps
# a unique set of numbers but still allows the relative location
# make a new unique sequence number
my $seqno = $next_sequence_number++;
- $current_sequence_number[$aa][ $current_depth[$aa] ] = $seqno;
+ $rcurrent_sequence_number->[$aa][$cd_aa] = $seqno;
- $starting_line_of_current_depth[$aa][ $current_depth[$aa] ] =
+ $rstarting_line_of_current_depth->[$aa][$cd_aa] =
[ $input_line_number, $input_line, $pos ];
for my $bb ( 0 .. @closing_brace_names - 1 ) {
next if ( $bb == $aa );
- $depth_array[$aa][$bb][ $current_depth[$aa] ] = $current_depth[$bb];
+ $rdepth_array->[$aa][$bb][$cd_aa] = $rcurrent_depth->[$bb];
}
# set a flag for indenting a nested ternary statement
my $indent = 0;
if ( $aa == QUESTION_COLON ) {
- $nested_ternary_flag[ $current_depth[$aa] ] = 0;
- if ( $current_depth[$aa] > 1 ) {
- if ( $nested_ternary_flag[ $current_depth[$aa] - 1 ] == 0 ) {
- my $pdepth = $total_depth[$aa][ $current_depth[$aa] - 1 ];
+ $rnested_ternary_flag->[$cd_aa] = 0;
+ if ( $cd_aa > 1 ) {
+ if ( $rnested_ternary_flag->[ $cd_aa - 1 ] == 0 ) {
+ my $pdepth = $rtotal_depth->[$aa][ $cd_aa - 1 ];
if ( $pdepth == $total_depth - 1 ) {
$indent = 1;
- $nested_ternary_flag[ $current_depth[$aa] - 1 ] = -1;
+ $rnested_ternary_flag->[ $cd_aa - 1 ] = -1;
}
}
}
}
# Fix part #1 for git82: save last token type for propagation of type 'Z'
- $nested_statement_type[$aa][ $current_depth[$aa] ] =
+ $rnested_statement_type->[$aa][$cd_aa] =
[ $statement_type, $last_nonblank_type, $last_nonblank_token ];
$statement_type = EMPTY_STRING;
return ( $seqno, $indent );
sub is_balanced_closing_container {
+ my ($aa) = @_;
+
# Return true if a closing container can go here without error
# Return false if not
- my ($aa) = @_;
+ # Given:
+ # $aa = integer code of container type, 0-3
# cannot close if there was no opening
- return unless ( $current_depth[$aa] > 0 );
+ my $cd_aa = $rcurrent_depth->[$aa];
+ return if ( $cd_aa <= 0 );
# check that any other brace types $bb contained within would be balanced
for my $bb ( 0 .. @closing_brace_names - 1 ) {
next if ( $bb == $aa );
return
- unless ( $depth_array[$aa][$bb][ $current_depth[$aa] ] ==
- $current_depth[$bb] );
+ if ( $rdepth_array->[$aa][$bb][$cd_aa] != $rcurrent_depth->[$bb] );
}
# OK, everything will be balanced
sub decrease_nesting_depth {
- my ( $aa, $pos ) = @_;
+ my ( $self, $aa, $pos ) = @_;
- # USES GLOBAL VARIABLES: $tokenizer_self, @current_depth,
- # @current_sequence_number, @depth_array, @starting_line_of_current_depth
+ # Given:
+ # $aa = integer code of container type, 0-3
+ # $pos = position of character, for error message
+
+ # USES GLOBAL VARIABLES: $rcurrent_depth,
+ # $rcurrent_sequence_number, $rdepth_array, $rstarting_line_of_current_depth
# $statement_type
my $seqno = 0;
- my $input_line_number = $tokenizer_self->[_last_line_number_];
- my $input_line = $tokenizer_self->[_line_of_text_];
+ my $input_line_number = $self->[_last_line_number_];
+ my $input_line = $self->[_line_of_text_];
my $outdent = 0;
$total_depth--;
- if ( $current_depth[$aa] > 0 ) {
+ my $cd_aa = $rcurrent_depth->[$aa];
+ if ( $cd_aa > 0 ) {
# set a flag for un-indenting after seeing a nested ternary statement
- $seqno = $current_sequence_number[$aa][ $current_depth[$aa] ];
+ $seqno = $rcurrent_sequence_number->[$aa][$cd_aa];
if ( $aa == QUESTION_COLON ) {
- $outdent = $nested_ternary_flag[ $current_depth[$aa] ];
+ $outdent = $rnested_ternary_flag->[$cd_aa];
}
# Fix part #2 for git82: use saved type for propagation of type 'Z'
# through type L-R braces. Perl seems to allow ${bareword}
# as an indirect object, but nothing much more complex than that.
- ( $statement_type, my $saved_type, my $saved_token ) =
- @{ $nested_statement_type[$aa][ $current_depth[$aa] ] };
+ ( $statement_type, my $saved_type, my $saved_token_uu ) =
+ @{ $rnested_statement_type->[$aa][ $rcurrent_depth->[$aa] ] };
if ( $aa == BRACE
&& $saved_type eq 'Z'
&& $last_nonblank_type eq 'w'
- && $brace_structural_type[$brace_depth] eq 'L' )
+ && $rbrace_structural_type->[$brace_depth] eq 'L' )
{
$last_nonblank_type = $saved_type;
}
for my $bb ( 0 .. @closing_brace_names - 1 ) {
next if ( $bb == $aa );
- unless ( $depth_array[$aa][$bb][ $current_depth[$aa] ] ==
- $current_depth[$bb] )
- {
+ if ( $rdepth_array->[$aa][$bb][$cd_aa] != $rcurrent_depth->[$bb] ) {
my $diff =
- $current_depth[$bb] -
- $depth_array[$aa][$bb][ $current_depth[$aa] ];
+ $rcurrent_depth->[$bb] - $rdepth_array->[$aa][$bb][$cd_aa];
# don't whine too many times
- my $saw_brace_error = get_saw_brace_error();
+ my $saw_brace_error = $self->get_saw_brace_error();
if (
$saw_brace_error <= MAX_NAG_MESSAGES
&& ( ( $diff > 0 ) || ( $saw_brace_error <= 0 ) )
)
{
- interrupt_logfile();
- my $rsl =
- $starting_line_of_current_depth[$aa]
- [ $current_depth[$aa] ];
+ $self->interrupt_logfile();
+ my $rsl = $rstarting_line_of_current_depth->[$aa][$cd_aa];
my $sl = $rsl->[0];
my $rel = [ $input_line_number, $input_line, $pos ];
my $el = $rel->[0];
( $diff > 0 )
? $opening_brace_names[$bb]
: $closing_brace_names[$bb];
- write_error_indicator_pair( @{$rsl}, '^' );
+ $self->write_error_indicator_pair( @{$rsl}, '^' );
my $msg = <<"EOM";
Found $diff extra $bname$ess between $opening_brace_names[$aa] on line $sl and $closing_brace_names[$aa] on line $el
EOM
if ( $diff > 0 ) {
my $rml =
- $starting_line_of_current_depth[$bb]
- [ $current_depth[$bb] ];
+ $rstarting_line_of_current_depth->[$bb]
+ [ $rcurrent_depth->[$bb] ];
my $ml = $rml->[0];
$msg .=
" The most recent un-matched $bname is on line $ml\n";
- write_error_indicator_pair( @{$rml}, '^' );
+ $self->write_error_indicator_pair( @{$rml}, '^' );
}
- write_error_indicator_pair( @{$rel}, '^' );
- warning($msg);
- resume_logfile();
+ $self->write_error_indicator_pair( @{$rel}, '^' );
+ $self->warning($msg);
+ $self->resume_logfile();
}
- increment_brace_error();
+ $self->increment_brace_error();
+ if ( $bb eq BRACE ) { $self->[_show_indentation_table_] = 1 }
}
}
- $current_depth[$aa]--;
+ $rcurrent_depth->[$aa]--;
}
else {
- my $saw_brace_error = get_saw_brace_error();
+ my $saw_brace_error = $self->get_saw_brace_error();
if ( $saw_brace_error <= MAX_NAG_MESSAGES ) {
my $msg = <<"EOM";
There is no previous $opening_brace_names[$aa] to match a $closing_brace_names[$aa] on line $input_line_number
EOM
- indicate_error( $msg, $input_line_number, $input_line, $pos, '^' );
+ $self->indicate_error( $msg, $input_line_number, $input_line, $pos,
+ '^' );
}
- increment_brace_error();
+ $self->increment_brace_error();
+ if ( $aa eq BRACE ) { $self->[_show_indentation_table_] = 1 }
# keep track of errors in braces alone (ignoring ternary nesting errors)
- $tokenizer_self->[_true_brace_error_count_]++
+ $self->[_true_brace_error_count_]++
if ( $closing_brace_names[$aa] ne "':'" );
}
return ( $seqno, $outdent );
sub check_final_nesting_depths {
- # USES GLOBAL VARIABLES: @current_depth, @starting_line_of_current_depth
+ # USES GLOBAL VARIABLES: $rcurrent_depth, $rstarting_line_of_current_depth
+ my $self = shift;
for my $aa ( 0 .. @closing_brace_names - 1 ) {
- if ( $current_depth[$aa] ) {
- my $rsl =
- $starting_line_of_current_depth[$aa][ $current_depth[$aa] ];
+ my $cd_aa = $rcurrent_depth->[$aa];
+ if ($cd_aa) {
+ my $rsl = $rstarting_line_of_current_depth->[$aa][$cd_aa];
my $sl = $rsl->[0];
my $msg = <<"EOM";
-Final nesting depth of $opening_brace_names[$aa]s is $current_depth[$aa]
+Final nesting depth of $opening_brace_names[$aa]s is $cd_aa
The most recent un-matched $opening_brace_names[$aa] is on line $sl
EOM
- indicate_error( $msg, @{$rsl}, '^' );
- increment_brace_error();
+ $self->indicate_error( $msg, @{$rsl}, '^' );
+ $self->increment_brace_error();
+ if ( $aa eq BRACE ) { $self->[_show_indentation_table_] = 1 }
}
}
return;
sub peek_ahead_for_n_nonblank_pre_tokens {
- # returns next n pretokens if they exist
- # returns undef's if hits eof without seeing any pretokens
- # USES GLOBAL VARIABLES: $tokenizer_self
- my $max_pretokens = shift;
+ my ( $self, $max_pretokens ) = @_;
+
+ # Given:
+ # $max_pretokens = number of pretokens wanted
+ # Return:
+ # next $max_pretokens pretokens if they exist
+ # undef's if hits eof without seeing any pretokens
+
+ # USES GLOBAL VARIABLES: (none)
my $line;
my $i = 0;
my ( $rpre_tokens, $rmap, $rpre_types );
- while ( $line =
- $tokenizer_self->[_line_buffer_object_]->peek_ahead( $i++ ) )
- {
- $line =~ s/^\s*//; # trim leading blanks
+ while ( defined( $line = $self->peek_ahead( $i++ ) ) ) {
+ $line =~ s/^\s+//; # trim leading blanks
next if ( length($line) <= 0 ); # skip blank
next if ( $line =~ /^#/ ); # skip comment
( $rpre_tokens, $rmap, $rpre_types ) =
pre_tokenize( $line, $max_pretokens );
last;
- }
+ } ## end while ( defined( $line = ...))
return ( $rpre_tokens, $rpre_types );
} ## end sub peek_ahead_for_n_nonblank_pre_tokens
# look ahead for next non-blank, non-comment line of code
sub peek_ahead_for_nonblank_token {
- # USES GLOBAL VARIABLES: $tokenizer_self
- my ( $rtokens, $max_token_index ) = @_;
+ my ( $self, $rtokens, $max_token_index ) = @_;
+
+ # Given:
+ # $rtokens = ref to token array
+ # $max_token_index = index of last token in $rtokens
+ # Task:
+ # Update $rtokens with next nonblank token
+
+ # USES GLOBAL VARIABLES: (none)
my $line;
my $i = 0;
- while ( $line =
- $tokenizer_self->[_line_buffer_object_]->peek_ahead( $i++ ) )
- {
- $line =~ s/^\s*//; # trim leading blanks
+ while ( defined( $line = $self->peek_ahead( $i++ ) ) ) {
+ $line =~ s/^\s+//; # trim leading blanks
next if ( length($line) <= 0 ); # skip blank
next if ( $line =~ /^#/ ); # skip comment
# Updated from 2 to 3 to get trigraphs, added for case b1175
- my ( $rtok, $rmap, $rtype ) = pre_tokenize( $line, 3 );
+ my ( $rtok, $rmap_uu, $rtype_uu ) = pre_tokenize( $line, 3 );
my $j = $max_token_index + 1;
foreach my $tok ( @{$rtok} ) {
$rtokens->[ ++$j ] = $tok;
}
last;
- }
+ } ## end while ( defined( $line = ...))
return;
} ## end sub peek_ahead_for_nonblank_token
sub guess_if_pattern_or_conditional {
- # this routine is called when we have encountered a ? following an
+ my ( $self, $i, $rtokens, $rtoken_type, $rtoken_map_uu, $max_token_index )
+ = @_;
+
+ # This routine is called when we have encountered a ? following an
# unknown bareword, and we must decide if it starts a pattern or not
- # input parameters:
+ # Given:
# $i - token index of the ? starting possible pattern
- # output parameters:
+ # $rtokens ... = the token arrays
+ # Return:
# $is_pattern = 0 if probably not pattern, =1 if probably a pattern
# msg = a warning or diagnostic message
+
# USES GLOBAL VARIABLES: $last_nonblank_token
- my ( $i, $rtokens, $rtoken_map, $max_token_index ) = @_;
my $is_pattern = 0;
- my $msg = "guessing that ? after $last_nonblank_token starts a ";
+ my $msg = "guessing that ? after '$last_nonblank_token' starts a ";
if ( $i >= $max_token_index ) {
$msg .= "conditional (no end to pattern found on the line)\n";
+ $is_pattern = 0;
+ return ( $is_pattern, $msg );
}
- else {
- my $ibeg = $i;
- $i = $ibeg + 1;
- my $next_token = $rtokens->[$i]; # first token after ?
-
- # look for a possible ending ? on this line..
- my $in_quote = 1;
- my $quote_depth = 0;
- my $quote_character = EMPTY_STRING;
- my $quote_pos = 0;
- my $quoted_string;
- (
- $i,
- $in_quote,
- $quote_character,
- $quote_pos,
- $quote_depth,
- $quoted_string,
+ my $ibeg = $i;
+ $i = $ibeg + 1;
+ ##my $next_token = $rtokens->[$i]; # first token after ?
- ) = follow_quoted_string(
+ # look for a possible ending ? on this line..
+ my $in_quote = 1;
+ my $quote_depth = 0;
+ my $quote_character = EMPTY_STRING;
+ my $quote_pos = 0;
+ my $quoted_string;
+ (
- $ibeg,
- $in_quote,
- $rtokens,
- $quote_character,
- $quote_pos,
- $quote_depth,
- $max_token_index,
+ $i,
+ $in_quote,
+ $quote_character,
+ $quote_pos,
+ $quote_depth,
+ $quoted_string,
- );
+ ) = $self->follow_quoted_string(
- if ($in_quote) {
+ $ibeg,
+ $in_quote,
+ $rtokens,
+ $rtoken_type,
+ $quote_character,
+ $quote_pos,
+ $quote_depth,
+ $max_token_index,
- # we didn't find an ending ? on this line,
- # so we bias towards conditional
- $is_pattern = 0;
- $msg .= "conditional (no ending ? on this line)\n";
+ );
- # we found an ending ?, so we bias towards a pattern
- }
- else {
+ if ($in_quote) {
- # Watch out for an ending ? in quotes, like this
- # my $case_flag = File::Spec->case_tolerant ? '(?i)' : '';
- my $s_quote = 0;
- my $d_quote = 0;
- my $colons = 0;
- foreach my $ii ( $ibeg + 1 .. $i - 1 ) {
- my $tok = $rtokens->[$ii];
- if ( $tok eq ":" ) { $colons++ }
- if ( $tok eq "'" ) { $s_quote++ }
- if ( $tok eq '"' ) { $d_quote++ }
- }
- if ( $s_quote % 2 || $d_quote % 2 || $colons ) {
- $is_pattern = 0;
- $msg .= "found ending ? but unbalanced quote chars\n";
- }
- elsif ( pattern_expected( $i, $rtokens, $max_token_index ) >= 0 ) {
- $is_pattern = 1;
- $msg .= "pattern (found ending ? and pattern expected)\n";
- }
- else {
- $msg .= "pattern (uncertain, but found ending ?)\n";
- }
- }
+ # we didn't find an ending ? on this line,
+ # so we bias towards conditional
+ $is_pattern = 0;
+ $msg .= "conditional (no ending ? on this line)\n";
+ return ( $is_pattern, $msg );
+ }
+
+ # we found an ending ?, so we bias towards a pattern
+
+ # Watch out for an ending ? in quotes, like this
+ # my $case_flag = File::Spec->case_tolerant ? '(?i)' : '';
+ my $s_quote = 0;
+ my $d_quote = 0;
+ my $colons = 0;
+ foreach my $ii ( $ibeg + 1 .. $i - 1 ) {
+ my $tok = $rtokens->[$ii];
+ if ( $tok eq ":" ) { $colons++ }
+ if ( $tok eq "'" ) { $s_quote++ }
+ if ( $tok eq '"' ) { $d_quote++ }
}
+ if ( $s_quote % 2 || $d_quote % 2 || $colons ) {
+ $is_pattern = 0;
+ $msg .= "found ending ? but unbalanced quote chars\n";
+ return ( $is_pattern, $msg );
+ }
+ if ( $self->pattern_expected( $i, $rtokens, $max_token_index ) >= 0 ) {
+ $is_pattern = 1;
+ $msg .= "pattern (found ending ? and pattern expected)\n";
+ return ( $is_pattern, $msg );
+ }
+ $msg .= "pattern (uncertain, but found ending ?)\n";
return ( $is_pattern, $msg );
} ## end sub guess_if_pattern_or_conditional
BEGIN {
# Constants like 'pi' in Trig.pm are common
- my @q = qw(pi pi2 pi4 pip2 pip4);
- @{is_known_constant}{@q} = (1) x scalar(@q);
+ my @q = qw( pi pi2 pi4 pip2 pip4 );
+ @is_known_constant{@q} = (1) x scalar(@q);
# parenless calls of 'ok' are common
@q = qw( ok );
- @{is_known_function}{@q} = (1) x scalar(@q);
+ @is_known_function{@q} = (1) x scalar(@q);
} ## end BEGIN
sub guess_if_pattern_or_division {
- # this routine is called when we have encountered a / following an
+ my ( $self, $i, $rtokens, $rtoken_type, $rtoken_map, $max_token_index ) =
+ @_;
+
+ # This routine is called when we have encountered a / following an
# unknown bareword, and we must decide if it starts a pattern or is a
- # division
- # input parameters:
+ # division.
+ # Given:
# $i - token index of the / starting possible pattern
- # output parameters:
+ # $rtokens ... = the token arrays
+ # Return:
# $is_pattern = 0 if probably division, =1 if probably a pattern
# msg = a warning or diagnostic message
# USES GLOBAL VARIABLES: $last_nonblank_token
- my ( $i, $rtokens, $rtoken_map, $max_token_index ) = @_;
+ my $msg = "guessing that / after '$last_nonblank_token' starts a ";
+ my $ibeg = $i;
my $is_pattern = 0;
- my $msg = "guessing that / after $last_nonblank_token starts a ";
-
- if ( $i >= $max_token_index ) {
- $msg .= "division (no end to pattern found on the line)\n";
- }
- else {
- my $ibeg = $i;
- my $divide_possible =
- is_possible_numerator( $i, $rtokens, $max_token_index );
- if ( $divide_possible < 0 ) {
- $msg = "pattern (division not possible here)\n";
- $is_pattern = 1;
- return ( $is_pattern, $msg );
- }
+ my $divide_possible =
+ $self->is_possible_numerator( $i, $rtokens, $max_token_index );
- $i = $ibeg + 1;
- my $next_token = $rtokens->[$i]; # first token after slash
-
- # One of the things we can look at is the spacing around the slash.
- # There # are four possible spacings around the first slash:
- #
- # return pi/two;#/; -/-
- # return pi/ two;#/; -/+
- # return pi / two;#/; +/+
- # return pi /two;#/; +/- <-- possible pattern
- #
- # Spacing rule: a space before the slash but not after the slash
- # usually indicates a pattern. We can use this to break ties.
+ if ( $divide_possible < 0 ) {
+ $msg .= "pattern (division not possible here)\n";
+ $is_pattern = 1;
+ $self->saw_bareword_function($last_nonblank_token);
+ return ( $is_pattern, $msg );
+ }
+ if ( $divide_possible == 4 ) {
+ $msg .= "division (pattern not possible here)\n";
+ $is_pattern = 0;
+ return ( $is_pattern, $msg );
+ }
- my $is_pattern_by_spacing =
- ( $i > 1 && $next_token !~ m/^\s/ && $rtokens->[ $i - 2 ] =~ m/^\s/ );
+ # anything left on line?
+ if ( $i >= $max_token_index ) {
+ $msg .= "division (line ends with this /)\n";
+ $is_pattern = 0;
+ return ( $is_pattern, $msg );
+ }
- # look for a possible ending / on this line..
- my $in_quote = 1;
- my $quote_depth = 0;
- my $quote_character = EMPTY_STRING;
- my $quote_pos = 0;
- my $quoted_string;
- (
- $i, $in_quote, $quote_character, $quote_pos, $quote_depth,
- $quoted_string
- )
- = follow_quoted_string( $ibeg, $in_quote, $rtokens, $quote_character,
- $quote_pos, $quote_depth, $max_token_index );
+ # quick check for no pattern-ending slash on this line
+ my $pos_beg = $rtoken_map->[$ibeg];
+ my $input_line = $self->[_line_of_text_];
+ if ( index( $input_line, '/', $pos_beg + 1 ) < 0 ) {
+ $msg .= "division (no ending / on this line)\n";
+ $is_pattern = 0;
+ return ( $is_pattern, $msg );
+ }
- if ($in_quote) {
+ # Setup spacing rule before we change $i below..
+ $i = $ibeg + 1;
+ my $next_token = $rtokens->[$i]; # first token after slash
- # we didn't find an ending / on this line, so we bias towards
- # division
- if ( $divide_possible >= 0 ) {
- $is_pattern = 0;
- $msg .= "division (no ending / on this line)\n";
- }
- else {
+ # There are four possible spacings around the first slash:
+ #
+ # return pi/two;#/; -/-
+ # return pi/ two;#/; -/+
+ # return pi / two;#/; +/+
+ # return pi /two;#/; +/- <-- possible pattern
+ #
+ # Spacing rule: a space before the slash but not after the slash
+ # usually indicates a pattern. We can use this to break ties.
+ # Note: perl seems to take a newline as a space in this rule (c243)
+ my $space_before = $i < 2 || $rtokens->[ $i - 2 ] =~ m/^\s/;
+ my $space_after = $next_token =~ m/^\s/;
+ my $is_pattern_by_spacing = $space_before && !$space_after;
+
+ # Make an accurate search for a possible terminating / on this line..
+ my $in_quote = 1;
+ my $quote_depth = 0;
+ my $quote_character = EMPTY_STRING;
+ my $quote_pos = 0;
+ my $quoted_string;
+ (
- # assuming a multi-line pattern ... this is risky, but division
- # does not seem possible. If this fails, it would either be due
- # to a syntax error in the code, or the division_expected logic
- # needs to be fixed.
- $msg = "multi-line pattern (division not possible)\n";
- $is_pattern = 1;
- }
- }
+ $i,
+ $in_quote,
+ $quote_character,
+ $quote_pos,
+ $quote_depth,
+ $quoted_string
+ )
+ = $self->follow_quoted_string(
- # we found an ending /, so we bias slightly towards a pattern
- else {
+ $ibeg,
+ $in_quote,
+ $rtokens,
+ $rtoken_type,
+ $quote_character,
+ $quote_pos,
+ $quote_depth,
+ $max_token_index
+ );
- my $pattern_expected =
- pattern_expected( $i, $rtokens, $max_token_index );
+ # if we didn't find an ending / on this line ..
+ if ($in_quote) {
+ $is_pattern = 0;
+ $msg .= "division (no ending / on this line)\n";
+ return ( $is_pattern, $msg );
+ }
- if ( $pattern_expected >= 0 ) {
+ # we found an ending /, see if it might terminate a pattern
+ my $pattern_expected =
+ $self->pattern_expected( $i, $rtokens, $max_token_index );
- # pattern looks possible...
- if ( $divide_possible >= 0 ) {
+ if ( $pattern_expected < 0 ) {
+ $is_pattern = 0;
+ $msg .= "division (pattern not possible)\n";
+ return ( $is_pattern, $msg );
+ }
- # Both pattern and divide can work here...
+ # Both pattern and divide can work here...
+ # Check for known constants in the numerator, like 'pi'
+ if ( $is_known_constant{$last_nonblank_token} ) {
+ $msg .=
+"division (pattern works too but saw known constant '$last_nonblank_token')\n";
+ $is_pattern = 0;
+ return ( $is_pattern, $msg );
+ }
- # Increase weight of divide if a pure number follows
- $divide_possible += $next_token =~ /^\d+$/;
+ # Check for known functions like 'ok'
+ if ( $is_known_function{$last_nonblank_token} ) {
+ $msg .= "pattern (division works too but saw '$last_nonblank_token')\n";
+ $is_pattern = 1;
+ return ( $is_pattern, $msg );
+ }
- # Check for known constants in the numerator, like 'pi'
- if ( $is_known_constant{$last_nonblank_token} ) {
- $msg .=
-"division (pattern works too but saw known constant '$last_nonblank_token')\n";
- $is_pattern = 0;
- }
+ # If one rule is more probable, use it
+ if ( $divide_possible > $pattern_expected ) {
+ $msg .= "division (more likely based on following tokens)\n";
+ $is_pattern = 0;
+ return ( $is_pattern, $msg );
+ }
- # A very common bare word in pattern expressions is 'ok'
- elsif ( $is_known_function{$last_nonblank_token} ) {
- $msg .=
-"pattern (division works too but saw '$last_nonblank_token')\n";
- $is_pattern = 1;
- }
+ # finally, we have to use the spacing rule
+ if ($is_pattern_by_spacing) {
+ $msg .= "pattern (guess on spacing, but division possible too)\n";
+ $is_pattern = 1;
+ }
+ else {
+ $msg .= "division (guess on spacing, but pattern is possible too)\n";
+ $is_pattern = 0;
+ }
- # If one rule is more definite, use it
- elsif ( $divide_possible > $pattern_expected ) {
- $msg .=
- "division (more likely based on following tokens)\n";
- $is_pattern = 0;
- }
+ return ( $is_pattern, $msg );
+} ## end sub guess_if_pattern_or_division
- # otherwise, use the spacing rule
- elsif ($is_pattern_by_spacing) {
- $msg .=
-"pattern (guess on spacing, but division possible too)\n";
- $is_pattern = 1;
- }
- else {
- $msg .=
-"division (guess on spacing, but pattern is possible too)\n";
- $is_pattern = 0;
- }
- }
+sub guess_if_here_doc {
- # divide_possible < 0 means divide can not work here
- else {
- $is_pattern = 1;
- $msg .= "pattern (division not possible)\n";
- }
- }
+ my ( $self, $next_token ) = @_;
- # pattern does not look possible...
- else {
+ # Try to resolve here-doc vs. shift by looking ahead for
+ # non-code or the end token (currently only looks for end token)
- if ( $divide_possible >= 0 ) {
- $is_pattern = 0;
- $msg .= "division (pattern not possible)\n";
- }
+ # Given:
+ # $next_token = the next token after '<<'
- # Neither pattern nor divide look possible...go by spacing
- else {
- if ($is_pattern_by_spacing) {
- $msg .= "pattern (guess on spacing)\n";
- $is_pattern = 1;
- }
- else {
- $msg .= "division (guess on spacing)\n";
- $is_pattern = 0;
- }
- }
- }
- }
- }
- return ( $is_pattern, $msg );
-} ## end sub guess_if_pattern_or_division
+ # Return:
+ # 1 if it is probably a here doc
+ # 0 if not
-# try to resolve here-doc vs. shift by looking ahead for
-# non-code or the end token (currently only looks for end token)
-# returns 1 if it is probably a here doc, 0 if not
-sub guess_if_here_doc {
+ # USES GLOBAL VARIABLES: $current_package $ris_constant,
# This is how many lines we will search for a target as part of the
- # guessing strategy. It is a constant because there is probably
- # little reason to change it.
- # USES GLOBAL VARIABLES: $tokenizer_self, $current_package
- # %is_constant,
+ # guessing strategy. There is probably little reason to change it.
my $HERE_DOC_WINDOW = 40;
- my $next_token = shift;
my $here_doc_expected = 0;
my $line;
my $k = 0;
my $msg = "checking <<";
- while ( $line =
- $tokenizer_self->[_line_buffer_object_]->peek_ahead( $k++ ) )
- {
+ while ( defined( $line = $self->peek_ahead( $k++ ) ) ) {
chomp $line;
-
- if ( $line =~ /^$next_token$/ ) {
+ if ( $line eq $next_token ) {
$msg .= " -- found target $next_token ahead $k lines\n";
$here_doc_expected = 1; # got it
last;
}
last if ( $k >= $HERE_DOC_WINDOW );
- }
+ } ## end while ( defined( $line = ...))
- unless ($here_doc_expected) {
+ if ( !$here_doc_expected ) {
if ( !defined($line) ) {
$here_doc_expected = -1; # hit eof without seeing target
}
else { # still unsure..taking a wild guess
- if ( !$is_constant{$current_package}{$next_token} ) {
+ if ( !$ris_constant->{$current_package}{$next_token} ) {
$here_doc_expected = 1;
$msg .=
" -- guessing it's a here-doc ($next_token not a constant)\n";
}
}
}
- write_logfile_entry($msg);
+ $self->write_logfile_entry($msg);
return $here_doc_expected;
} ## end sub guess_if_here_doc
sub scan_bare_identifier_do {
- # this routine is called to scan a token starting with an alphanumeric
+ my (
+
+ $self,
+
+ $input_line,
+ $i,
+ $tok,
+ $type,
+ $prototype,
+ $rtoken_map,
+ $max_token_index
+
+ ) = @_;
+
+ # This routine is called to scan a token starting with an alphanumeric
# variable or package separator, :: or '.
+
+ # Given:
+ # current scan state variables
+
# USES GLOBAL VARIABLES: $current_package, $last_nonblank_token,
- # $last_nonblank_type,@paren_type, $paren_depth
+ # $last_nonblank_type, $rparen_type, $paren_depth
- my ( $input_line, $i, $tok, $type, $prototype, $rtoken_map,
- $max_token_index )
- = @_;
- my $i_begin = $i;
my $package = undef;
my $i_beg = $i;
# we have to back up one pretoken at a :: since each : is one pretoken
if ( $tok eq '::' ) { $i_beg-- }
- if ( $tok eq '->' ) { $i_beg-- }
my $pos_beg = $rtoken_map->[$i_beg];
pos($input_line) = $pos_beg;
# A::
# ::A
# A'B
- if ( $input_line =~ m/\G\s*((?:\w*(?:'|::)))*(?:(?:->)?(\w+))?/gc ) {
-
+ if (
+ $input_line =~ m{
+ \G\s* # start at pos
+ ( (?:\w*(?:'|::))* ) # $1 = maybe package name like A:: A::B:: or A'
+ (\w+)? # $2 = maybe followed by sub name
+ }gcx
+ )
+ {
my $pos = pos($input_line);
my $numc = $pos - $pos_beg;
$tok = substr( $input_line, $pos_beg, $numc );
my $sub_name = EMPTY_STRING;
if ( defined($2) ) { $sub_name = $2; }
- if ( defined($1) ) {
+ if ( defined($1) && length($1) ) {
$package = $1;
+ # patch: check for package call A::B::C->
+ # in this case, C is part of the package name
+ if ($sub_name) {
+ if ( $input_line =~ m{ \G\s*(?:->) }gcx ) {
+ $package .= $sub_name;
+ $sub_name = EMPTY_STRING;
+ }
+ pos($input_line) = $pos;
+ }
+
# patch: don't allow isolated package name which just ends
# in the old style package separator (single quote). Example:
# use CGI':all';
# check for v-string with leading 'v' type character
# (This seems to have precedence over filehandle, type 'Y')
- if ( $tok =~ /^v\d[_\d]*$/ ) {
+ if ( substr( $tok, 0, 1 ) eq 'v' && $tok =~ /^v\d[_\d]*$/ ) {
# we only have the first part - something like 'v101' -
# look for more
$tok = substr( $input_line, $pos_beg, $numc );
}
$type = 'v';
-
- # warn if this version can't handle v-strings
- report_v_string($tok);
- }
-
- elsif ( $is_constant{$package}{$sub_name} ) {
- $type = 'C';
+ $self->report_v_string($tok);
}
# bareword after sort has implied empty prototype; for example:
$type = 'Z';
}
+ # issue c382: this elsif statement moved from above because
+ # previous check for type 'Z' after sort has priority.
+ elsif ( $ris_constant->{$package}{$sub_name} ) {
+ $type = 'C';
+ }
+
# Note: strangely, perl does not seem to really let you create
# functions which act like eval and do, in the sense that eval
# and do may have operators following the final }, but any operators
# If this ever changes, here is the update
# to make perltidy behave accordingly:
- # elsif ( $is_block_function{$package}{$tok} ) {
+ # elsif ( $ris_block_function->{$package}{$tok} ) {
# $tok='eval'; # patch to do braces like eval - doesn't work
# $type = 'k';
#}
# TODO: This could become a separate type to allow for different
# future behavior:
- elsif ( $is_block_function{$package}{$sub_name} ) {
+ elsif ( $ris_block_function->{$package}{$sub_name} ) {
$type = 'G';
}
- elsif ( $is_block_list_function{$package}{$sub_name} ) {
+ elsif ( $ris_block_list_function->{$package}{$sub_name} ) {
$type = 'G';
}
- elsif ( $is_user_function{$package}{$sub_name} ) {
+ elsif ( $ris_user_function->{$package}{$sub_name} ) {
$type = 'U';
- $prototype = $user_function_prototype{$package}{$sub_name};
+ $prototype = $ruser_function_prototype->{$package}{$sub_name};
}
# check for indirect object
# or preceded by something like 'print(' or 'printf('
|| (
( $last_nonblank_token eq '(' )
- && $is_indirect_object_taker{ $paren_type[$paren_depth]
+ && $is_indirect_object_taker{
+ $rparen_type->[$paren_depth]
}
)
# could be bug caused by older perltidy if
# followed by '('
if ( $input_line =~ m/\G\s*\(/gc ) {
- complain(
+ $self->complain(
"Caution: unknown word '$tok' in indirect object slot\n"
);
}
$type = 'Z';
}
}
+
+ # none of the above special types
+ else {
+ }
}
# Now we must convert back from character position
( $i, $error ) =
inverse_pretoken_map( $i, $pos, $rtoken_map, $max_token_index );
if ($error) {
- warning("scan_bare_identifier: Possibly invalid tokenization\n");
+ $self->warning(
+ "scan_bare_identifier: Possibly invalid tokenization\n");
}
}
$type = 'w';
# change this warning to log message if it becomes annoying
- warning("didn't find identifier after leading ::\n");
+ $self->warning("didn't find identifier after leading ::\n");
}
return ( $i, $tok, $type, $prototype );
} ## end sub scan_bare_identifier_do
sub scan_id_do {
-# This is the new scanner and will eventually replace scan_identifier.
-# Only type 'sub' and 'package' are implemented.
-# Token types $ * % @ & -> are not yet implemented.
-#
-# Scan identifier following a type token.
-# The type of call depends on $id_scan_state: $id_scan_state = ''
-# for starting call, in which case $tok must be the token defining
-# the type.
-#
-# If the type token is the last nonblank token on the line, a value
-# of $id_scan_state = $tok is returned, indicating that further
-# calls must be made to get the identifier. If the type token is
-# not the last nonblank token on the line, the identifier is
-# scanned and handled and a value of '' is returned.
-# USES GLOBAL VARIABLES: $current_package, $last_nonblank_token, $in_attribute_list,
-# $statement_type, $tokenizer_self
-
- my ( $input_line, $i, $tok, $rtokens, $rtoken_map, $id_scan_state,
- $max_token_index )
- = @_;
+ my (
+
+ $self,
+
+ $input_line,
+ $i,
+ $tok,
+ $rtokens,
+ $rtoken_map,
+ $id_scan_state,
+ $max_token_index
+
+ ) = @_;
+
+ # Scan identifier following a type token.
+ # Given:
+ # current scan state variables
+
+ # This is the new scanner and may eventually replace scan_identifier.
+ # Only type 'sub' and 'package' are implemented.
+ # Token types $ * % @ & -> are not yet implemented.
+ #
+ # The type of call depends on $id_scan_state: $id_scan_state = ''
+ # for starting call, in which case $tok must be the token defining
+ # the type.
+ #
+ # If the type token is the last nonblank token on the line, a value
+ # of $id_scan_state = $tok is returned, indicating that further
+ # calls must be made to get the identifier. If the type token is
+ # not the last nonblank token on the line, the identifier is
+ # scanned and handled and a value of '' is returned.
+
use constant DEBUG_NSCAN => 0;
my $type = EMPTY_STRING;
- my ( $i_beg, $pos_beg );
+ my $i_beg;
#print "NSCAN:entering i=$i, tok=$tok, type=$type, state=$id_scan_state\n";
#my ($a,$b,$c) = caller;
# only a '#' immediately after a '$' is not a comment
if ( $next_nonblank_token eq '#' ) {
- unless ( $tok eq '$' ) {
+ if ( $tok ne '$' ) {
$blank_line = 1;
}
}
}
# handle non-blank line; identifier, if any, must follow
- unless ($blank_line) {
+ if ( !$blank_line ) {
if ( $is_sub{$id_scan_state} ) {
- ( $i, $tok, $type, $id_scan_state ) = do_scan_sub(
+ ( $i, $tok, $type, $id_scan_state ) = $self->do_scan_sub(
{
input_line => $input_line,
i => $i,
}
elsif ( $is_package{$id_scan_state} ) {
- ( $i, $tok, $type ) =
- do_scan_package( $input_line, $i, $i_beg, $tok, $type, $rtokens,
- $rtoken_map, $max_token_index );
+ ( $i, $tok, $type ) = $self->do_scan_package(
+ {
+ input_line => $input_line,
+ i => $i,
+ i_beg => $i_beg,
+ tok => $tok,
+ type => $type,
+ rtokens => $rtokens,
+ rtoken_map => $rtoken_map,
+ max_token_index => $max_token_index,
+ }
+ );
$id_scan_state = EMPTY_STRING;
}
else {
- warning("invalid token in scan_id: $tok\n");
+ $self->warning("invalid token in scan_id: $tok\n");
$id_scan_state = EMPTY_STRING;
}
}
# shouldn't happen:
if (DEVEL_MODE) {
- Fault(<<EOM);
+ $self->Fault(<<EOM);
Program bug in scan_id: undefined type but scan_state=$id_scan_state
EOM
}
- warning(
+ $self->warning(
"Possible program bug in sub scan_id: undefined type but scan_state=$id_scan_state\n"
);
- report_definite_bug();
+ $self->report_definite_bug();
}
DEBUG_NSCAN && do {
- print STDOUT
+ print {*STDOUT}
"NSCAN: returns i=$i, tok=$tok, type=$type, state=$id_scan_state\n";
};
return ( $i, $tok, $type, $id_scan_state );
sub check_prototype {
my ( $proto, $package, $subname ) = @_;
- return unless ( defined($package) && defined($subname) );
+
+ # Classify a sub based on its prototype
+ return if ( !defined($package) );
+ return if ( !defined($subname) );
if ( defined($proto) ) {
$proto =~ s/^\s*\(\s*//;
$proto =~ s/\s*\)$//;
if ($proto) {
- $is_user_function{$package}{$subname} = 1;
- $user_function_prototype{$package}{$subname} = "($proto)";
+ $ris_user_function->{$package}{$subname} = 1;
+ $ruser_function_prototype->{$package}{$subname} = "($proto)";
# prototypes containing '&' must be treated specially..
if ( $proto =~ /\&/ ) {
# right curly braces of prototypes ending in
# '&' may be followed by an operator
if ( $proto =~ /\&$/ ) {
- $is_block_function{$package}{$subname} = 1;
+ $ris_block_function->{$package}{$subname} = 1;
}
# right curly braces of prototypes NOT ending in
# '&' may NOT be followed by an operator
- elsif ( $proto !~ /\&$/ ) {
- $is_block_list_function{$package}{$subname} = 1;
+ else {
+ $ris_block_list_function->{$package}{$subname} = 1;
}
}
}
else {
- $is_constant{$package}{$subname} = 1;
+ $ris_constant->{$package}{$subname} = 1;
}
}
else {
- $is_user_function{$package}{$subname} = 1;
+ $ris_user_function->{$package}{$subname} = 1;
}
return;
} ## end sub check_prototype
sub do_scan_package {
- # do_scan_package parses a package name
- # it is called with $i_beg equal to the index of the first nonblank
+ my ( $self, $rcall_hash ) = @_;
+
+ my $input_line = $rcall_hash->{input_line};
+ my $i = $rcall_hash->{i};
+ my $i_beg = $rcall_hash->{i_beg};
+ my $tok = $rcall_hash->{tok};
+ my $type = $rcall_hash->{type};
+ my $rtokens = $rcall_hash->{rtokens};
+ my $rtoken_map = $rcall_hash->{rtoken_map};
+ my $max_token_index = $rcall_hash->{max_token_index};
+
+ # Parse a package name.
+ # This is called with $i_beg equal to the index of the first nonblank
# token following a 'package' token.
# USES GLOBAL VARIABLES: $current_package,
# character and at least three components.
# reference http://perldoc.perl.org/functions/package.html
- my ( $input_line, $i, $i_beg, $tok, $type, $rtokens, $rtoken_map,
- $max_token_index )
- = @_;
my $package = undef;
my $pos_beg = $rtoken_map->[$i_beg];
pos($input_line) = $pos_beg;
my $pos = pos($input_line);
my $numc = $pos - $pos_beg;
$tok = 'package ' . substr( $input_line, $pos_beg, $numc );
- $type = 'i';
+ $type = 'P'; # Fix for c250, previously 'i'
# Now we must convert back from character position
# to pre_token index.
my $error;
( $i, $error ) =
inverse_pretoken_map( $i, $pos, $rtoken_map, $max_token_index );
- if ($error) { warning("Possibly invalid package\n") }
+ if ($error) { $self->warning("Possibly invalid package\n") }
$current_package = $package;
# we should now have package NAMESPACE
# package NAMESPACE VERSION
# package NAMESPACE BLOCK
# package NAMESPACE VERSION BLOCK
- my ( $next_nonblank_token, $i_next ) =
- find_next_nonblank_token( $i, $rtokens, $max_token_index );
+ my ( $next_nonblank_token, $i_next_uu ) =
+ $self->find_next_nonblank_token( $i, $rtokens, $max_token_index );
# check that something recognizable follows, but do not parse.
# A VERSION number will be parsed later as a number or v-string in the
$statement_type = $tok;
}
else {
- warning(
+ $self->warning(
"Unexpected '$next_nonblank_token' after package name '$tok'\n"
);
}
return ( $i, $tok, $type );
} ## end sub do_scan_package
-my %is_special_variable_char;
+{ ## begin closure for sub scan_complex_identifier
-BEGIN {
+ use constant DEBUG_SCAN_ID => 0;
- # These are the only characters which can (currently) form special
- # variables, like $^W: (issue c066).
- my @q =
- qw{ ? A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ };
- @{is_special_variable_char}{@q} = (1) x scalar(@q);
-} ## end BEGIN
+ # Constant hash:
+ my %is_special_variable_char;
-{ ## begin closure for sub scan_complex_identifier
+ BEGIN {
- use constant DEBUG_SCAN_ID => 0;
+ # These are the only characters which can (currently) form special
+ # variables, like $^W: (issue c066).
+ my @q =
+ qw{ ? A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ };
+ @is_special_variable_char{@q} = (1) x scalar(@q);
+ } ## end BEGIN
# These are the possible states for this scanner:
my $scan_state_SIGIL = '$';
#----------------------------------
sub do_id_scan_state_dollar {
+ my $self = shift;
+
# We saw a sigil, now looking to start a variable name
if ( $tok eq '$' ) {
}
}
elsif ( $tok =~ /^\w/ ) { # alphanumeric ..
- $saw_alpha = 1;
- $id_scan_state = $scan_state_COLON; # now need ::
+ $saw_alpha = 1;
$identifier .= $tok;
+
+ # now need :: except for special digit vars like '$1' (c208)
+ $id_scan_state = $tok =~ /^\d/ ? EMPTY_STRING : $scan_state_COLON;
}
elsif ( $tok eq '::' ) {
$id_scan_state = $scan_state_ALPHA;
"Space in identifier, following $identifier\n";
}
else {
- ## ok: silently accept space after '$' and '@' sigils
+ # silently accept space after '$' and '@' sigils
}
}
}
}
else {
$i = $i_save;
- write_logfile_entry( 'Use of $# is deprecated' . "\n" );
+ $self->write_logfile_entry(
+ 'Use of $# is deprecated' . "\n" );
}
}
elsif ( $identifier eq '$$' ) {
sub do_id_scan_state_alpha {
+ my $self = shift;
+
# looking for alphanumeric after ::
$tok_is_blank = $tok =~ /^\s*$/;
sub do_id_scan_state_colon {
+ my $self = shift;
+
# looking for possible :: after alphanumeric
$tok_is_blank = $tok =~ /^\s*$/;
sub do_id_scan_state_left_paren {
+ my $self = shift;
+
# looking for possible '(' of a prototype
if ( $tok eq '(' ) { # got it
sub do_id_scan_state_right_paren {
+ my $self = shift;
+
# looking for a ')' of prototype to close a '('
$tok_is_blank = $tok =~ /^\s*$/;
$identifier .= $tok;
}
else { # probable error in script, but keep going
- warning("Unexpected '$tok' while seeking end of prototype\n");
+ $self->warning(
+ "Unexpected '$tok' while seeking end of prototype\n");
$identifier .= $tok;
}
return;
sub do_id_scan_state_ampersand {
- # Starting sub call after seeing an '&'
+ my $self = shift;
+ # Starting sub call after seeing an '&'
if ( $tok =~ /^[\$\w]/ ) { # alphanumeric ..
$id_scan_state = $scan_state_COLON; # now need ::
$saw_alpha = 1;
# Special variable (c066)
$identifier .= $tok;
- $type = '&';
+ $type = 'i';
- # There may be one more character, not a space, after the ^
+ # To be a special $^ variable, there may be one more character,
+ # not a space, after the ^
my $next1 = $rtokens->[ $i + 1 ];
my $chr = substr( $next1, 0, 1 );
if ( $is_special_variable_char{$chr} ) {
}
else {
- # it is &^
+ # It is &^. This is parsed by perl as a call to sub '^',
+ # even though it would be difficult to create a sub '^'.
+ # So we mark it as an identifier (c068).
$id_scan_state = EMPTY_STRING;
}
}
sub scan_complex_identifier {
+ (
+ my $self,
+
+ $i,
+ $id_scan_state,
+ $identifier,
+ $rtokens,
+ $max_token_index,
+ $expecting,
+ $container_type
+
+ ) = @_;
+
# This routine assembles tokens into identifiers. It maintains a
# scan state, id_scan_state. It updates id_scan_state based upon
# current id_scan_state and token, and returns an updated
# This routine now serves a a backup for sub scan_simple_identifier
# which handles most identifiers.
- (
- $i, $id_scan_state, $identifier, $rtokens, $max_token_index,
- $expecting, $container_type
- ) = @_;
+ # Note that $self must be a 'my' variable and not be a closure
+ # variables like the other args. Otherwise it will not get
+ # deleted by a DESTROY call at the end of a file. Then an
+ # attempt to create multiple tokenizers can occur when multiple
+ # files are processed, causing an error.
# return flag telling caller to split the pretoken
my $split_pretoken_flag;
# shouldn't happen: bad call parameter
my $msg =
-"Program bug detected: scan_identifier received bad starting token = '$tok'\n";
- if (DEVEL_MODE) { Fault($msg) }
- if ( !$tokenizer_self->[_in_error_] ) {
- warning($msg);
- $tokenizer_self->[_in_error_] = 1;
+"Program bug detected: scan_complex_identifier received bad starting token = '$tok'\n";
+ if (DEVEL_MODE) { $self->Fault($msg) }
+ if ( !$self->[_in_error_] ) {
+ $self->warning($msg);
+ $self->[_in_error_] = 1;
}
$id_scan_state = EMPTY_STRING;
# check for a valid starting state
if ( DEVEL_MODE && !$is_returnable_scan_state{$id_scan_state} ) {
- Fault(<<EOM);
+ $self->Fault(<<EOM);
Unexpected starting scan state in sub scan_complex_identifier: '$id_scan_state'
EOM
}
# unknown state - should not happen
else {
if (DEVEL_MODE) {
- Fault(<<EOM);
+ $self->Fault(<<EOM);
Unknown scan state in sub scan_complex_identifier: '$id_scan_state'
Scan state at sub entry was '$id_scan_state_begin'
EOM
$i++;
}
- $code->();
+ $code->($self);
# check for forward progress: a decrease in the index $i
# implies that scanning has finished
last if ( $i <= $i_start_loop );
- } ## end of main loop
+ } ## end while ( $i < $max_token_index...)
#-------------
# Check result
}
if ( $id_scan_state eq $scan_state_RPAREN ) {
- warning(
+ $self->warning(
"Hit end of line while seeking ) to end prototype\n");
}
# In something like '$${' we have type '$$' (and only
# part of an identifier)
&& !( $identifier =~ /\$$/ && $tok eq '{' )
-
- ## && ( $identifier !~ /^(sub |package )$/ )
&& $identifier ne 'sub '
&& $identifier ne 'package '
)
# See if we formed an identifier...
if ($identifier) {
$tok = $identifier;
- if ($message) { write_logfile_entry($message) }
+ if ($message) { $self->write_logfile_entry($message) }
}
# did not find an identifier, back up
RETURN:
DEBUG_SCAN_ID && do {
- my ( $a, $b, $c ) = caller;
- print STDOUT
+ my ( $a, $b, $c ) = caller();
+ print {*STDOUT}
"SCANID: called from $a $b $c with tok, i, state, identifier =$tok_begin, $i_begin, $id_scan_state_begin, $identifier_begin\n";
- print STDOUT
+ print {*STDOUT}
"SCANID: returned with tok, i, state, identifier =$tok, $i, $id_scan_state, $identifier\n";
};
- return ( $i, $tok, $type, $id_scan_state, $identifier,
- $split_pretoken_flag );
+
+ return (
+
+ $i,
+ $tok,
+ $type,
+ $id_scan_state,
+ $identifier,
+ $split_pretoken_flag
+ );
} ## end sub scan_complex_identifier
} ## end closure for sub scan_complex_identifier
# lexical subs with these names can cause parsing errors in this version
my @q = qw( m q qq qr qw qx s tr y );
- @{warn_if_lexical}{@q} = (1) x scalar(@q);
+ @warn_if_lexical{@q} = (1) x scalar(@q);
} ## end BEGIN
# saved package and subnames in case prototype is on separate line
sub do_scan_sub {
- # do_scan_sub parses a sub name and prototype.
+ my ( $self, $rcall_hash ) = @_;
+
+ my $input_line = $rcall_hash->{input_line};
+ my $i = $rcall_hash->{i};
+ my $i_beg = $rcall_hash->{i_beg};
+ my $tok = $rcall_hash->{tok};
+ my $type = $rcall_hash->{type};
+ my $rtokens = $rcall_hash->{rtokens};
+ my $rtoken_map = $rcall_hash->{rtoken_map};
+ my $id_scan_state = $rcall_hash->{id_scan_state};
+ my $max_token_index = $rcall_hash->{max_token_index};
+
+ # Parse a sub name and prototype.
# At present there are three basic CALL TYPES which are
# distinguished by the starting value of '$tok':
# a name is given if and only if a non-anonymous sub is
# appropriate.
# USES GLOBAL VARS: $current_package, $last_nonblank_token,
- # $in_attribute_list, %saw_function_definition,
+ # $rsaw_function_definition,
# $statement_type
- my ($rinput_hash) = @_;
-
- my $input_line = $rinput_hash->{input_line};
- my $i = $rinput_hash->{i};
- my $i_beg = $rinput_hash->{i_beg};
- my $tok = $rinput_hash->{tok};
- my $type = $rinput_hash->{type};
- my $rtokens = $rinput_hash->{rtokens};
- my $rtoken_map = $rinput_hash->{rtoken_map};
- my $id_scan_state = $rinput_hash->{id_scan_state};
- my $max_token_index = $rinput_hash->{max_token_index};
-
my $i_entry = $i;
# Determine the CALL TYPE
# Look for the sub NAME if this is a SUB call
if (
$call_type == SUB_CALL
- && $input_line =~ m/\G\s*
+ && $input_line =~ m{\G\s*
((?:\w*(?:'|::))*) # package - something that ends in :: or '
(\w+) # NAME - required
- /gcx
+ }gcx
)
{
$match = 1;
my $is_lexical_sub =
$last_nonblank_type eq 'k' && $last_nonblank_token eq 'my';
if ( $is_lexical_sub && $1 ) {
- warning("'my' sub $subname cannot be in package '$1'\n");
+ $self->warning("'my' sub $subname cannot be in package '$1'\n");
$is_lexical_sub = 0;
}
# lexical subs use the block sequence number as a package name
my $seqno =
- $current_sequence_number[BRACE][ $current_depth[BRACE] ];
- $seqno = 1 unless ( defined($seqno) );
+ $rcurrent_sequence_number->[BRACE]
+ [ $rcurrent_depth->[BRACE] ];
+ $seqno = 1 if ( !defined($seqno) );
$package = $seqno;
if ( $warn_if_lexical{$subname} ) {
- warning(
+ $self->warning(
"'my' sub '$subname' matches a builtin name and may not be handled correctly in this perltidy version.\n"
);
+
+ # This may end badly, it is safest to block formatting
+ # For an example, see perl527/lexsub.t (issue c203)
+ $self->[_in_trouble_] = 1;
}
}
else {
my $pos = pos($input_line);
my $numc = $pos - $pos_beg;
$tok = 'sub ' . substr( $input_line, $pos_beg, $numc );
- $type = 'i';
+ $type = 'S'; ## Fix for c250, was 'i';
# remember the sub name in case another call is needed to
# get the prototype
# regex below uses [A-Za-z] rather than \w
# This is the old regex which has been replaced:
# $input_line =~ m/\G(\s*\([^\)\(\}\{\,#]*\))? # PROTO
+ # Added '=' for issue c362
my $saw_opening_paren = $input_line =~ /\G\s*\(/;
if (
- $input_line =~ m/\G(\s*\([^\)\(\}\{\,#A-Za-z]*\))? # PROTO
+ $input_line =~ m{\G(\s*\([^\)\(\}\{\,#A-Za-z=]*\))? # PROTO
(\s*:)? # ATTRS leading ':'
- /gcx
+ }gcx
&& ( $1 || $2 )
)
{
elsif ( $call_type == PAREN_CALL ) {
$tok = $last_nonblank_token;
}
+ else {
+ }
$match ||= 1;
# Patch part #1 to fixes cases b994 and b1053:
# Mark an anonymous sub keyword without prototype as type 'k', i.e.
# 'sub : lvalue { ...'
- $type = 'i';
+ $type = 'S'; ## C250, was 'i';
if ( $tok eq 'sub' && !$proto ) { $type = 'k' }
}
# catch case of line with leading ATTR ':' after anonymous sub
if ( $pos == $pos_beg && $tok eq ':' ) {
- $type = 'A';
- $in_attribute_list = 1;
+ $type = 'A';
+ $self->[_in_attribute_list_] = 1;
}
# Otherwise, if we found a match we must convert back from
my $error;
( $i, $error ) = inverse_pretoken_map( $i, $pos, $rtoken_map,
$max_token_index );
- if ($error) { warning("Possibly invalid sub\n") }
+ if ($error) { $self->warning("Possibly invalid sub\n") }
# Patch part #2 to fixes cases b994 and b1053:
# Do not let spaces be part of the token of an anonymous sub
}
# check for multiple definitions of a sub
- ( $next_nonblank_token, my $i_next ) =
+ ( $next_nonblank_token, my $i_next_uu ) =
find_next_nonblank_token_on_this_line( $i, $rtokens,
$max_token_index );
}
if ( $next_nonblank_token =~ /^(\s*|#)$/ )
{ # skip blank or side comment
- my ( $rpre_tokens, $rpre_types ) =
- peek_ahead_for_n_nonblank_pre_tokens(1);
+ my ( $rpre_tokens, $rpre_types_uu ) =
+ $self->peek_ahead_for_n_nonblank_pre_tokens(1);
if ( defined($rpre_tokens) && @{$rpre_tokens} ) {
$next_nonblank_token = $rpre_tokens->[0];
}
# Check for multiple definitions of a sub, but
# it is ok to have multiple sub BEGIN, etc,
# so we do not complain if name is all caps
- if ( $saw_function_definition{$subname}{$package}
+ if ( $rsaw_function_definition->{$subname}{$package}
&& $subname !~ /^[A-Z]+$/ )
{
- my $lno = $saw_function_definition{$subname}{$package};
+ my $lno =
+ $rsaw_function_definition->{$subname}{$package};
if ( $package =~ /^\d/ ) {
- warning(
+ $self->warning(
"already saw definition of lexical 'sub $subname' at line $lno\n"
);
}
else {
- warning(
+ if ( !DEVEL_MODE ) {
+ $self->warning(
"already saw definition of 'sub $subname' in package '$package' at line $lno\n"
- ) unless (DEVEL_MODE);
+ );
+ }
}
}
- $saw_function_definition{$subname}{$package} =
- $tokenizer_self->[_last_line_number_];
+ $rsaw_function_definition->{$subname}{$package} =
+ $self->[_last_line_number_];
}
}
elsif ( $next_nonblank_token eq ';' ) {
substr( $tok, 0, 3 ) eq 'sub' ? $tok : 'sub';
}
}
- elsif ($next_nonblank_token) { # EOF technically ok
- if ( $rinput_hash->{tok} eq 'method' && $call_type == SUB_CALL )
+ # something else..
+ elsif ($next_nonblank_token) {
+
+ if ( $rcall_hash->{tok} eq 'method' && $call_type == SUB_CALL )
{
# For a method call, silently ignore this error (rt145706)
# to avoid needless warnings. Example which can produce it:
}
else {
$subname = EMPTY_STRING unless defined($subname);
- warning(
+ $self->warning(
"expecting ':' or ';' or '{' after definition or declaration of sub '$subname' but saw '$next_nonblank_token'\n"
);
}
}
+
+ # EOF technically ok
+ else {
+ }
+
check_prototype( $proto, $package, $subname );
}
#########################################################################
sub find_next_nonblank_token {
- my ( $i, $rtokens, $max_token_index ) = @_;
+ my ( $self, $i, $rtokens, $max_token_index ) = @_;
# Returns the next nonblank token after the token at index $i
# To skip past a side comment, and any subsequent block comments
# and blank lines, call with i=$max_token_index
+ # Skip any ending blank (fix c258). It would be cleaner if caller passed
+ # $rtoken_map, so we could check for type 'b', and avoid a regex test, but
+ # benchmarking shows that this test does not take significant time. So
+ # that would be a nice update but not essential. Also note that ending
+ # blanks will not occur for text previously processed by perltidy.
+ if ( $i == $max_token_index - 1
+ && $rtokens->[$max_token_index] =~ /^\s+$/ )
+ {
+ $i++;
+ }
+
if ( $i >= $max_token_index ) {
if ( !peeked_ahead() ) {
peeked_ahead(1);
- peek_ahead_for_nonblank_token( $rtokens, $max_token_index );
+ $self->peek_ahead_for_nonblank_token( $rtokens, $max_token_index );
}
}
my $next_nonblank_token = $rtokens->[ ++$i ];
+
+ # Any more tokens?
return ( SPACE, $i )
- unless ( defined($next_nonblank_token) && length($next_nonblank_token) );
+ if ( !defined($next_nonblank_token) || !length($next_nonblank_token) );
- # Quick test for nonblank ascii char. Note that we just have to
- # examine the first character here.
+ # Skip over whitespace
my $ord = ord( substr( $next_nonblank_token, 0, 1 ) );
- if ( $ord >= ORD_PRINTABLE_MIN
- && $ord <= ORD_PRINTABLE_MAX )
- {
- return ( $next_nonblank_token, $i );
- }
+ if (
- # Quick test to skip over an ascii space or tab
- elsif ( $ord == ORD_SPACE || $ord == ORD_TAB ) {
- $next_nonblank_token = $rtokens->[ ++$i ];
- return ( SPACE, $i ) unless defined($next_nonblank_token);
- }
+ ( $ord <= ORD_PRINTABLE_MIN || $ord >= ORD_PRINTABLE_MAX )
- # Slow test to skip over something else identified as whitespace
- elsif ( $next_nonblank_token =~ /^\s*$/ ) {
+ # Quick test for ascii space or tab
+ && (
+ ( $ord == ORD_SPACE || $ord == ORD_TAB )
+
+ # Slow test to for something else identified as whitespace
+ || $next_nonblank_token =~ /^\s+$/
+ )
+ )
+ {
$next_nonblank_token = $rtokens->[ ++$i ];
return ( SPACE, $i ) unless defined($next_nonblank_token);
}
# We should be at a nonblank now
return ( $next_nonblank_token, $i );
+
} ## end sub find_next_nonblank_token
-sub find_next_noncomment_type {
- my ( $i, $rtokens, $max_token_index ) = @_;
+sub find_next_noncomment_token {
+ my ( $self, $i, $rtokens, $max_token_index ) = @_;
# Given the current character position, look ahead past any comments
# and blank lines and return the next token, including digraphs and
# trigraphs.
my ( $next_nonblank_token, $i_next ) =
- find_next_nonblank_token( $i, $rtokens, $max_token_index );
+ $self->find_next_nonblank_token( $i, $rtokens, $max_token_index );
# skip past any side comment
if ( $next_nonblank_token eq '#' ) {
( $next_nonblank_token, $i_next ) =
- find_next_nonblank_token( $i_next, $rtokens, $max_token_index );
+ $self->find_next_nonblank_token( $i_next, $rtokens,
+ $max_token_index );
}
# check for a digraph
}
return ( $next_nonblank_token, $i_next );
-} ## end sub find_next_noncomment_type
+} ## end sub find_next_noncomment_token
sub is_possible_numerator {
+ my ( $self, $i, $rtokens, $max_token_index ) = @_;
+
# Look at the next non-comment character and decide if it could be a
- # numerator. Return
- # 1 - yes
- # 0 - can't tell
- # -1 - no
+ # numerator. Returns the following code:
+ # -1 - division not possible
+ # 0 - can't tell if division possible
+ # 1 - division possible
+ # 2 - division probable: number follows
+ # 3 - division very probable: number and one of ; ] } follow
+ # 4 - is division, not pattern: number and ) follow
- my ( $i, $rtokens, $max_token_index ) = @_;
- my $is_possible_numerator = 0;
+ my $divide_possible_code = 0;
my $next_token = $rtokens->[ $i + 1 ];
if ( $next_token eq '=' ) { $i++; } # handle /=
my ( $next_nonblank_token, $i_next ) =
- find_next_nonblank_token( $i, $rtokens, $max_token_index );
+ $self->find_next_nonblank_token( $i, $rtokens, $max_token_index );
if ( $next_nonblank_token eq '#' ) {
( $next_nonblank_token, $i_next ) =
- find_next_nonblank_token( $max_token_index, $rtokens,
+ $self->find_next_nonblank_token( $max_token_index, $rtokens,
$max_token_index );
}
- if ( $next_nonblank_token =~ /(\(|\$|\w|\.|\@)/ ) {
- $is_possible_numerator = 1;
+ if ( $next_nonblank_token =~ / [ \( \$ \w \. \@ ] /x ) {
+ $divide_possible_code = 1;
+
+ # look ahead one more token for some common patterns, such as
+ # pi/2) pi/2; pi/2}
+ if ( $next_nonblank_token =~ /^\d/ ) {
+ my ( $next_next_nonblank_token, $i_next_next_uu ) =
+ $self->find_next_nonblank_token( $i_next, $rtokens,
+ $max_token_index );
+ if ( $next_next_nonblank_token eq ')' ) {
+ $divide_possible_code = 4;
+ }
+ elsif ($next_next_nonblank_token eq ';'
+ || $next_next_nonblank_token eq ']'
+ || $next_next_nonblank_token eq '}' )
+ {
+ $divide_possible_code = 3;
+ }
+ else {
+ $divide_possible_code = 2;
+ }
+ }
}
elsif ( $next_nonblank_token =~ /^\s*$/ ) {
- $is_possible_numerator = 0;
+ $divide_possible_code = 0;
}
else {
- $is_possible_numerator = -1;
+ $divide_possible_code = -1;
}
- return $is_possible_numerator;
+ return $divide_possible_code;
} ## end sub is_possible_numerator
{ ## closure for sub pattern_expected
# '&&' and '|' instead of '||'
# /(\)|\}|\;|\&\&|\|\||and|or|while|if|unless)/
- my @q = qw( & && | || ? : + - * and or while if unless);
+ my @q = qw( & && | || ? : + - * and or while if unless );
push @q, ')', '}', ']', '>', ',', ';';
- @{pattern_test}{@q} = (1) x scalar(@q);
+ @pattern_test{@q} = (1) x scalar(@q);
} ## end BEGIN
sub pattern_expected {
+ my ( $self, $i, $rtokens, $max_token_index ) = @_;
+
# This a filter for a possible pattern.
# It looks at the token after a possible pattern and tries to
# determine if that token could end a pattern.
# 1 - yes
# 0 - can't tell
# -1 - no
- my ( $i, $rtokens, $max_token_index ) = @_;
my $is_pattern = 0;
my $next_token = $rtokens->[ $i + 1 ];
- if ( $next_token =~ /^[msixpodualgc]/ ) {
+
+ # skip a possible quote modifier
+ my $possible_modifiers = $quote_modifiers{'m'};
+ if ( $next_token =~ /^$possible_modifiers/ ) {
$i++;
- } # skip possible modifier
- my ( $next_nonblank_token, $i_next ) =
- find_next_nonblank_token( $i, $rtokens, $max_token_index );
+ }
+
+ my ( $next_nonblank_token, $i_next_uu ) =
+ $self->find_next_nonblank_token( $i, $rtokens, $max_token_index );
if ( $pattern_test{$next_nonblank_token} ) {
$is_pattern = 1;
sub find_angle_operator_termination {
+ my ( $self, $input_line, $i_beg, $rtoken_map, $expecting, $max_token_index )
+ = @_;
+
# We are looking at a '<' and want to know if it is an angle operator.
- # We are to return:
+ # Return:
# $i = pretoken index of ending '>' if found, current $i otherwise
# $type = 'Q' if found, '>' otherwise
- my ( $input_line, $i_beg, $rtoken_map, $expecting, $max_token_index ) = @_;
+
my $i = $i_beg;
my $type = '<';
pos($input_line) = 1 + $rtoken_map->[$i];
my $filter;
+ my $expecting_TERM = $expecting == TERM;
+
# we just have to find the next '>' if a term is expected
- if ( $expecting == TERM ) { $filter = '[\>]' }
+ if ($expecting_TERM) { $filter = '[\>]' }
# we have to guess if we don't know what is expected
elsif ( $expecting == UNKNOWN ) { $filter = '[\>\;\=\#\|\<]' }
# shouldn't happen - we shouldn't be here if operator is expected
else {
if (DEVEL_MODE) {
- Fault(<<EOM);
+ $self->Fault(<<EOM);
Bad call to find_angle_operator_termination
EOM
}
}
######################################debug#####
- #write_diagnostics( "ANGLE? :$str\n");
+ #$self->write_diagnostics( "ANGLE? :$str\n");
#print "ANGLE: found $1 at pos=$pos str=$str check=$check\n";
######################################debug#####
$type = 'Q';
# If this happens, it may be necessary to split the pretoken.
if ($error) {
if (DEVEL_MODE) {
- Fault(<<EOM);
+ $self->Fault(<<EOM);
unexpected error condition returned by inverse_pretoken_map
EOM
}
- warning(
+ $self->warning(
"Possible tokinization error..please check this line\n");
}
+ # Check for accidental formatting of a markup language doc...
+ # Formatting will be skipped if we set _html_tag_count_ and
+ # also set a warning of any kind.
+ my $is_html_tag;
+ my $is_first_string =
+ $i_beg == 0 && $self->[_last_line_number_] == 1;
+
+ # html comment '<!...' of any type
+ if ( $str =~ /^<\s*!/ ) {
+ $is_html_tag = 1;
+ if ($is_first_string) {
+ $self->warning(
+"looks like a markup language, continuing error checks\n"
+ );
+ }
+ }
+
+ # html end tag, something like </h1>
+ elsif ( $str =~ /^<\s*\/\w+\s*>$/ ) {
+ $is_html_tag = 1;
+ }
+
+ # xml prolog?
+ elsif ( $str =~ /^<\?xml\s.*\?>$/i && $is_first_string ) {
+ $is_html_tag = 1;
+ $self->warning(
+ "looks like a markup language, continuing error checks\n");
+ }
+ else {
+ ## doesn't look like a markup tag
+ }
+
+ if ($is_html_tag) {
+ $self->[_html_tag_count_]++;
+ }
+
# count blanks on inside of brackets
my $blank_count = 0;
$blank_count++ if ( $str =~ /<\s+/ );
# Now let's see where we stand....
# OK if math op not possible
- if ( $expecting == TERM ) {
+ if ($expecting_TERM) {
+ }
+
+ elsif ($is_html_tag) {
}
# OK if there are no more than 2 non-blank pre-tokens inside
elsif ( $i <= $i_beg + 3 + $blank_count ) {
# No longer any need to document this common case
- ## write_diagnostics("ANGLE(1 or 2 tokens): $str\n");
+ ## $self->write_diagnostics("ANGLE(1 or 2 tokens): $str\n");
}
# OK if there is some kind of identifier inside
# print $fh <tvg::INPUT>;
elsif ( $str =~ /^<\s*\$?(\w|::|\s)+\s*>$/ ) {
- write_diagnostics("ANGLE (contains identifier): $str\n");
+ $self->write_diagnostics("ANGLE (contains identifier): $str\n");
}
# Not sure..
else {
# Let's try a Brace Test: any braces inside must balance
- my $br = 0;
- while ( $str =~ /\{/g ) { $br++ }
- while ( $str =~ /\}/g ) { $br-- }
- my $sb = 0;
- while ( $str =~ /\[/g ) { $sb++ }
- while ( $str =~ /\]/g ) { $sb-- }
- my $pr = 0;
- while ( $str =~ /\(/g ) { $pr++ }
- while ( $str =~ /\)/g ) { $pr-- }
+ my $br = $str =~ tr/\{/{/ - $str =~ tr/\}/}/;
+ my $sb = $str =~ tr/\[/[/ - $str =~ tr/\]/]/;
+ my $pr = $str =~ tr/\(/(/ - $str =~ tr/\)/)/;
# if braces do not balance - not angle operator
if ( $br || $sb || $pr ) {
$i = $i_beg;
$type = '<';
- write_diagnostics(
+ $self->write_diagnostics(
"NOT ANGLE (BRACE={$br ($pr [$sb ):$str\n");
}
# Tentatively accepting this as a valid angle operator.
# There are lots more things that can be checked.
else {
- write_diagnostics(
+ $self->write_diagnostics(
"ANGLE-Guessing yes: $str expecting=$expecting\n");
- write_logfile_entry("Guessing angle operator here: $str\n");
+ $self->write_logfile_entry(
+ "Guessing angle operator here: $str\n");
}
}
}
# didn't find ending >
else {
- if ( $expecting == TERM ) {
- warning("No ending > for angle operator\n");
+ if ($expecting_TERM) {
+ $self->warning("No ending > for angle operator\n");
}
}
}
sub scan_number_do {
- # scan a number in any of the formats that Perl accepts
+ my ( $self, $input_line, $i, $rtoken_map, $input_type, $max_token_index ) =
+ @_;
+
+ # Scan a number in any of the formats that Perl accepts
# Underbars (_) are allowed in decimal numbers.
- # input parameters -
+ # Given:
# $input_line - the string to scan
# $i - pre_token index to start scanning
# $rtoken_map - reference to the pre_token map giving starting
# character position in $input_line of token $i
- # output parameters -
+ # Return:
# $i - last pre_token index of the number just scanned
+ # $type - the token type ('v' or 'n')
# number - the number (characters); or undef if not a number
- my ( $input_line, $i, $rtoken_map, $input_type, $max_token_index ) = @_;
my $pos_beg = $rtoken_map->[$i];
my $pos;
- my $i_begin = $i;
- my $number = undef;
- my $type = $input_type;
+ ##my $i_begin = $i;
+ my $number = undef;
+ my $type = $input_type;
my $first_char = substr( $input_line, $pos_beg, 1 );
# Look for bad starting characters; Shouldn't happen..
if ( $first_char !~ /[\d\.\+\-Ee]/ ) {
if (DEVEL_MODE) {
- Fault(<<EOM);
+ $self->Fault(<<EOM);
Program bug - scan_number given bad first character = '$first_char'
EOM
}
my $numc = $pos - $pos_beg;
$number = substr( $input_line, $pos_beg, $numc );
$type = 'v';
- report_v_string($number);
}
# handle octal, hex, binary
# /\G[+-]?0(([xX][0-9a-fA-F_]+)|([0-7_]+)|([bB][01_]+))/g )
# (hex) (octal) (binary)
if (
- $input_line =~
+ $input_line =~ m{
- /\G[+-]?0( # leading [signed] 0
+ \G[+-]?0( # leading [signed] 0
# a hex float, i.e. '0x0.b17217f7d1cf78p0'
([xX][0-9a-fA-F_]* # X and optional leading digits
[0-9a-fA-F_]*) # optional Additional exponent digits
# or hex integer
- |([xX][0-9a-fA-F_]+)
+ |([xX][0-9a-fA-F_]+)
# or octal fraction
|([oO]?[0-7_]+ # string of octal digits
|([oO]?[0-7_]+) # string of octal digits
# or a binary float
- |([bB][01_]* # 'b' with string of binary digits
+ |([bB][01_]* # 'b' with string of binary digits
(\.([01][01_]*)?)? # optional decimal and fraction
[Pp][+-]?[01] # Required exponent indicator, no underscore
[01_]*) # additional exponent bits
# or binary integer
- |([bB][01_]+) # 'b' with string of binary digits
+ |([bB][01_]+) # 'b' with string of binary digits
- )/gx
+ )}gx
)
{
$pos = pos($input_line);
my $error;
( $i, $error ) =
inverse_pretoken_map( $i, $pos, $rtoken_map, $max_token_index );
- if ($error) { warning("Possibly invalid number\n") }
+ if ($error) { $self->warning("Possibly invalid number\n") }
return ( $i, $type, $number );
} ## end sub scan_number_do
$i--;
last;
}
- }
+ } ## end while ( ++$i <= $max_token_index)
return ( $i, $error );
} ## end sub inverse_pretoken_map
sub find_here_doc {
- # find the target of a here document, if any
- # input parameters:
+ my (
+
+ $self,
+
+ $expecting,
+ $i,
+ $rtokens,
+ $rtoken_type,
+ $rtoken_map_uu,
+ $max_token_index
+
+ ) = @_;
+
+ # Find the target of a here document, if any
+ # Given:
# $i - token index of the second < of <<
# ($i must be less than the last token index if this is called)
- # output parameters:
+ # Return:
# $found_target = 0 didn't find target; =1 found target
# HERE_TARGET - the target string (may be empty string)
# $i - unchanged if not here doc,
# or index of the last token of the here target
# $saw_error - flag noting unbalanced quote on here target
- my ( $expecting, $i, $rtokens, $rtoken_map, $max_token_index ) = @_;
my $ibeg = $i;
my $found_target = 0;
my $here_doc_target = EMPTY_STRING;
# perl allows a backslash before the target string (heredoc.t)
my $backslash = 0;
- if ( $next_token eq '\\' ) {
+ if ( $next_token eq BACKSLASH ) {
$backslash = 1;
$next_token = $rtokens->[ $i + 2 ];
}
my $quoted_string;
(
- $i, $in_quote, $here_quote_character, $quote_pos, $quote_depth,
+
+ $i,
+ $in_quote,
+ $here_quote_character,
+ $quote_pos,
+ $quote_depth,
$quoted_string
)
- = follow_quoted_string( $i_next_nonblank, $in_quote, $rtokens,
- $here_quote_character, $quote_pos, $quote_depth, $max_token_index );
+ = $self->follow_quoted_string(
+
+ $i_next_nonblank,
+ $in_quote,
+ $rtokens,
+ $rtoken_type,
+ $here_quote_character,
+ $quote_pos,
+ $quote_depth,
+ $max_token_index
+ );
if ($in_quote) { # didn't find end of quote, so no target found
$i = $ibeg;
if ( $expecting == TERM ) {
- warning(
+ $self->warning(
"Did not find here-doc string terminator ($here_quote_character) before end of line \n"
);
$saw_error = 1;
elsif ( ( $next_token =~ /^\s*$/ ) and ( $expecting == TERM ) ) {
$found_target = 1;
- write_logfile_entry(
+ $self->write_logfile_entry(
"found blank here-target after <<; suggest using \"\"\n");
$i = $ibeg;
}
my $here_doc_expected;
if ( $expecting == UNKNOWN ) {
- $here_doc_expected = guess_if_here_doc($next_token);
+ $here_doc_expected = $self->guess_if_here_doc($next_token);
}
else {
$here_doc_expected = 1;
if ( $expecting == TERM ) {
$found_target = 1;
- write_logfile_entry("Note: bare here-doc operator <<\n");
+ $self->write_logfile_entry("Note: bare here-doc operator <<\n");
}
else {
$i = $ibeg;
sub do_quote {
- # follow (or continue following) quoted string(s)
- # $in_quote return code:
- # 0 - ok, found end
- # 1 - still must find end of quote whose target is $quote_character
- # 2 - still looking for end of first of two quotes
- #
- # Returns updated strings:
- # $quoted_string_1 = quoted string seen while in_quote=1
- # $quoted_string_2 = quoted string seen while in_quote=2
my (
+ $self,
+
$i,
$in_quote,
$quote_character,
$quoted_string_1,
$quoted_string_2,
$rtokens,
- $rtoken_map,
+ $rtoken_type,
+ $rtoken_map_uu,
$max_token_index,
) = @_;
+ # Follow (or continue following) quoted string(s)
+ # $in_quote = return code:
+ # 0 - ok, found end
+ # 1 - still must find end of quote whose target is $quote_character
+ # 2 - still looking for end of first of two quotes
+ #
+ # Returns updated strings:
+ # $quoted_string_1 = quoted string seen while in_quote=1
+ # $quoted_string_2 = quoted string seen while in_quote=2
+
my $quoted_string;
if ( $in_quote == 2 ) { # two quotes/quoted_string_1s to follow
my $ibeg = $i;
(
- $i, $in_quote, $quote_character, $quote_pos, $quote_depth,
+
+ $i,
+ $in_quote,
+ $quote_character,
+ $quote_pos,
+ $quote_depth,
$quoted_string
)
- = follow_quoted_string( $ibeg, $in_quote, $rtokens, $quote_character,
- $quote_pos, $quote_depth, $max_token_index );
+ = $self->follow_quoted_string(
+
+ $ibeg,
+ $in_quote,
+ $rtokens,
+ $rtoken_type,
+ $quote_character,
+ $quote_pos,
+ $quote_depth,
+ $max_token_index
+ );
$quoted_string_2 .= $quoted_string;
if ( $in_quote == 1 ) {
if ( $quote_character =~ /[\{\[\<\(]/ ) { $i++; }
if ( $in_quote == 1 ) { # one (more) quote to follow
my $ibeg = $i;
(
- $i, $in_quote, $quote_character, $quote_pos, $quote_depth,
+
+ $i,
+ $in_quote,
+ $quote_character,
+ $quote_pos,
+ $quote_depth,
$quoted_string
)
- = follow_quoted_string( $ibeg, $in_quote, $rtokens, $quote_character,
- $quote_pos, $quote_depth, $max_token_index );
+ = $self->follow_quoted_string(
+
+ $ibeg,
+ $in_quote,
+ $rtokens,
+ $rtoken_type,
+ $quote_character,
+ $quote_pos,
+ $quote_depth,
+ $max_token_index
+ );
$quoted_string_1 .= $quoted_string;
if ( $in_quote == 1 ) {
$quoted_string_1 .= "\n";
);
} ## end sub do_quote
+# Some possible non-word quote delimiters, for preliminary checking
+my %is_punct_char;
+
+BEGIN {
+
+ my @q = qw# / " ' { } ( ) [ ] < > ; + - * | % ! x ~ = ? : . ^ & #;
+ push @q, '#';
+ push @q, ',';
+ @is_punct_char{@q} = (1) x scalar(@q);
+}
+
sub follow_quoted_string {
- # scan for a specific token, skipping escaped characters
- # if the quote character is blank, use the first non-blank character
- # input parameters:
- # $rtokens = reference to the array of tokens
- # $i = the token index of the first character to search
- # $in_quote = number of quoted strings being followed
- # $beginning_tok = the starting quote character
- # $quote_pos = index to check next for alphanumeric delimiter
- # output parameters:
- # $i = the token index of the ending quote character
- # $in_quote = decremented if found end, unchanged if not
- # $beginning_tok = the starting quote character
- # $quote_pos = index to check next for alphanumeric delimiter
- # $quote_depth = nesting depth, since delimiters '{ ( [ <' can be nested.
- # $quoted_string = the text of the quote (without quotation tokens)
my (
+ $self,
+
$i_beg,
$in_quote,
$rtokens,
+ $rtoken_type,
$beginning_tok,
$quote_pos,
$quote_depth,
) = @_;
+ # Scan for a specific token, skipping escaped characters.
+ # If the quote character is blank, use the first non-blank character.
+ # Given:
+ # $rtokens = reference to the array of tokens
+ # $i = the token index of the first character to search
+ # $in_quote = number of quoted strings being followed
+ # $beginning_tok = the starting quote character
+ # $quote_pos = index to check next for alphanumeric delimiter
+ # Return:
+ # $i = the token index of the ending quote character
+ # $in_quote = decremented if found end, unchanged if not
+ # $beginning_tok = the starting quote character
+ # $quote_pos = index to check next for alphanumeric delimiter
+ # $quote_depth = nesting depth, since delimiters '{ ( [ <' can be nested.
+ # $quoted_string = the text of the quote (without quotation tokens)
my ( $tok, $end_tok );
my $i = $i_beg - 1;
my $quoted_string = EMPTY_STRING;
0 && do {
- print STDOUT
+ print {*STDOUT}
"QUOTE entering with quote_pos = $quote_pos i=$i beginning_tok =$beginning_tok\n";
};
- # get the corresponding end token
- if ( $beginning_tok !~ /^\s*$/ ) {
- $end_tok = matching_end_token($beginning_tok);
+ # for a non-blank token, get the corresponding end token
+ if (
+ $is_punct_char{$beginning_tok}
+ || ( length($beginning_tok)
+ && $beginning_tok !~ /^\s+$/ )
+ )
+ {
+ $end_tok =
+ $matching_end_token{$beginning_tok}
+ ? $matching_end_token{$beginning_tok}
+ : $beginning_tok;
}
- # a blank token means we must find and use the first non-blank one
+ # for a blank token, find and use the first non-blank one
else {
my $allow_quote_comments = ( $i < 0 ) ? 1 : 0; # i<0 means we saw a <cr>
while ( $i < $max_token_index ) {
$tok = $rtokens->[ ++$i ];
- if ( $tok !~ /^\s*$/ ) {
+ if ( $rtoken_type->[$i] ne 'b' ) {
if ( ( $tok eq '#' ) && ($allow_quote_comments) ) {
$i = $max_token_index;
$beginning_tok = $tok;
$quote_pos = 0;
}
- $end_tok = matching_end_token($beginning_tok);
+ $end_tok =
+ $matching_end_token{$beginning_tok}
+ ? $matching_end_token{$beginning_tok}
+ : $beginning_tok;
$quote_depth = 1;
last;
}
else {
$allow_quote_comments = 1;
}
- }
+ } ## end while ( $i < $max_token_index)
}
# There are two different loops which search for the ending quote
# Case 1 (rare): loop for case of alphanumeric quote delimiter..
# "quote_pos" is the position the current word to begin searching
#----------------------------------------------------------------
- if ( $beginning_tok =~ /\w/ ) {
+ if ( !$is_punct_char{$beginning_tok} && $beginning_tok =~ /\w/ ) {
# Note this because it is not recommended practice except
# for obfuscated perl contests
if ( $in_quote == 1 ) {
- write_logfile_entry(
+ $self->write_logfile_entry(
"Note: alphanumeric quote delimiter ($beginning_tok) \n");
}
if ( $quote_pos == 0 || ( $i < 0 ) ) {
$tok = $rtokens->[ ++$i ];
- if ( $tok eq '\\' ) {
+ if ( $tok eq BACKSLASH ) {
# retain backslash unless it hides the end token
$quoted_string .= $tok
- unless $rtokens->[ $i + 1 ] eq $end_tok;
+ unless ( $rtokens->[ $i + 1 ] eq $end_tok );
$quote_pos++;
last if ( $i >= $max_token_index );
$tok = $rtokens->[ ++$i ];
}
my $old_pos = $quote_pos;
- unless ( defined($tok) && defined($end_tok) && defined($quote_pos) )
- {
-
- }
$quote_pos = 1 + index( $tok, $end_tok, $quote_pos );
if ( $quote_pos > 0 ) {
$quoted_string .= substr( $tok, $old_pos );
}
}
- }
+ } ## end while ( $i <= $max_token_index)
}
#-----------------------------------------------------------------------
elsif ( $tok eq $beginning_tok ) {
$quote_depth++;
}
- elsif ( $tok eq '\\' ) {
+ elsif ( $tok eq BACKSLASH ) {
# retain backslash unless it hides the beginning or end token
$tok = $rtokens->[ ++$i ];
- $quoted_string .= '\\'
- unless ( $tok eq $end_tok || $tok eq $beginning_tok );
+ $quoted_string .= BACKSLASH
+ if ( $tok ne $end_tok && $tok ne $beginning_tok );
+ }
+ else {
+ ## nothing special
}
$quoted_string .= $tok;
- }
+ } ## end while ( $i < $max_token_index)
}
if ( $i > $max_token_index ) { $i = $max_token_index }
return (
} ## end sub follow_quoted_string
sub indicate_error {
- my ( $msg, $line_number, $input_line, $pos, $carrat ) = @_;
- interrupt_logfile();
- warning($msg);
- write_error_indicator_pair( $line_number, $input_line, $pos, $carrat );
- resume_logfile();
+ my ( $self, $msg, $line_number, $input_line, $pos, $caret ) = @_;
+
+ # write input line and line with carat's showing where error was detected
+ $self->interrupt_logfile();
+ $self->warning($msg);
+ $self->write_error_indicator_pair( $line_number, $input_line, $pos,
+ $caret );
+ $self->resume_logfile();
return;
} ## end sub indicate_error
sub write_error_indicator_pair {
- my ( $line_number, $input_line, $pos, $carrat ) = @_;
+ my ( $self, $line_number, $input_line, $pos, $caret ) = @_;
my ( $offset, $numbered_line, $underline ) =
make_numbered_line( $line_number, $input_line, $pos );
- $underline = write_on_underline( $underline, $pos - $offset, $carrat );
- warning( $numbered_line . "\n" );
- $underline =~ s/\s*$//;
- warning( $underline . "\n" );
+ $underline = write_on_underline( $underline, $pos - $offset, $caret );
+ $self->warning( $numbered_line . "\n" );
+ $underline =~ s/\s+$//;
+ $self->warning( $underline . "\n" );
return;
} ## end sub write_error_indicator_pair
sub make_numbered_line {
- # Given an input line, its line number, and a character position of
- # interest, create a string not longer than 80 characters of the form
+ my ( $lineno, $str, $pos ) = @_;
+
+ # Given:
+ # $lineno=line number
+ # $str = an input line
+ # $pos = character position of interest
+ # Create a string not longer than 80 characters of the form:
# $lineno: sub_string
- # such that the sub_string of $str contains the position of interest
+ # such that the sub_string of $str contains the position of interest
#
# Here is an example of what we want, in this case we add trailing
# '...' because the line is long.
# - $underline = a blank 'underline' which is all spaces with the same
# number of characters as the numbered line.
- my ( $lineno, $str, $pos ) = @_;
my $offset = ( $pos < 60 ) ? 0 : $pos - 40;
my $excess = length($str) - $offset - 68;
my $numc = ( $excess > 0 ) ? 68 : undef;
sub write_on_underline {
+ my ( $underline, $pos, $pos_chr ) = @_;
+
# The "underline" is a string that shows where an error is; it starts
# out as a string of blanks with the same length as the numbered line of
# code above it, and we have to add marking to show where an error is.
# This is a trivial thing to do with substr, but there is some
# checking to do.
- my ( $underline, $pos, $pos_chr ) = @_;
-
# check for error..shouldn't happen
- unless ( ( $pos >= 0 ) && ( $pos <= length($underline) ) ) {
+ if ( $pos < 0 || $pos > length($underline) ) {
return $underline;
}
my $excess = length($pos_chr) + $pos - length($underline);
if ( $excess > 0 ) {
$pos_chr = substr( $pos_chr, 0, length($pos_chr) - $excess );
}
- substr( $underline, $pos, length($pos_chr) ) = $pos_chr;
+ substr( $underline, $pos, length($pos_chr), $pos_chr );
return ($underline);
} ## end sub write_on_underline
sub pre_tokenize {
- my ( $str, $max_tokens_wanted ) = @_;
+ my ( $str, ($max_tokens_wanted) ) = @_;
- # Input parameter:
+ # Input parameters:
+ # $str = string to be parsed
# $max_tokens_wanted > 0 to stop on reaching this many tokens.
- # = 0 means get all tokens
-
- # Break a string, $str, into a sequence of preliminary tokens. We
- # are interested in these types of tokens:
- # words (type='w'), example: 'max_tokens_wanted'
- # digits (type = 'd'), example: '0755'
- # whitespace (type = 'b'), example: ' '
- # any other single character (i.e. punct; type = the character itself).
- # We cannot do better than this yet because we might be in a quoted
- # string or pattern. Caller sets $max_tokens_wanted to 0 to get all
- # tokens.
+ # = undef or 0 means get all tokens
+
+ # Break a string, $str, into a sequence of preliminary tokens (pre-tokens).
+ # We look for these types of tokens:
+ # words (type='w'), example: 'max_tokens_wanted'
+ # digits (type = 'd'), example: '0755'
+ # whitespace (type = 'b'), example: ' '
+ # single character punct (type = char) example: '='
+
+ # Later operations will combine one or more of these pre-tokens into final
+ # tokens. We cannot do better than this yet because we might be in a
+ # quoted string or pattern.
# An advantage of doing this pre-tokenization step is that it keeps almost
- # all of the regex work highly localized. A disadvantage is that in some
- # very rare instances we will have to go back and split a pre-token.
+ # all of the regex parsing very simple and localized right here. A
+ # disadvantage is that in some extremely rare instances we will have to go
+ # back and split a pre-token.
# Return parameters:
my @tokens = (); # array of the tokens themselves
my @token_map = (0); # string position of start of each token
my @type = (); # 'b'=whitespace, 'd'=digits, 'w'=alpha, or punct
- do {
-
- # whitespace
- if ( $str =~ /\G(\s+)/gc ) { push @type, 'b'; }
+ if ( !$max_tokens_wanted ) { $max_tokens_wanted = -1 }
- # numbers
- # note that this must come before words!
- elsif ( $str =~ /\G(\d+)/gc ) { push @type, 'd'; }
+ while ( $max_tokens_wanted-- ) {
- # words
- elsif ( $str =~ /\G(\w+)/gc ) { push @type, 'w'; }
-
- # single-character punctuation
- elsif ( $str =~ /\G(\W)/gc ) { push @type, $1; }
+ if (
+ $str =~ m{
+ \G(
+ (\s+) # type 'b' = whitespace - this must come before \W
+ | (\W) # or type=char = single-character, non-whitespace punct
+ | (\d+) # or type 'd' = sequence of digits - must come before \w
+ | (\w+) # or type 'w' = words not starting with a digit
+ )
+ }gcx
+ )
+ {
+ push @tokens, $1;
+ push @type,
+ defined($2) ? 'b' : defined($3) ? $1 : defined($4) ? 'd' : 'w';
+ push @token_map, pos($str);
+ }
# that's all..
else {
return ( \@tokens, \@token_map, \@type );
}
-
- push @tokens, $1;
- push @token_map, pos($str);
-
- } while ( --$max_tokens_wanted != 0 );
+ } ## end while ( $max_tokens_wanted...)
return ( \@tokens, \@token_map, \@type );
} ## end sub pre_tokenize
sub show_tokens {
- # this is an old debug routine
- # not called, but saved for reference
+ # This is an uncalled debug routine, saved for reference
my ( $rtokens, $rtoken_map ) = @_;
my $num = scalar( @{$rtokens} );
foreach my $i ( 0 .. $num - 1 ) {
my $len = length( $rtokens->[$i] );
- print STDOUT "$i:$len:$rtoken_map->[$i]:$rtokens->[$i]:\n";
+ print {*STDOUT} "$i:$len:$rtoken_map->[$i]:$rtokens->[$i]:\n";
}
return;
} ## end sub show_tokens
-{ ## closure for sub matching end token
- my %matching_end_token;
-
- BEGIN {
- %matching_end_token = (
- '{' => '}',
- '(' => ')',
- '[' => ']',
- '<' => '>',
- );
- } ## end BEGIN
-
- sub matching_end_token {
-
- # return closing character for a pattern
- my $beginning_token = shift;
- if ( $matching_end_token{$beginning_token} ) {
- return $matching_end_token{$beginning_token};
- }
- return ($beginning_token);
- } ## end sub matching_end_token
-}
-
sub dump_token_types {
my ( $class, $fh ) = @_;
# adding NEW_TOKENS: add a comment here
$fh->print(<<'END_OF_LIST');
-Here is a list of the token types currently used for lines of type 'CODE'.
-For the following tokens, the "type" of a token is just the token itself.
+Here is a list of the token types currently used for lines of type 'CODE'.
+For the following tokens, the "type" of a token is just the token itself.
.. :: << >> ** && .. || // -> => += -= .= %= &= |= ^= *= <>
( ) <= >= == =~ !~ != ++ -- /= x=
-... **= <<= >>= &&= ||= //= <=>
-, + - / * | % ! x ~ = \ ? : . < > ^ &
+... **= <<= >>= &&= ||= //= <=>
+, + - / * | % ! x ~ = \ ? : . < > ^ & ^^
The following additional token types are defined:
type meaning
- b blank (white space)
+ b blank (white space)
{ indent: opening structural curly brace or square bracket or paren
(code block, anonymous hash reference, or anonymous array reference)
} outdent: right structural curly brace or square bracket or paren
( left non-structural paren (all but a list right of an =)
) right non-structural paren
L left non-structural curly brace (enclosing a key)
- R right non-structural curly brace
+ R right non-structural curly brace
; terminal semicolon
f indicates a semicolon in a "for" statement
h here_doc operator <<
C user-defined constant or constant function (with void prototype = ())
U user-defined function taking parameters
G user-defined function taking block parameter (like grep/map/eval)
- M (unused, but reserved for subroutine definition name)
- P (unused, but -html uses it to label pod text)
- t type indicater such as %,$,@,*,&,sub
+ S sub definition (reported as type 'i' in older versions)
+ P package definition (reported as type 'i' in older versions)
+ t type indicator such as %,$,@,*,&,sub
w bare word (perhaps a subroutine call)
i identifier of some type (with leading %, $, @, *, &, sub, -> )
n a number
p unary +
m unary -
pp pre-increment operator ++
- mm pre-decrement operator --
+ mm pre-decrement operator --
A : used as attribute separator
-
+
Here are the '_line_type' codes used internally:
SYSTEM - system-specific code before hash-bang line
CODE - line of perl code (including comments)
return;
} ## end sub dump_token_types
+#------------------
+# About Token Types
+#------------------
+
+# The array "valid_token_types" in the BEGIN section has an up-to-date list
+# of token types. Sub 'dump_token_types' should be kept up to date with
+# token types.
+
+# Ideally, tokens are the smallest pieces of text
+# such that a newline may be inserted between any pair of tokens without
+# changing or invalidating the program. This version comes close to this,
+# although there are necessarily a few exceptions which must be caught by
+# the formatter. Many of these involve the treatment of bare words.
+#
+# To simplify things, token types are either a single character, or they
+# are identical to the tokens themselves.
+#
+# As a debugging aid, the -D flag creates a file containing a side-by-side
+# comparison of the input string and its tokenization for each line of a file.
+# This is an invaluable debugging aid.
+#
+# In addition to tokens, and some associated quantities, the tokenizer
+# also returns flags indication any special line types. These include
+# quotes, here_docs, formats.
+#
+#------------------
+# Adding NEW_TOKENS
+#------------------
+#
+# Here are some notes on the minimal steps. I wrote these notes while
+# adding the 'v' token type for v-strings, which are things like version
+# numbers 5.6.0, and ip addresses, and will use that as an example. ( You
+# can use your editor to search for the string "NEW_TOKENS" to find the
+# appropriate sections to change):
+
+# *. For another example, search for the smartmatch operator '~~'
+# with your editor to see where updates were made for it.
+
+# *. For another example, search for the string 'c250', which shows
+# locations where changes for new types 'P' and 'S' were made.
+
+# *. Think of a new, unused character for the token type, and add to
+# the array @valid_token_types in the BEGIN section of this package.
+# For example, I used 'v' for v-strings.
+#
+# *. Implement coding to recognize the $type of the token in this routine.
+# This is the hardest part, and is best done by imitating or modifying
+# some of the existing coding. For example, to recognize v-strings, I
+# patched 'sub scan_bare_identifier' to recognize v-strings beginning with
+# 'v' and 'sub scan_number' to recognize v-strings without the leading 'v'.
+#
+# *. Update sub operator_expected. This update is critically important but
+# the coding is trivial. Look at the comments in that routine for help.
+# For v-strings, which should behave like numbers, I just added 'v' to the
+# regex used to handle numbers and strings (types 'n' and 'Q').
+#
+# *. Implement a 'bond strength' rule in sub set_bond_strengths in
+# Perl::Tidy::Formatter for breaking lines around this token type. You can
+# skip this step and take the default at first, then adjust later to get
+# desired results. For adding type 'v', I looked at sub bond_strength and
+# saw that number type 'n' was using default strengths, so I didn't do
+# anything. I may tune it up someday if I don't like the way line
+# breaks with v-strings look.
+#
+# *. Implement a 'whitespace' rule in sub set_whitespace_flags in
+# Perl::Tidy::Formatter. For adding type 'v', I looked at this routine
+# and saw that type 'n' used spaces on both sides, so I just added 'v'
+# to the array @spaces_both_sides.
+#
+# *. Update HtmlWriter package so that users can colorize the token as
+# desired. This is quite easy; see comments identified by 'NEW_TOKENS' in
+# that package. For v-strings, I initially chose to use a default color
+# equal to the default for numbers, but it might be nice to change that
+# eventually.
+#
+# *. Update comments in Perl::Tidy::Tokenizer::dump_token_types.
+#
+# *. Run lots and lots of debug tests. Start with special files designed
+# to test the new token type. Run with the -D flag to create a .DEBUG
+# file which shows the tokenization. When these work ok, test as many old
+# scripts as possible. Start with all of the '.t' files in the 'test'
+# directory of the distribution file. Compare .tdy output with previous
+# version and updated version to see the differences. Then include as
+# many more files as possible. My own technique has been to collect a huge
+# number of perl scripts (thousands!) into one directory and run perltidy
+# *, then run diff between the output of the previous version and the
+# current version.
+
BEGIN {
# These names are used in error messages
my @q;
- my @digraphs = qw(
+ my @digraphs = qw#
.. :: << >> ** && || // -> => += -= .= %= &= |= ^= *= <>
- <= >= == =~ !~ != ++ -- /= x= ~~ ~. |. &. ^.
- );
+ <= >= == =~ !~ != ++ -- /= x= ~~ ~. |. &. ^. ^^
+ #;
@is_digraph{@digraphs} = (1) x scalar(@digraphs);
@q = qw(
);
@can_start_digraph{@q} = (1) x scalar(@q);
- my @trigraphs = qw( ... **= <<= >>= &&= ||= //= <=> !~~ &.= |.= ^.= <<~);
+ my @trigraphs = qw( ... **= <<= >>= &&= ||= //= <=> !~~ &.= |.= ^.= <<~ );
@is_trigraph{@trigraphs} = (1) x scalar(@trigraphs);
my @tetragraphs = qw( <<>> );
# make a hash of all valid token types for self-checking the tokenizer
# (adding NEW_TOKENS : select a new character and add to this list)
+ # fix for c250: added new token type 'P' and 'S'
my @valid_token_types = qw#
- A b C G L R f h Q k t w i q n p m F pp mm U j J Y Z v
+ A b C G L R f h Q k t w i q n p m F pp mm U j J Y Z v P S
{ } ( ) [ ] ; + - / * | % ! x ~ = \ ? : . < > ^ &
#;
push( @valid_token_types, @digraphs );
# a list of file test letters, as in -e (Table 3-4 of 'camel 3')
my @file_test_operators =
- qw( A B C M O R S T W X b c d e f g k l o p r s t u w x z);
+ qw( A B C M O R S T W X b c d e f g k l o p r s t u w x z );
@is_file_test_operator{@file_test_operators} =
(1) x scalar(@file_test_operators);
@is_block_operator{@q} = (1) x scalar(@q);
# these functions allow an identifier in the indirect object slot
- @q = qw( print printf sort exec system say);
+ @q = qw( print printf sort exec system say );
@is_indirect_object_taker{@q} = (1) x scalar(@q);
# Note: 'field' will be added by sub check_options if --use-feature=class
- @q = qw(my our state);
+ @q = qw( my our state );
@is_my_our_state{@q} = (1) x scalar(@q);
# These tokens may precede a code block
# now and we could let the extended-syntax coding handle them.
# Added 'default' for Switch::Plain.
# Note: 'ADJUST' will be added by sub check_options if --use-feature=class
- @q =
- qw( BEGIN END CHECK INIT AUTOLOAD DESTROY UNITCHECK continue if elsif else
- unless do while until eval for foreach map grep sort
- switch case given when default catch try finally);
+ @q = qw(
+ BEGIN END CHECK INIT AUTOLOAD DESTROY
+ UNITCHECK continue if elsif else unless
+ do while until eval for foreach
+ map grep sort switch case given
+ when default catch try finally
+ );
@is_code_block_token{@q} = (1) x scalar(@q);
# Note: this hash was formerly named '%is_not_zero_continuation_block_type'
# to their right, or at least they are not expected to be followed
# immediately by operators.
my @value_requestor = qw(
- AUTOLOAD
- BEGIN
- CHECK
- DESTROY
- END
- EQ
- GE
- GT
- INIT
- LE
- LT
- NE
- UNITCHECK
- abs
- accept
- alarm
- and
- atan2
- bind
- binmode
- bless
- break
- caller
- chdir
- chmod
- chomp
- chop
- chown
- chr
- chroot
- close
- closedir
- cmp
- connect
- continue
- cos
- crypt
- dbmclose
- dbmopen
- defined
- delete
- die
- dump
- each
- else
- elsif
- eof
- eq
- evalbytes
- exec
- exists
- exit
- exp
- fc
- fcntl
- fileno
- flock
- for
- foreach
- formline
- ge
- getc
- getgrgid
- getgrnam
- gethostbyaddr
- gethostbyname
- getnetbyaddr
- getnetbyname
- getpeername
- getpgrp
- getpriority
- getprotobyname
- getprotobynumber
- getpwnam
- getpwuid
- getservbyname
- getservbyport
- getsockname
- getsockopt
- glob
- gmtime
- goto
- grep
- gt
- hex
- if
- index
- int
- ioctl
- join
- keys
- kill
- last
- lc
- lcfirst
- le
- length
- link
- listen
- local
- localtime
- lock
- log
- lstat
- lt
- map
- mkdir
- msgctl
- msgget
- msgrcv
- msgsnd
- my
- ne
- next
- no
- not
- oct
- open
- opendir
- or
- ord
- our
- pack
- pipe
- pop
- pos
- print
- printf
- prototype
- push
- quotemeta
- rand
- read
- readdir
- readlink
- readline
- readpipe
- recv
- redo
- ref
- rename
- require
- reset
- return
- reverse
- rewinddir
- rindex
- rmdir
- scalar
- seek
- seekdir
- select
- semctl
- semget
- semop
- send
- sethostent
- setnetent
- setpgrp
- setpriority
- setprotoent
- setservent
- setsockopt
- shift
- shmctl
- shmget
- shmread
- shmwrite
- shutdown
- sin
- sleep
- socket
- socketpair
- sort
- splice
- split
- sprintf
- sqrt
- srand
- stat
- state
- study
- substr
- symlink
- syscall
- sysopen
- sysread
- sysseek
- system
- syswrite
- tell
- telldir
- tie
- tied
- truncate
- uc
- ucfirst
- umask
- undef
- unless
- unlink
- unpack
- unshift
- untie
- until
- use
- utime
- values
- vec
- waitpid
- warn
- while
- write
- xor
-
- switch
- case
- default
- given
- when
- err
- say
- isa
-
- catch
-
+ AUTOLOAD BEGIN CHECK DESTROY
+ END EQ GE GT
+ INIT LE LT NE
+ UNITCHECK abs accept alarm
+ and atan2 bind binmode
+ bless break caller chdir
+ chmod chomp chop chown
+ chr chroot close closedir
+ cmp connect continue cos
+ crypt dbmclose dbmopen defined
+ delete die dump each
+ else elsif eof eq
+ evalbytes exec exists exit
+ exp fc fcntl fileno
+ flock for foreach formline
+ ge getc getgrgid getgrnam
+ gethostbyaddr gethostbyname getnetbyaddr getnetbyname
+ getpeername getpgrp getpriority getprotobyname
+ getprotobynumber getpwnam getpwuid getservbyname
+ getservbyport getsockname getsockopt glob
+ gmtime goto grep gt
+ hex if index int
+ ioctl join keys kill
+ last lc lcfirst le
+ length link listen local
+ localtime lock log lstat
+ lt map mkdir msgctl
+ msgget msgrcv msgsnd my
+ ne next no not
+ oct open opendir or
+ ord our pack pipe
+ pop pos print printf
+ prototype push quotemeta rand
+ read readdir readlink readline
+ readpipe recv redo ref
+ rename require reset return
+ reverse rewinddir rindex rmdir
+ scalar seek seekdir select
+ semctl semget semop send
+ sethostent setnetent setpgrp setpriority
+ setprotoent setservent setsockopt shift
+ shmctl shmget shmread shmwrite
+ shutdown sin sleep socket
+ socketpair sort splice split
+ sprintf sqrt srand stat
+ state study substr symlink
+ syscall sysopen sysread sysseek
+ system syswrite tell telldir
+ tie tied truncate uc
+ ucfirst umask undef unless
+ unlink unpack unshift untie
+ until use utime values
+ vec waitpid warn while
+ write xor case catch
+ default err given isa
+ say switch when
);
# Note: 'ADJUST', 'field' are added by sub check_options
push( @Keywords, @value_requestor );
# These are treated the same but are not keywords:
- my @extra_vr = qw(
- constant
- vars
- );
+ my @extra_vr = qw( constant vars );
push( @value_requestor, @extra_vr );
@expecting_term_token{@value_requestor} = (1) x scalar(@value_requestor);
# so that they might be followed by an operator, or at least
# not a term.
my @operator_requestor = qw(
- endgrent
- endhostent
- endnetent
- endprotoent
- endpwent
- endservent
- fork
- getgrent
- gethostent
- getlogin
- getnetent
- getppid
- getprotoent
- getpwent
- getservent
- setgrent
- setpwent
- time
- times
- wait
+ endgrent endhostent endnetent endprotoent
+ endpwent endservent fork getgrent
+ gethostent getlogin getnetent getppid
+ getprotoent getpwent getservent setgrent
+ setpwent time times wait
wantarray
);
push( @Keywords, @operator_requestor );
# These are treated the same but are not considered keywords:
- my @extra_or = qw(
- STDERR
- STDIN
- STDOUT
- );
+ my @extra_or = qw( STDERR STDIN STDOUT );
push( @operator_requestor, @extra_or );
# these token TYPES expect trailing operator but not a term
# note: ++ and -- are post-increment and decrement, 'C' = constant
my @operator_requestor_types = qw( ++ -- C <> q );
+
+ # NOTE: This hash is available but not currently used
@expecting_operator_types{@operator_requestor_types} =
(1) x scalar(@operator_requestor_types);
**= += -= .= /= *= %= x= &= |= ^= <<= >>= &&= ||= //=
<= >= == != => \ > < % * / ? & | ** <=> ~~ !~~ <<~
f F pp mm Y p m U J G j >> << ^ t
- ~. ^. |. &. ^.= |.= &.=
+ ~. ^. |. &. ^.= |.= &.= ^^
#;
push( @value_requestor_type, ',' )
; # (perl doesn't like a ',' in a qw block)
+
+ # NOTE: This hash is available but not currently used
@expecting_term_types{@value_requestor_type} =
(1) x scalar(@value_requestor_type);
# For simple syntax checking, it is nice to have a list of operators which
# will really be unhappy if not followed by a term. This includes most
# of the above...
- %really_want_term = %expecting_term_types;
+ @really_want_term{@value_requestor_type} =
+ (1) x scalar(@value_requestor_type);
# with these exceptions...
delete $really_want_term{'U'}; # user sub, depends on prototype
delete $really_want_term{'F'}; # file test works on $_ if no following term
delete $really_want_term{'Y'}; # indirect object, too risky to check syntax;
# let perl do it
- @q = qw(q qq qx qr s y tr m);
+ @q = qw( q qq qx qr s y tr m );
@is_q_qq_qx_qr_s_y_tr_m{@q} = (1) x scalar(@q);
# Note added 'qw' here
- @q = qw(q qq qw qx qr s y tr m);
+ @q = qw( q qq qw qx qr s y tr m );
@is_q_qq_qw_qx_qr_s_y_tr_m{@q} = (1) x scalar(@q);
+ # Quote modifiers:
+ # original ref: camel 3 p 147,
+ # but perl may accept undocumented flags
+ # perl 5.10 adds 'p' (preserve)
+ # Perl version 5.22 added 'n'
+ # From http://perldoc.perl.org/perlop.html we have
+ # /PATTERN/msixpodualngc or m?PATTERN?msixpodualngc
+ # s/PATTERN/REPLACEMENT/msixpodualngcer
+ # y/SEARCHLIST/REPLACEMENTLIST/cdsr
+ # tr/SEARCHLIST/REPLACEMENTLIST/cdsr
+ # qr/STRING/msixpodualn
+ %quote_modifiers = (
+ 's' => '[msixpodualngcer]',
+ 'y' => '[cdsr]',
+ 'tr' => '[cdsr]',
+ 'm' => '[msixpodualngc]',
+ 'qr' => '[msixpodualn]',
+ 'q' => EMPTY_STRING,
+ 'qq' => EMPTY_STRING,
+ 'qw' => EMPTY_STRING,
+ 'qx' => EMPTY_STRING,
+ );
+
# Note: 'class' will be added by sub check_options if -use-feature=class
- @q = qw(package);
+ @q = qw( package );
@is_package{@q} = (1) x scalar(@q);
- @q = qw( ? : );
- push @q, ',';
- @is_comma_question_colon{@q} = (1) x scalar(@q);
-
@q = qw( if elsif unless );
@is_if_elsif_unless{@q} = (1) x scalar(@q);
@other_line_endings{@q} = (1) x scalar(@q);
# These keywords are handled specially in the tokenizer code:
- my @special_keywords = qw(
- do
- eval
- format
- m
- package
- q
- qq
- qr
- qw
- qx
- s
- sub
- tr
- y
- );
+ my @special_keywords =
+ qw( do eval format m package q qq qr qw qx s sub tr y );
push( @Keywords, @special_keywords );
# Keywords after which list formatting may be used
# WARNING: do not include |map|grep|eval or perl may die on
# syntax errors (map1.t).
my @keyword_taking_list = qw(
- and
- chmod
- chomp
- chop
- chown
- dbmopen
- die
- elsif
- exec
- fcntl
- for
- foreach
- formline
- getsockopt
- if
- index
- ioctl
- join
- kill
- local
- msgctl
- msgrcv
- msgsnd
- my
- open
- or
- our
- pack
- print
- printf
- push
- read
- readpipe
- recv
- return
- reverse
- rindex
- seek
- select
- semctl
- semget
- send
- setpriority
- setsockopt
- shmctl
- shmget
- shmread
- shmwrite
- socket
- socketpair
- sort
- splice
- split
- sprintf
- state
- substr
- syscall
- sysopen
- sysread
- sysseek
- system
- syswrite
- tie
- unless
- unlink
- unpack
- unshift
- until
- vec
- warn
+ and chmod chomp chop
+ chown dbmopen die elsif
+ exec fcntl for foreach
+ formline getsockopt given if
+ index ioctl join kill
+ local msgctl msgrcv msgsnd
+ my open or our
+ pack print printf push
+ read readpipe recv return
+ reverse rindex seek select
+ semctl semget send setpriority
+ setsockopt shmctl shmget shmread
+ shmwrite socket socketpair sort
+ splice split sprintf state
+ substr syscall sysopen sysread
+ sysseek system syswrite tie
+ unless unlink unpack unshift
+ until vec warn when
while
- given
- when
);
+
+ # NOTE: This hash is available but not currently used
@is_keyword_taking_list{@keyword_taking_list} =
(1) x scalar(@keyword_taking_list);
# This list is used to decide if a pattern delimited by slashes, /pattern/,
# can follow one of these keywords.
- @q = qw(
- chomp eof eval fc lc pop shift uc undef
- );
+ @q = qw( chomp eof eval fc lc pop shift uc undef );
@is_keyword_rejecting_slash_as_pattern_delimiter{@q} =
(1) x scalar(@q);
# currently only used to disambiguate a ? used as a ternary from one used
# as a (deprecated) pattern delimiter. In the future, they might be used
# to give a warning about ambiguous syntax before a /.
- # Note: split has been omitted (see not below).
+ # Note: split has been omitted (see note below).
my @keywords_taking_optional_arg = qw(
- abs
- alarm
- caller
- chdir
- chomp
- chop
- chr
- chroot
- close
- cos
- defined
- die
- eof
- eval
- evalbytes
- exit
- exp
- fc
- getc
- glob
- gmtime
- hex
- int
- last
- lc
- lcfirst
- length
- localtime
- log
- lstat
- mkdir
- next
- oct
- ord
- pop
- pos
- print
- printf
- prototype
- quotemeta
- rand
- readline
- readlink
- readpipe
- redo
- ref
- require
- reset
- reverse
- rmdir
- say
- select
- shift
- sin
- sleep
- sqrt
- srand
- stat
- study
- tell
- uc
- ucfirst
- umask
- undef
- unlink
- warn
+ abs alarm caller chdir chomp chop
+ chr chroot close cos defined die
+ eof eval evalbytes exit exp fc
+ getc glob gmtime hex int last
+ lc lcfirst length localtime log lstat
+ mkdir next oct ord pop pos
+ print printf prototype quotemeta rand readline
+ readlink readpipe redo ref require reset
+ reverse rmdir say select shift sin
+ sleep sqrt srand stat study tell
+ uc ucfirst umask undef unlink warn
write
);
@is_keyword_taking_optional_arg{@keywords_taking_optional_arg} =
# __DATA__ __END__
@is_keyword{@Keywords} = (1) x scalar(@Keywords);
+
+ %matching_end_token = (
+ '{' => '}',
+ '(' => ')',
+ '[' => ']',
+ '<' => '>',
+ );
} ## end BEGIN
+
+} ## end package Perl::Tidy::Tokenizer
1;
use strict;
use warnings;
use Carp;
+
+{ #<<< A non-indenting brace to contain all lexical variables
+
+our $VERSION = '20250105';
use English qw( -no_match_vars );
-our $VERSION = '20230309';
+use Scalar::Util 'refaddr'; # perl 5.8.1 and later
use Perl::Tidy::VerticalAligner::Alignment;
use Perl::Tidy::VerticalAligner::Line;
# sub _flush_comment_lines
# CODE SECTION 5: Code to process groups of code lines
# sub _flush_group_lines
-# CODE SECTION 6: Output Step A
+# CODE SECTION 6: Pad Signed Number Columns
+# sub pad_signed_number_columns
+# CODE SECTION 7: Pad Wide Equals Columns
+# sub pad_wide_equals_columns
+# CODE SECTION 8: Output Step A
# sub valign_output_step_A
-# CODE SECTION 7: Output Step B
+# CODE SECTION 9: Output Step B
# sub valign_output_step_B
-# CODE SECTION 8: Output Step C
+# CODE SECTION 10: Output Step C
# sub valign_output_step_C
-# CODE SECTION 9: Output Step D
+# CODE SECTION 11: Output Step D
# sub valign_output_step_D
-# CODE SECTION 10: Summary
+# CODE SECTION 12: Summary
# sub report_anything_unusual
##################################################################
return if ( $AUTOLOAD =~ /\bDESTROY$/ );
my ( $pkg, $fname, $lno ) = caller();
my $my_package = __PACKAGE__;
- print STDERR <<EOM;
+ print {*STDERR} <<EOM;
======================================================================
Error detected in package '$my_package', version $VERSION
Received unexpected AUTOLOAD call for sub '$AUTOLOAD'
-Called from package: '$pkg'
+Called from package: '$pkg'
Called from File '$fname' at line '$lno'
This error is probably due to a recent programming change
======================================================================
# except if there has been a bug introduced by a recent program change.
# Please add comments at calls to Fault to explain why the call
# should not occur, and where to look to fix it.
- my ( $package0, $filename0, $line0, $subroutine0 ) = caller(0);
- my ( $package1, $filename1, $line1, $subroutine1 ) = caller(1);
- my ( $package2, $filename2, $line2, $subroutine2 ) = caller(2);
+ my ( $package0_uu, $filename0_uu, $line0, $subroutine0_uu ) = caller(0);
+ my ( $package1_uu, $filename1, $line1, $subroutine1 ) = caller(1);
+ my ( $package2_uu, $filename2_uu, $line2_uu, $subroutine2 ) = caller(2);
my $pkg = __PACKAGE__;
my $input_stream_name = get_input_stream_name();
$pkg reports VERSION='$VERSION'.
==============================================================================
EOM
-
- # We shouldn't get here, but this return is to keep Perl-Critic from
- # complaining.
- return;
+ croak "unexpected return from sub Die";
} ## end sub Fault
my %valid_LINE_keys;
_file_writer_object_ => $i++,
_logger_object_ => $i++,
_diagnostics_object_ => $i++,
- _length_function_ => $i++,
-
- _rOpts_ => $i++,
- _rOpts_indent_columns_ => $i++,
- _rOpts_tabs_ => $i++,
- _rOpts_entab_leading_whitespace_ => $i++,
- _rOpts_fixed_position_side_comment_ => $i++,
- _rOpts_minimum_space_to_comment_ => $i++,
- _rOpts_valign_code_ => $i++,
- _rOpts_valign_block_comments_ => $i++,
- _rOpts_valign_side_comments_ => $i++,
+
+ _rOpts_ => $i++,
_last_level_written_ => $i++,
_last_side_comment_column_ => $i++,
use constant DEBUG_TABS => 0;
my $debug_warning = sub {
- print STDOUT "VALIGN_DEBUGGING with key $_[0]\n";
+ my $msg = shift;
+ print {*STDOUT} "VALIGN_DEBUGGING with key $msg\n";
return;
};
%valign_control_hash,
$valign_control_default,
+ $rOpts_indent_columns,
+ $rOpts_tabs,
+ $rOpts_entab_leading_whitespace,
+ $rOpts_fixed_position_side_comment,
+ $rOpts_maximum_line_length,
+ $rOpts_minimum_space_to_comment,
+ $rOpts_valign_code,
+ $rOpts_valign_block_comments,
+ $rOpts_valign_side_comments,
+ $rOpts_valign_signed_numbers,
+ $rOpts_valign_signed_numbers_limit,
+ $rOpts_valign_wide_equals,
+
+ $require_tabs,
+
);
sub check_options {
}
}
+ # Initialize some global options
+ $rOpts_indent_columns = $rOpts->{'indent-columns'};
+ $rOpts_tabs = $rOpts->{'tabs'};
+ $rOpts_entab_leading_whitespace = $rOpts->{'entab-leading-whitespace'};
+ $require_tabs = ( $rOpts_tabs || $rOpts_entab_leading_whitespace )
+ && $rOpts_indent_columns > 0;
+
+ $rOpts_fixed_position_side_comment =
+ $rOpts->{'fixed-position-side-comment'};
+
+ $rOpts_maximum_line_length = $rOpts->{'maximum-line-length'};
+ $rOpts_minimum_space_to_comment = $rOpts->{'minimum-space-to-comment'};
+ $rOpts_valign_code = $rOpts->{'valign-code'};
+ $rOpts_valign_block_comments = $rOpts->{'valign-block-comments'};
+ $rOpts_valign_side_comments = $rOpts->{'valign-side-comments'};
+ $rOpts_valign_signed_numbers = $rOpts->{'valign-signed-numbers'};
+ $rOpts_valign_signed_numbers_limit =
+ $rOpts->{'valign-signed-numbers-limit'};
+ $rOpts_valign_wide_equals = $rOpts->{'valign-wide-equals'};
+
return;
} ## end sub check_options
sub new {
- my ( $class, @args ) = @_;
+ my ( $class, @arglist ) = @_;
+ if ( @arglist % 2 ) { croak "Odd number of items in arg hash list\n" }
my %defaults = (
rOpts => undef,
file_writer_object => undef,
logger_object => undef,
diagnostics_object => undef,
- length_function => sub { return length( $_[0] ) },
);
- my %args = ( %defaults, @args );
+ my %args = ( %defaults, @arglist );
# Initialize other caches and buffers
initialize_step_B_cache();
initialize_valign_buffer();
- initialize_leading_string_cache();
initialize_decode();
set_logger_object( $args{logger_object} );
$self->[_file_writer_object_] = $args{file_writer_object};
$self->[_logger_object_] = $args{logger_object};
$self->[_diagnostics_object_] = $args{diagnostics_object};
- $self->[_length_function_] = $args{length_function};
- # shortcuts to user options
+ # shortcut to user options
my $rOpts = $args{rOpts};
-
- $self->[_rOpts_] = $rOpts;
- $self->[_rOpts_indent_columns_] = $rOpts->{'indent-columns'};
- $self->[_rOpts_tabs_] = $rOpts->{'tabs'};
- $self->[_rOpts_entab_leading_whitespace_] =
- $rOpts->{'entab-leading-whitespace'};
- $self->[_rOpts_fixed_position_side_comment_] =
- $rOpts->{'fixed-position-side-comment'};
- $self->[_rOpts_minimum_space_to_comment_] =
- $rOpts->{'minimum-space-to-comment'};
- $self->[_rOpts_valign_code_] = $rOpts->{'valign-code'};
- $self->[_rOpts_valign_block_comments_] = $rOpts->{'valign-block-comments'};
- $self->[_rOpts_valign_side_comments_] = $rOpts->{'valign-side-comments'};
+ $self->[_rOpts_] = $rOpts;
# Batch of lines being collected
$self->[_rgroup_lines_] = [];
# push things out the pipeline...
# push out any current group lines
- $self->_flush_group_lines();
+ $self->_flush_group_lines()
+ if ( @{ $self->[_rgroup_lines_] } );
# then anything left in the cache of step_B
$self->_flush_step_B_cache();
sub initialize_for_new_group {
my ($self) = @_;
+ # initialize for a new group of lines to be aligned vertically
+
$self->[_rgroup_lines_] = [];
$self->[_group_type_] = EMPTY_STRING;
$self->[_zero_count_] = 0;
} ## end sub initialize_for_new_group
sub group_line_count {
- return +@{ $_[0]->[_rgroup_lines_] };
+ my $self = shift;
+ return +@{ $self->[_rgroup_lines_] };
}
# interface to Perl::Tidy::Diagnostics routines
return;
}
- sub get_logger_object {
- return $logger_object;
- }
-
sub get_input_stream_name {
my $input_stream_name = EMPTY_STRING;
if ($logger_object) {
use constant DEBUG_VALIGN => 0;
use constant SC_LONG_LINE_DIFF => 12;
+my %is_opening_token;
my %is_closing_token;
+my %is_digit_char;
+my %is_plus_or_minus;
+my %is_if_or;
+my %is_assignment;
+my %is_comma_token;
+my %is_good_marginal_alignment;
BEGIN {
- my @q = qw< } ) ] >;
+
+ my @q = qw< { ( [ >;
+ @is_opening_token{@q} = (1) x scalar(@q);
+
+ @q = qw< } ) ] >;
@is_closing_token{@q} = (1) x scalar(@q);
+
+ @q = qw( 0 1 2 3 4 5 6 7 8 9 );
+ @is_digit_char{@q} = (1) x scalar(@q);
+
+ @q = qw( + - );
+ @is_plus_or_minus{@q} = (1) x scalar(@q);
+
+ @q = qw( if unless or || );
+ @is_if_or{@q} = (1) x scalar(@q);
+
+ @q = qw( = **= += *= &= <<= &&= -= /= |= >>= ||= //= .= %= ^= x= );
+ @is_assignment{@q} = (1) x scalar(@q);
+
+ @q = qw( => );
+ push @q, ',';
+ @is_comma_token{@q} = (1) x scalar(@q);
+
+ # We can be less restrictive in marginal cases at certain "good" alignments
+ @q = qw( { ? => = );
+ push @q, (',');
+ @is_good_marginal_alignment{@q} = (1) x scalar(@q);
+
}
#--------------------------------------------
# we receive one line of specially marked text for vertical alignment.
# We compare the line with the current group, and either:
# - the line joins the current group if alignments match, or
- # - the current group is flushed and a new group is started otherwise
+ # - the current group is flushed and a new group is started
#---------------------------------------------------------------------
#
# The key input parameters describing each line are:
DEBUG_VALIGN && do {
my $nlines = $self->group_line_count();
- print STDOUT
+ print {*STDOUT}
"Entering valign_input: lines=$nlines new #fields= $jmax, leading_count=$leading_space_count, level=$level, group_level=$group_level, level_end=$level_end\n";
};
my $line_count = $self->group_line_count();
my $min_lines = $rvertical_tightness_flags->{_vt_min_lines};
my $max_lines = $rvertical_tightness_flags->{_vt_max_lines};
- $min_lines = 0 unless ($min_lines);
- $max_lines = 1 unless ($max_lines);
+ $min_lines = 0 if ( !$min_lines );
+ $max_lines = 1 if ( !$max_lines );
if ( ( $line_count >= $min_lines )
&& ( $line_count <= $max_lines ) )
{
|| ( $group_maximum_line_length
&& $maximum_line_length != $group_maximum_line_length )
|| $is_outdented
- || ( $is_block_comment && !$self->[_rOpts_valign_block_comments_] )
+ || ( $is_block_comment && !$rOpts_valign_block_comments )
|| ( !$is_block_comment
- && !$self->[_rOpts_valign_side_comments_]
- && !$self->[_rOpts_valign_code_] )
+ && !$rOpts_valign_side_comments
+ && !$rOpts_valign_code )
)
{
- $self->_flush_group_lines( $level - $group_level );
+ $self->_flush_group_lines( $level - $group_level )
+ if ( @{ $self->[_rgroup_lines_] } );
$group_level = $level;
$self->[_group_level_] = $group_level;
return;
}
else {
- $self->_flush_group_lines();
+ $self->_flush_group_lines()
+ if ( @{ $self->[_rgroup_lines_] } );
}
}
my $j_terminal_match;
if ( $is_terminal_ternary && @{$rgroup_lines} ) {
- $j_terminal_match =
- fix_terminal_ternary( $rgroup_lines->[-1], $rfields, $rtokens,
- $rpatterns, $rfield_lengths, $group_level, );
+ $j_terminal_match = fix_terminal_ternary(
+ {
+ old_line => $rgroup_lines->[-1],
+ rfields => $rfields,
+ rtokens => $rtokens,
+ rpatterns => $rpatterns,
+ rfield_lengths => $rfield_lengths,
+ group_level => $group_level,
+ }
+ );
$jmax = @{$rfields} - 1;
}
&& @{$rgroup_lines}
&& $is_balanced_line )
{
-
- $j_terminal_match =
- fix_terminal_else( $rgroup_lines->[-1], $rfields, $rtokens,
- $rpatterns, $rfield_lengths );
+ $j_terminal_match = fix_terminal_else(
+ {
+ old_line => $rgroup_lines->[-1],
+ rfields => $rfields,
+ rtokens => $rtokens,
+ rpatterns => $rpatterns,
+ rfield_lengths => $rfield_lengths,
+ }
+ );
$jmax = @{$rfields} - 1;
}
if ( $jmax <= 0 ) {
$self->[_zero_count_]++;
- if ( @{$rgroup_lines}
+ # VSN PATCH for a single number, part 1.
+ my $is_numeric =
+ $rOpts_valign_signed_numbers && $rpatterns->[0] eq 'n,';
+
+ if ( !$is_numeric
+ && @{$rgroup_lines}
&& !get_recoverable_spaces( $rgroup_lines->[0]->{'indentation'} ) )
{
if ( $rgroup_lines->[0]->{'jmax'} > 1
|| $self->[_zero_count_] > 3 )
{
- $self->_flush_group_lines();
+ $self->_flush_group_lines()
+ if ( @{ $self->[_rgroup_lines_] } );
# Update '$rgroup_lines' - it will become a ref to empty array.
# This allows avoiding a call to get_group_line_count below.
}
# just write this line directly if no current group, no side comment,
- # and no space recovery is needed.
+ # and no space recovery is needed,
+ # and not numeric - VSN PATCH for a single number, part 4.
if ( !@{$rgroup_lines}
+ && !$is_numeric
&& !get_recoverable_spaces($indentation) )
{
# output this group if it ends in a terminal else or ternary line
if ( defined($j_terminal_match) ) {
- $self->_flush_group_lines();
+ $self->_flush_group_lines()
+ if ( @{ $self->[_rgroup_lines_] } );
}
# Force break after jump to lower level
elsif ($level_end < $level
|| $is_closing_token{ substr( $rfields->[0], 0, 1 ) } )
{
- $self->_flush_group_lines(-1);
+ $self->_flush_group_lines(-1)
+ if ( @{ $self->[_rgroup_lines_] } );
+ }
+
+ else {
+ ##ok: no output needed
}
# --------------------------------------------------------------------
# Some old debugging stuff
# --------------------------------------------------------------------
DEBUG_VALIGN && do {
- print STDOUT "exiting valign_input fields:";
+ print {*STDOUT} "exiting valign_input fields:";
dump_array( @{$rfields} );
- print STDOUT "exiting valign_input tokens:";
+ print {*STDOUT} "exiting valign_input tokens:";
dump_array( @{$rtokens} );
- print STDOUT "exiting valign_input patterns:";
+ print {*STDOUT} "exiting valign_input patterns:";
dump_array( @{$rpatterns} );
};
sub join_hanging_comment {
+ my ( $new_line, $old_line ) = @_;
+
# Add dummy fields to a hanging side comment to make it look
# like the first line in its potential group. This simplifies
# the coding.
- my ( $new_line, $old_line ) = @_;
+
+ # Given:
+ # $new_line = ref to hash of the line to be possibly changed
+ # $old_line = ref to hash of the previous reference line
+ # Return:
+ # true if new line modified
+ # false otherwise
my $jmax = $new_line->{'jmax'};
# must be 2 fields
- return 0 unless $jmax == 1;
+ return 0 unless ( $jmax == 1 );
my $rtokens = $new_line->{'rtokens'};
# the second field must be a comment
- return 0 unless $rtokens->[0] eq '#';
+ return 0 unless ( $rtokens->[0] eq '#' );
my $rfields = $new_line->{'rfields'};
# the first field must be empty
- return 0 unless $rfields->[0] =~ /^\s*$/;
+ return 0 if ( $rfields->[0] !~ /^\s*$/ );
# the current line must have fewer fields
my $maximum_field_index = $old_line->{'jmax'};
return 0
- unless $maximum_field_index > $jmax;
+ if ( $maximum_field_index <= $jmax );
# looks ok..
my $rpatterns = $new_line->{'rpatterns'};
return 1;
} ## end sub join_hanging_comment
-{ ## closure for sub decide_if_list
-
- my %is_comma_token;
-
- BEGIN {
-
- my @q = qw( => );
- push @q, ',';
- @is_comma_token{@q} = (1) x scalar(@q);
- } ## end BEGIN
-
- sub decide_if_list {
-
- my $line = shift;
+sub decide_if_list {
- # A list will be taken to be a line with a forced break in which all
- # of the field separators are commas or comma-arrows (except for the
- # trailing #)
+ my $line = shift;
- my $rtokens = $line->{'rtokens'};
- my $test_token = $rtokens->[0];
- my ( $raw_tok, $lev, $tag, $tok_count ) =
- decode_alignment_token($test_token);
- if ( $is_comma_token{$raw_tok} ) {
- my $list_type = $test_token;
- my $jmax = $line->{'jmax'};
-
- foreach ( 1 .. $jmax - 2 ) {
- ( $raw_tok, $lev, $tag, $tok_count ) =
- decode_alignment_token( $rtokens->[$_] );
- if ( !$is_comma_token{$raw_tok} ) {
- $list_type = EMPTY_STRING;
- last;
- }
+ # Given:
+ # $line = ref to hash of values for a line
+ # Task:
+ # Set 'list_type' property
+
+ # A list will be taken to be a line with a forced break in which all
+ # of the field separators are commas or comma-arrows (except for the
+ # trailing #)
+
+ my $rtokens = $line->{'rtokens'};
+ my $test_token = $rtokens->[0];
+ my ( $raw_tok, $lev, $tag, $tok_count ) =
+ decode_alignment_token($test_token);
+ if ( $is_comma_token{$raw_tok} ) {
+ my $list_type = $test_token;
+ my $jmax = $line->{'jmax'};
+
+ foreach ( 1 .. $jmax - 2 ) {
+ ( $raw_tok, $lev, $tag, $tok_count ) =
+ decode_alignment_token( $rtokens->[$_] );
+ if ( !$is_comma_token{$raw_tok} ) {
+ $list_type = EMPTY_STRING;
+ last;
}
- $line->{'list_type'} = $list_type;
}
- return;
- } ## end sub decide_if_list
-}
+ $line->{'list_type'} = $list_type;
+ }
+ return;
+} ## end sub decide_if_list
sub fix_terminal_ternary {
#
# returns the index of the terminal question token, if any
- my ( $old_line, $rfields, $rtokens, $rpatterns, $rfield_lengths,
- $group_level )
- = @_;
+ my ($rcall_hash) = @_;
- return unless ($old_line);
+ my $old_line = $rcall_hash->{old_line};
+ my $rfields = $rcall_hash->{rfields};
+ my $rtokens = $rcall_hash->{rtokens};
+ my $rpatterns = $rcall_hash->{rpatterns};
+ my $rfield_lengths = $rcall_hash->{rfield_lengths};
+ my $group_level = $rcall_hash->{group_level};
+
+ return if ( !$old_line );
use constant EXPLAIN_TERNARY => 0;
if (%valign_control_hash) {
my $align_ok = $valign_control_hash{'?'};
$align_ok = $valign_control_default unless defined($align_ok);
- return unless ($align_ok);
+ return if ( !$align_ok );
}
my $jmax = @{$rfields} - 1;
my $pad_length = 0;
foreach my $j ( 0 .. $maximum_field_index - 1 ) {
my $tok = $rtokens_old->[$j];
- my ( $raw_tok, $lev, $tag, $tok_count ) = decode_alignment_token($tok);
+ my ( $raw_tok, $lev, $tag_uu, $tok_count_uu ) =
+ decode_alignment_token($tok);
if ( $raw_tok eq '?' ) {
$depth_question = $lev;
# depth must be correct
- next unless ( $depth_question eq $group_level );
+ next if ( $depth_question ne $group_level );
$jquestion = $j;
if ( $rfields_old->[ $j + 1 ] =~ /^(\?\s*)/ ) {
last;
}
}
- return unless ( defined($jquestion) ); # shouldn't happen
+ return if ( !defined($jquestion) ); # shouldn't happen
# Now splice the tokens and patterns of the previous line
# into the else line to insure a match. Add empty fields
EXPLAIN_TERNARY && do {
local $LIST_SEPARATOR = '><';
- print STDOUT "CURRENT FIELDS=<@{$rfields_old}>\n";
- print STDOUT "CURRENT TOKENS=<@{$rtokens_old}>\n";
- print STDOUT "CURRENT PATTERNS=<@{$rpatterns_old}>\n";
- print STDOUT "UNMODIFIED FIELDS=<@{$rfields}>\n";
- print STDOUT "UNMODIFIED TOKENS=<@{$rtokens}>\n";
- print STDOUT "UNMODIFIED PATTERNS=<@{$rpatterns}>\n";
+ print {*STDOUT} "CURRENT FIELDS=<@{$rfields_old}>\n";
+ print {*STDOUT} "CURRENT TOKENS=<@{$rtokens_old}>\n";
+ print {*STDOUT} "CURRENT PATTERNS=<@{$rpatterns_old}>\n";
+ print {*STDOUT} "UNMODIFIED FIELDS=<@{$rfields}>\n";
+ print {*STDOUT} "UNMODIFIED TOKENS=<@{$rtokens}>\n";
+ print {*STDOUT} "UNMODIFIED PATTERNS=<@{$rpatterns}>\n";
};
# handle cases of leading colon on this line
# Note that this padding will remain even if the terminal value goes
# out on a separate line. This does not seem to look to bad, so no
# mechanism has been included to undo it.
- my $field1 = shift @fields;
+ my $field1_uu = shift @fields;
my $field_length1 = shift @field_lengths;
my $len_colon = length($colon);
unshift @fields, ( $colon, $pad . $therest );
( $len_colon, $pad_length + $field_length1 - $len_colon );
# change the leading pattern from : to ?
- return unless ( $patterns[0] =~ s/^\:/?/ );
+ return if ( $patterns[0] !~ s/^\:/?/ );
# install leading tokens and patterns of existing line
unshift( @tokens, @{$rtokens_old}[ 0 .. $jquestion ] );
# : ( $mname = $name . '->' );
else {
- return unless ( $jmax > 0 && $tokens[0] ne '#' ); # shouldn't happen
+ return if ( $jmax <= 0 || $tokens[0] eq '#' ); # shouldn't happen
# prepend a leading ? onto the second pattern
$patterns[1] = "?b" . $patterns[1];
EXPLAIN_TERNARY && do {
local $LIST_SEPARATOR = '><';
- print STDOUT "MODIFIED TOKENS=<@tokens>\n";
- print STDOUT "MODIFIED PATTERNS=<@patterns>\n";
- print STDOUT "MODIFIED FIELDS=<@fields>\n";
+ print {*STDOUT} "MODIFIED TOKENS=<@tokens>\n";
+ print {*STDOUT} "MODIFIED PATTERNS=<@patterns>\n";
+ print {*STDOUT} "MODIFIED FIELDS=<@fields>\n";
};
# all ok .. update the arrays
#
# returns a positive value if the else block should be indented
#
- my ( $old_line, $rfields, $rtokens, $rpatterns, $rfield_lengths ) = @_;
+ my ($rcall_hash) = @_;
+
+ my $old_line = $rcall_hash->{old_line};
+ my $rfields = $rcall_hash->{rfields};
+ my $rtokens = $rcall_hash->{rtokens};
+ my $rpatterns = $rcall_hash->{rpatterns};
+ my $rfield_lengths = $rcall_hash->{rfield_lengths};
- return unless ($old_line);
+ return if ( !$old_line );
my $jmax = @{$rfields} - 1;
- return unless ( $jmax > 0 );
+ return if ( $jmax <= 0 );
if (%valign_control_hash) {
my $align_ok = $valign_control_hash{'{'};
$align_ok = $valign_control_default unless defined($align_ok);
- return unless ($align_ok);
+ return if ( !$align_ok );
}
# check for balanced else block following if/elsif/unless
my $rfields_old = $old_line->{'rfields'};
# TBD: add handling for 'case'
- return unless ( $rfields_old->[0] =~ /^(if|elsif|unless)\s*$/ );
+ return if ( $rfields_old->[0] !~ /^(?:if|elsif|unless)\s*$/ );
# look for the opening brace after the else, and extract the depth
my $tok_brace = $rtokens->[0];
my $jparen = 0;
my $tok_paren = '(' . $depth_brace;
my $tok_test = $rtokens_old->[$jparen];
- return unless ( $tok_test eq $tok_paren ); # shouldn't happen
+ return if ( $tok_test ne $tok_paren ); # shouldn't happen
# Now find the opening block brace
my ($jbrace);
last;
}
}
- return unless ( defined($jbrace) ); # shouldn't happen
+ return if ( !defined($jbrace) ); # shouldn't happen
# Now splice the tokens and patterns of the previous line
# into the else line to insure a match. Add empty fields
# See if the current line matches the current vertical alignment group.
- my ( $self, $new_line, $base_line, $prev_line ) = @_;
+ my ( $self, $new_line, $base_line, $prev_line, $group_line_count ) = @_;
# Given:
# $new_line = the line being considered for group inclusion
# $base_line = the first line of the current group
# $prev_line = the line just before $new_line
+ # $group_line_count = number of lines in the current group
- # returns a flag and a value as follows:
+ # Returns: a flag and a value as follows:
# return (0, $imax_align) if the line does not match
# return (1, $imax_align) if the line matches but does not fit
# return (2, $imax_align) if the line matches and fits
use constant MATCH_NO_FIT => 1;
use constant MATCH_AND_FIT => 2;
+ # Return value '$return_value' describes the match with 3 possible values
my $return_value;
- # Returns '$imax_align' which is the index of the maximum matching token.
+ # Return value '$imax_align' is the index of the maximum matching token.
# It will be used in the subsequent left-to-right sweep to align as many
# tokens as possible for lines which partially match.
my $imax_align = -1;
# calculated and stored by sub 'match_line_pair'.
$imax_align = $prev_line->{'imax_pair'};
- if ( $imax_align != $jlimit ) {
+ # Only the following ci sequences are accepted (issue c225):
+ # 0 0 0 ... OK
+ # 0 1 1 ... OK but marginal*
+ # 1 1 1 ... OK
+ # This check is rarely activated, but for example we want
+ # to avoid something like this 'tail wag dog' situation:
+ # $tag =~ s/\b([a-z]+)/\L\u$1/gio;
+ # $tag =~ s/\b([b-df-hj-np-tv-z]+)\b/\U$1/gio
+ # if $tag =~ /-/;
+ # *Note: we could set a flag for the 0 1 marginal case and
+ # use it to prevent alignment of selected token types.
+ my $ci_prev = $prev_line->{'ci_level'};
+ my $ci_new = $new_line->{'ci_level'};
+ if ( $ci_prev != $ci_new
+ && $imax_align >= 0
+ && ( $ci_new == 0 || $group_line_count > 1 ) )
+ {
+ $imax_align = -1;
+ $GoToMsg =
+"Rejected ci: ci_prev=$ci_prev ci_new=$ci_new num=$group_line_count\n";
+ $return_value = NO_MATCH;
+ }
+ elsif ( $imax_align != $jlimit ) {
$GoToMsg = "Not all tokens match: $imax_align != $jlimit\n";
$return_value = NO_MATCH;
}
+ else {
+ ##ok: continue
+ }
}
}
# The new line has alignments identical to the current group. Now we have
# to fit the new line into the group without causing a field to exceed the
# line length limit.
- # return true if successful
- # return false if not successful
+
+ # Given:
+ # $new_line = ref to hash of the new line values
+ # $old_line = ref to hash of the previous line values
+ # Returns:
+ # true if the new line alignments fit the old line
+ # false otherwise
my $jmax = $new_line->{'jmax'};
my $leading_space_count = $new_line->{'leading_space_count'};
if ( $jmax_old ne $jmax ) {
warning(<<EOM);
-Program bug detected in Perl::Tidy::VerticalAligner sub check_fit
+Program bug detected in Perl::Tidy::VerticalAligner sub check_fit
unexpected difference in array lengths: $jmax != $jmax_old
EOM
return;
my ($new_line) = @_;
+ # Given:
+ # $new_line = ref to hash of a line starting a new group
+ # Task:
+ # setup alignment fields for this line
+
my $jmax = $new_line->{'jmax'};
my $rfield_lengths = $new_line->{'rfield_lengths'};
my $col = $new_line->{'leading_space_count'};
# debug routine to dump array contents
local $LIST_SEPARATOR = ')(';
- print STDOUT "(@_)\n";
+ print {*STDOUT} "(@_)\n";
return;
} ## end sub dump_array
sub level_change {
+ my ( $self, $leading_space_count, $diff, $level ) = @_;
+
# compute decrease in level when we remove $diff spaces from the
# leading spaces
- my ( $self, $leading_space_count, $diff, $level ) = @_;
- my $rOpts_indent_columns = $self->[_rOpts_indent_columns_];
+ # Given:
+ # $leading_space_count = current leading line spaces
+ # $diff = number of spaces to remove
+ # $level = current indentation level
+ # Return:
+ # $level = updated level accounting for the loss of spaces
+
if ($rOpts_indent_columns) {
my $olev =
int( ( $leading_space_count + $diff ) / $rOpts_indent_columns );
my ($self) = @_;
my $rgroup_lines = $self->[_rgroup_lines_];
- return unless ( @{$rgroup_lines} );
+ return if ( !@{$rgroup_lines} );
my $group_level = $self->[_group_level_];
my $group_maximum_line_length = $self->[_group_maximum_line_length_];
my $leading_space_count = $self->[_comment_leading_space_count_];
-## my $leading_string =
-## $self->get_leading_string( $leading_space_count, $group_level );
# look for excessively long lines
my $max_excess = 0;
foreach my $item ( @{$rgroup_lines} ) {
- my ( $str, $str_len ) = @{$item};
+ my ( $str_uu, $str_len ) = @{$item};
my $excess =
$str_len + $leading_space_count - $group_maximum_line_length;
if ( $excess > $max_excess ) {
$self->[_last_outdented_line_at_] =
$last_outdented_line_at + $nlines - 1;
my $outdented_line_count = $self->[_outdented_line_count_];
- unless ($outdented_line_count) {
+ if ( !$outdented_line_count ) {
$self->[_first_outdented_line_at_] = $last_outdented_line_at;
}
$outdented_line_count += $nlines;
# This is the vertical aligner internal flush, which leaves the cache
# intact
- my ( $self, $level_jump ) = @_;
+ my ( $self, ($level_jump) ) = @_;
# $level_jump = $next_level-$group_level, if known
# = undef if not known
# Note: only the sign of the jump is needed
my $rgroup_lines = $self->[_rgroup_lines_];
- return unless ( @{$rgroup_lines} );
+ return if ( !@{$rgroup_lines} );
my $group_type = $self->[_group_type_];
my $group_level = $self->[_group_level_];
0 && do {
my ( $a, $b, $c ) = caller();
my $nlines = @{$rgroup_lines};
- print STDOUT
+ print {*STDOUT}
"APPEND0: _flush_group_lines called from $a $b $c lines=$nlines, type=$group_type \n";
};
#------------------------------------------------------------------------
# STEP 1: Remove most unmatched tokens. They block good alignments.
- my ( $max_lev_diff, $saw_side_comment ) =
+ my ( $max_lev_diff_uu, $saw_side_comment, $saw_signed_number ) =
delete_unmatched_tokens( $rgroup_lines, $group_level );
# STEP 2: Sweep top to bottom, forming subgroups of lines with exactly
? get_extra_leading_spaces( $rgroup_lines, $rgroups )
: 0;
- # STEP 6: Output the lines.
+ # STEP 6: add sign padding to columns numbers if needed
+ pad_signed_number_columns($rgroup_lines)
+ if ( $saw_signed_number && $rOpts_valign_signed_numbers );
+
+ # STEP 7: pad wide equals
+ pad_wide_equals_columns($rgroup_lines)
+ if ($rOpts_valign_wide_equals);
+
+ # STEP 8: Output the lines.
# All lines in this group have the same leading spacing and maximum line
# length
my $group_leader_length = $rgroup_lines->[0]->{'leading_space_count'};
sub get_rgroup_jrange {
- return unless @{$rgroups};
- return unless ( $group_line_count > 0 );
+ return if ( !@{$rgroups} );
+ return if ( $group_line_count <= 0 );
my ( $jbeg, $jend ) = @{ $rgroups->[-1] };
return ( $jbeg, $jend );
} ## end sub get_rgroup_jrange
sub end_rgroup {
my ($imax_align) = @_;
- return unless @{$rgroups};
- return unless ( $group_line_count > 0 );
+ return if ( !@{$rgroups} );
+ return if ( $group_line_count <= 0 );
my ( $jbeg, $jend ) = @{ pop @{$rgroups} };
push @{$rgroups}, [ $jbeg, $jend, $imax_align ];
# emergency reset to prevent sweep_left_to_right from trying to match a
# failed terminal else match
- return unless @{$rgroups} > 1;
+ return if ( @{$rgroups} <= 1 );
$rgroups->[-2]->[2] = -1;
return;
} ## end sub block_penultimate_match
sub sweep_top_down {
my ( $self, $rlines, $group_level ) = @_;
+ # This is the first of two major sweeps to find alignments.
+ # The other is sweep_left_to_right.
+
+ # Given:
+ # $rlines = ref to hash of lines in this main alignment group
+ # $group_level = common indentation level of these lines
+ # Return:
+ # $rgroups = ref to hash of subgroups created
+
# Partition the set of lines into final alignment subgroups
# and store the alignments with the lines.
}
my $j_terminal_match = $new_line->{'j_terminal_match'};
- my ( $jbeg, $jend ) = get_rgroup_jrange();
+ my ( $jbeg, $jend_uu ) = get_rgroup_jrange();
if ( !defined($jbeg) ) {
# safety check, shouldn't happen
warning(<<EOM);
-Program bug detected in Perl::Tidy::VerticalAligner sub sweep_top_down
+Program bug detected in Perl::Tidy::VerticalAligner sub sweep_top_down
undefined index for group line count $group_line_count
EOM
$jbeg = $jline;
# There are no matching tokens, so now check side comments.
# Programming note: accessing arrays with index -1 is
# risky in Perl, but we have verified there is at least one
- # line in the group and that there is at least one field.
+ # line in the group and that there is at least one field,
my $prev_comment =
$rall_lines->[ $jline - 1 ]->{'rfields'}->[-1];
my $side_comment = $new_line->{'rfields'}->[-1];
- end_rgroup(-1) unless ( $side_comment && $prev_comment );
+
+ # do not end group if both lines have side comments
+ if ( !$side_comment || !$prev_comment ) {
+
+ # Otherwise - VSN PATCH for a single number:
+ # - do not end group if numeric and no side comment, or
+ # - end if !numeric or side comment
+ my $pat = $new_line->{'rpatterns'}->[0];
+ my $is_numeric = $rOpts_valign_signed_numbers
+ && ( $pat eq 'n,'
+ || $pat eq 'n,b' );
+ end_rgroup(-1) if ( !$is_numeric || $side_comment );
+ }
+ }
+ else {
+ ##ok: continue
}
# See if the new line matches and fits the current group,
if ($group_line_count) {
( $match_code, my $imax_align ) =
$self->check_match( $new_line, $base_line,
- $rall_lines->[ $jline - 1 ] );
+ $rall_lines->[ $jline - 1 ],
+ $group_line_count );
if ( $match_code != 2 ) { end_rgroup($imax_align) }
}
elsif ( $new_line->{'end_group'} ) {
end_rgroup(-1);
}
+
+ else {
+ ##ok: continue
+ }
} ## end loop over lines
end_rgroup(-1);
my ( $line_m, $line, $imax_min ) = @_;
+ # Decide if two adjacent, isolated lines should be aligned
+
# Given:
- # two isolated (list) lines
+ # $line_m, $line = two isolated (list) lines
# imax_min = number of common alignment tokens
# Return:
# $pad_max = maximum suggested pad distance
# = 0 if alignment not recommended
- # Note that this is only for two lines which do not have alignment tokens
- # in common with any other lines. It is intended for lists, but it might
- # also be used for two non-list lines with a common leading '='.
# Allow alignment if the difference in the two unpadded line lengths
# is not more than either line length. The idea is to avoid
# 'VARCHAR', DBI::SQL_VARCHAR, undef, "'", "'", undef, 0, 1,
# 1, 0, 0, 0, undef, 0, 0
# ];
+
+ # Note that this is only for two lines which do not have alignment tokens
+ # in common with any other lines. It is intended for lists, but it might
+ # also be used for two non-list lines with a common leading '='.
+
my $rfield_lengths = $line->{'rfield_lengths'};
my $rfield_lengths_m = $line_m->{'rfield_lengths'};
# Safety check - shouldn't happen
return 0
- unless $imax_min < @{$rfield_lengths} && $imax_min < @{$rfield_lengths_m};
+ if ( $imax_min >= @{$rfield_lengths}
+ || $imax_min >= @{$rfield_lengths_m} );
my $lensum_m = 0;
my $lensum = 0;
foreach my $i ( 0 .. $imax_min ) {
my $pat = $rpatterns->[$i];
my $pat_m = $rpatterns_m->[$i];
- if ( $pat ne $pat_m ) { $patterns_match = 0; last }
+
+ # VSN PATCH: allow numbers to match quotes
+ if ( $pat_m ne $pat && length($pat_m) eq length($pat) ) {
+ $pat =~ tr/n/Q/;
+ $pat_m =~ tr/n/Q/;
+ }
+
+ if ( $pat ne $pat_m ) { $patterns_match = 0; last; }
}
}
my ( $rlines, $rgroups, $group_level ) = @_;
+ # This is the second of two major sweeps to find alignments.
+ # The other is sweep_top_down.
+
+ # Given:
+ # $rlines = ref to hash of lines in this main alignment group
+ # $rgroups = ref to hash of subgroups
+ # $group_level = common indentation level of these lines
+ # Task:
+ # add leading alignments where possible
+
# So far we have divided the lines into groups having an equal number of
# identical alignments. Here we are going to look for common leading
# alignments between the different groups and align them when possible.
# nothing to do if just one group
my $ng_max = @{$rgroups} - 1;
- return unless ( $ng_max > 0 );
+ return if ( $ng_max <= 0 );
#---------------------------------------------------------------------
# Step 1: Loop over groups to find all common leading alignment tokens
$line = $rlines->[$jbeg];
$rtokens = $line->{'rtokens'};
$imax = $line->{'jmax'} - 2;
- $istop = -1 unless ( defined($istop) );
+ $istop = -1 if ( !defined($istop) );
$istop = $imax if ( $istop > $imax );
# Initialize on first group
{
# We will just align assignments and simple lists
- next unless ( $imax_min >= 0 );
+ next if ( $imax_min < 0 );
next
- unless ( $rtokens->[0] =~ /^=\d/
- || $list_type );
+ if ( $rtokens->[0] !~ /^=\d/
+ && !$list_type );
# In this case we will limit padding to a short distance. This
# is a compromise to keep some vertical alignment but prevent large
# gaps, which do not look good for just two lines.
my $pad_max =
two_line_pad( $rlines->[$jbeg], $rlines->[$jbeg_m], $imax_min );
- next unless ($pad_max);
+ next if ( !$pad_max );
my $ng_m = $ng - 1;
$max_move{"$ng_m"} = $pad_max;
$max_move{"$ng"} = $pad_max;
my $var = pop(@todo);
$ng_beg = $var->[1];
}
- my ( $raw_tok, $lev, $tag, $tok_count ) = decode_alignment_token($tok);
+ my ( $raw_tok, $lev, $tag_uu, $tok_count_uu ) =
+ decode_alignment_token($tok);
push @todo, [ $i, $ng_beg, $ng_end, $raw_tok, $lev ];
}
#------------------------------
# Step 3: Execute the task list
#------------------------------
- do_left_to_right_sweep( $rlines, $rgroups, \@todo, \%max_move, $short_pad,
- $group_level );
+ do_left_to_right_sweep(
+ {
+ rlines => $rlines,
+ rgroups => $rgroups,
+ rtodo => \@todo,
+ rmax_move => \%max_move,
+ short_pad => $short_pad,
+ group_level => $group_level,
+ }
+ );
return;
} ## end sub sweep_left_to_right
# alignments we can be less restrictive.
# These are 'good' alignments, which are allowed more padding:
- my @q = qw(
- => = ? if unless or || {
- );
+ my @q = qw( => = ? if unless or || { );
push @q, ',';
@is_good_alignment_token{@q} = (0) x scalar(@q);
$is_good_alignment_token{'='} = 1;
$is_good_alignment_token{'if'} = 1;
$is_good_alignment_token{'unless'} = 1;
- $is_good_alignment_token{'=>'} = 1
+ $is_good_alignment_token{'=>'} = 1;
- # Note the hash values are set so that:
- # if ($is_good_alignment_token{$raw_tok}) => best
- # if defined ($is_good_alignment_token{$raw_tok}) => good or best
+ # Note the hash values are set so that:
+ # if ($is_good_alignment_token{$raw_tok}) => best
+ # if defined ($is_good_alignment_token{$raw_tok}) => good or best
} ## end BEGIN
# This is a sub called by sub do_left_to_right_sweep to
# move the alignment column of token $itok to $col_want for a
# sequence of groups.
- my ( $rlines, $rgroups, $rmax_move, $ngb, $nge, $itok, $col_want,
- $raw_tok )
- = @_;
- return unless ( defined($ngb) && $nge > $ngb );
+ my ($rcall_hash) = @_;
+
+ my $rlines = $rcall_hash->{rlines};
+ my $rgroups = $rcall_hash->{rgroups};
+ my $rmax_move = $rcall_hash->{rmax_move};
+ my $ngb = $rcall_hash->{ngb};
+ my $nge = $rcall_hash->{nge};
+ my $itok = $rcall_hash->{itok};
+ my $col_want = $rcall_hash->{col_want};
+ my $raw_tok = $rcall_hash->{raw_tok};
+
+ return if ( !defined($ngb) || $nge <= $ngb );
foreach my $ng ( $ngb .. $nge ) {
- my ( $jbeg, $jend ) = @{ $rgroups->[$ng] };
+ my ( $jbeg, $jend_uu ) = @{ $rgroups->[$ng] };
my $line = $rlines->[$jbeg];
my $col = $line->get_column($itok);
my $move = $col_want - $col;
# spot to take special action on failure to move
}
+ else {
+ ##ok: (move==0)
+ }
}
return;
} ## end sub move_to_common_column
sub do_left_to_right_sweep {
- my ( $rlines, $rgroups, $rtodo, $rmax_move, $short_pad, $group_level )
- = @_;
+
+ my ($rcall_hash) = @_;
+
+ # This is the worker routine for sub 'sweep_left_to_right'. Make
+ # vertical alignments by sweeping from left to right over groups
+ # of lines which have been located and prepared by the caller.
+
+ my $rlines = $rcall_hash->{rlines};
+ my $rgroups = $rcall_hash->{rgroups};
+ my $rtodo = $rcall_hash->{rtodo};
+ my $rmax_move = $rcall_hash->{rmax_move};
+ my $short_pad = $rcall_hash->{short_pad};
+ my $group_level = $rcall_hash->{group_level};
# $blocking_level[$nj is the level at a match failure between groups
# $ng-1 and $ng
my ( $itok, $ng_beg, $ng_end, $raw_tok, $lev ) = @{$task};
# Nothing to do for a single group
- next unless ( $ng_end > $ng_beg );
+ next if ( $ng_end <= $ng_beg );
my $ng_first; # index of the first group of a continuous sequence
my $col_want; # the common alignment column of a sequence of groups
my $is_blocked = defined( $blocking_level[$ng] )
&& $lev > $blocking_level[$ng];
- # TAIL-WAG-DOG RULE: prevent a 'tail-wag-dog' syndrom, meaning:
+ # TAIL-WAG-DOG RULE: prevent a 'tail-wag-dog' syndrome, meaning:
# Do not let one or two lines with a **different number of
# alignments** open up a big gap in a large block. For
# example, we will prevent something like this, where the first
}
# Otherwise allow some minimal padding of good alignments
- elsif (
+ else {
- defined( $is_good_alignment_token{$raw_tok} )
+ if (
- # We have to be careful if there are just 2 lines. This
- # two-line factor allows large gaps only for 2 lines which
- # are simple lists with fewer items on the second line. It
- # gives results similar to previous versions of perltidy.
- && ( $lines_total > 2
- || $group_list_type && $jmax < $jmax_m && $top_level )
- )
- {
- $factor += 1;
- if ($top_level) {
+ defined( $is_good_alignment_token{$raw_tok} )
+
+ # We have to be careful if there are just 2 lines.
+ # This two-line factor allows large gaps only for 2
+ # lines which are simple lists with fewer items on the
+ # second line. It gives results similar to previous
+ # versions of perltidy.
+ && (
+ $lines_total > 2
+ || ( $group_list_type
+ && $jmax < $jmax_m
+ && $top_level )
+ )
+ )
+ {
$factor += 1;
+ if ($top_level) {
+ $factor += 1;
+ }
}
}
}
move_to_common_column(
- $rlines, $rgroups, $rmax_move, $ng_first,
- $ng - 1, $itok, $col_want, $raw_tok
+ {
+ rlines => $rlines,
+ rgroups => $rgroups,
+ rmax_move => $rmax_move,
+ ngb => $ng_first,
+ nge => $ng - 1,
+ itok => $itok,
+ col_want => $col_want,
+ raw_tok => $raw_tok,
+ }
);
$ng_first = $ng;
$col_want = $col;
if ( $ng_end > $ng_first ) {
move_to_common_column(
- $rlines, $rgroups, $rmax_move, $ng_first,
- $ng_end, $itok, $col_want, $raw_tok
+ {
+ rlines => $rlines,
+ rgroups => $rgroups,
+ rmax_move => $rmax_move,
+ ngb => $ng_first,
+ nge => $ng_end,
+ itok => $itok,
+ col_want => $col_want,
+ raw_tok => $raw_tok,
+ }
);
- } ## end loop over groups for one task
+ }
} ## end loop over tasks
return;
my ( $line_obj, $ridel ) = @_;
- # $line_obj is the line to be modified
- # $ridel is a ref to list of indexes to be deleted
+ # Given:
+ # $line_obj = the line to be modified
+ # $ridel = a ref to list of indexes to be deleted
- # remove an unused alignment token(s) to improve alignment chances
+ # remove unused alignment token(s) to improve alignment chances
- return unless ( defined($line_obj) && defined($ridel) && @{$ridel} );
+ return if ( !defined($line_obj) || !defined($ridel) || !@{$ridel} );
my $jmax_old = $line_obj->{'jmax'};
my $rfields_old = $line_obj->{'rfields'};
sub decode_alignment_token {
+ my ($tok) = @_;
+
# Unpack the values packed in an alignment token
+
+ # Given:
+ # $tok = an alignment token
+ # Returns:
+ # ( $raw_tok, $lev, $tag, $tok_count )
#
# Usage:
# my ( $raw_tok, $lev, $tag, $tok_count ) =
# $nport = $port = shift || $name;
# The first '=' may either be '=0' or '=0.1' [level 0, first equals]
# The second '=' will be '=0.2' [level 0, second equals]
- my ($tok) = @_;
if ( defined( $decoded_token{$tok} ) ) {
return @{ $decoded_token{$tok} };
} ## end sub decode_alignment_token
}
-{ ## closure for sub delete_unmatched_tokens
+sub delete_unmatched_tokens {
+ my ( $rlines, $group_level ) = @_;
- my %is_assignment;
- my %keep_after_deleted_assignment;
+ # Remove as many obviously un-needed alignment tokens as possible.
+ # This will prevent them from interfering with the final alignment.
- BEGIN {
- my @q;
+ # Given:
+ # $rlines = ref to hash of all lines in this alignment group
+ # $group_level = their comment indentation level
- @q = qw(
- = **= += *= &= <<= &&=
- -= /= |= >>= ||= //=
- .= %= ^=
- x=
- );
- @is_assignment{@q} = (1) x scalar(@q);
+ # Return:
+ my $max_lev_diff = 0; # used to avoid a call to prune_tree
+ my $saw_side_comment = 0; # used to avoid a call for side comments
+ my $saw_signed_number = 0; # used to avoid a call for -vsn
- # These tokens may be kept following an = deletion
- @q = qw(
- if unless or ||
- );
- @keep_after_deleted_assignment{@q} = (1) x scalar(@q);
+ # Handle no lines -- shouldn't happen
+ return unless @{$rlines};
- } ## end BEGIN
+ # Handle a single line
+ if ( @{$rlines} == 1 ) {
+ my $line = $rlines->[0];
+ my $jmax = $line->{'jmax'};
+ my $length = $line->{'rfield_lengths'}->[$jmax];
+ $saw_side_comment = $length > 0;
+ return ( $max_lev_diff, $saw_side_comment, $saw_signed_number );
+ }
- sub delete_unmatched_tokens {
- my ( $rlines, $group_level ) = @_;
+ # ignore hanging side comments in these operations
+ my @filtered = grep { !$_->{'is_hanging_side_comment'} } @{$rlines};
+ my $rnew_lines = \@filtered;
- # This is a important first step in vertical alignment in which
- # we remove as many obviously un-needed alignment tokens as possible.
- # This will prevent them from interfering with the final alignment.
+ $saw_side_comment = @filtered != @{$rlines};
+ $max_lev_diff = 0;
- # Returns:
- my $max_lev_diff = 0; # used to avoid a call to prune_tree
- my $saw_side_comment = 0; # used to avoid a call for side comments
+ # nothing to do if all lines were hanging side comments
+ my $jmax = @{$rnew_lines} - 1;
+ return ( $max_lev_diff, $saw_side_comment, $saw_signed_number )
+ if ( $jmax < 0 );
- # Handle no lines -- shouldn't happen
- return unless @{$rlines};
+ #----------------------------------------------------
+ # Create a hash of alignment token info for each line
+ #----------------------------------------------------
+ ( my $rline_hashes, my $requals_info, $saw_side_comment, $max_lev_diff ) =
+ make_alignment_info( $group_level, $rnew_lines, $saw_side_comment );
- # Handle a single line
- if ( @{$rlines} == 1 ) {
- my $line = $rlines->[0];
- my $jmax = $line->{'jmax'};
- my $length = $line->{'rfield_lengths'}->[$jmax];
- $saw_side_comment = $length > 0;
- return ( $max_lev_diff, $saw_side_comment );
+ #------------------------------------------------------------
+ # Find independent subgroups of lines. Neighboring subgroups
+ # do not have a common alignment token.
+ #------------------------------------------------------------
+ my @subgroups;
+ push @subgroups, [ 0, $jmax ];
+ foreach my $jl ( 0 .. $jmax - 1 ) {
+ if ( $rnew_lines->[$jl]->{'end_group'} ) {
+ $subgroups[-1]->[1] = $jl;
+ push @subgroups, [ $jl + 1, $jmax ];
}
+ }
- # ignore hanging side comments in these operations
- my @filtered = grep { !$_->{'is_hanging_side_comment'} } @{$rlines};
- my $rnew_lines = \@filtered;
-
- $saw_side_comment = @filtered != @{$rlines};
- $max_lev_diff = 0;
-
- # nothing to do if all lines were hanging side comments
- my $jmax = @{$rnew_lines} - 1;
- return ( $max_lev_diff, $saw_side_comment ) unless ( $jmax >= 0 );
-
- #----------------------------------------------------
- # Create a hash of alignment token info for each line
- #----------------------------------------------------
- ( my $rline_hashes, my $requals_info, $saw_side_comment, $max_lev_diff )
- = make_alignment_info( $group_level, $rnew_lines, $saw_side_comment );
-
- #------------------------------------------------------------
- # Find independent subgroups of lines. Neighboring subgroups
- # do not have a common alignment token.
- #------------------------------------------------------------
- my @subgroups;
- push @subgroups, [ 0, $jmax ];
- foreach my $jl ( 0 .. $jmax - 1 ) {
- if ( $rnew_lines->[$jl]->{'end_group'} ) {
- $subgroups[-1]->[1] = $jl;
- push @subgroups, [ $jl + 1, $jmax ];
- }
- }
+ #-----------------------------------------------------------
+ # PASS 1 over subgroups to remove unmatched alignment tokens
+ #-----------------------------------------------------------
+ delete_unmatched_tokens_main_loop(
+ $group_level, $rnew_lines, \@subgroups,
+ $rline_hashes, $requals_info
+ );
- #-----------------------------------------------------------
- # PASS 1 over subgroups to remove unmatched alignment tokens
- #-----------------------------------------------------------
- delete_unmatched_tokens_main_loop(
- $group_level, $rnew_lines, \@subgroups,
- $rline_hashes, $requals_info
- );
+ #----------------------------------------------------------------
+ # PASS 2: Construct a tree of matched lines and delete some small
+ # deeper levels of tokens. They also block good alignments.
+ #----------------------------------------------------------------
+ prune_alignment_tree($rnew_lines) if ($max_lev_diff);
- #----------------------------------------------------------------
- # PASS 2: Construct a tree of matched lines and delete some small
- # deeper levels of tokens. They also block good alignments.
- #----------------------------------------------------------------
- prune_alignment_tree($rnew_lines) if ($max_lev_diff);
+ #--------------------------------------------
+ # PASS 3: compare all lines for common tokens
+ #--------------------------------------------
+ $saw_signed_number =
+ match_line_pairs( $rlines, $rnew_lines, \@subgroups, $group_level );
- #--------------------------------------------
- # PASS 3: compare all lines for common tokens
- #--------------------------------------------
- match_line_pairs( $rlines, $rnew_lines, \@subgroups, $group_level );
+ return ( $max_lev_diff, $saw_side_comment, $saw_signed_number );
+} ## end sub delete_unmatched_tokens
- return ( $max_lev_diff, $saw_side_comment );
- } ## end sub delete_unmatched_tokens
+sub make_alignment_info {
- sub make_alignment_info {
+ my ( $group_level, $rnew_lines, $saw_side_comment ) = @_;
- my ( $group_level, $rnew_lines, $saw_side_comment ) = @_;
+ # Create a hash of alignment token info for each line
+ # This info will be used to find common alignments
- #------------------------------------------------------------
- # Loop to create a hash of alignment token info for each line
- #------------------------------------------------------------
- my $rline_hashes = [];
- my @equals_info;
- my @line_info; # no longer used
- my $jmax = @{$rnew_lines} - 1;
- my $max_lev_diff = 0;
- foreach my $line ( @{$rnew_lines} ) {
- my $rhash = {};
- my $rtokens = $line->{'rtokens'};
- my $rpatterns = $line->{'rpatterns'};
- my $i = 0;
- my ( $i_eq, $tok_eq, $pat_eq );
- my ( $lev_min, $lev_max );
- foreach my $tok ( @{$rtokens} ) {
- my ( $raw_tok, $lev, $tag, $tok_count ) =
- decode_alignment_token($tok);
+ # Given:
+ # $group_level = common indentation level
+ # $rnew_lines = ref to hash of line info
+ # $saw_side_comment = true if there is a side comment
+ # Return:
+ # $rline_hashes = ref to hash with new line vars
+ # \@equals_info = ref to array with info on any '=' tokens
+ # $saw_side_comment = updated side comment flag
+ # $max_lev_diff = maximum level change seen
+
+ #----------------
+ # Loop over lines
+ #----------------
+ my $rline_hashes = [];
+ my @equals_info;
+ my @line_info; # no longer used
+ my $jmax = @{$rnew_lines} - 1;
+ my $max_lev_diff = 0;
+ foreach my $line ( @{$rnew_lines} ) {
+ my $rhash = {};
+ my $rtokens = $line->{'rtokens'};
+ my $rpatterns = $line->{'rpatterns'};
+ my $i = 0;
+ my ( $i_eq, $tok_eq, $pat_eq );
+ my ( $lev_min, $lev_max );
+ foreach my $tok ( @{$rtokens} ) {
+ my ( $raw_tok, $lev, $tag, $tok_count ) =
+ decode_alignment_token($tok);
- if ( $tok ne '#' ) {
- if ( !defined($lev_min) ) {
- $lev_min = $lev;
- $lev_max = $lev;
- }
- else {
- if ( $lev < $lev_min ) { $lev_min = $lev }
- if ( $lev > $lev_max ) { $lev_max = $lev }
- }
+ if ( $tok ne '#' ) {
+ if ( !defined($lev_min) ) {
+ $lev_min = $lev;
+ $lev_max = $lev;
}
else {
- if ( !$saw_side_comment ) {
- my $length = $line->{'rfield_lengths'}->[ $i + 1 ];
- $saw_side_comment ||= $length;
- }
+ if ( $lev < $lev_min ) { $lev_min = $lev }
+ if ( $lev > $lev_max ) { $lev_max = $lev }
+ }
+ }
+ else {
+ if ( !$saw_side_comment ) {
+ my $length = $line->{'rfield_lengths'}->[ $i + 1 ];
+ $saw_side_comment ||= $length;
}
+ }
- # Possible future upgrade: for multiple matches,
- # record [$i1, $i2, ..] instead of $i
- $rhash->{$tok} =
- [ $i, undef, undef, $raw_tok, $lev, $tag, $tok_count ];
+ # Possible future upgrade: for multiple matches,
+ # record [$i1, $i2, ..] instead of $i
+ $rhash->{$tok} =
+ [ $i, undef, undef, $raw_tok, $lev, $tag, $tok_count ];
- # remember the first equals at line level
- if ( !defined($i_eq) && $raw_tok eq '=' ) {
+ # remember the first equals at line level
+ if ( !defined($i_eq) && $raw_tok eq '=' ) {
- if ( $lev eq $group_level ) {
- $i_eq = $i;
- $tok_eq = $tok;
- $pat_eq = $rpatterns->[$i];
- }
+ if ( $lev eq $group_level ) {
+ $i_eq = $i;
+ $tok_eq = $tok;
+ $pat_eq = $rpatterns->[$i];
}
- $i++;
- }
- push @{$rline_hashes}, $rhash;
- push @equals_info, [ $i_eq, $tok_eq, $pat_eq ];
- push @line_info, [ $lev_min, $lev_max ];
- if ( defined($lev_min) ) {
- my $lev_diff = $lev_max - $lev_min;
- if ( $lev_diff > $max_lev_diff ) { $max_lev_diff = $lev_diff }
}
+ $i++;
}
+ push @{$rline_hashes}, $rhash;
+ push @equals_info, [ $i_eq, $tok_eq, $pat_eq ];
+ push @line_info, [ $lev_min, $lev_max ];
+ if ( defined($lev_min) ) {
+ my $lev_diff = $lev_max - $lev_min;
+ if ( $lev_diff > $max_lev_diff ) { $max_lev_diff = $lev_diff }
+ }
+ }
- #----------------------------------------------------
- # Loop to compare each line pair and remember matches
- #----------------------------------------------------
- my $rtok_hash = {};
- my $nr = 0;
- foreach my $jl ( 0 .. $jmax - 1 ) {
- my $nl = $nr;
- $nr = 0;
- my $jr = $jl + 1;
- my $rhash_l = $rline_hashes->[$jl];
- my $rhash_r = $rline_hashes->[$jr];
- foreach my $tok ( keys %{$rhash_l} ) {
- if ( defined( $rhash_r->{$tok} ) ) {
- my $il = $rhash_l->{$tok}->[0];
- my $ir = $rhash_r->{$tok}->[0];
- $rhash_l->{$tok}->[2] = $ir;
- $rhash_r->{$tok}->[1] = $il;
- if ( $tok ne '#' ) {
- push @{ $rtok_hash->{$tok} }, ( $jl, $jr );
- $nr++;
- }
+ #----------------------------------------------------
+ # Loop to compare each line pair and remember matches
+ #----------------------------------------------------
+ my $rtok_hash = {};
+ my $nr = 0;
+ foreach my $jl ( 0 .. $jmax - 1 ) {
+ my $nl = $nr;
+ $nr = 0;
+ my $jr = $jl + 1;
+ my $rhash_l = $rline_hashes->[$jl];
+ my $rhash_r = $rline_hashes->[$jr];
+ foreach my $tok ( keys %{$rhash_l} ) {
+ if ( defined( $rhash_r->{$tok} ) ) {
+ my $il = $rhash_l->{$tok}->[0];
+ my $ir = $rhash_r->{$tok}->[0];
+ $rhash_l->{$tok}->[2] = $ir;
+ $rhash_r->{$tok}->[1] = $il;
+ if ( $tok ne '#' ) {
+ push @{ $rtok_hash->{$tok} }, ( $jl, $jr );
+ $nr++;
}
}
+ }
- # Set a line break if no matching tokens between these lines
- # (this is not strictly necessary now but does not hurt)
- if ( $nr == 0 && $nl > 0 ) {
- $rnew_lines->[$jl]->{'end_group'} = 1;
- }
+ # Set a line break if no matching tokens between these lines
+ # (this is not strictly necessary now but does not hurt)
+ if ( $nr == 0 && $nl > 0 ) {
+ $rnew_lines->[$jl]->{'end_group'} = 1;
+ }
- # Also set a line break if both lines have simple equals but with
- # different leading characters in patterns. This check is similar
- # to one in sub check_match, and will prevent sub
- # prune_alignment_tree from removing alignments which otherwise
- # should be kept. This fix is rarely needed, but it can
- # occasionally improve formatting.
- # For example:
- # my $name = $this->{Name};
- # $type = $this->ctype($genlooptype) if defined $genlooptype;
- # my $declini = ( $asgnonly ? "" : "\t$type *" );
- # my $cast = ( $type ? "($type *)" : "" );
- # The last two lines start with 'my' and will not match the
- # previous line starting with $type, so we do not want
- # prune_alignment tree to delete their ? : alignments at a deeper
- # level.
- my ( $i_eq_l, $tok_eq_l, $pat_eq_l ) = @{ $equals_info[$jl] };
- my ( $i_eq_r, $tok_eq_r, $pat_eq_r ) = @{ $equals_info[$jr] };
- if ( defined($i_eq_l) && defined($i_eq_r) ) {
-
- # Also, do not align equals across a change in ci level
- my $ci_jump = $rnew_lines->[$jl]->{'ci_level'} !=
- $rnew_lines->[$jr]->{'ci_level'};
+ # Also set a line break if both lines have simple equals but with
+ # different leading characters in patterns. This check is similar
+ # to one in sub check_match, and will prevent sub
+ # prune_alignment_tree from removing alignments which otherwise
+ # should be kept. This fix is rarely needed, but it can
+ # occasionally improve formatting.
+ # For example:
+ # my $name = $this->{Name};
+ # $type = $this->ctype($genlooptype) if defined $genlooptype;
+ # my $declini = ( $asgnonly ? "" : "\t$type *" );
+ # my $cast = ( $type ? "($type *)" : "" );
+ # The last two lines start with 'my' and will not match the
+ # previous line starting with $type, so we do not want
+ # prune_alignment tree to delete their ? : alignments at a deeper
+ # level.
+ my ( $i_eq_l, $tok_eq_l, $pat_eq_l ) = @{ $equals_info[$jl] };
+ my ( $i_eq_r, $tok_eq_r, $pat_eq_r ) = @{ $equals_info[$jr] };
+ if ( defined($i_eq_l) && defined($i_eq_r) ) {
+
+ # Also, do not align equals across a change in ci level
+ my $ci_jump = $rnew_lines->[$jl]->{'ci_level'} !=
+ $rnew_lines->[$jr]->{'ci_level'};
- if (
- $tok_eq_l eq $tok_eq_r
- && $i_eq_l == 0
- && $i_eq_r == 0
- && ( substr( $pat_eq_l, 0, 1 ) ne substr( $pat_eq_r, 0, 1 )
- || $ci_jump )
- )
- {
- $rnew_lines->[$jl]->{'end_group'} = 1;
- }
+ if (
+ $tok_eq_l eq $tok_eq_r
+ && $i_eq_l == 0
+ && $i_eq_r == 0
+ && ( substr( $pat_eq_l, 0, 1 ) ne substr( $pat_eq_r, 0, 1 )
+ || $ci_jump )
+ )
+ {
+ $rnew_lines->[$jl]->{'end_group'} = 1;
}
}
- return ( $rline_hashes, \@equals_info, $saw_side_comment,
- $max_lev_diff );
- } ## end sub make_alignment_info
+ }
+ return ( $rline_hashes, \@equals_info, $saw_side_comment, $max_lev_diff );
+} ## end sub make_alignment_info
- sub delete_unmatched_tokens_main_loop {
+sub delete_unmatched_tokens_main_loop {
- my (
- $group_level, $rnew_lines, $rsubgroups,
- $rline_hashes, $requals_info
- ) = @_;
+ my ( $group_level, $rnew_lines, $rsubgroups, $rline_hashes, $requals_info )
+ = @_;
- #--------------------------------------------------------------
- # Main loop over subgroups to remove unmatched alignment tokens
- #--------------------------------------------------------------
-
- # flag to allow skipping pass 2 - not currently used
- my $saw_large_group;
-
- my $has_terminal_match = $rnew_lines->[-1]->{'j_terminal_match'};
-
- foreach my $item ( @{$rsubgroups} ) {
- my ( $jbeg, $jend ) = @{$item};
-
- my $nlines = $jend - $jbeg + 1;
-
- #---------------------------------------------------
- # Look for complete if/elsif/else and ternary blocks
- #---------------------------------------------------
-
- # We are looking for a common '$dividing_token' like these:
-
- # if ( $b and $s ) { $p->{'type'} = 'a'; }
- # elsif ($b) { $p->{'type'} = 'b'; }
- # elsif ($s) { $p->{'type'} = 's'; }
- # else { $p->{'type'} = ''; }
- # ^----------- dividing_token
-
- # my $severity =
- # !$routine ? '[PFX]'
- # : $routine =~ /warn.*_d\z/ ? '[DS]'
- # : $routine =~ /ck_warn/ ? 'W'
- # : $routine =~ /ckWARN\d*reg_d/ ? 'S'
- # : $routine =~ /ckWARN\d*reg/ ? 'W'
- # : $routine =~ /vWARN\d/ ? '[WDS]'
- # : '[PFX]';
- # ^----------- dividing_token
-
- # Only look for groups which are more than 2 lines long. Two lines
- # can get messed up doing this, probably due to the various
- # two-line rules.
-
- my $dividing_token;
- my %token_line_count;
- if ( $nlines > 2 ) {
-
- foreach my $jj ( $jbeg .. $jend ) {
- my %seen;
- my $line = $rnew_lines->[$jj];
- my $rtokens = $line->{'rtokens'};
- foreach my $tok ( @{$rtokens} ) {
- if ( !$seen{$tok} ) {
- $seen{$tok}++;
- $token_line_count{$tok}++;
- }
+ #--------------------------------------------------------------
+ # Main loop over subgroups to remove unmatched alignment tokens
+ #--------------------------------------------------------------
+
+ # flag to allow skipping pass 2 - not currently used
+ my $saw_large_group;
+
+ my $has_terminal_match = $rnew_lines->[-1]->{'j_terminal_match'};
+
+ foreach my $item ( @{$rsubgroups} ) {
+ my ( $jbeg, $jend ) = @{$item};
+
+ my $nlines = $jend - $jbeg + 1;
+
+ #---------------------------------------------------
+ # Look for complete if/elsif/else and ternary blocks
+ #---------------------------------------------------
+
+ # We are looking for a common '$dividing_token' like these:
+
+ # if ( $b and $s ) { $p->{'type'} = 'a'; }
+ # elsif ($b) { $p->{'type'} = 'b'; }
+ # elsif ($s) { $p->{'type'} = 's'; }
+ # else { $p->{'type'} = ''; }
+ # ^----------- dividing_token
+
+ # my $severity =
+ # !$routine ? '[PFX]'
+ # : $routine =~ /warn.*_d\z/ ? '[DS]'
+ # : $routine =~ /ck_warn/ ? 'W'
+ # : $routine =~ /ckWARN\d*reg_d/ ? 'S'
+ # : $routine =~ /ckWARN\d*reg/ ? 'W'
+ # : $routine =~ /vWARN\d/ ? '[WDS]'
+ # : '[PFX]';
+ # ^----------- dividing_token
+
+ # Only look for groups which are more than 2 lines long. Two lines
+ # can get messed up doing this, probably due to the various
+ # two-line rules.
+
+ my $dividing_token;
+ my %token_line_count;
+ if ( $nlines > 2 ) {
+
+ foreach my $jj ( $jbeg .. $jend ) {
+ my %seen;
+ my $line = $rnew_lines->[$jj];
+ my $rtokens = $line->{'rtokens'};
+ foreach my $tok ( @{$rtokens} ) {
+ if ( !$seen{$tok} ) {
+ $seen{$tok}++;
+ $token_line_count{$tok}++;
}
}
+ }
- foreach my $tok ( keys %token_line_count ) {
- if ( $token_line_count{$tok} == $nlines ) {
- if ( substr( $tok, 0, 1 ) eq '?'
- || substr( $tok, 0, 1 ) eq '{'
- && $tok =~ /^\{\d+if/ )
- {
- $dividing_token = $tok;
- last;
- }
+ foreach my $tok ( keys %token_line_count ) {
+ if ( $token_line_count{$tok} == $nlines ) {
+ if ( substr( $tok, 0, 1 ) eq '?'
+ || substr( $tok, 0, 1 ) eq '{' && $tok =~ /^\{\d+if/ )
+ {
+ $dividing_token = $tok;
+ last;
}
}
}
+ }
- #-------------------------------------------------------------
- # Loop over subgroup lines to remove unwanted alignment tokens
- #-------------------------------------------------------------
- foreach my $jj ( $jbeg .. $jend ) {
- my $line = $rnew_lines->[$jj];
- my $rtokens = $line->{'rtokens'};
- my $rhash = $rline_hashes->[$jj];
- my $i_eq = $requals_info->[$jj]->[0];
- my @idel;
- my $imax = @{$rtokens} - 2;
- my $delete_above_level;
- my $deleted_assignment_token;
-
- my $saw_dividing_token = EMPTY_STRING;
- $saw_large_group ||= $nlines > 2 && $imax > 1;
-
- # Loop over all alignment tokens
- foreach my $i ( 0 .. $imax ) {
- my $tok = $rtokens->[$i];
- next if ( $tok eq '#' ); # shouldn't happen
- my ( $iii, $il, $ir, $raw_tok, $lev, $tag, $tok_count ) =
- @{ $rhash->{$tok} };
-
- #------------------------------------------------------
- # Here is the basic RULE: remove an unmatched alignment
- # which does not occur in the surrounding lines.
- #------------------------------------------------------
- my $delete_me = !defined($il) && !defined($ir);
-
- # Apply any user controls. Note that not all lines pass
- # this way so they have to be applied elsewhere too.
- my $align_ok = 1;
- if (%valign_control_hash) {
- $align_ok = $valign_control_hash{$raw_tok};
- $align_ok = $valign_control_default
- unless defined($align_ok);
- $delete_me ||= !$align_ok;
- }
+ #-------------------------------------------------------------
+ # Loop over subgroup lines to remove unwanted alignment tokens
+ #-------------------------------------------------------------
+ foreach my $jj ( $jbeg .. $jend ) {
+ my $line = $rnew_lines->[$jj];
+ my $rtokens = $line->{'rtokens'};
+ my $rhash = $rline_hashes->[$jj];
+ my $i_eq = $requals_info->[$jj]->[0];
+ my @idel;
+ my $imax = @{$rtokens} - 2;
+ my $delete_above_level;
+ my $deleted_assignment_token;
- # But now we modify this with exceptions...
+ my $saw_dividing_token = EMPTY_STRING;
+ $saw_large_group ||= $nlines > 2 && $imax > 1;
- # EXCEPTION 1: If we are in a complete ternary or
- # if/elsif/else group, and this token is not on every line
- # of the group, should we delete it to preserve overall
- # alignment?
- if ($dividing_token) {
- if ( $token_line_count{$tok} >= $nlines ) {
- $saw_dividing_token ||= $tok eq $dividing_token;
- }
- else {
+ # Loop over all alignment tokens
+ foreach my $i ( 0 .. $imax ) {
+ my $tok = $rtokens->[$i];
+ next if ( $tok eq '#' ); # shouldn't happen
+ my ( $iii_uu, $il, $ir, $raw_tok, $lev, $tag_uu, $tok_count ) =
+ @{ $rhash->{$tok} };
+
+ #------------------------------------------------------
+ # Here is the basic RULE: remove an unmatched alignment
+ # which does not occur in the surrounding lines.
+ #------------------------------------------------------
+ my $delete_me = !defined($il) && !defined($ir);
+
+ # Apply any user controls. Note that not all lines pass
+ # this way so they have to be applied elsewhere too.
+ my $align_ok = 1;
+ if (%valign_control_hash) {
+ $align_ok = $valign_control_hash{$raw_tok};
+ $align_ok = $valign_control_default
+ unless defined($align_ok);
+ $delete_me ||= !$align_ok;
+ }
- # For shorter runs, delete toks to save alignment.
- # For longer runs, keep toks after the '{' or '?'
- # to allow sub-alignments within braces. The
- # number 5 lines is arbitrary but seems to work ok.
- $delete_me ||=
- ( $nlines < 5 || !$saw_dividing_token );
- }
+ # But now we modify this with exceptions...
+
+ # EXCEPTION 1: If we are in a complete ternary or
+ # if/elsif/else group, and this token is not on every line
+ # of the group, should we delete it to preserve overall
+ # alignment?
+ if ($dividing_token) {
+ if ( $token_line_count{$tok} >= $nlines ) {
+ $saw_dividing_token ||= $tok eq $dividing_token;
}
+ else {
- # EXCEPTION 2: Remove all tokens above a certain level
- # following a previous deletion. For example, we have to
- # remove tagged higher level alignment tokens following a
- # '=>' deletion because the tags of higher level tokens
- # will now be incorrect. For example, this will prevent
- # aligning commas as follows after deleting the second '=>'
- # $w->insert(
- # ListBox => origin => [ 270, 160 ],
- # size => [ 200, 55 ],
- # );
- if ( defined($delete_above_level) ) {
- if ( $lev > $delete_above_level ) {
- $delete_me ||= 1;
- }
- else { $delete_above_level = undef }
+ # For shorter runs, delete toks to save alignment.
+ # For longer runs, keep toks after the '{' or '?'
+ # to allow sub-alignments within braces. The
+ # number 5 lines is arbitrary but seems to work ok.
+ $delete_me ||= ( $nlines < 5 || !$saw_dividing_token );
}
+ }
- # EXCEPTION 3: Remove all but certain tokens after an
- # assignment deletion.
- if (
- $deleted_assignment_token
- && ( $lev > $group_level
- || !$keep_after_deleted_assignment{$raw_tok} )
- )
- {
+ # EXCEPTION 2: Remove all tokens above a certain level
+ # following a previous deletion. For example, we have to
+ # remove tagged higher level alignment tokens following a
+ # '=>' deletion because the tags of higher level tokens
+ # will now be incorrect. For example, this will prevent
+ # aligning commas as follows after deleting the second '=>'
+ # $w->insert(
+ # ListBox => origin => [ 270, 160 ],
+ # size => [ 200, 55 ],
+ # );
+ if ( defined($delete_above_level) ) {
+ if ( $lev > $delete_above_level ) {
$delete_me ||= 1;
}
+ else { $delete_above_level = undef }
+ }
- # EXCEPTION 4: Do not touch the first line of a 2 line
- # terminal match, such as below, because j_terminal has
- # already been set.
- # if ($tag) { $tago = "<$tag>"; $tagc = "</$tag>"; }
- # else { $tago = $tagc = ''; }
- # But see snippets 'else1.t' and 'else2.t'
- $delete_me = 0
- if ( $jj == $jbeg
- && $has_terminal_match
- && $nlines == 2 );
+ # EXCEPTION 3: Remove all but certain tokens after an
+ # assignment deletion.
+ if (
+ $deleted_assignment_token
+ && ( $lev > $group_level
+ || !$is_if_or{$raw_tok} )
+ )
+ {
+ $delete_me ||= 1;
+ }
- # EXCEPTION 5: misc additional rules for commas and equals
- if ( $delete_me && $tok_count == 1 ) {
+ # EXCEPTION 4: Do not touch the first line of a 2 line
+ # terminal match, such as below, because j_terminal has
+ # already been set.
+ # if ($tag) { $tago = "<$tag>"; $tagc = "</$tag>"; }
+ # else { $tago = $tagc = ''; }
+ # But see snippets 'else1.t' and 'else2.t'
+ $delete_me = 0
+ if ( $jj == $jbeg
+ && $has_terminal_match
+ && $nlines == 2 );
- # okay to delete second and higher copies of a token
+ # EXCEPTION 5: misc additional rules for commas and equals
+ if ( $delete_me && $tok_count == 1 ) {
- # for a comma...
- if ( $raw_tok eq ',' ) {
+ # okay to delete second and higher copies of a token
- # Do not delete commas before an equals
- $delete_me = 0
- if ( defined($i_eq) && $i < $i_eq );
+ # for a comma...
+ if ( $raw_tok eq ',' ) {
- # Do not delete line-level commas
- $delete_me = 0 if ( $lev <= $group_level );
- }
+ # Do not delete commas before an equals
+ $delete_me = 0
+ if ( defined($i_eq) && $i < $i_eq );
- # For an assignment at group level..
- if ( $is_assignment{$raw_tok}
- && $lev == $group_level )
- {
+ # Do not delete line-level commas
+ $delete_me = 0 if ( $lev <= $group_level );
+ }
- # Do not delete if it is the last alignment of
- # multiple tokens; this will prevent some
- # undesirable alignments
- if ( $imax > 0 && $i == $imax ) {
- $delete_me = 0;
- }
+ # For an assignment at group level..
+ if ( $is_assignment{$raw_tok}
+ && $lev == $group_level )
+ {
- # Otherwise, set a flag to delete most
- # remaining tokens
- else { $deleted_assignment_token = $raw_tok }
+ # Do not delete if it is the last alignment of
+ # multiple tokens; this will prevent some
+ # undesirable alignments
+ if ( $imax > 0 && $i == $imax ) {
+ $delete_me = 0;
}
+
+ # Otherwise, set a flag to delete most
+ # remaining tokens
+ else { $deleted_assignment_token = $raw_tok }
}
+ }
- # Do not let a user exclusion be reactivated by above rules
- $delete_me ||= !$align_ok;
+ # Do not let a user exclusion be reactivated by above rules
+ $delete_me ||= !$align_ok;
- #------------------------------------
- # Add this token to the deletion list
- #------------------------------------
- if ($delete_me) {
- push @idel, $i;
+ #------------------------------------
+ # Add this token to the deletion list
+ #------------------------------------
+ if ($delete_me) {
+ push @idel, $i;
- # update deletion propagation flags
- if ( !defined($delete_above_level)
- || $lev < $delete_above_level )
- {
+ # update deletion propagation flags
+ if ( !defined($delete_above_level)
+ || $lev < $delete_above_level )
+ {
- # delete all following higher level alignments
- $delete_above_level = $lev;
+ # delete all following higher level alignments
+ $delete_above_level = $lev;
- # but keep deleting after => to next lower level
- # to avoid some bizarre alignments
- if ( $raw_tok eq '=>' ) {
- $delete_above_level = $lev - 1;
- }
+ # but keep deleting after => to next lower level
+ # to avoid some bizarre alignments
+ if ( $raw_tok eq '=>' ) {
+ $delete_above_level = $lev - 1;
}
}
- } # End loop over alignment tokens
-
- # Process all deletion requests for this line
- if (@idel) {
- delete_selected_tokens( $line, \@idel );
}
- } # End loopover lines
- } ## end main loop over subgroups
+ } # End loop over alignment tokens
- return;
- } ## end sub delete_unmatched_tokens_main_loop
-}
+ # Process all deletion requests for this line
+ if (@idel) {
+ delete_selected_tokens( $line, \@idel );
+ }
+ } # End loop over lines
+ } ## end main loop over subgroups
+
+ return;
+} ## end sub delete_unmatched_tokens_main_loop
sub match_line_pairs {
my ( $rlines, $rnew_lines, $rsubgroups, $group_level ) = @_;
# Compare each pair of lines and save information about common matches
- # $rlines = list of lines including hanging side comments
- # $rnew_lines = list of lines without any hanging side comments
- # $rsubgroups = list of subgroups of the new lines
+
+ # Given:
+ # $rlines = list of lines including hanging side comments
+ # $rnew_lines = list of lines without any hanging side comments
+ # $rsubgroups = list of subgroups of the new lines
+ # Return:
+ # $saw_signed_number = true if a field has a signed number
+ # (needed for --valign-signed-numbers)
# TODO:
# Maybe change: imax_pair => pair_match_info = ref to array
my ( $line, $rtokens, $rpatterns, $rfield_lengths, $imax, $list_type,
$ci_level );
+ # Return parameter to avoid calls to sub pad_signed_number_columns
+ my $saw_signed_number;
+
# loop over subgroups
foreach my $item ( @{$rsubgroups} ) {
my ( $jbeg, $jend ) = @{$item};
my $nlines = $jend - $jbeg + 1;
- next unless ( $nlines > 1 );
+ next if ( $nlines <= 1 );
# loop over lines in a subgroup
foreach my $jj ( $jbeg .. $jend ) {
$list_type = $line->{'list_type'};
$ci_level = $line->{'ci_level'};
+ # Quick approximate check for signed numbers in this line.
+ # This speeds up large runs by about 0.5%
+ if ( !$saw_signed_number ) {
+
+ my $rfields = $line->{'rfields'};
+ foreach my $i ( 0 .. $imax + 1 ) {
+ next if ( index( $rpatterns->[$i], 'n' ) < 0 );
+ my $field = $rfields->[$i];
+ if ( index( $field, '-' ) >= 0
+ || index( $field, '+' ) >= 0 )
+ {
+ $saw_signed_number = 1;
+ last;
+ }
+ }
+ }
+
# nothing to do for first line
next if ( $jj == $jbeg );
my $pat = $rpatterns->[$i];
my $pat_m = $rpatterns_m->[$i];
+ # VSN PATCH: allow numbers to match quotes
+ if ( $pat_m ne $pat ) {
+ $pat =~ tr/n/Q/;
+ $pat_m =~ tr/n/Q/;
+ }
+
# If patterns don't match, we have to be careful...
if ( $pat_m ne $pat ) {
my $pad =
$rfield_lengths->[$i] - $rfield_lengths_m->[$i];
- my ( $match_code, $rmsg ) =
- compare_patterns( $group_level,
- $tok, $tok_m, $pat, $pat_m, $pad );
+ my $match_code = compare_patterns(
+ {
+ group_level => $group_level,
+ tok => $tok,
+ tok_m => $tok_m,
+ pat => $pat,
+ pat_m => $pat_m,
+ pad => $pad,
+ }
+ );
if ($match_code) {
if ( $match_code == 1 ) { $i_nomatch = $i }
elsif ( $match_code == 2 ) { $i_nomatch = 0 }
+ else { } ##ok
last;
}
}
# so that lines can just look back one line for their pair info.
if ( @{$rlines} > @{$rnew_lines} ) {
my $last_pair_info = -1;
- foreach my $line ( @{$rlines} ) {
- if ( $line->{'is_hanging_side_comment'} ) {
- $line->{'imax_pair'} = $last_pair_info;
+ foreach my $line_t ( @{$rlines} ) {
+ if ( $line_t->{'is_hanging_side_comment'} ) {
+ $line_t->{'imax_pair'} = $last_pair_info;
}
else {
- $last_pair_info = $line->{'imax_pair'};
+ $last_pair_info = $line_t->{'imax_pair'};
}
}
}
- return;
+ return $saw_signed_number;
} ## end sub match_line_pairs
sub compare_patterns {
- my ( $group_level, $tok, $tok_m, $pat, $pat_m, $pad ) = @_;
+ my ($rcall_hash) = @_;
+
+ my $group_level = $rcall_hash->{group_level};
+ my $tok = $rcall_hash->{tok};
+## my $tok_m = $rcall_hash->{tok_m};
+ my $pat = $rcall_hash->{pat};
+ my $pat_m = $rcall_hash->{pat_m};
+ my $pad = $rcall_hash->{pad};
- # helper routine for sub match_line_pairs to decide if patterns in two
- # lines match well enough..Given
+ # This is a helper routine for sub match_line_pairs to decide if patterns
+ # in two lines match well enough
+ # Given:
# $tok_m, $pat_m = token and pattern of first line
# $tok, $pat = token and pattern of second line
# $pad = 0 if no padding is needed, !=0 otherwise
- # return code:
+ # Return code:
# 0 = patterns match, continue
# 1 = no match
# 2 = no match, and lines do not match at all
use constant EXPLAIN_COMPARE_PATTERNS => 0;
- my ( $alignment_token, $lev, $tag, $tok_count ) =
+ my ( $alignment_token, $lev, $tag_uu, $tok_count_uu ) =
decode_alignment_token($tok);
# We have to be very careful about aligning commas
EXPLAIN_COMPARE_PATTERNS
&& $return_code
- && print STDERR "no match because $GoToMsg\n";
+ && print {*STDOUT} "no match because $GoToMsg\n";
- return ( $return_code, \$GoToMsg );
+ return $return_code;
} ## end sub compare_patterns
sub fat_comma_to_comma {
my ($str) = @_;
- # We are changing '=>' to ',' and removing any trailing decimal count
- # because currently fat commas have a count and commas do not.
- # For example, we will change '=>2+{-3.2' into ',2+{-3'
+ # Given:
+ # $str = a decorated fat comma alignment token
+
+ # Change '=>' to ','
+ # and remove any trailing decimal count because currently fat commas have a
+ # count and commas do not.
+
+ # For example, change '=>2+{-3.2' into ',2+{-3'
if ( $str =~ /^=>([^\.]*)/ ) { $str = ',' . $1 }
return $str;
} ## end sub fat_comma_to_comma
sub get_line_token_info {
- # scan lines of tokens and return summary information about the range of
- # levels and patterns.
my ($rlines) = @_;
+ # Given:
+ # $rlines = ref to array of lines in this group
+
+ # Scan lines of tokens and return summary information about the range of
+ # levels and patterns.
+
# First scan to check monotonicity. Here is an example of several
# lines which are monotonic. The = is the lowest level, and
# the commas are all one level deeper. So this is not nonmonotonic.
{
$imax = $ii;
$ii--;
- }
+ } ## end while ( $ii >= 0 && fat_comma_to_comma...)
}
# make a first pass to find level range
$i++;
last if ( $i > $imax );
last if ( $tok eq '#' );
- my ( $raw_tok, $lev, $tag, $tok_count ) =
+ my ( $raw_tok_uu, $lev, $tag_uu, $tok_count_uu ) =
@{ $all_token_info[$jj]->[$i] };
last if ( $tok eq '#' );
# handle no levels
my $rtoken_patterns = {};
my $rtoken_indexes = {};
- my @levs = sort keys %saw_level;
+ my @levs = sort { $a <=> $b } keys %saw_level;
if ( !defined($lev_min) ) {
$lev_min = -1;
$lev_max = -1;
$rtoken_patterns->{$lev_max} = $token_pattern_max;
$rtoken_indexes->{$lev_max} = [ ( 0 .. $imax ) ];
- my $lev_top = pop @levs; # alread did max level
+ my $lev_top = pop @levs; # already did max level
my $itok = -1;
foreach my $tok ( @{$rtokens} ) {
$itok++;
last if ( $itok > $imax );
- my ( $raw_tok, $lev, $tag, $tok_count ) =
+ my ( $raw_tok, $lev, $tag_uu, $tok_count_uu ) =
@{ $all_token_info[$jj]->[$itok] };
last if ( $raw_tok eq '#' );
foreach my $lev_test (@levs) {
sub prune_alignment_tree {
my ($rlines) = @_;
+
+ # Given:
+ # $rlines = ref to array of lines in this group
+
+ # Prune the tree of alignments to limit depth of alignments
+
my $jmax = @{$rlines} - 1;
- return unless $jmax > 0;
+ return if ( $jmax <= 0 );
# Vertical alignment in perltidy is done as an iterative process. The
# starting point is to mark all possible alignment tokens ('=', ',', '=>',
];
return;
- }; ## end sub end_node
+ }; ## end $end_node = sub
#-----------------------------------------------------
# Prune Tree Step 2. Loop to form the tree of matches.
my $jm = $jp - 1;
# Pull out needed values for the next line
- my ( $lev_min, $lev_max, $rtoken_patterns, $rlevs, $rtoken_indexes,
- $is_monotonic, $imax_true, $imax )
+ my ( $lev_min_uu, $lev_max_uu, $rtoken_patterns, $rlevs,
+ $rtoken_indexes, $is_monotonic_uu, $imax_true_uu, $imax_uu )
= @{ $rline_values->[$jp] };
# Transfer levels and patterns for this line to the working arrays.
# Otherwise see if anything changed and update the tree if so
else {
- foreach my $depth ( 0 .. $MAX_DEPTH ) {
+ foreach my $dep ( 0 .. $MAX_DEPTH ) {
- my $def_current = defined( $token_patterns_current[$depth] );
- my $def_next = defined( $token_patterns_next[$depth] );
- last unless ( $def_current || $def_next );
+ my $def_current = defined( $token_patterns_current[$dep] );
+ my $def_next = defined( $token_patterns_next[$dep] );
+ last if ( !$def_current && !$def_next );
if ( !$def_current
|| !$def_next
- || $token_patterns_current[$depth] ne
- $token_patterns_next[$depth] )
+ || $token_patterns_current[$dep] ne
+ $token_patterns_next[$dep] )
{
my $n_parent;
- if ( $depth > 0 && defined( $match_tree[ $depth - 1 ] ) ) {
- $n_parent = @{ $match_tree[ $depth - 1 ] } - 1;
+ if ( $dep > 0 && defined( $match_tree[ $dep - 1 ] ) ) {
+ $n_parent = @{ $match_tree[ $dep - 1 ] } - 1;
}
- $end_node->( $depth, $jm, $n_parent );
+ $end_node->( $dep, $jm, $n_parent );
last;
}
}
# $level_keep is the minimum level to keep
my @delete_list;
- # Not currently used:
- # Groups with ending comma lists and their range of sizes:
- # $ragged_comma_group{$id} = [ imax_group_min, imax_group_max ]
- ## my %ragged_comma_group;
-
# We work with a list of nodes to visit at the next deeper depth.
my @todo_list;
if ( defined( $match_tree[0] ) ) {
}
foreach my $depth ( 0 .. $MAX_DEPTH ) {
- last unless (@todo_list);
+ last if ( !@todo_list );
my @todo_next;
foreach my $np (@todo_list) {
- my ( $jbeg_p, $jend_p, $np_p, $lev_p, $pat_p, $nc_beg_p, $nc_end_p,
- $rindexes_p )
+ my ( $jbeg_p, $jend_p, $np_p_uu, $lev_p, $pat_p_uu, $nc_beg_p,
+ $nc_end_p, $rindexes_p_uu )
= @{ $match_tree[$depth]->[$np] };
my $nlines_p = $jend_p - $jbeg_p + 1;
# loop to keep or delete each child node
foreach my $nc ( $nc_beg_p .. $nc_end_p ) {
- my ( $jbeg_c, $jend_c, $np_c, $lev_c, $pat_c, $nc_beg_c,
- $nc_end_c )
+ my ( $jbeg_c, $jend_c, $np_c_uu, $lev_c_uu, $pat_c_uu,
+ $nc_beg_c_uu, $nc_end_c_uu )
= @{ $match_tree[ $depth + 1 ]->[$nc] };
my $nlines_c = $jend_c - $jbeg_c + 1;
my $is_monotonic = $rline_values->[$jbeg_c]->[5];
my $imax = @{$rtokens} - 2;
foreach my $i ( 0 .. $imax ) {
my $tok = $rtokens->[$i];
- my ( $raw_tok, $lev, $tag, $tok_count ) =
+ my ( $raw_tok_uu, $lev, $tag_uu, $tok_count_uu ) =
decode_alignment_token($tok);
if ( $lev > $level_keep ) {
push @idel, $i;
local $LIST_SEPARATOR = ')(';
foreach my $item ( @{$rgroup} ) {
my @fix = @{$item};
- foreach my $val (@fix) { $val = "undef" unless defined $val; }
+ foreach my $val (@fix) { $val = "undef" unless defined($val); }
$fix[4] = "...";
print "(@fix)\n";
}
return;
} ## end sub Dump_tree_groups
-{ ## closure for sub is_marginal_match
-
- my %is_if_or;
- my %is_assignment;
- my %is_good_alignment;
-
- # This test did not give sufficiently better results to use as an update,
- # but the flag is worth keeping as a starting point for future testing.
- use constant TEST_MARGINAL_EQ_ALIGNMENT => 0;
+# This test did not give sufficiently better results to use as an update,
+# but the flag is kept as a starting point for future testing.
+use constant TEST_MARGINAL_EQ_ALIGNMENT => 0;
- BEGIN {
-
- my @q = qw(
- if unless or ||
- );
- @is_if_or{@q} = (1) x scalar(@q);
-
- @q = qw(
- = **= += *= &= <<= &&=
- -= /= |= >>= ||= //=
- .= %= ^=
- x=
- );
- @is_assignment{@q} = (1) x scalar(@q);
-
- # Vertically aligning on certain "good" tokens is usually okay
- # so we can be less restrictive in marginal cases.
- @q = qw( { ? => = );
- push @q, (',');
- @is_good_alignment{@q} = (1) x scalar(@q);
- } ## end BEGIN
+sub is_marginal_match {
- sub is_marginal_match {
+ my ( $line_0, $line_1, $group_level, $imax_align, $imax_prev ) = @_;
- my ( $line_0, $line_1, $group_level, $imax_align, $imax_prev ) = @_;
+ # Decide if we should undo some or all of the common alignments of a
+ # group of just two lines.
- # Decide if we should undo some or all of the common alignments of a
- # group of just two lines.
+ # Given:
+ # $line_0 and $line_1 - the two lines
+ # $group_level = the indentation level of the group being processed
+ # $imax_align = the maximum index of the common alignment tokens
+ # of the two lines
+ # $imax_prev = the maximum index of the common alignment tokens
+ # with the line before $line_0 (=-1 of does not exist)
- # Given:
- # $line_0 and $line_1 - the two lines
- # $group_level = the indentation level of the group being processed
- # $imax_align = the maximum index of the common alignment tokens
- # of the two lines
- # $imax_prev = the maximum index of the common alignment tokens
- # with the line before $line_0 (=-1 of does not exist)
+ # Return:
+ # $is_marginal = true if the two lines should NOT be fully aligned
+ # = false if the two lines can remain fully aligned
+ # $imax_align = the index of the highest alignment token shared by
+ # these two lines to keep if the match is marginal.
- # Return:
- # $is_marginal = true if the two lines should NOT be fully aligned
- # = false if the two lines can remain fully aligned
- # $imax_align = the index of the highest alignment token shared by
- # these two lines to keep if the match is marginal.
+ # When we have an alignment group of just two lines like this, we are
+ # working in the twilight zone of what looks good and what looks bad.
+ # This routine is a collection of rules which work have been found to
+ # work fairly well, but it will need to be updated from time to time.
- # When we have an alignment group of just two lines like this, we are
- # working in the twilight zone of what looks good and what looks bad.
- # This routine is a collection of rules which work have been found to
- # work fairly well, but it will need to be updated from time to time.
+ my $is_marginal = 0;
- my $is_marginal = 0;
+ #---------------------------------------
+ # Always align certain special cases ...
+ #---------------------------------------
+ if (
- #---------------------------------------
- # Always align certain special cases ...
- #---------------------------------------
- if (
+ # always keep alignments of a terminal else or ternary
+ defined( $line_1->{'j_terminal_match'} )
- # always keep alignments of a terminal else or ternary
- defined( $line_1->{'j_terminal_match'} )
+ # always align lists
+ || $line_0->{'list_type'}
- # always align lists
- || $line_0->{'list_type'}
+ # always align hanging side comments
+ || $line_1->{'is_hanging_side_comment'}
- # always align hanging side comments
- || $line_1->{'is_hanging_side_comment'}
+ )
+ {
+ return ( $is_marginal, $imax_align );
+ }
- )
- {
- return ( $is_marginal, $imax_align );
- }
-
- my $jmax_0 = $line_0->{'jmax'};
- my $jmax_1 = $line_1->{'jmax'};
- my $rtokens_1 = $line_1->{'rtokens'};
- my $rtokens_0 = $line_0->{'rtokens'};
- my $rfield_lengths_0 = $line_0->{'rfield_lengths'};
- my $rfield_lengths_1 = $line_1->{'rfield_lengths'};
- my $rpatterns_0 = $line_0->{'rpatterns'};
- my $rpatterns_1 = $line_1->{'rpatterns'};
- my $imax_next = $line_1->{'imax_pair'};
-
- # We will scan the alignment tokens and set a flag '$is_marginal' if
- # it seems that the an alignment would look bad.
- my $max_pad = 0;
- my $saw_good_alignment = 0;
- my $saw_if_or; # if we saw an 'if' or 'or' at group level
- my $raw_tokb = EMPTY_STRING; # first token seen at group level
- my $jfirst_bad;
- my $line_ending_fat_comma; # is last token just a '=>' ?
- my $j0_eq_pad;
- my $j0_max_pad = 0;
-
- foreach my $j ( 0 .. $jmax_1 - 2 ) {
- my ( $raw_tok, $lev, $tag, $tok_count ) =
- decode_alignment_token( $rtokens_1->[$j] );
- if ( $raw_tok && $lev == $group_level ) {
- if ( !$raw_tokb ) { $raw_tokb = $raw_tok }
- $saw_if_or ||= $is_if_or{$raw_tok};
- }
+ my $jmax_0 = $line_0->{'jmax'};
+ my $jmax_1 = $line_1->{'jmax'};
+ my $rtokens_1 = $line_1->{'rtokens'};
+## my $rtokens_0 = $line_0->{'rtokens'};
+ my $rfield_lengths_0 = $line_0->{'rfield_lengths'};
+ my $rfield_lengths_1 = $line_1->{'rfield_lengths'};
+ my $rpatterns_0 = $line_0->{'rpatterns'};
+ my $rpatterns_1 = $line_1->{'rpatterns'};
+ my $imax_next = $line_1->{'imax_pair'};
+
+ # We will scan the alignment tokens and set a flag '$is_marginal' if
+ # it seems that the an alignment would look bad.
+ my $max_pad = 0;
+ my $saw_good_alignment = 0;
+ my $saw_if_or; # if we saw an 'if' or 'or' at group level
+ my $raw_tokb = EMPTY_STRING; # first token seen at group level
+ my $jfirst_bad;
+ my $line_ending_fat_comma; # is last token just a '=>' ?
+ my $j0_eq_pad;
+ my $j0_max_pad = 0;
+
+ foreach my $j ( 0 .. $jmax_1 - 2 ) {
+ my ( $raw_tok, $lev, $tag_uu, $tok_count_uu ) =
+ decode_alignment_token( $rtokens_1->[$j] );
+ if ( $raw_tok && $lev == $group_level ) {
+ if ( !$raw_tokb ) { $raw_tokb = $raw_tok }
+ $saw_if_or ||= $is_if_or{$raw_tok};
+ }
- # When the first of the two lines ends in a bare '=>' this will
- # probably be marginal match. (For a bare =>, the next field length
- # will be 2 or 3, depending on side comment)
- $line_ending_fat_comma =
- $j == $jmax_1 - 2
- && $raw_tok eq '=>'
- && $rfield_lengths_0->[ $j + 1 ] <= 3;
-
- my $pad = $rfield_lengths_1->[$j] - $rfield_lengths_0->[$j];
- if ( $j == 0 ) {
- $pad += $line_1->{'leading_space_count'} -
- $line_0->{'leading_space_count'};
-
- # Remember the pad at a leading equals
- if ( $raw_tok eq '=' && $lev == $group_level ) {
- $j0_eq_pad = $pad;
- $j0_max_pad =
- 0.5 * ( $rfield_lengths_1->[0] + $rfield_lengths_0->[0] );
- $j0_max_pad = 4 if ( $j0_max_pad < 4 );
- }
- }
+ # When the first of the two lines ends in a bare '=>' this will
+ # probably be marginal match. (For a bare =>, the next field length
+ # will be 2 or 3, depending on side comment)
+ $line_ending_fat_comma =
+ $j == $jmax_1 - 2
+ && $raw_tok eq '=>'
+ && $rfield_lengths_0->[ $j + 1 ] <= 3;
- if ( $pad < 0 ) { $pad = -$pad }
- if ( $pad > $max_pad ) { $max_pad = $pad }
- if ( $is_good_alignment{$raw_tok} && !$line_ending_fat_comma ) {
- $saw_good_alignment = 1;
- }
- else {
- $jfirst_bad = $j unless defined($jfirst_bad);
- }
- if ( $rpatterns_0->[$j] ne $rpatterns_1->[$j] ) {
-
- # Flag this as a marginal match since patterns differ.
- # Normally, we will not allow just two lines to match if
- # marginal. But we can allow matching in some specific cases.
-
- $jfirst_bad = $j if ( !defined($jfirst_bad) );
- $is_marginal = 1 if ( $is_marginal == 0 );
- if ( $raw_tok eq '=' ) {
-
- # Here is an example of a marginal match:
- # $done{$$op} = 1;
- # $op = compile_bblock($op);
- # The left tokens are both identifiers, but
- # one accesses a hash and the other doesn't.
- # We'll let this be a tentative match and undo
- # it later if we don't find more than 2 lines
- # in the group.
- $is_marginal = 2;
- }
+ my $pad = $rfield_lengths_1->[$j] - $rfield_lengths_0->[$j];
+ if ( $j == 0 ) {
+ $pad += $line_1->{'leading_space_count'} -
+ $line_0->{'leading_space_count'};
+
+ # Remember the pad at a leading equals
+ if ( $raw_tok eq '=' && $lev == $group_level ) {
+ $j0_eq_pad = $pad;
+ $j0_max_pad =
+ 0.5 * ( $rfield_lengths_1->[0] + $rfield_lengths_0->[0] );
+ $j0_max_pad = 4 if ( $j0_max_pad < 4 );
}
}
- $is_marginal = 1 if ( $is_marginal == 0 && $line_ending_fat_comma );
-
- # Turn off the "marginal match" flag in some cases...
- # A "marginal match" occurs when the alignment tokens agree
- # but there are differences in the other tokens (patterns).
- # If we leave the marginal match flag set, then the rule is that we
- # will align only if there are more than two lines in the group.
- # We will turn of the flag if we almost have a match
- # and either we have seen a good alignment token or we
- # just need a small pad (2 spaces) to fit. These rules are
- # the result of experimentation. Tokens which misaligned by just
- # one or two characters are annoying. On the other hand,
- # large gaps to less important alignment tokens are also annoying.
- if ( $is_marginal == 1
- && ( $saw_good_alignment || $max_pad < 3 ) )
+ if ( $pad < 0 ) { $pad = -$pad }
+ if ( $pad > $max_pad ) { $max_pad = $pad }
+ if ( $is_good_marginal_alignment{$raw_tok}
+ && !$line_ending_fat_comma )
{
- $is_marginal = 0;
- }
-
- # We will use the line endings to help decide on alignments...
- # See if the lines end with semicolons...
- my $sc_term0;
- my $sc_term1;
- if ( $jmax_0 < 1 || $jmax_1 < 1 ) {
-
- # shouldn't happen
+ $saw_good_alignment = 1;
}
else {
- my $pat0 = $rpatterns_0->[ $jmax_0 - 1 ];
- my $pat1 = $rpatterns_1->[ $jmax_1 - 1 ];
- $sc_term0 = $pat0 =~ /;b?$/;
- $sc_term1 = $pat1 =~ /;b?$/;
+ $jfirst_bad = $j unless defined($jfirst_bad);
}
-
- if ( !$is_marginal && !$sc_term0 ) {
-
- # First line of assignment should be semicolon terminated.
- # For example, do not align here:
- # $$href{-NUM_TEXT_FILES} = $$href{-NUM_BINARY_FILES} =
- # $$href{-NUM_DIRS} = 0;
- if ( $is_assignment{$raw_tokb} ) {
- $is_marginal = 1;
+ my $pat_0 = $rpatterns_0->[$j];
+ my $pat_1 = $rpatterns_1->[$j];
+ if ( $pat_0 ne $pat_1 && length($pat_0) eq length($pat_1) ) {
+ $pat_0 =~ tr/n/Q/;
+ $pat_1 =~ tr/n/Q/;
+ }
+ if ( $pat_0 ne $pat_1 ) {
+
+ # Flag this as a marginal match since patterns differ.
+ # Normally, we will not allow just two lines to match if
+ # marginal. But we can allow matching in some specific cases.
+
+ $jfirst_bad = $j if ( !defined($jfirst_bad) );
+ $is_marginal = 1 if ( $is_marginal == 0 );
+ if ( $raw_tok eq '=' ) {
+
+ # Here is an example of a marginal match:
+ # $done{$$op} = 1;
+ # $op = compile_bblock($op);
+ # The left tokens are both identifiers, but
+ # one accesses a hash and the other doesn't.
+ # We'll let this be a tentative match and undo
+ # it later if we don't find more than 2 lines
+ # in the group.
+ $is_marginal = 2;
}
}
+ }
- # Try to avoid some undesirable alignments of opening tokens
- # for example, the space between grep and { here:
- # return map { ( $_ => $_ ) }
- # grep { /$handles/ } $self->_get_delegate_method_list;
- $is_marginal ||=
- ( $raw_tokb eq '(' || $raw_tokb eq '{' )
- && $jmax_1 == 2
- && $sc_term0 ne $sc_term1;
-
- #---------------------------------------
- # return if this is not a marginal match
- #---------------------------------------
- if ( !$is_marginal ) {
- return ( $is_marginal, $imax_align );
- }
-
- # Undo the marginal match flag in certain cases,
-
- # Two lines with a leading equals-like operator are allowed to
- # align if the patterns to the left of the equals are the same.
- # For example the following two lines are a marginal match but have
- # the same left side patterns, so we will align the equals.
- # my $orig = my $format = "^<<<<< ~~\n";
- # my $abc = "abc";
- # But these have a different left pattern so they will not be
- # aligned
- # $xmldoc .= $`;
- # $self->{'leftovers'} .= "<bx-seq:seq" . $';
-
- # First line semicolon terminated but second not, usually ok:
- # my $want = "'ab', 'a', 'b'";
- # my $got = join( ", ",
- # map { defined($_) ? "'$_'" : "undef" }
- # @got );
- # First line not semicolon terminated, Not OK to match:
- # $$href{-NUM_TEXT_FILES} = $$href{-NUM_BINARY_FILES} =
- # $$href{-NUM_DIRS} = 0;
- my $pat0 = $rpatterns_0->[0];
- my $pat1 = $rpatterns_1->[0];
+ $is_marginal = 1 if ( $is_marginal == 0 && $line_ending_fat_comma );
+
+ # Turn off the "marginal match" flag in some cases...
+ # A "marginal match" occurs when the alignment tokens agree
+ # but there are differences in the other tokens (patterns).
+ # If we leave the marginal match flag set, then the rule is that we
+ # will align only if there are more than two lines in the group.
+ # We will turn of the flag if we almost have a match
+ # and either we have seen a good alignment token or we
+ # just need a small pad (2 spaces) to fit. These rules are
+ # the result of experimentation. Tokens which misaligned by just
+ # one or two characters are annoying. On the other hand,
+ # large gaps to less important alignment tokens are also annoying.
+ if ( $is_marginal == 1
+ && ( $saw_good_alignment || $max_pad < 3 ) )
+ {
+ $is_marginal = 0;
+ }
- #---------------------------------------------------------
- # Turn off the marginal flag for some types of assignments
- #---------------------------------------------------------
- if ( $is_assignment{$raw_tokb} ) {
+ # We will use the line endings to help decide on alignments...
+ # See if the lines end with semicolons...
+ my $sc_term0;
+ my $sc_term1;
+ if ( $jmax_0 < 1 || $jmax_1 < 1 ) {
- # undo marginal flag if first line is semicolon terminated
- # and leading patters match
- if ($sc_term0) { # && $sc_term1) {
- $is_marginal = $pat0 ne $pat1;
- }
+ # shouldn't happen
+ }
+ else {
+ my $pat0 = $rpatterns_0->[ $jmax_0 - 1 ];
+ my $pat1 = $rpatterns_1->[ $jmax_1 - 1 ];
+ $sc_term0 = $pat0 =~ /;b?$/;
+ $sc_term1 = $pat1 =~ /;b?$/;
+ }
+
+ if ( !$is_marginal && !$sc_term0 ) {
+
+ # First line of assignment should be semicolon terminated.
+ # For example, do not align here:
+ # $$href{-NUM_TEXT_FILES} = $$href{-NUM_BINARY_FILES} =
+ # $$href{-NUM_DIRS} = 0;
+ if ( $is_assignment{$raw_tokb} ) {
+ $is_marginal = 1;
}
- elsif ( $raw_tokb eq '=>' ) {
+ }
+
+ # Try to avoid some undesirable alignments of opening tokens
+ # for example, the space between grep and { here:
+ # return map { ( $_ => $_ ) }
+ # grep { /$handles/ } $self->_get_delegate_method_list;
+ $is_marginal ||=
+ ( $raw_tokb eq '(' || $raw_tokb eq '{' )
+ && $jmax_1 == 2
+ && $sc_term0 ne $sc_term1;
+
+ #---------------------------------------
+ # return if this is not a marginal match
+ #---------------------------------------
+ if ( !$is_marginal ) {
+ return ( $is_marginal, $imax_align );
+ }
+
+ # Undo the marginal match flag in certain cases,
+
+ # Two lines with a leading equals-like operator are allowed to
+ # align if the patterns to the left of the equals are the same.
+ # For example the following two lines are a marginal match but have
+ # the same left side patterns, so we will align the equals.
+ # my $orig = my $format = "^<<<<< ~~\n";
+ # my $abc = "abc";
+ # But these have a different left pattern so they will not be
+ # aligned
+ # $xmldoc .= $`;
+ # $self->{'leftovers'} .= "<bx-seq:seq" . $';
+
+ # First line semicolon terminated but second not, usually ok:
+ # my $want = "'ab', 'a', 'b'";
+ # my $got = join( ", ",
+ # map { defined($_) ? "'$_'" : "undef" }
+ # @got );
+ # First line not semicolon terminated, Not OK to match:
+ # $$href{-NUM_TEXT_FILES} = $$href{-NUM_BINARY_FILES} =
+ # $$href{-NUM_DIRS} = 0;
+ my $pat0 = $rpatterns_0->[0];
+ my $pat1 = $rpatterns_1->[0];
+
+ #---------------------------------------------------------
+ # Turn off the marginal flag for some types of assignments
+ #---------------------------------------------------------
+ if ( $is_assignment{$raw_tokb} ) {
- # undo marginal flag if patterns match
- $is_marginal = $pat0 ne $pat1 || $line_ending_fat_comma;
+ # undo marginal flag if first line is semicolon terminated
+ # and leading patters match
+ if ($sc_term0) { # && $sc_term1) {
+ $is_marginal = $pat0 ne $pat1;
}
- elsif ( $raw_tokb eq '=~' ) {
+ }
+ elsif ( $raw_tokb eq '=>' ) {
- # undo marginal flag if both lines are semicolon terminated
- # and leading patters match
- if ( $sc_term1 && $sc_term0 ) {
- $is_marginal = $pat0 ne $pat1;
- }
+ # undo marginal flag if patterns match
+ $is_marginal = $pat0 ne $pat1 || $line_ending_fat_comma;
+ }
+ elsif ( $raw_tokb eq '=~' ) {
+
+ # undo marginal flag if both lines are semicolon terminated
+ # and leading patters match
+ if ( $sc_term1 && $sc_term0 ) {
+ $is_marginal = $pat0 ne $pat1;
}
+ }
+ else {
+ ##ok: (none of the above)
+ }
- #-----------------------------------------------------
- # Turn off the marginal flag if we saw an 'if' or 'or'
- #-----------------------------------------------------
+ #-----------------------------------------------------
+ # Turn off the marginal flag if we saw an 'if' or 'or'
+ #-----------------------------------------------------
- # A trailing 'if' and 'or' often gives a good alignment
- # For example, we can align these:
- # return -1 if $_[0] =~ m/^CHAPT|APPENDIX/;
- # return $1 + 0 if $_[0] =~ m/^SECT(\d*)$/;
+ # A trailing 'if' and 'or' often gives a good alignment
+ # For example, we can align these:
+ # return -1 if $_[0] =~ m/^CHAPT|APPENDIX/;
+ # return $1 + 0 if $_[0] =~ m/^SECT(\d*)$/;
- # or
- # $d_in_m[2] = 29 if ( &Date_LeapYear($y) );
- # $d = $d_in_m[$m] if ( $d > $d_in_m[$m] );
+ # or
+ # $d_in_m[2] = 29 if ( &Date_LeapYear($y) );
+ # $d = $d_in_m[$m] if ( $d > $d_in_m[$m] );
- if ($saw_if_or) {
+ if ($saw_if_or) {
- # undo marginal flag if both lines are semicolon terminated
- if ( $sc_term0 && $sc_term1 ) {
- $is_marginal = 0;
- }
+ # undo marginal flag if both lines are semicolon terminated
+ if ( $sc_term0 && $sc_term1 ) {
+ $is_marginal = 0;
}
+ }
- # For a marginal match, only keep matches before the first 'bad' match
- if ( $is_marginal
- && defined($jfirst_bad)
- && $imax_align > $jfirst_bad - 1 )
- {
- $imax_align = $jfirst_bad - 1;
- }
+ # For a marginal match, only keep matches before the first 'bad' match
+ if ( $is_marginal
+ && defined($jfirst_bad)
+ && $imax_align > $jfirst_bad - 1 )
+ {
+ $imax_align = $jfirst_bad - 1;
+ }
- #----------------------------------------------------------
- # Allow sweep to match lines with leading '=' in some cases
- #----------------------------------------------------------
- if ( $imax_align < 0 && defined($j0_eq_pad) ) {
+ #----------------------------------------------------------
+ # Allow sweep to match lines with leading '=' in some cases
+ #----------------------------------------------------------
+ if ( $imax_align < 0 && defined($j0_eq_pad) ) {
- if (
+ if (
- # If there is a following line with leading equals, or
- # preceding line with leading equals, then let the sweep align
- # them without restriction. For example, the first two lines
- # here are a marginal match, but they are followed by a line
- # with leading equals, so the sweep-lr logic can align all of
- # the lines:
-
- # $date[1] = $month_to_num{ $date[1] }; # <--line_0
- # @xdate = split( /[:\/\s]/, $log->field('t') ); # <--line_1
- # $day = sprintf( "%04d/%02d/%02d", @date[ 2, 1, 0 ] );
- # $time = sprintf( "%02d:%02d:%02d", @date[ 3 .. 5 ] );
-
- # Likewise, if we reverse the two pairs we want the same result
-
- # $day = sprintf( "%04d/%02d/%02d", @date[ 2, 1, 0 ] );
- # $time = sprintf( "%02d:%02d:%02d", @date[ 3 .. 5 ] );
- # $date[1] = $month_to_num{ $date[1] }; # <--line_0
- # @xdate = split( /[:\/\s]/, $log->field('t') ); # <--line_1
-
- (
- $imax_next >= 0
- || $imax_prev >= 0
- || TEST_MARGINAL_EQ_ALIGNMENT
- )
- && $j0_eq_pad >= -$j0_max_pad
- && $j0_eq_pad <= $j0_max_pad
- )
- {
+ # If there is a following line with leading equals, or
+ # preceding line with leading equals, then let the sweep align
+ # them without restriction. For example, the first two lines
+ # here are a marginal match, but they are followed by a line
+ # with leading equals, so the sweep-lr logic can align all of
+ # the lines:
- # But do not do this if there is a comma before the '='.
- # For example, the first two lines below have commas and
- # therefore are not allowed to align with lines 3 & 4:
+ # $date[1] = $month_to_num{ $date[1] }; # <--line_0
+ # @xdate = split( /[:\/\s]/, $log->field('t') ); # <--line_1
+ # $day = sprintf( "%04d/%02d/%02d", @date[ 2, 1, 0 ] );
+ # $time = sprintf( "%02d:%02d:%02d", @date[ 3 .. 5 ] );
- # my ( $x, $y ) = $self->Size(); #<--line_0
- # my ( $left, $top, $right, $bottom ) = $self->Window(); #<--l_1
- # my $vx = $right - $left;
- # my $vy = $bottom - $top;
+ # Likewise, if we reverse the two pairs we want the same result
- if ( $rpatterns_0->[0] !~ /,/ && $rpatterns_1->[0] !~ /,/ ) {
- $imax_align = 0;
- }
+ # $day = sprintf( "%04d/%02d/%02d", @date[ 2, 1, 0 ] );
+ # $time = sprintf( "%02d:%02d:%02d", @date[ 3 .. 5 ] );
+ # $date[1] = $month_to_num{ $date[1] }; # <--line_0
+ # @xdate = split( /[:\/\s]/, $log->field('t') ); # <--line_1
+
+ (
+ $imax_next >= 0
+ || $imax_prev >= 0
+ || TEST_MARGINAL_EQ_ALIGNMENT
+ )
+ && $j0_eq_pad >= -$j0_max_pad
+ && $j0_eq_pad <= $j0_max_pad
+ )
+ {
+
+ # But do not do this if there is a comma before the '='.
+ # For example, the first two lines below have commas and
+ # therefore are not allowed to align with lines 3 & 4:
+
+ # my ( $x, $y ) = $self->Size(); #<--line_0
+ # my ( $left, $top, $right, $bottom ) = $self->Window(); #<--l_1
+ # my $vx = $right - $left;
+ # my $vy = $bottom - $top;
+
+ if ( $rpatterns_0->[0] !~ /,/ && $rpatterns_1->[0] !~ /,/ ) {
+ $imax_align = 0;
}
}
+ }
- return ( $is_marginal, $imax_align );
- } ## end sub is_marginal_match
-} ## end closure for sub is_marginal_match
+ return ( $is_marginal, $imax_align );
+} ## end sub is_marginal_match
sub get_extra_leading_spaces {
# lines of a list are back together again.
#----------------------------------------------------------
- return 0 unless ( @{$rlines} && @{$rgroups} );
+ return 0 if ( !@{$rlines} || !@{$rgroups} );
my $object = $rlines->[0]->{'indentation'};
- return 0 unless ( ref($object) );
+ return 0 if ( !ref($object) );
my $extra_leading_spaces = 0;
my $extra_indentation_spaces_wanted = get_recoverable_spaces($object);
- return ($extra_leading_spaces) unless ($extra_indentation_spaces_wanted);
+ return ($extra_leading_spaces) if ( !$extra_indentation_spaces_wanted );
my $min_spaces = $extra_indentation_spaces_wanted;
if ( $min_spaces > 0 ) { $min_spaces = 0 }
# a previous side comment should be forgotten. This involves
# checking several rules.
- # Return true to KEEP old comment location
- # Return false to FORGET old comment location
+ # Given:
+ # $line = ref to info hash for the line of interest
+ # $line_number = number of this line in the output stream
+ # $level = indentation level of this line
+ # $num5 = ..see comments below
+
+ # Return:
+ # true to KEEP old comment location
+ # false to FORGET old comment location
my $KEEP = 1;
my $FORGET = 0;
# '$num5' is the number of comments in the first 5 lines after the first
# comment. It is needed to keep a compact group of side comments from
# being influenced by a more distant side comment.
- $num5 = 1 unless ($num5);
+ $num5 = 1 if ( !$num5 );
# Some values:
return $FORGET
if ( $line_diff > $short_diff
- || !$self->[_rOpts_valign_side_comments_] );
+ || !$rOpts_valign_side_comments );
# RULE3: Forget a side comment if this line is at lower level and
# ends a block
my $line = $rlines->[$jj];
my $jmax = $line->{'jmax'};
my $sc_len = $line->{'rfield_lengths'}->[$jmax];
- next unless ($sc_len);
+ next if ( !$sc_len );
$num5++;
}
# Loop over the groups with side comments
my $column_limit;
- foreach my $ng (@todo) {
- my ( $jbeg, $jend ) = @{ $rgroups->[$ng] };
+ foreach my $ngr (@todo) {
+ my ( $jbeg, $jend_uu ) = @{ $rgroups->[$ngr] };
# Note that since all lines in a group have common alignments, we
# just have to work on one of the lines (the first line).
next if ( $jmax <= 0 );
# but if this doesn't work, give up and use the minimum space
- my $min_move = $self->[_rOpts_minimum_space_to_comment_] - 1;
+ my $min_move = $rOpts_minimum_space_to_comment - 1;
if ( $move > $avail ) {
$move = $min_move;
}
return;
} ## end sub align_side_comments
-###############################
-# CODE SECTION 6: Output Step A
-###############################
+###########################################
+# CODE SECTION 6: Pad Signed Number Columns
+###########################################
-sub valign_output_step_A {
+use constant DEBUG_VSN => 0;
- #------------------------------------------------------------
- # This is Step A in writing vertically aligned lines.
- # The line is prepared according to the alignments which have
- # been found. Then it is shipped to the next step.
- #------------------------------------------------------------
+my %is_leading_sign_pattern;
- my ( $self, $rinput_hash ) = @_;
+BEGIN {
- my $line = $rinput_hash->{line};
- my $min_ci_gap = $rinput_hash->{min_ci_gap};
- my $do_not_align = $rinput_hash->{do_not_align};
- my $group_leader_length = $rinput_hash->{group_leader_length};
- my $extra_leading_spaces = $rinput_hash->{extra_leading_spaces};
- my $level = $rinput_hash->{level};
- my $maximum_line_length = $rinput_hash->{maximum_line_length};
+ # PATTERNS: A pattern is basically the concatenation of all token types in
+ # the field, with keywords converted to their actual text. The formatter
+ # has changed things like 'print' to 'priNt' so that all 'n's are numbers.
+ # The following patterns 'n' can match a signed number of interest.
+ # Thus 'n'=a signed or unsigned number, 'b'=a space, '}'=one of ) ] }
+ my @q = ( 'n,', 'n,b', 'nb', 'nb}', 'nb},', 'n},', 'n};' );
- my $rfields = $line->{'rfields'};
- my $rfield_lengths = $line->{'rfield_lengths'};
- my $leading_space_count = $line->{'leading_space_count'};
- my $outdent_long_lines = $line->{'outdent_long_lines'};
- my $maximum_field_index = $line->{'jmax'};
- my $rvertical_tightness_flags = $line->{'rvertical_tightness_flags'};
- my $Kend = $line->{'Kend'};
- my $level_end = $line->{'level_end'};
+ @is_leading_sign_pattern{@q} = (1) x scalar(@q);
- # Check for valid hash keys at end of lifetime of $line during development
- DEVEL_MODE
- && check_keys( $line, \%valid_LINE_keys,
- "Checking line keys at valign_output_step_A", 1 );
+}
- # add any extra spaces
- if ( $leading_space_count > $group_leader_length ) {
- $leading_space_count += $min_ci_gap;
- }
+sub min_max_median {
+ my ($rvalues) = @_;
- my $str = $rfields->[0];
- my $str_len = $rfield_lengths->[0];
+ # Given: $rvalues = ref to an array of numbers
+ # Return: the min, max, and median
+ my $num = @{$rvalues};
+ return unless ($num);
- my @alignments = @{ $line->{'ralignments'} };
- if ( @alignments != $maximum_field_index + 1 ) {
+ my @sorted = sort { $a <=> $b } @{$rvalues};
- # Shouldn't happen: sub install_new_alignments makes jmax alignments
- my $jmax_alignments = @alignments - 1;
- if (DEVEL_MODE) {
- Fault(
-"alignment jmax=$jmax_alignments should equal $maximum_field_index\n"
- );
+ my $min = $sorted[0];
+ my $max = $sorted[-1];
+ my $imid = int( $num / 2 );
+ my $median =
+ @sorted % 2
+ ? $sorted[$imid]
+ : ( $sorted[ $imid - 1 ] + $sorted[$imid] ) / 2;
+
+ return ( $min, $max, $median );
+} ## end sub min_max_median
+
+sub end_signed_number_column {
+ my ( $rgroup_lines, $rcol_hash, $ix_last ) = @_;
+
+ # Finish formatting a column of unsigned numbers
+ # Given:
+ # $rgroup_lines - the current vertical alignment group of lines
+ # $rcol_hash - a hash of information about this vertical column
+ # $ix_last - index of the last line of this vertical column
+ # Task:
+ # If this is a mixture of signed and unsigned numbers, then add a
+ # single space before the unsigned numbers to improve appearance.
+ return unless ($rcol_hash);
+ my $jcol = $rcol_hash->{jcol};
+ my $unsigned = $rcol_hash->{unsigned_count};
+ my $signed = $rcol_hash->{signed_count};
+ my $rsigned_lines = $rcol_hash->{rsigned_lines};
+
+ if ( !$signed && $unsigned ) {
+ DEVEL_MODE
+ && Fault("avoid calling without mixed signed and unsigned\n");
+ return;
+ }
+
+ my $pos_start_number = $rcol_hash->{pos_start_number};
+ my $char_end_part1 = $rcol_hash->{char_end_part1};
+ my $ix_first = $rcol_hash->{ix_first};
+ my $nlines = $ix_last - $ix_first + 1;
+
+ # check for skipped lines, shouldn't happen
+ if ( $signed + $unsigned != $nlines ) {
+ my $line = $rgroup_lines->[$ix_last];
+ my $rfields = $line->{'rfields'};
+ my $text = join EMPTY_STRING, @{$rfields};
+ DEVEL_MODE && Fault(<<EOM);
+We seem to have miscounted lines, please check:
+signed=$signed
+j=$jcol
+unsigned=$unsigned
+ix_first=$ix_first
+ix_last=$ix_last
+nlines=$nlines
+text=$text
+EOM
+ return;
+ }
+
+ #-----------------------------------------------------------------
+ # Form groups of unsigned numbers from the list of signed numbers.
+ #-----------------------------------------------------------------
+ my @unsigned_subgroups;
+ my $ix_last_negative = $ix_first - 1;
+ my %is_signed;
+ foreach my $ix ( @{$rsigned_lines} ) {
+ $is_signed{$ix} = 1;
+ my $Nu = $ix - $ix_last_negative - 1;
+ if ( $Nu > 0 && $Nu <= $rOpts_valign_signed_numbers_limit ) {
+ push @unsigned_subgroups, [ $ix_last_negative + 1, $ix - 1 ];
}
- $do_not_align = 1;
+ $ix_last_negative = $ix;
}
- # loop to concatenate all fields of this line and needed padding
- my $total_pad_count = 0;
- for my $j ( 1 .. $maximum_field_index ) {
+ # Exclude groups with more than about 20 consecutive numbers. Little
+ # visual improvement is gained by padding more than this, and this avoids
+ # large numbers of differences in a file when a single line is changed.
+ my $Nu = $ix_last - $ix_last_negative;
+ if ( $Nu > 0 && $Nu <= $rOpts_valign_signed_numbers_limit ) {
+ push @unsigned_subgroups, [ $ix_last_negative + 1, $ix_last ];
+ }
- # skip zero-length side comments
- last
- if (
- ( $j == $maximum_field_index )
- && ( !defined( $rfields->[$j] )
- || ( $rfield_lengths->[$j] == 0 ) )
- );
+ if ( !@unsigned_subgroups ) { return } # shouldn't happen
+
+ #--------------------------------------------
+ # Find number lengths for irregularity checks
+ #--------------------------------------------
+ # Padding signed numbers looks best when the numbers, excluding signs,
+ # all have about the same length. When the lengths are irregular, with
+ # mostly longer unsigned numbers, it doesn't look good to do this. So
+ # we need to filter out these bad-looking cases.
+
+ # The 'field_lengths' are unreliable because they may include some
+ # arbitrary trailing text; see 'substr.t' So we must look for the end of
+ # the number at a space, comma, or closing container token. Note that these
+ # lengths include the length of any signs.
+ my @len_unsigned;
+ my @len_signed;
+ my @lengths;
+ foreach my $ix ( $ix_first .. $ix_last ) {
+ my $line = $rgroup_lines->[$ix];
+ my $rfield = $line->{'rfields'};
+ my $str = substr( $rfield->[$jcol], $pos_start_number );
+ if ( $str =~ /^([^\s\,\)\]\}]*)/ ) { $str = $1 }
+ my $len = length($str);
+ if ( $is_signed{$ix} ) { push @len_signed, $len }
+ else { push @len_unsigned, $len }
+ push @lengths, [ $len, $ix ];
+ }
- # compute spaces of padding before this field
- my $col = $alignments[ $j - 1 ]->{'column'};
- my $pad = $col - ( $str_len + $leading_space_count );
+ my ( $min_unsigned_length, $max_unsigned_length, $median_unsigned_length )
+ = min_max_median( \@len_unsigned );
+ my ( $min_signed_length_uu, $max_signed_length, $median_signed_length ) =
+ min_max_median( \@len_signed );
- if ($do_not_align) {
- $pad =
- ( $j < $maximum_field_index )
- ? 0
- : $self->[_rOpts_minimum_space_to_comment_] - 1;
- }
+ # Skip padding if no signed numbers exceed unsigned numbers in length
+ if ( $max_signed_length <= $min_unsigned_length ) {
+ return;
+ }
- # if the -fpsc flag is set, move the side comment to the selected
- # column if and only if it is possible, ignoring constraints on
- # line length and minimum space to comment
- if ( $self->[_rOpts_fixed_position_side_comment_]
- && $j == $maximum_field_index )
- {
- my $newpad =
- $pad + $self->[_rOpts_fixed_position_side_comment_] - $col - 1;
- if ( $newpad >= 0 ) { $pad = $newpad; }
- }
+ # If max signed length is greatest - all unsigned values can be padded
+ elsif ( $max_signed_length > $max_unsigned_length ) {
+
+ # Example:
+ # %wind_dir = (
+ # 'n' => [ 1, 0 ],
+ # 'ne' => [ 1, 1 ],
+ # 'e' => [ 0, 1 ],
+ # 'se' => [ -1, 1 ],
+ # 's' => [ -1, 0 ],
+ # 'sw' => [ -1, -1 ],
+ # 'w' => [ 0, -1 ],
+ # 'nw' => [ 1, -1 ],
+ # '' => [ 0, 0 ],
+ # );
+
+ # This is the ideal case - ok to continue and pad
+ }
- # accumulate the padding
- if ( $pad > 0 ) { $total_pad_count += $pad; }
+ # intermediate case: some signed numbers cannot be padded ...
+ else {
- # only add padding when we have a finite field;
- # this avoids extra terminal spaces if we have empty fields
- if ( $rfield_lengths->[$j] > 0 ) {
- $str .= SPACE x $total_pad_count;
- $str_len += $total_pad_count;
- $total_pad_count = 0;
- $str .= $rfields->[$j];
- $str_len += $rfield_lengths->[$j];
- }
- else {
- $total_pad_count = 0;
+ # We have to take a closer look.
+ # Here is an example which looks bad if we do padding like this:
+ # my %hash = (
+ # X0 => -12867.098241163,
+ # X1 => 2.31694338671684, # unsigned w/ excess>0
+ # X2 => 0.0597726714860419, # max length => excess=0
+ # Y0 => 30043.1335503155, # unsigned w/ excess>0
+ # Y1 => 0.0525784981597044, # max length => excess=0
+ # Y2 => -2.32447131600783,
+ # );
+
+ # To decide what looks okay, we count 'good' and 'bad' line interfaces:
+ # X0 - X1 = good (X0 is signed and X1 can move)
+ # X1 - X2 = bad (x1 can move but x2 cannot)
+ # X2 - Y0 = bad (x2 cannot move but Y0 can move)
+ # Y0 - Y1 = bad (Y0 can move but Y1 cannot move)
+ # Y1 - Y2 = bad (Y1 cannot move and Y2 is signed)
+ # Result: 4 bad interfaces and 1 good => so we will skip this
+ my $good_count = 0;
+ my $bad_count = 0;
+ foreach my $item (@lengths) {
+ $item->[0] = $max_unsigned_length - $item->[0];
}
- }
+ my $item0 = shift @lengths;
+ my ( $excess, $ix ) = @{$item0};
+ my $immobile_count = $excess ? 0 : 1;
+ foreach my $item (@lengths) {
+ my $excess_m = $excess;
+ my $ix_m = $ix;
+ ( $excess, $ix ) = @{$item};
+ if ( !$excess ) { $immobile_count++ }
+
+ if ( $is_signed{$ix_m} ) {
+
+ # signed-unsigned interface
+ if ( !$is_signed{$ix} ) {
+ if ($excess) { $good_count++ }
+ else { $bad_count++ }
+ }
- my $side_comment_length = $rfield_lengths->[$maximum_field_index];
+ # signed-signed: ok, not good or bad
+ }
+ else {
- # ship this line off
- $self->valign_output_step_B(
+ # unsigned-signed interface
+ if ( $is_signed{$ix} ) {
+ if ($excess_m) { $good_count++ }
+ else { $bad_count++ }
+ }
+
+ # unsigned-unsigned: bad if different
+ else {
+ if ( $excess_m xor $excess ) {
+ $bad_count++;
+ }
+ }
+ }
+ }
+
+ # Filter 1: skip if more interfaces are 'bad' than 'good'
+ if ( $bad_count > $good_count ) {
+ return;
+ }
+
+ # Filter 2: skip in a table with multiple 'bad' interfaces and where
+ # 'most' of the unsigned lengths are shorter than the signed lengths.
+ # Using the median value makes this insensitive to small changes.
+ if ( $median_unsigned_length >= $median_signed_length
+ && $bad_count > 1
+ && $immobile_count > 1 )
+ {
+ return;
+ }
+
+ # Anything that gets past these filters should look ok if padded
+ }
+
+ #---------------------------------------------
+ # Compute actual available space for each line
+ #---------------------------------------------
+ my %excess_space;
+ my $movable_count = 0;
+ foreach my $item (@unsigned_subgroups) {
+ my ( $ix_min, $ix_max ) = @{$item};
+ foreach my $ix ( $ix_min .. $ix_max ) {
+ my $line = $rgroup_lines->[$ix];
+ my $leading_space_count = $line->{'leading_space_count'};
+ my $jmax = $line->{'jmax'};
+ my $rfield_lengths = $line->{'rfield_lengths'};
+ if ( $jcol >= $jmax ) {
+
+ # shouldn't happen
+ DEVEL_MODE && Fault("jcol=$jcol >= jmax=$jmax\n");
+ return;
+ }
+ my @alignments = @{ $line->{'ralignments'} };
+ my $col = $alignments[$jcol]->{'column'};
+ my $col_start =
+ $jcol == 0
+ ? $leading_space_count
+ : $alignments[ $jcol - 1 ]->{'column'};
+ my $avail = $col - $col_start;
+ my $field_length = $rfield_lengths->[$jcol];
+ my $excess = $avail - $field_length;
+ $excess_space{$ix} = $excess;
+ if ( $excess > 0 ) { $movable_count++ }
+ }
+ }
+
+ return unless ($movable_count);
+
+ # Count the number of signed-unsigned interfaces that would change
+ # if we do the padding
+ my $Nc = 0;
+ foreach my $item (@unsigned_subgroups) {
+ my ( $ix_min, $ix_max ) = @{$item};
+ $Nc++ if ( $excess_space{$ix_min} > 0 && $ix_min != $ix_first );
+ $Nc++ if ( $excess_space{$ix_max} > 0 && $ix_max != $ix_last );
+ }
+
+ #--------------------------------------------------------------------
+ # Sparsity check:
+ # Give up if the number of interface changes will be below the cutoff
+ #--------------------------------------------------------------------
+ if ( $unsigned > $Nc * $rOpts_valign_signed_numbers_limit ) {
+ return;
+ }
+
+ #------------------------------------------------------------------------
+ # Insert an extra space before the unsigned numbers if space is available
+ #------------------------------------------------------------------------
+ foreach my $item (@unsigned_subgroups) {
+ my ( $ix_min, $ix_max ) = @{$item};
+
+ foreach my $ix ( $ix_min .. $ix_max ) {
+ next if ( $excess_space{$ix} <= 0 );
+ my $line = $rgroup_lines->[$ix];
+ my $rfields = $line->{'rfields'};
+ my $rfield_lengths = $line->{'rfield_lengths'};
+ pad_signed_field(
+ \$rfields->[$jcol], \$rfield_lengths->[$jcol],
+ $pos_start_number, $char_end_part1
+ );
+ }
+ }
+ return;
+} ## end sub end_signed_number_column
+
+sub pad_signed_field {
+ my ( $rstr, $rstr_len, $pos_start_number, $char_end_part1 ) = @_;
+
+ # Insert an extra space before a number to highlight algebraic signs
+ # in a column of numbers.
+ # Given:
+ # $rstr = ref to string
+ # $rstr_len = ref to display width of string (could include wide chars)
+ # $pos_start_number = string position of the leading digit
+ # $char_end_part1 = character at $pos_start_number - 1
+ # Task: update $rstr and $rstr_len with a single space
+
+ # First partition the string into $part1 and $part2, so that the
+ # number starts at the beginning of part2.
+ my $part1 = EMPTY_STRING;
+ my $part2 = ${$rstr};
+ my $str = ${$rstr};
+ if ( $pos_start_number > 0 ) {
+ my $len = length($str);
+ if ( $pos_start_number >= $len ) {
+ DEVEL_MODE && Fault(<<EOM);
+Expection position '$pos_start_number' < length $len of string '$str'
+EOM
+ return;
+ }
+ $part1 = substr( $str, 0, $pos_start_number );
+ $part2 = substr( $str, $pos_start_number );
+
+ # VERIFY that we are inserting a new space after either
+ # (1) an existing space or
+ # (2) an opening token.
+ # Otherwise disaster can occur. An error here implies a programming
+ # error in defining '$pos_start_number'.
+
+ my $test_char1 = substr( $part1, -1, 1 );
+ if ( $test_char1 ne $char_end_part1 ) {
+ DEVEL_MODE && Fault(<<EOM);
+Expecting '$char_end_part1' but saw '$test_char1' in string '$str'
+Probably bad position '$pos_start_number'
+EOM
+ return;
+ }
+ }
+
+ # VERIFY we are inserting a space before a digit character
+ my $test_char2 = substr( $part2, 0, 1 );
+ if ( $is_digit_char{$test_char2} ) {
+ ${$rstr} = $part1 . SPACE . $part2;
+ ${$rstr_len} += 1;
+ }
+ else {
+ DEVEL_MODE && Fault(<<EOM);
+Expecting test char2 as leading digit but saw '$test_char2' in string '$str'
+May be bad position '$pos_start_number'
+EOM
+ }
+ return;
+} ## end sub pad_signed_field
+
+sub split_field {
+ my ( $pat1, $field, $pattern ) = @_;
+
+ # Given;
+ # $pat1 = first part of a pattern before a numeric type 'n'
+ # $field = corresponding text field
+ # $pattern = full pattern
+ # Return:
+ # $pos_start_number = positiion in $field where the number should start
+ # = 0 if cannot find
+ # $char_end_part1 = the character preceding $pos_start_number
+ # $ch_opening = the preceding opening container character, if any
+
+ # We have to find where the possible number starts in this field.
+ # The safe thing to do is return @fail if anything does not look right.
+
+ my $pos_start_number = 0;
+ my $char_end_part1 = EMPTY_STRING;
+ my $ch_opening = EMPTY_STRING;
+ my @fail = ( $pos_start_number, $char_end_part1, $ch_opening );
+
+ # Be sure there is just 'n' in the pattern. Multiple terms can occur
+ # when fields are joined, but since we are jumping into the middle
+ # of a field it is safest not to try to handle them.
+ my $n_count = ( $pattern =~ tr/n/n/ );
+ if ( $n_count && $n_count > 1 ) {
+ return @fail;
+ }
+
+ # Same thing for commas
+ my $comma_count = ( $pattern =~ tr/,/,/ );
+ if ( $comma_count && $comma_count > 1 ) {
+ return @fail;
+ }
+
+ # Require 0 or 1 braces
+ my $len_field = length($field);
+ my $len_pat1 = length($pat1);
+ return @fail unless ( $len_pat1 && $len_field );
+
+ # Look at the pattern ending
+ my $ending_b = 0;
+ my $ch = substr( $pat1, -1, 1 );
+ if ( $ch eq 'b' ) {
+ $ending_b = 1;
+ $ch = substr( $pat1, -2, 1 );
+ $char_end_part1 = SPACE;
+ }
+
+ # handle either '{b' or '{'
+ if ( $ch eq '{' ) {
+
+ # Only one brace
+ my $brace_count = ( $pat1 =~ tr/\{/\{/ );
+ return @fail if ( $brace_count != 1 );
+
+ my $i_paren = index( $field, '(' );
+ my $i_bracket = index( $field, '[' );
+ my $i_brace = index( $field, '{' );
+ my $i_opening = length($field);
+ if ( $i_paren >= 0 ) {
+ $i_opening = $i_paren;
+ $ch_opening = '(';
+ }
+ if ( $i_bracket >= 0
+ && $i_bracket < $i_opening )
+ {
+ $i_opening = $i_bracket;
+ $ch_opening = '[';
+ }
+ if ( $i_brace >= 0 && $i_brace < $i_opening ) {
+ $i_opening = $i_brace;
+ $ch_opening = '{';
+ }
+ if ( $i_opening >= 0
+ && $i_opening < length($field) - 1 )
+ {
+ $pos_start_number = $i_opening + 1 + $ending_b;
+ $char_end_part1 = $ch_opening
+ if ( !$ending_b );
+ }
+ else {
+ # strange - could not find the opening token
+ }
+ }
+
+ # no braces: maybe '=>b'
+ else {
+
+ # looking for patterns ending in '=b' or '=>b'
+ if ( !$ending_b ) { return @fail }
+
+ # find the = in the text
+ my $pos_equals = index( $field, '=' );
+ return @fail if ( $pos_equals < 0 );
+
+ # be sure there are no other '=' in the pattern
+ my $equals_count = ( $pat1 =~ tr/=/=/ );
+ return @fail if ( $equals_count != 1 );
+
+ if ( $len_pat1 >= 2 && substr( $pat1, -2, 2 ) eq '=b' ) {
+ $pos_start_number = $pos_equals + 2;
+ }
+ elsif ( $len_pat1 >= 3 && substr( $pat1, -3, 3 ) eq '=>b' ) {
+ $pos_start_number = $pos_equals + 3;
+ }
+ else {
+
+ # cannot handle this pattern
+ return @fail;
+ }
+ }
+
+ if ( $pos_start_number <= 0 || $pos_start_number >= $len_field ) {
+ return @fail;
+ }
+
+ return ( $pos_start_number, $char_end_part1, $ch_opening );
+} ## end sub split_field
+
+sub field_matches_end_pattern {
+ my ( $field2, $pat2 ) = @_;
+
+ # Check that a possible numeric field matches the ending pattern
+
+ # Given:
+ # $field2 = the rest of the field after removing any sign
+ # $pat2 = the end pattern of this field
+ # Return:
+ # false if field is definitely non-numeric
+ # true otherwise
+
+ my $next_char = substr( $pat2, 1, 1 );
+ my $field2_trim = EMPTY_STRING;
+
+ # if pattern is one of: 'n,', 'n,b'
+ if ( $next_char eq ',' ) {
+ my $icomma = index( $field2, ',' );
+ if ( $icomma >= 0 ) {
+ $field2_trim = substr( $field2, 0, $icomma );
+ }
+ }
+
+ # if pattern is one of: 'nb', 'nb}', 'nb},'
+ elsif ( $next_char eq 'b' ) {
+ my $ispace = index( $field2, SPACE );
+ if ( $ispace >= 0 ) {
+ $field2_trim = substr( $field2, 0, $ispace );
+ }
+ }
+
+ # if pattern is one of 'n},', 'n};'
+ elsif ( $next_char eq '}' ) {
+ if ( $field2 =~ /^([^\)\}\]]+)/ ) {
+ $field2_trim = $1;
+ }
+ }
+
+ # unrecognized pattern
+ else {
+ DEVEL_MODE && Fault(<<EOM);
+Unexpected ending pattern '$pat2' next='$next_char' field2='$field2'
+The hash 'is_leading_sign_pattern' seems to have changed but the code
+has not been updated to handle it. Please fix.
+EOM
+ return;
+ }
+
+ if ( !length($field2_trim) ) {
+ DEVEL_MODE
+ && Fault(
+ "STRANGE: cannot find end of field=$field2 for pat=$pat2 \n");
+ return;
+ }
+
+ # Reject obviously non-numeric fields just to be sure we did not
+ # jump into a quote of some kind
+ if ( $field2_trim !~ /^[\d\.\+\-abcdefpx_]+$/i ) {
+ DEBUG_VSN
+ && print {*STDERR}
+"Rejecting match to pat2='$pat2' with next=$next_char field2=$field2 trimmed='$field2_trim'\n";
+ return;
+ }
+ return 1;
+} ## end sub field_matches_end_pattern
+
+sub pad_signed_number_columns {
+ my ($rgroup_lines) = @_;
+
+ # Given:
+ # $rgroup_lines = the current vertical alignment group of lines
+ # Task:
+ # Look for columns of aligned numeric values, some of whose numbers
+ # have algebraic signs. Add a leading space to the unsigned
+ # numbers, if possible, so that the just the signs appear as the first
+ # character. Example of what we want to do:
+
+ # my @correct = (
+ # [ 123456.79, 86753090000.868, 11 ],
+ # [ -123456.79, -86753090000.868, -11 ],
+ # [ 123456.001, 80.080, 10 ],
+ # [ -123456.001, -80.080, 0 ],
+ # [ 10.9, 10.9, 11 ],
+ # );
+
+ # The logic here is complex because we are working with bits of text
+ # which have been broken into patterns which are convenient for the
+ # vertical aligner, but we no longer have the original tokenization
+ # which would have indicated the precise bounds of numbers. So we
+ # have to procede very carefully with lots of checks. There are
+ # more checks than really necessary now because originally numbers
+ # and quotes were both indicated with pattern 'Q'. But now numbers are
+ # uniquely marked as pattern 'n', so there is less risk of an error.
+ # The extra checks take very little time so they are retained.
+
+ return unless ($rOpts_valign_signed_numbers);
+
+ my %column_info;
+ my @columns;
+
+ #----------------
+ # loop over lines
+ #----------------
+ my $ix_line = -1;
+ my $jmax = -1;
+ foreach my $line ( @{$rgroup_lines} ) {
+ $ix_line++;
+ my $jmax_last = $jmax;
+ $jmax = $line->{'jmax'};
+ my $jmax_change = $jmax ne $jmax_last;
+
+ my @alignments = @{ $line->{'ralignments'} };
+ my $rfields = $line->{'rfields'};
+ my $rpatterns = $line->{'rpatterns'};
+ my $rtokens = $line->{'rtokens'};
+
+ #-----------------------------------------------
+ # Check for a reduction in the number of columns
+ #-----------------------------------------------
+ if ( $jmax < $jmax_last ) {
+
+ foreach my $jcol ( keys %column_info ) {
+
+ # end any stranded columns on the right
+ next if ( $jcol < $jmax );
+ my $rcol_hash = $column_info{$jcol};
+ next unless ($rcol_hash);
+ if ( $rcol_hash->{signed_count}
+ && $rcol_hash->{unsigned_count} )
+ {
+ end_signed_number_column( $rgroup_lines, $rcol_hash,
+ $ix_line - 1 );
+ }
+ delete $column_info{$jcol};
+ }
+
+ # Try to keep the end data column running; test case 'rfc.in'
+ # The last item in a list will still need a trailing comma.
+ my $jcol = $jmax - 1;
+ if ( $jcol >= 0 && $column_info{$jcol} ) {
+ my $alignment = $alignments[$jcol];
+ my $old_col = $columns[$jcol];
+ my $col = $alignment->{column};
+
+ if (
+ $col < $old_col
+
+ # only do this if the text has a leading digit
+ && $rfields->[$jcol] =~ /^([+-]?)\d/
+
+ # and a signed number has been seen - issue c375
+ && ( $1 || $column_info{$jcol}->{signed_count} )
+ )
+ {
+ my $spaces_needed = $old_col - $col;
+ my $spaces_available =
+ $line->get_available_space_on_right();
+ if ( $spaces_available >= $spaces_needed ) {
+ $line->increase_field_width( $jcol, $spaces_needed );
+ }
+ }
+ }
+ }
+
+ #--------------------------------------------
+ # Loop over fields except last (side comment)
+ #--------------------------------------------
+ for my $jcol ( 0 .. $jmax - 1 ) {
+
+ #-----------------------------------------
+ # Decide if this is a new alignment column
+ #-----------------------------------------
+ my $alignment = $alignments[$jcol];
+ my $old_col = $columns[$jcol];
+ my $col = $alignment->{column};
+ $columns[$jcol] = $col;
+ if ( defined($old_col) && $old_col != $col ) {
+ foreach my $jcol_old ( keys %column_info ) {
+ next if ( $jcol_old < $jcol );
+ my $rcol_hash = $column_info{$jcol_old};
+ if ( $rcol_hash->{signed_count}
+ && $rcol_hash->{unsigned_count} )
+ {
+ end_signed_number_column( $rgroup_lines, $rcol_hash,
+ $ix_line - 1 );
+ }
+ delete $column_info{$jcol_old};
+ }
+ }
+
+ # A new padded sign column can only start at an alignment change
+ my $rcol_hash = $column_info{$jcol};
+
+ #------------------------------------------------------------
+ # Examine this field, looking for signed and unsigned numbers
+ #------------------------------------------------------------
+ my $field = $rfields->[$jcol];
+ my $pattern = $rpatterns->[$jcol];
+
+ my $is_signed_number = 0;
+ my $is_unsigned_number = 0;
+
+ #--------------------------------------------------------
+ # set $pos_start_number = index in field of digit or sign
+ #--------------------------------------------------------
+ my $pos_start_number = 0;
+ my $char_end_part1 = EMPTY_STRING;
+ my $ch_opening = EMPTY_STRING;
+
+ # Set $field_ok to false on encountering any problem
+ # Do not pad signed and unsigned hash keys
+ my $field_ok = length($field) > 0
+ && substr( $rtokens->[$jcol], 0, 2 ) ne '=>';
+
+ if ( $field_ok && $pattern ) {
+
+ # Split the pattern at the first 'n'
+ # $pat1 = pattern before the 'n' (if any)
+ # $pat2 = pattern starting at the 'n'
+ my ( $pat1, $pat2 );
+ my $posq = index( $pattern, 'n' );
+ if ( $posq < 0 ) {
+ $field_ok = 0;
+ }
+ else {
+ # Just look at up to 3 of the pattern characters
+ # We require $pat2 to have one of the known patterns
+ $pat1 = substr( $pattern, 0, $posq );
+ $pat2 = substr( $pattern, $posq, 3 );
+ $field_ok = $is_leading_sign_pattern{$pat2};
+ }
+
+ if ($field_ok) {
+
+ # If the number starts within the field then we must
+ # find its offset position.
+ if ($pat1) {
+
+ # Note: an optimization would be to remember previous
+ # calls for each column and use them if possible, but
+ # benchmarking shows that this is not necessary.
+ # See .ba54 for example coding.
+ ( $pos_start_number, $char_end_part1, $ch_opening ) =
+ split_field( $pat1, $field, $pattern );
+
+ $field_ok ||= $pos_start_number;
+ }
+
+ if ($field_ok) {
+
+ # look for an optional + or - sign
+ my $test_char = substr( $field, $pos_start_number, 1 );
+ my $sign;
+ if ( $is_plus_or_minus{$test_char} ) {
+ $sign = $test_char;
+ $test_char =
+ substr( $field, $pos_start_number + 1, 1 );
+ }
+
+ # and a digit
+ if ( $is_digit_char{$test_char} ) {
+ my $field2;
+ if ($sign) {
+ $is_signed_number = 1;
+ $field2 =
+ substr( $field, $pos_start_number + 1 );
+ }
+ else {
+ $is_unsigned_number = 1;
+ $field2 =
+ $pos_start_number
+ ? substr( $field, $pos_start_number )
+ : $field;
+ }
+
+ # Check for match to ending pattern
+ $field_ok =
+ field_matches_end_pattern( $field2, $pat2 );
+ }
+ else {
+ $field_ok = 0;
+ }
+ }
+ }
+ }
+
+ #----------------------
+ # Figure out what to do
+ #----------------------
+
+ # we require a signed or unsigned number field
+ # which is not a hash key
+ $field_ok &&= ( $is_signed_number || $is_unsigned_number );
+
+ # if a column has not started..
+ if ( !$rcol_hash ) {
+
+ # give up if this is cannot start a new column
+ next if ( !$field_ok );
+
+ # otherwise continue on to start a new column
+
+ }
+
+ # if a column has been started...
+ else {
+
+ # and this cannot be added to it
+ if ( !$field_ok
+ || $rcol_hash->{pos_start_number} ne $pos_start_number
+ || $rcol_hash->{char_end_part1} ne $char_end_part1
+ || $rcol_hash->{col} ne $col )
+ {
+
+ # then end the current column and start over
+ if ( $rcol_hash->{signed_count}
+ && $rcol_hash->{unsigned_count} )
+ {
+ end_signed_number_column( $rgroup_lines, $rcol_hash,
+ $ix_line - 1 );
+ }
+ delete $column_info{$jcol};
+ $rcol_hash = undef;
+ }
+ }
+
+ if (DEBUG_VSN) {
+ my $exists = defined($rcol_hash);
+ print
+"VSN: line=$ix_line change=$jmax_change jcol=$jcol field=$field exists?=$exists unsigned?=$is_unsigned_number signed?=$is_signed_number\n";
+ }
+
+ #---------------------------------------
+ # Either start a new column, if possible
+ #---------------------------------------
+ if ( !defined($rcol_hash) ) {
+
+ next if ( !$field_ok );
+
+ my $rsigned_lines = $is_signed_number ? [$ix_line] : [];
+ $column_info{$jcol} = {
+ unsigned_count => $is_unsigned_number,
+ signed_count => $is_signed_number,
+ pos_start_number => $pos_start_number,
+ char_end_part1 => $char_end_part1,
+ ix_first => $ix_line,
+ col => $col,
+ jcol => $jcol,
+ rsigned_lines => $rsigned_lines,
+ };
+ }
+
+ #------------------------------
+ # or extend the existing column
+ #------------------------------
+ else {
+ $rcol_hash->{unsigned_count} += $is_unsigned_number;
+ $rcol_hash->{signed_count} += $is_signed_number;
+ if ($is_signed_number) {
+ push @{ $rcol_hash->{rsigned_lines} }, $ix_line;
+ }
+ }
+ }
+ }
+
+ #-------------------------------------
+ # Loop to finish any remaining columns
+ #-------------------------------------
+ foreach my $jcol ( keys %column_info ) {
+ my $rcol_hash = $column_info{$jcol};
+ if ( $rcol_hash->{signed_count} && $rcol_hash->{unsigned_count} ) {
+ end_signed_number_column( $rgroup_lines, $rcol_hash, $ix_line );
+ }
+ }
+ return;
+} ## end sub pad_signed_number_columns
+
+#########################################
+# CODE SECTION 7: Pad Wide Equals Columns
+#########################################
+
+use constant DEBUG_WEC => 0;
+
+sub end_wide_equals_column {
+ my ( $rgroup_lines, $rcol_hash, $ix_last ) = @_;
+
+ # Finish formatting a column of wide equals
+ # Given:
+ # $rgroup_lines - the current vertical alignment group of lines
+ # $rcol_hash - a hash of information about this vertical column
+ # $ix_last - index of the last line of this vertical column
+
+ return unless ($rcol_hash);
+ my $jcol = $rcol_hash->{jcol};
+ my $col = $rcol_hash->{col};
+ my $min_width = $rcol_hash->{min_width};
+ my $max_width = $rcol_hash->{max_width};
+ my $rwidths = $rcol_hash->{rwidths};
+ my $ix_first = $rcol_hash->{ix_first};
+
+ # check for skipped lines, shouldn't happen
+ my $nlines = $ix_last - $ix_first + 1;
+ my $num = @{$rwidths};
+ if ( $num != $nlines ) {
+ my $line = $rgroup_lines->[$ix_last];
+ my $rfields = $line->{'rfields'};
+ my $text = join EMPTY_STRING, @{$rfields};
+ DEVEL_MODE && Fault(<<EOM);
+We seem to have miscounted lines, please check:
+nlines=$nlines
+num saved=$num
+min width=$min_width
+max width=$max_width
+j=$jcol
+ix_first=$ix_first
+ix_last=$ix_last
+text=$text
+EOM
+ return;
+ }
+
+ #------------------------------------------------------
+ # loop over all lines of this vertical alignment column
+ #------------------------------------------------------
+
+ my (
+ $current_alignment, $starting_colp,
+ $current_line, @previous_linked_lines
+ );
+ foreach my $item ( @{$rwidths} ) {
+ my ( $ix, $width ) = @{$item};
+ my $line = $rgroup_lines->[$ix];
+
+ # add leading spaces to the shorter equality tokens to get
+ # vertical alignment of the '=' signs
+ my $jmax = $line->{'jmax'};
+ my $jcolp = $jcol + 1;
+
+ my @alignments = @{ $line->{'ralignments'} };
+ my $alignment = $alignments[$jcolp];
+ my $colp = $alignment->{column};
+
+ #------------------------------------------------------------
+ # Transfer column width changes between equivalent alignments
+ #------------------------------------------------------------
+
+ # This step keeps alignments to the right correct in case the
+ # alignment object changes but the actual alignment col does not.
+ # It is extremely rare for this to occur. Issue c353.
+
+ # nothing to do if no more real alignments on right
+ if ( $jcolp >= $jmax - 1 ) {
+ $current_alignment = undef;
+ $current_line = undef;
+ @previous_linked_lines = ();
+ }
+
+ # handle new rhs alignment
+ elsif ( !$current_alignment ) {
+ $current_alignment = $alignment;
+ $current_line = $line;
+ $starting_colp = $colp;
+ @previous_linked_lines = ();
+ }
+
+ # handle change in existing alignment
+ elsif ( refaddr($alignment) != refaddr($current_alignment) ) {
+
+ # change rhs alignment column - new vertical group on right
+ if ( $starting_colp != $colp ) {
+ $starting_colp = $colp;
+ @previous_linked_lines = ();
+ }
+ else {
+
+ # Same starting alignment col on right, but different alignment
+ # object. See if we must increase width of this new alignment
+ # object.
+ my $current_colp = $current_alignment->{column};
+ if ( $current_colp > $colp ) {
+ my $excess = $current_colp - $colp;
+ my $padding_available =
+ $line->get_available_space_on_right();
+ if ( $excess <= $padding_available ) {
+ $line->increase_field_width( $jcolp, $excess );
+ $colp = $alignment->{column};
+ }
+ }
+
+ # remember the previous line in case we have to go back and
+ # increasse its width
+ push @previous_linked_lines, $current_line;
+ }
+ $current_alignment = $alignment;
+ $current_line = $line;
+ }
+ else {
+ # continuing with same alignment
+ }
+
+ #-----------------------
+ # add any needed padding
+ #-----------------------
+ my $pad = $max_width - $width;
+ if ( $pad > 0 ) {
+
+ my $rfields = $line->{'rfields'};
+ my $rfield_lengths = $line->{'rfield_lengths'};
+
+ my $lenp = $rfield_lengths->[$jcolp];
+ my $avail = $colp - $col;
+ my $excess = $lenp + $pad - $avail;
+
+ if ( $excess > 0 ) {
+
+ my $padding_available = $line->get_available_space_on_right();
+ if ( $excess <= $padding_available ) {
+ $line->increase_field_width( $jcolp, $excess );
+
+ # Increase space of any previous linked lines
+ foreach my $line_prev (@previous_linked_lines) {
+ $padding_available =
+ $line_prev->get_available_space_on_right();
+ if ( $excess <= $padding_available ) {
+ $line_prev->increase_field_width( $jcolp, $excess );
+ }
+ else {
+ last;
+ }
+ }
+ }
+ else {
+ $pad = 0;
+ }
+
+ }
+
+ # Add spaces
+ $rfields->[$jcolp] = ( SPACE x $pad ) . $rfields->[$jcolp];
+ $rfield_lengths->[$jcolp] += $pad;
+ }
+ }
+ return;
+} ## end sub end_wide_equals_column
+
+sub pad_wide_equals_columns {
+ my ($rgroup_lines) = @_;
+
+ # Given:
+ # $rgroup_lines = the current vertical alignment group of lines
+ # Task:
+ # Look for columns of aligned equals tokens, some of which may be
+ # things like '-=', '&&=', etc. Increase the field length of the
+ # previous field by 1 or 2 spaces where necessary and possible so
+ # that alignment of all '=' occurs. For example, given
+
+ # $j /= 2;
+ # $pow2 = $pow2 * $pow2;
+
+ # In this case we want to add a leading space '=' term to get
+ # $j /= 2;
+ # $pow2 = $pow2 * $pow2;
+
+ # The logic here is somewhat similar to sub pad_signed_number_columns
+
+ return unless ($rOpts_valign_wide_equals);
+
+ my %column_info;
+ my @columns;
+
+ #----------------
+ # loop over lines
+ #----------------
+ my $ix_line = -1;
+ my $jmax = -1;
+ foreach my $line ( @{$rgroup_lines} ) {
+ $ix_line++;
+ my $jmax_last = $jmax;
+ $jmax = $line->{'jmax'};
+ my $jmax_change = $jmax ne $jmax_last;
+
+ my @alignments = @{ $line->{'ralignments'} };
+ my $rfields = $line->{'rfields'};
+ my $rtokens = $line->{'rtokens'};
+
+ #-----------------------------------------------
+ # Check for a reduction in the number of columns
+ #-----------------------------------------------
+ if ( $jmax < $jmax_last ) {
+
+ foreach my $jcol ( keys %column_info ) {
+
+ # end any stranded columns on the right
+ next if ( $jcol < $jmax );
+ my $rcol_hash = $column_info{$jcol};
+ next unless ($rcol_hash);
+ if ( $rcol_hash->{max_width} > $rcol_hash->{min_width} ) {
+ end_wide_equals_column( $rgroup_lines, $rcol_hash,
+ $ix_line - 1 );
+ }
+ delete $column_info{$jcol};
+ }
+ }
+
+ #--------------------------------------------------
+ # Loop over fields except last field (side comment)
+ #--------------------------------------------------
+ for my $jcol ( 0 .. $jmax - 1 ) {
+
+ #-----------------------------------------
+ # Decide if this is a new alignment column
+ #-----------------------------------------
+ my $alignment = $alignments[$jcol];
+ my $old_col = $columns[$jcol];
+ my $col = $alignment->{column};
+ $columns[$jcol] = $col;
+ if ( defined($old_col) && $old_col != $col ) {
+ foreach my $jcol_old ( keys %column_info ) {
+ next if ( $jcol_old < $jcol );
+ my $rcol_hash = $column_info{$jcol_old};
+ if ( $rcol_hash->{max_width} > $rcol_hash->{min_width} ) {
+ end_wide_equals_column( $rgroup_lines, $rcol_hash,
+ $ix_line - 1 );
+ }
+ delete $column_info{$jcol_old};
+ }
+ }
+
+ # A new wide equals column can only start at an alignment change
+ my $rcol_hash = $column_info{$jcol};
+
+ #------------------------------------------------------
+ # Examine this field, looking for equals or wide equals
+ #------------------------------------------------------
+ my $field_next = $rfields->[ $jcol + 1 ];
+ my $token = $rtokens->[$jcol];
+
+ # See if this is an equals alignment group;
+ # indicated by alignment token of '=' followed by a digit
+ my $len_equals_symbol = 0;
+ if ( length($token) > 1
+ && substr( $token, 0, 1 ) eq '='
+ && $is_digit_char{ substr( $token, 1, 1 ) } )
+ {
+
+ # find the actual equality symbol which starts the next field
+ # i.e. '=' or '**=' or '-=' etc. We just need its length.
+ my $pos = index( $field_next, '=' );
+ if ( $pos >= 0 && $pos <= 2 ) {
+ $len_equals_symbol = $pos + 1;
+ }
+ }
+
+ # if a column has not started..
+ if ( !$rcol_hash ) {
+
+ # give up if this is cannot start a new column
+ next if ( !$len_equals_symbol );
+
+ # otherwise continue on to start a new column
+
+ }
+
+ # if a column has been started...
+ else {
+
+ # and this cannot be added to it
+ if ( !$len_equals_symbol || $rcol_hash->{col} ne $col ) {
+
+ # then end the current column and start over
+ if ( $rcol_hash->{max_width} > $rcol_hash->{min_width} ) {
+ end_wide_equals_column( $rgroup_lines, $rcol_hash,
+ $ix_line - 1 );
+ }
+ delete $column_info{$jcol};
+ $rcol_hash = undef;
+ }
+ }
+
+ if (DEBUG_WEC) {
+ my $exists = defined($rcol_hash);
+ print
+"WEA: line=$ix_line change=$jmax_change jcol=$jcol field=$field_next exists?=$exists equals?=$len_equals_symbol\n";
+ }
+
+ #---------------------------------------
+ # Either start a new column, if possible
+ #---------------------------------------
+ if ( !defined($rcol_hash) ) {
+
+ next if ( !$len_equals_symbol );
+
+ $column_info{$jcol} = {
+ ix_first => $ix_line,
+ col => $col,
+ jcol => $jcol,
+ min_width => $len_equals_symbol,
+ max_width => $len_equals_symbol,
+ rwidths => [ [ $ix_line, $len_equals_symbol ] ],
+ };
+ }
+
+ #------------------------------
+ # or extend the existing column
+ #------------------------------
+ else {
+ if ( $len_equals_symbol > $rcol_hash->{max_width} ) {
+ $rcol_hash->{max_width} = $len_equals_symbol;
+ }
+ if ( $len_equals_symbol < $rcol_hash->{min_width} ) {
+ $rcol_hash->{min_width} = $len_equals_symbol;
+ }
+ push @{ $rcol_hash->{rwidths} },
+ [ $ix_line, $len_equals_symbol ];
+ }
+ }
+ }
+
+ #-------------------------------------
+ # Loop to finish any remaining columns
+ #-------------------------------------
+ foreach my $jcol ( keys %column_info ) {
+ my $rcol_hash = $column_info{$jcol};
+ if ( $rcol_hash->{max_width} > $rcol_hash->{min_width} ) {
+ end_wide_equals_column( $rgroup_lines, $rcol_hash, $ix_line );
+ }
+ }
+ return;
+} ## end sub pad_wide_equals_columns
+
+###############################
+# CODE SECTION 8: Output Step A
+###############################
+
+sub valign_output_step_A {
+
+ #------------------------------------------------------------
+ # This is Step A in writing vertically aligned lines.
+ # The line is prepared according to the alignments which have
+ # been found. Then it is shipped to the next step.
+ #------------------------------------------------------------
+
+ my ( $self, $rinput_hash ) = @_;
+
+ my $line = $rinput_hash->{line};
+ my $min_ci_gap = $rinput_hash->{min_ci_gap};
+ my $do_not_align = $rinput_hash->{do_not_align};
+ my $group_leader_length = $rinput_hash->{group_leader_length};
+ my $extra_leading_spaces = $rinput_hash->{extra_leading_spaces};
+ my $level = $rinput_hash->{level};
+ my $maximum_line_length = $rinput_hash->{maximum_line_length};
+
+ my $rfields = $line->{'rfields'};
+ my $rfield_lengths = $line->{'rfield_lengths'};
+ my $leading_space_count = $line->{'leading_space_count'};
+ my $outdent_long_lines = $line->{'outdent_long_lines'};
+ my $maximum_field_index = $line->{'jmax'};
+ my $rvertical_tightness_flags = $line->{'rvertical_tightness_flags'};
+ my $Kend = $line->{'Kend'};
+ my $level_end = $line->{'level_end'};
+
+ # Check for valid hash keys at end of lifetime of $line during development
+ DEVEL_MODE
+ && check_keys( $line, \%valid_LINE_keys,
+ "Checking line keys at valign_output_step_A", 1 );
+
+ # add any extra spaces
+ if ( $leading_space_count > $group_leader_length ) {
+ $leading_space_count += $min_ci_gap;
+ }
+
+ my $str = $rfields->[0];
+ my $str_len = $rfield_lengths->[0];
+
+ my @alignments = @{ $line->{'ralignments'} };
+ if ( @alignments != $maximum_field_index + 1 ) {
+
+ # Shouldn't happen: sub install_new_alignments makes jmax alignments
+ my $jmax_alignments = @alignments - 1;
+ if (DEVEL_MODE) {
+ Fault(
+"alignment jmax=$jmax_alignments should equal $maximum_field_index\n"
+ );
+ }
+ $do_not_align = 1;
+ }
+
+ # loop to concatenate all fields of this line and needed padding
+ my $total_pad_count = 0;
+ for my $j ( 1 .. $maximum_field_index ) {
+
+ # skip zero-length side comments
+ last
+ if (
+ ( $j == $maximum_field_index )
+ && ( !defined( $rfields->[$j] )
+ || ( $rfield_lengths->[$j] == 0 ) )
+ );
+
+ # compute spaces of padding before this field
+ my $col = $alignments[ $j - 1 ]->{'column'};
+ my $pad = $col - ( $str_len + $leading_space_count );
+
+ if ($do_not_align) {
+ $pad =
+ ( $j < $maximum_field_index )
+ ? 0
+ : $rOpts_minimum_space_to_comment - 1;
+ }
+
+ # if the -fpsc flag is set, move the side comment to the selected
+ # column if and only if it is possible, ignoring constraints on
+ # line length and minimum space to comment
+ if ( $rOpts_fixed_position_side_comment
+ && $j == $maximum_field_index )
+ {
+ my $newpad = $pad + $rOpts_fixed_position_side_comment - $col - 1;
+ if ( $newpad >= 0 ) { $pad = $newpad; }
+ }
+
+ # accumulate the padding
+ if ( $pad > 0 ) { $total_pad_count += $pad; }
+
+ # only add padding when we have a finite field;
+ # this avoids extra terminal spaces if we have empty fields
+ if ( $rfield_lengths->[$j] > 0 ) {
+ $str .= SPACE x $total_pad_count;
+ $str_len += $total_pad_count;
+ $total_pad_count = 0;
+ $str .= $rfields->[$j];
+ $str_len += $rfield_lengths->[$j];
+ }
+ else {
+ $total_pad_count = 0;
+ }
+ }
+
+ my $side_comment_length = $rfield_lengths->[$maximum_field_index];
+
+ # ship this line off
+ $self->valign_output_step_B(
{
leading_space_count => $leading_space_count + $extra_leading_spaces,
line => $str,
sub combine_fields {
+ my ( $line_0, $line_1, $imax_align ) = @_;
+
+ # Given:
+ # $line_0, $line_1 = two adjacent lines
+ # $imax_align = index of last alignment wanted
+
+ # Task:
# We have a group of two lines for which we do not want to align tokens
# between index $imax_align and the side comment. So we will delete fields
# between $imax_align and the side comment. Alignments have already
# been set so we have to adjust them.
- my ( $line_0, $line_1, $imax_align ) = @_;
-
if ( !defined($imax_align) ) { $imax_align = -1 }
# First delete the unwanted tokens
my $jmax_old = $line_0->{'jmax'};
my @idel = ( $imax_align + 1 .. $jmax_old - 2 );
- return unless (@idel);
+ return if ( !@idel );
# Get old alignments before any changes are made
my @old_alignments = @{ $line_0->{'ralignments'} };
sub get_output_line_number {
+ # Return the output line number to external modules.
# The output line number reported to a caller =
# the number of items still in the buffer +
# the number of items written.
- return $_[0]->group_line_count() +
- $_[0]->[_file_writer_object_]->get_output_line_number();
+ my $self = shift;
+ return $self->group_line_count() +
+ $self->[_file_writer_object_]->get_output_line_number();
} ## end sub get_output_line_number
###############################
-# CODE SECTION 7: Output Step B
+# CODE SECTION 9: Output Step B
###############################
{ ## closure for sub valign_output_step_B
my ( $self, $rinput, $leading_string, $leading_string_length ) = @_;
+ # handle a cached line ..
+ # either append the current line to it or write it out
+
# The cached line will either be:
# - passed along to step_C, or
# - or combined with the current line
my $level_end = $rinput->{level_end};
my $maximum_line_length = $rinput->{maximum_line_length};
- my ( $open_or_close, $opening_flag, $closing_flag, $seqno, $valid,
- $seqno_beg, $seqno_end );
+ my ( $open_or_close, $seqno_beg );
if ($rvertical_tightness_flags) {
$open_or_close = $rvertical_tightness_flags->{_vt_type};
}
}
return ( $str, $str_length, $leading_string, $leading_string_length,
- $leading_space_count, $level, $maximum_line_length, );
+ $leading_space_count, $level, $maximum_line_length );
} ## end sub handle_cached_line
my $outdent_long_lines = $rinput->{outdent_long_lines};
my $rvertical_tightness_flags = $rinput->{rvertical_tightness_flags};
my $level = $rinput->{level};
- my $level_end = $rinput->{level_end};
- my $Kend = $rinput->{Kend};
- my $maximum_line_length = $rinput->{maximum_line_length};
+## my $level_end = $rinput->{level_end};
+ my $Kend = $rinput->{Kend};
+ my $maximum_line_length = $rinput->{maximum_line_length};
# Useful -gcs test cases for wide characters are
# perl527/(method.t.2, reg_mesg.t, mime-header.t)
$self->[_last_outdented_line_at_] = $last_outdented_line_at;
my $outdented_line_count = $self->[_outdented_line_count_];
- unless ($outdented_line_count) {
+ if ( !$outdented_line_count ) {
$self->[_first_outdented_line_at_] =
$last_outdented_line_at;
}
} ## end sub valign_output_step_B
}
-###############################
-# CODE SECTION 8: Output Step C
-###############################
+################################
+# CODE SECTION 10: Output Step C
+################################
{ ## closure for sub valign_output_step_C
}
###############################
-# CODE SECTION 9: Output Step D
+# CODE SECTION 11: Output Step D
###############################
-sub valign_output_step_D {
+sub add_leading_tabs {
- #----------------------------------------------------------------
- # This is Step D in writing vertically aligned lines.
- # It is the end of the vertical alignment pipeline.
- # Write one vertically aligned line of code to the output object.
- #----------------------------------------------------------------
+ my ( $line, $leading_space_count, $level ) = @_;
- my ( $self, $line, $leading_space_count, $level, $Kend ) = @_;
+ # Convert leading whitespace to use tabs if -et or -t are set
+
+ # Given:
+ # $line = the line of text to be written, without any tabs
+ # $leading_whitespace = expected number of leading blank spaces
+ # $level = indentation level (needed for -t)
- # The line is currently correct if there is no tabbing (recommended!)
- # We may have to lop off some leading spaces and replace with tabs.
- if ( $leading_space_count > 0 ) {
+ # Return:
+ # $line = the line with possible leading tabs
- my $rOpts_indent_columns = $self->[_rOpts_indent_columns_];
- my $rOpts_tabs = $self->[_rOpts_tabs_];
- my $rOpts_entab_leading_whitespace =
- $self->[_rOpts_entab_leading_whitespace_];
+ my $trimmed_line = $line;
+ $trimmed_line =~ s/^ [^\S\n]+ //gxm;
- # Nothing to do if no tabs
- if ( !( $rOpts_tabs || $rOpts_entab_leading_whitespace )
- || $rOpts_indent_columns <= 0 )
- {
+ # Check for discrepancy in actual leading white spaces with estimate
+ if ( length($line) != length($trimmed_line) + $leading_space_count ) {
- # nothing to do
+ # If $leading_space_count is zero, then this routine must not
+ # be called because we might be in a quote of some kind
+ if ( $leading_space_count <= 0 ) {
+ DEVEL_MODE && Fault(<<EOM);
+should not be here with leading space count = $leading_space_count
+EOM
+ return $line;
}
- # Handle entab option
- elsif ($rOpts_entab_leading_whitespace) {
+ my $leading_space_count_test = length($line) - length($trimmed_line);
- # Patch 12-nov-2018 based on report from Glenn. Extra padding was
- # not correctly entabbed, nor were side comments: Increase leading
- # space count for a padded line to get correct tabbing
- if ( $line =~ /^(\s+)(.*)$/ ) {
- my $spaces = length($1);
- if ( $spaces > $leading_space_count ) {
- $leading_space_count = $spaces;
- }
- }
+ # Skip tabbing if actual whitespace is less than expected
+ if ( $leading_space_count_test < $leading_space_count ) {
+ DEBUG_TABS
+ && warning(<<EOM);
+Error entabbing: expected count=$leading_space_count but only found $leading_space_count_test for line:
+'$line'
+EOM
+ return $line;
+ }
- my $space_count =
- $leading_space_count % $rOpts_entab_leading_whitespace;
- my $tab_count =
- int( $leading_space_count / $rOpts_entab_leading_whitespace );
- my $leading_string = "\t" x $tab_count . SPACE x $space_count;
- if ( $line =~ /^\s{$leading_space_count,$leading_space_count}/ ) {
- substr( $line, 0, $leading_space_count ) = $leading_string;
- }
- else {
+ # Use actual whitespace if it exceeds prediction. This mainly
+ # occurs at hanging side comments.
+ $leading_space_count = $leading_space_count_test;
+ }
- # shouldn't happen - program error counting whitespace
- # - skip entabbing
- DEBUG_TABS
- && warning(
-"Error entabbing in valign_output_step_D: expected count=$leading_space_count\n"
- );
- }
- }
+ #----------------------------------
+ # Handle --entab-leading-whitespace
+ #----------------------------------
+ if ($rOpts_entab_leading_whitespace) {
+
+ my $space_count =
+ $leading_space_count % $rOpts_entab_leading_whitespace;
+ my $tab_count =
+ int( $leading_space_count / $rOpts_entab_leading_whitespace );
+ my $leading_string = "\t" x $tab_count . SPACE x $space_count;
+ $line = $leading_string . $trimmed_line;
+ }
- # Handle option of one tab per level
- else {
- my $leading_string = ( "\t" x $level );
- my $space_count =
- $leading_space_count - $level * $rOpts_indent_columns;
+ #-----------------------------------------------
+ # Handle -t (one tab per level; not recommended)
+ #-----------------------------------------------
+ elsif ( $rOpts_tabs && $level ) {
- # shouldn't happen:
- if ( $space_count < 0 ) {
+ my $leading_string = ( "\t" x $level );
+ my $space_count = $leading_space_count - $level * $rOpts_indent_columns;
- # But it could be an outdented comment
- if ( $line !~ /^\s*#/ ) {
- DEBUG_TABS
- && warning(
-"Error entabbing in valign_output_step_D: for level=$level count=$leading_space_count\n"
- );
- }
- $leading_string = ( SPACE x $leading_space_count );
- }
- else {
- $leading_string .= ( SPACE x $space_count );
- }
- if ( $line =~ /^\s{$leading_space_count,$leading_space_count}/ ) {
- substr( $line, 0, $leading_space_count ) = $leading_string;
- }
- else {
+ # shouldn't happen:
+ if ( $space_count < 0 ) {
- # shouldn't happen - program error counting whitespace
- # we'll skip entabbing
+ # But it could be an outdented comment
+ if ( $line !~ /^\s*#/ ) {
DEBUG_TABS
&& warning(
-"Error entabbing in valign_output_step_D: expected count=$leading_space_count\n"
+"Error entabbing in valign_output_step_D: for level=$level count=$leading_space_count\n"
);
}
+ $leading_string = ( SPACE x $leading_space_count );
}
- }
- my $file_writer_object = $self->[_file_writer_object_];
- $file_writer_object->write_code_line( $line . "\n", $Kend );
-
- return;
-} ## end sub valign_output_step_D
-
-{ ## closure for sub get_leading_string
-
- my @leading_string_cache;
-
- sub initialize_leading_string_cache {
- @leading_string_cache = ();
- return;
- }
-
- sub get_leading_string {
-
- # define the leading whitespace string for this line..
- my ( $self, $leading_whitespace_count, $group_level ) = @_;
-
- # Handle case of zero whitespace, which includes multi-line quotes
- # (which may have a finite level; this prevents tab problems)
- if ( $leading_whitespace_count <= 0 ) {
- return EMPTY_STRING;
+ else {
+ $leading_string .= ( SPACE x $space_count );
}
+ $line = $leading_string . $trimmed_line;
+ }
- # look for previous result
- elsif ( $leading_string_cache[$leading_whitespace_count] ) {
- return $leading_string_cache[$leading_whitespace_count];
+ # nothing to do; we should have skipped a call to this sub
+ else {
+ if (DEVEL_MODE) {
+ Fault(
+"in tab sub but neither -t nor -et set: check flag 'require_tabs'\n"
+ );
}
+ }
+ return $line;
+} ## end sub add_leading_tabs
- # must compute a string for this number of spaces
- my $leading_string;
-
- # Handle simple case of no tabs
- my $rOpts_indent_columns = $self->[_rOpts_indent_columns_];
- my $rOpts_tabs = $self->[_rOpts_tabs_];
- my $rOpts_entab_leading_whitespace =
- $self->[_rOpts_entab_leading_whitespace_];
+sub valign_output_step_D {
- if ( !( $rOpts_tabs || $rOpts_entab_leading_whitespace )
- || $rOpts_indent_columns <= 0 )
- {
- $leading_string = ( SPACE x $leading_whitespace_count );
- }
+ #----------------------------------------------------------------
+ # This is Step D in writing vertically aligned lines.
+ # It is the end of the vertical alignment pipeline.
+ # Write one vertically aligned line of code to the output object.
+ #----------------------------------------------------------------
- # Handle entab option
- elsif ($rOpts_entab_leading_whitespace) {
- my $space_count =
- $leading_whitespace_count % $rOpts_entab_leading_whitespace;
- my $tab_count = int(
- $leading_whitespace_count / $rOpts_entab_leading_whitespace );
- $leading_string = "\t" x $tab_count . SPACE x $space_count;
- }
+ my ( $self, $line, $leading_space_count, $level, $Kend ) = @_;
- # Handle option of one tab per level
- else {
- $leading_string = ( "\t" x $group_level );
- my $space_count =
- $leading_whitespace_count - $group_level * $rOpts_indent_columns;
+ # Convert leading whitespace to use tabs if requested.
+ if ( $require_tabs && $leading_space_count > 0 ) {
+ $line = add_leading_tabs( $line, $leading_space_count, $level );
+ }
- # shouldn't happen:
- if ( $space_count < 0 ) {
- DEBUG_TABS
- && warning(
-"Error in get_leading_string: for level=$group_level count=$leading_whitespace_count\n"
- );
+ my $file_writer_object = $self->[_file_writer_object_];
+ $file_writer_object->write_code_line( $line . "\n", $Kend );
- # -- skip entabbing
- $leading_string = ( SPACE x $leading_whitespace_count );
- }
- else {
- $leading_string .= ( SPACE x $space_count );
- }
- }
- $leading_string_cache[$leading_whitespace_count] = $leading_string;
- return $leading_string;
- } ## end sub get_leading_string
-} ## end get_leading_string
+ return;
+} ## end sub valign_output_step_D
##########################
-# CODE SECTION 10: Summary
+# CODE SECTION 12: Summary
##########################
sub report_anything_unusual {
}
return;
} ## end sub report_anything_unusual
+
+} ## end package Perl::Tidy::VerticalAligner
1;
# on a single column being aligned
#
#####################################################################
+
package Perl::Tidy::VerticalAligner::Alignment;
use strict;
use warnings;
-{ #<<< A non-indenting brace
-
-our $VERSION = '20230309';
+our $VERSION = '20250105';
sub new {
my ( $class, $rarg ) = @_;
return if ( $AUTOLOAD =~ /\bDESTROY$/ );
my ( $pkg, $fname, $lno ) = caller();
my $my_package = __PACKAGE__;
- print STDERR <<EOM;
+ print {*STDERR} <<EOM;
======================================================================
Error detected in package '$my_package', version $VERSION
Received unexpected AUTOLOAD call for sub '$AUTOLOAD'
-Called from package: '$pkg'
+Called from package: '$pkg'
Called from File '$fname' at line '$lno'
This error is probably due to a recent programming change
======================================================================
EOM
exit 1;
-}
+} ## end sub AUTOLOAD
sub DESTROY {
}
sub get_column {
- return $_[0]->{'column'};
+ my $self = shift;
+ return $self->{'column'};
}
sub increment_column {
- $_[0]->{'column'} += $_[1];
-
+ my ( $self, $pad ) = @_;
+ $self->{'column'} += $pad;
return;
}
sub save_column {
- $_[0]->{'saved_column'} = $_[0]->{'column'};
+ my $self = shift;
+ $self->{'saved_column'} = $self->{'column'};
return;
}
sub restore_column {
- $_[0]->{'column'} = $_[0]->{'saved_column'};
+ my $self = shift;
+ $self->{'column'} = $self->{'saved_column'};
return;
}
-} ## end of package VerticalAligner::Alignment
1;
package Perl::Tidy::VerticalAligner::Line;
use strict;
use warnings;
+
+our $VERSION = '20250105';
use English qw( -no_match_vars );
-our $VERSION = '20230309';
sub AUTOLOAD {
return if ( $AUTOLOAD =~ /\bDESTROY$/ );
my ( $pkg, $fname, $lno ) = caller();
my $my_package = __PACKAGE__;
- print STDERR <<EOM;
+ print {*STDERR} <<EOM;
======================================================================
Error detected in package '$my_package', version $VERSION
Received unexpected AUTOLOAD call for sub '$AUTOLOAD'
-Called from package: '$pkg'
+Called from package: '$pkg'
Called from File '$fname' at line '$lno'
This error is probably due to a recent programming change
======================================================================
exit 1;
} ## end sub AUTOLOAD
-{
+sub DESTROY {
- # Constructor may be called as a class method
- sub new {
- my ( $class, $ri ) = @_;
- my $self = bless $ri, $class;
- return $self;
- }
+ # required to avoid call to AUTOLOAD in some versions of perl
+}
- sub get_column {
- my ( $self, $j ) = @_;
- my $alignment = $self->{ralignments}->[$j];
- return unless defined($alignment);
- return $alignment->get_column();
- } ## end sub get_column
+# Constructor may be called as a class method
+sub new {
+ my ( $class, $ri ) = @_;
+ my $self = bless $ri, $class;
+ return $self;
+}
- sub current_field_width {
- my ( $self, $j ) = @_;
- my $col_j = 0;
- my $col_jm = 0;
+sub get_column {
+ my ( $self, $j ) = @_;
+ my $alignment = $self->{ralignments}->[$j];
+ return unless defined($alignment);
+ return $alignment->get_column();
+} ## end sub get_column
- my $alignment_j = $self->{ralignments}->[$j];
- $col_j = $alignment_j->get_column() if defined($alignment_j);
+sub current_field_width {
+ my ( $self, $j ) = @_;
- if ( $j > 0 ) {
- my $alignment_jm = $self->{ralignments}->[ $j - 1 ];
- $col_jm = $alignment_jm->get_column() if defined($alignment_jm);
- }
- return $col_j - $col_jm;
- } ## end sub current_field_width
-
- sub increase_field_width {
-
- my ( $self, $j, $pad ) = @_;
- my $jmax = $self->{jmax};
- foreach ( $j .. $jmax ) {
- my $alignment = $self->{ralignments}->[$_];
- if ( defined($alignment) ) {
- $alignment->increment_column($pad);
- }
- }
- return;
- } ## end sub increase_field_width
+ # Return number of columns of space between alignments $j and $j-1
+
+ my $alignment_j = $self->{ralignments}->[$j];
+ my $col_j = defined($alignment_j) ? $alignment_j->get_column() : 0;
+ return $col_j if ( $j == 0 );
- sub get_available_space_on_right {
- my $jmax = $_[0]->{jmax};
- return $_[0]->{maximum_line_length} - $_[0]->get_column($jmax);
+ my $alignment_jm = $self->{ralignments}->[ $j - 1 ];
+ my $col_jm = defined($alignment_jm) ? $alignment_jm->get_column() : 0;
+ return $col_j - $col_jm;
+
+} ## end sub current_field_width
+
+sub increase_field_width {
+
+ my ( $self, $j, $pad ) = @_;
+
+ # Increase the width of alignment field $j by $pad spaces
+ my $jmax = $self->{jmax};
+ foreach ( $j .. $jmax ) {
+ my $alignment = $self->{ralignments}->[$_];
+ if ( defined($alignment) ) {
+ $alignment->increment_column($pad);
+ }
}
-}
+ return;
+} ## end sub increase_field_width
+sub get_available_space_on_right {
+ my $self = shift;
+ my $jmax = $self->{jmax};
+ return $self->{maximum_line_length} - $self->get_column($jmax);
+}
1;
#!/usr/bin/env perl
use strict;
use Getopt::Long;
+use IO::File;
my $usage = <<EOM;
my @modules = qw(
lib/Perl/Tidy.pm
lib/Perl/Tidy/Debugger.pm
- lib/Perl/Tidy/DevNull.pm
lib/Perl/Tidy/Diagnostics.pm
lib/Perl/Tidy/FileWriter.pm
lib/Perl/Tidy/Formatter.pm
lib/Perl/Tidy/IOScalar.pm
lib/Perl/Tidy/IOScalarArray.pm
lib/Perl/Tidy/IndentationItem.pm
- lib/Perl/Tidy/LineBuffer.pm
- lib/Perl/Tidy/LineSink.pm
- lib/Perl/Tidy/LineSource.pm
lib/Perl/Tidy/Logger.pm
lib/Perl/Tidy/Tokenizer.pm
lib/Perl/Tidy/VerticalAligner.pm
# then copy all modules
my $changed_count;
foreach my $module (@modules) {
+ my @non_ascii;
+ my $lno=0;
my $fh_module;
open( $fh_module, '<', $module )
or die "can't open my module file '$module' : $!\n";
while (<$fh_module>) {
+ $lno++;
last if /^\s*__END__\s*$/;
my $line = $_;
if ( $Opts{'D'}
}
$fh_out->print($line) unless $line =~ /^use Perl::Tidy/;
+
+ if ( $line =~ /[^[:ascii:]]/g ) {
+ my $pos = pos($line);
+ push @non_ascii, [ $lno, $pos ];
+ }
}
$fh_module->close();
+
+ if (@non_ascii) {
+ my $num = @non_ascii;
+ my ( $lno_first, $pos_first ) = @{ $non_ascii[0] };
+ print STDERR <<EOM;
+==============================================================================
+Warning: Found $num non-ascii characters in module: $module
+First at line $lno_first near character $pos_first
+Please avoid non-ascii characters in the perltidy source.
+==============================================================================
+EOM
+ }
}
# then, copy the rest of the script except for the 'use PerlTidy' statement
my $output;
my $stderr_string;
my $errorfile_string;
-my $params = "";
+# -ssp=2 is needed to keep formatting unchanged with new -ssp parameter
+my $params = "-ssp=2";
my $err = Perl::Tidy::perltidy(
#argv => '-npro', # fix for RT#127679, avoid reading unwanted .perltidyrc
? $d1
: $d2
: ($e) ? $e1
- : $e2
+ : $e2
: ($f) ? ($g)
? $g1
: $g2
sub a::that {
$p't'u = "wwoo\n";
- return sub { print $p't'u}
+ return sub { print $p't'u }
}
$a'that = a'that();
$a'that->(); # print "wwoo"
# with -kgb, do no break after last my
sub next_sibling {
my $self = shift;
- my $parent = $_PARENT{ refaddr $self} or return '';
+ my $parent = $_PARENT{ refaddr $self } or return '';
my $key = refaddr $self;
my $elements = $parent->{children};
my $position = List::MoreUtils::firstidx {
sub next_sibling {
my $self = shift;
- my $parent = $_PARENT{ refaddr $self} or return '';
+ my $parent = $_PARENT{ refaddr $self } or return '';
my $key = refaddr $self;
my $elements = $parent->{children};
my $position = List::MoreUtils::firstidx {
{
'track.id' => { -ident => 'none_search.id' },
}
-)->as_query;
+ )->as_query;
#6...........
},
-msc=10 -dbc -dp
----------
'comments3' => <<'----------',
-# testing --maximum-consecutive-blank-lines=2 and --indent-spaced-block-comments --no-format-skipping
--mbl=2 -isbc -nfs
+--maximum-consecutive-blank-lines=2 # -mbl=2
+--indent-spaced-block-comments # -isbc
+--no-format-skipping # -nfs
+--ignore-perlcritic-comments # -ipc
----------
'comments4' => <<'----------',
# testing --keep-old-blank-lines=2 [=all] and
1, 4, 6, 4, 1,);
#>>
+ local $Test::Builder::Level = $Test::Builder::Level + 1; ## no critic (Variables::ProhibitPackageVars)
+
+ ## no critic (ValuesAndExpressions::RequireInterpolationOfMetachars)
+
# some blank lines follow
#>>
+local $Test::Builder::Level = $Test::Builder::Level + 1; ## no critic (Variables::ProhibitPackageVars)
+
+## no critic (ValuesAndExpressions::RequireInterpolationOfMetachars)
+
# some blank lines follow
=pod
my @list = ( 1, 1, 1, 1, 2, 1, 1, 3, 3, 1, 1, 4, 6, 4, 1, );
+local $Test::Builder::Level = $Test::Builder::Level + 1; ## no critic (Variables::ProhibitPackageVars)
+
__END__
#>>
+local $Test::Builder::Level =
+ $Test::Builder::Level + 1; ## no critic (Variables::ProhibitPackageVars)
+
+## no critic (ValuesAndExpressions::RequireInterpolationOfMetachars)
+
# some blank lines follow
1, 4, 6, 4, 1,);
#>>
+local $Test::Builder::Level = $Test::Builder::Level + 1; ## no critic (Variables::ProhibitPackageVars)
+
+## no critic (ValuesAndExpressions::RequireInterpolationOfMetachars)
+
# some blank lines follow
#>>
+local $Test::Builder::Level = $Test::Builder::Level + 1; ## no critic (Variables::ProhibitPackageVars)
+
+## no critic (ValuesAndExpressions::RequireInterpolationOfMetachars)
+
# some blank lines follow
=pod
# This single line should break into multiple lines, even with -l=0
# sub 'tight_paren_follows' should break the do block
$body =
- SOAP::Data->name('~V:Fault')->attr( { 'xmlns' => $SOAP::Constants::NS_ENV } )
+ SOAP::Data->name('~V:Fault')
+ ->attr( { 'xmlns' => $SOAP::Constants::NS_ENV } )
->value(
\SOAP::Data->set_value(
SOAP::Data->name(
# git22: Preserve function signature on a single line
# This behavior is controlled by 'sub weld_signature_parens'
-sub foo ( $x, $y = "abcd" ) {
+sub foo( $x, $y = "abcd" ) {
$x . $y;
}
# do not break after closing do brace
-sub foo ( $x, $y = do { {} }, $z = 42, $w = do { "abcd" } ) {
+sub foo( $x, $y = do { {} }, $z = 42, $w = do { "abcd" } ) {
$x . $y . $z;
}
# testing --delete-side-comments and --nostatic-block-comments
-dsc -nsbc
----------
- 'csc1' => "-csc -csci=2 -ncscb",
+ 'csc1' => "-csc -csci=2 -ncscb -cscxl=asub",
'csc2' => "-dcsc",
'def' => "",
'iob' => "-iob",
1, 4, 6, 4, 1,);
#>>
+ local $Test::Builder::Level = $Test::Builder::Level + 1; ## no critic (Variables::ProhibitPackageVars)
+
+ ## no critic (ValuesAndExpressions::RequireInterpolationOfMetachars)
+
# some blank lines follow
print( $_[0], "\n" );
}
} ## end sub message
+
+ my $message =sub {
+ if ( !defined( $_[0] ) ) {
+ print("Hello, World\n");
+ }
+ else {
+ print( $_[0], "\n" );
+ }
+ };
----------
'iob' => <<'----------',
#>>
+local $Test::Builder::Level = $Test::Builder::Level + 1;
+
+## no critic (ValuesAndExpressions::RequireInterpolationOfMetachars)
+
# some blank lines follow
=pod
print( $_[0], "\n" );
} ## end else [ if ( !defined( $_[0] ))
} ## end sub message
+
+ my $message = sub {
+ if ( !defined( $_[0] ) ) {
+ print("Hello, World\n");
+ } ## end if ( !defined( $_[0] ))
+ else {
+ print( $_[0], "\n" );
+ } ## end else [ if ( !defined( $_[0] ))
+ };
#9...........
},
print( $_[0], "\n" );
}
}
+
+ my $message = sub {
+ if ( !defined( $_[0] ) ) {
+ print("Hello, World\n");
+ }
+ else {
+ print( $_[0], "\n" );
+ }
+ };
#10...........
},
print( $_[0], "\n" );
}
} ## end sub message
+
+ my $message = sub {
+ if ( !defined( $_[0] ) ) {
+ print("Hello, World\n");
+ }
+ else {
+ print( $_[0], "\n" );
+ }
+ };
#11...........
},
|| $self->{MIH_DefaultType}
|| 'text/plain' );
+if (1) { ... }
+
# Padding can also remove spaces; here the space after the '(' is lost:
-elsif ( $statement_type =~ /^sub\b/
+elsif ($statement_type =~ /^sub\b/
|| $paren_type[$paren_depth] =~ /^sub\b/ )
+{
+}
----------
'outdent' => <<'----------',
|| $self->{MIH_DefaultType}
|| 'text/plain' );
+if (1) { ... }
+
# Padding can also remove spaces; here the space after the '(' is lost:
elsif ($statement_type =~ /^sub\b/
|| $paren_type[$paren_depth] =~ /^sub\b/ )
+{
+}
#19...........
},
};
|| $self->{MIH_DefaultType}
|| 'text/plain' );
+if (1) { ... }
+
# Padding can also remove spaces; here the space after the '(' is lost:
-elsif ( $statement_type =~ /^sub\b/
+elsif ($statement_type =~ /^sub\b/
|| $paren_type[$paren_depth] =~ /^sub\b/ )
+{
+}
----------
'nib' => <<'----------',
|| $self->{MIH_DefaultType}
|| 'text/plain' );
+if (1) { ... }
+
# Padding can also remove spaces; here the space after the '(' is lost:
elsif ( $statement_type =~ /^sub\b/
|| $paren_type[$paren_depth] =~ /^sub\b/ )
+{
+}
#1...........
},
--code-skipping-end='#>>V'
----------
'def' => "",
- 'drc' => "-drc",
+ 'drc' => "-ndrc",
'git106' => "-xlp -gnu -xci",
'git108' => "-wn -wfc",
'git93' => <<'----------',
source => "drc",
params => "def",
expect => <<'#9...........',
-ignoreSpec( $file, "file",, \%spec,,, \%Rspec );
+ignoreSpec( $file, "file", \%spec, \%Rspec );
#9...........
},
source => "drc",
params => "drc",
expect => <<'#10...........',
-ignoreSpec( $file, "file", \%spec, \%Rspec );
+ignoreSpec( $file, "file",, \%spec,,, \%Rspec );
#10...........
},
$q = 201 ;
print '-' x 79, "\n" ;
$g = (
- $f ^ ( $w = ( $z = $m . $e ) ^ substr $e, $q )
+ $f ^ ( $w = ( $z = $m . $e ) ^ substr $e, $q )
^ ( $n = $b ^ $d | $a ^ $l )
) & ( $w | $z ^ $f ^ $n ) & ( $l | $g )
)
my $no_index_1_1 =
{ 'map' =>
- { ':key' => { name => \&string, list => { value => \&string }, }, }, };
+ { ':key' => { name => \&string, list => { value => \&string, }, }, }, };
#2...........
},
expect => <<'#17...........',
# some tests for default setting --use-feature=class, rt145706
class Example::Subclass1 : isa(Example::Base) { ... }
+
class Example::Subclass2 : isa(Example::Base 2.345) { ... }
+
class Example::Subclass3 : isa(Example::Base) 1.345 { ... }
field $y : param(the_y_value);
+
class Pointer 2.0 {
field $x : param;
field $y : param;
#3 recombine6.def
#4 recombine7.def
#5 recombine8.def
+#6 git116.def
+#7 git116.git116
+#8 xbt.def
+#9 xbt.xbt1
+#10 xbt.xbt2
+#11 xbt.xbt3
+#12 lrt.def
+#13 lrt.lrt
+#14 ame.ame
+#15 ame.def
+#16 git124.def
+#17 c269.c269
+#18 c269.def
+#19 git125.def
# To locate test #13 you can search for its name or the string '#13'
# BEGIN SECTION 1: Parameter combinations #
###########################################
$rparams = {
+ 'ame' => <<'----------',
+--add-missing-else
+--add-missing-else-comment="NEED COMMENT"
+----------
+ 'c269' => "-ame",
'def' => "",
+ 'git116' => "-viu",
+ 'lrt' => "--line-range-tidy=2:3",
'olbxl2' => <<'----------',
-olbxl='*'
+----------
+ 'xbt1' => "-xbt",
+ 'xbt2' => "-xbt -xbtl=kt",
+ 'xbt3' => <<'----------',
+-xbt -bbt=2 -xbtl="print say t"
----------
};
############################
$rsources = {
+ 'ame' => <<'----------',
+ if ( $level == 3 ) { $val = $global{'section'} }
+ elsif ( $level == 2 ) { $val = $global{'chapter'} }
+----------
+
+ 'c269' => <<'----------',
+if ($xxxxx) {
+ $file = "$xxxxx";
+}
+elsif ($yyyyyy) {
+ $file = "$yyyyy";
+} # side comment
+ # hanging side comment
+elsif ($zzzzz) {
+
+ # comment
+}
+----------
+
+ 'git116' => <<'----------',
+print "Tried to add: @ResolveRPM\n" if ( @ResolveRPM and !$Quiet );
+print "Would need: @DepList\n" if ( @DepList and !$Quiet );
+print "RPM Output:\n" unless $Quiet;
+print join( "\n", @RPMOutput ) . "\n" unless $Quiet;
+----------
+
+ 'git124' => <<'----------',
+sub git124 {
+ return [
+ gather while ( my $foo = $bar->foobar )
+ {
+ ...;
+ }
+ ];
+}
+----------
+
+ 'git125' => <<'----------',
+sub Add ( $x, $y );
+sub Sub( $x, $y );
+----------
+
+ 'lrt' => <<'----------',
+=pod
+sub hello{ print
+"Hello World!"}
+=cut
+----------
+
'olbxl' => <<'----------',
eval {
require Ace };
'recombine8' => <<'----------',
# recombine uses normal forward mode
$v_gb = -1*(eval($pmt_gb))*(-1+((((-1+(1/((eval($i_gb)/100)+1))** ((eval($n_gb)-1)))))/(eval($i_gb)/100)));
+----------
+
+ 'xbt' => <<'----------',
+print {*STDERR} ${$data_sref};
+say {*STDERR} dump $c->{cookies};
+$rc = system {"lskdfj"} "lskdfj";
+test !eval { exec { 'notaint' } $TAINT }, 'exec';
+delete ${"$ {dest}::"}{$name};
+my @matches = @{$nodes_ref} > 1 ? @{$nodes_ref}[ 1 .. $#{$nodes_ref} ] : ();
+%{$self} = %{$project};
+*{$name} = $sub;
+grep { defined &{ ${ "${class}::" }{$_} } }
+&{"${class}::Clear"}();
----------
};
);
#5...........
},
+
+ 'git116.def' => {
+ source => "git116",
+ params => "def",
+ expect => <<'#6...........',
+print "Tried to add: @ResolveRPM\n" if ( @ResolveRPM and !$Quiet );
+print "Would need: @DepList\n" if ( @DepList and !$Quiet );
+print "RPM Output:\n" unless $Quiet;
+print join( "\n", @RPMOutput ) . "\n" unless $Quiet;
+#6...........
+ },
+
+ 'git116.git116' => {
+ source => "git116",
+ params => "git116",
+ expect => <<'#7...........',
+print "Tried to add: @ResolveRPM\n" if ( @ResolveRPM and !$Quiet );
+print "Would need: @DepList\n" if ( @DepList and !$Quiet );
+print "RPM Output:\n" unless $Quiet;
+print join( "\n", @RPMOutput ) . "\n" unless $Quiet;
+#7...........
+ },
+
+ 'xbt.def' => {
+ source => "xbt",
+ params => "def",
+ expect => <<'#8...........',
+print {*STDERR} ${$data_sref};
+say {*STDERR} dump $c->{cookies};
+$rc = system {"lskdfj"} "lskdfj";
+test !eval { exec {'notaint'} $TAINT }, 'exec';
+delete ${"$ {dest}::"}{$name};
+my @matches = @{$nodes_ref} > 1 ? @{$nodes_ref}[ 1 .. $#{$nodes_ref} ] : ();
+%{$self} = %{$project};
+*{$name} = $sub;
+grep { defined &{ ${"${class}::"}{$_} } } &{"${class}::Clear"}();
+#8...........
+ },
+
+ 'xbt.xbt1' => {
+ source => "xbt",
+ params => "xbt1",
+ expect => <<'#9...........',
+print { *STDERR } ${$data_sref};
+say { *STDERR } dump $c->{cookies};
+$rc = system { "lskdfj" } "lskdfj";
+test !eval { exec { 'notaint' } $TAINT }, 'exec';
+delete ${"$ {dest}::"}{$name};
+my @matches = @{$nodes_ref} > 1 ? @{$nodes_ref}[ 1 .. $#{$nodes_ref} ] : ();
+%{$self} = %{$project};
+*{$name} = $sub;
+grep { defined &{ ${"${class}::"}{$_} } } &{"${class}::Clear"}();
+#9...........
+ },
+
+ 'xbt.xbt2' => {
+ source => "xbt",
+ params => "xbt2",
+ expect => <<'#10...........',
+print { *STDERR } ${ $data_sref };
+say { *STDERR } dump $c->{cookies};
+$rc = system { "lskdfj" } "lskdfj";
+test !eval { exec { 'notaint' } $TAINT }, 'exec';
+delete ${ "$ {dest}::" }{$name};
+my @matches =
+ @{ $nodes_ref } > 1 ? @{ $nodes_ref }[ 1 .. $#{ $nodes_ref } ] : ();
+%{ $self } = %{ $project };
+*{ $name } = $sub;
+grep { defined &{ ${ "${class}::" }{$_} } } &{ "${class}::Clear" }();
+#10...........
+ },
+
+ 'xbt.xbt3' => {
+ source => "xbt",
+ params => "xbt3",
+ expect => <<'#11...........',
+print {*STDERR} ${$data_sref};
+say {*STDERR} dump $c->{cookies};
+$rc = system {"lskdfj"} "lskdfj";
+test !eval {exec {'notaint'} $TAINT}, 'exec';
+delete ${"$ {dest}::"}{$name};
+my @matches = @{$nodes_ref} > 1 ? @{$nodes_ref}[ 1 .. $#{$nodes_ref} ] : ();
+%{$self} = %{$project};
+*{$name} = $sub;
+grep {defined &{${"${class}::"}{$_}}} &{"${class}::Clear"}();
+#11...........
+ },
+
+ 'lrt.def' => {
+ source => "lrt",
+ params => "def",
+ expect => <<'#12...........',
+
+=pod
+sub hello{ print
+"Hello World!"}
+=cut
+#12...........
+ },
+
+ 'lrt.lrt' => {
+ source => "lrt",
+ params => "lrt",
+ expect => <<'#13...........',
+=pod
+sub hello {
+ print "Hello World!";
+}
+=cut
+#13...........
+ },
+
+ 'ame.ame' => {
+ source => "ame",
+ params => "ame",
+ expect => <<'#14...........',
+ if ( $level == 3 ) { $val = $global{'section'} }
+ elsif ( $level == 2 ) { $val = $global{'chapter'} }
+ else {
+ #NEED COMMENT
+ }
+#14...........
+ },
+
+ 'ame.def' => {
+ source => "ame",
+ params => "def",
+ expect => <<'#15...........',
+ if ( $level == 3 ) { $val = $global{'section'} }
+ elsif ( $level == 2 ) { $val = $global{'chapter'} }
+#15...........
+ },
+
+ 'git124.def' => {
+ source => "git124",
+ params => "def",
+ expect => <<'#16...........',
+sub git124 {
+ return [
+ gather while ( my $foo = $bar->foobar )
+ {
+ ...;
+ }
+ ];
+}
+#16...........
+ },
+
+ 'c269.c269' => {
+ source => "c269",
+ params => "c269",
+ expect => <<'#17...........',
+if ($xxxxx) {
+ $file = "$xxxxx";
+}
+elsif ($yyyyyy) {
+ $file = "$yyyyy";
+} # side comment
+ # hanging side comment
+elsif ($zzzzz) {
+
+ # comment
+}
+else {
+ ##FIXME - added with perltidy -ame
+}
+#17...........
+ },
+
+ 'c269.def' => {
+ source => "c269",
+ params => "def",
+ expect => <<'#18...........',
+if ($xxxxx) {
+ $file = "$xxxxx";
+}
+elsif ($yyyyyy) {
+ $file = "$yyyyy";
+} # side comment
+ # hanging side comment
+elsif ($zzzzz) {
+
+ # comment
+}
+#18...........
+ },
+
+ 'git125.def' => {
+ source => "git125",
+ params => "def",
+ expect => <<'#19...........',
+sub Add ( $x, $y );
+sub Sub( $x, $y );
+#19...........
+ },
};
my $ntests = 0 + keys %{$rtests};
--- /dev/null
+# Created with: ./make_t.pl
+
+# Contents:
+#1 git125.git125
+#2 vsn.def
+#3 vsn.vsn1
+#4 vsn.vsn2
+#5 dia.def
+#6 dia.dia1
+#7 dia.dia2
+#8 dia.dia3
+#9 git134.def
+#10 git135.def
+#11 git135.git135
+#12 c352.def
+#13 c353.c353
+#14 c353.def
+#15 git137.def
+#16 git137.git137
+#17 git138.def
+#18 git138.git138
+#19 vsn.vsn3
+
+# To locate test #13 you can search for its name or the string '#13'
+
+use strict;
+use Test::More;
+use Carp;
+use Perl::Tidy;
+my $rparams;
+my $rsources;
+my $rtests;
+
+BEGIN {
+
+ ###########################################
+ # BEGIN SECTION 1: Parameter combinations #
+ ###########################################
+ $rparams = {
+ 'c353' => <<'----------',
+--valign-wide-equals
+--valign-if-unless
+----------
+ 'def' => "",
+ 'dia1' => "-dia",
+ 'dia2' => "-aia",
+ 'dia3' => <<'----------',
+-dia -aia -iac=2
+-ias='][ }->{ ]->{ }->['
+----------
+ 'git125' => "-ssp=0",
+ 'git135' => "--valign-wide-equals",
+ 'git137' => "-mci -nolq -ci=4",
+ 'git138' => <<'----------',
+-xlp
+-vt=2
+----------
+ 'vsn1' => <<'----------',
+-vsn
+-gnu
+----------
+ 'vsn2' => <<'----------',
+# turn off vsn with -vsnl
+-vsn
+-vsnl=0
+----------
+ 'vsn3' => <<'----------',
+# turn off vsn
+-nvsn
+----------
+ };
+
+ ############################
+ # BEGIN SECTION 2: Sources #
+ ############################
+ $rsources = {
+
+ 'c352' => <<'----------',
+ # issue c352, recombine a small terminal quote
+ $text .= filter_blocks( $code, line( substr( $source, 0, $pos[0] ), $line ) ) . ")";
+ print( ( $Pipe->ResizeBuffer($NewSize) == $NewSize ) ? "Successful" : "Unsucessful" ) . "!\n\n";
+ my $func = 'encode_utf8(' . ( !defined $str ? 'undef' : is_utf8($str) ? '$utf8_str' : '$ascii_str' ) . ')';
+----------
+
+ 'c353' => <<'----------',
+@ns = split( /, ?/, join( ',', @ns ) );
+@cnames = split( /, ?/, join( ',', @cnames ) );
+$recurse = 1 unless ( defined $recurse );
+$port = 53 unless ( defined $port );
+$srcport = 0 unless ( defined $srcport );
+$ttl = 30 * 60 unless ( defined $ttl );
+$hash = 32 if ( defined $hash && $hash <= 0 );
+$hash = 63 if ( defined $hash && $hash > 63 );
+$unique = 1 unless ( defined $hash || defined $unique );
+$unique ||= $hash if (1);
+
+$basepath = $CWD unless length($basepath);
+$basepath .= '/' if -d $basepath && $basepath !~ m#/$#;
+
+$$hr = $1 || $5 || $9 || 0; # 9 is undef, but 5 is defined..
+$$mr = $2 || $6 || 0;
+$$sr = $3 || $7 || 0;
+$ampm = $4 || $8 || $10;
+$$tzr = $11;
+$$hr += 12 if $ampm and "\U$ampm" eq "PM" && $$hr != 12;
+$$hr = 0 if $$hr == 12 && "\U$ampm" eq "AM";
+$$hr = 0 if $$hr == 24;
+
+$map = $DEFAULT_MAP unless defined $map;
+$map = $DEFAULT_PATH . "/" . $map unless $map =~ m|/|;
+$map .= $DEFAULT_EXT unless $map =~ m|/[^/]+\.[^/]+$|;
+----------
+
+ 'dia' => <<'----------',
+return $this->{'content'}[$row][$col];
+return $this->{'content'}->[$row]->[$col];
+return $self->{'commandline'}->{'arg_list'}->[0]->[0]->{'hostgroups'};
+return $self->{'commandline'}{'arg_list'}[0][0]{'hostgroups'};
+$names->{'strings'}[ $featureEntry->{'settings'}{$setting} ][1][0]{0};
+$names->{'strings'}->[ $featureEntry->{'settings'}->{$setting} ]->[1]->[0]->{0};
+$this->{'hline_color'}[ $last_drawn_row + 1 ][$col];
+$this->{'hline_color'}->[ $last_drawn_row + 1 ]->[$col];
+@{ $table{$file}{$subname}{$pack}{ $type . $name }->{$event} };
+$tagslib->{ $fields[$x_i]->tag() }{ $subf[$i][0] }{tab};
+$m2_results{ $modlog->{uid} }->{m2_count}{ $_->{uid} }++;
+$self->_get_meta_data_hash_ref()->{ $p_object->get_key() }->[$p_offset];
+my $v2 = [ 1, 2, [ 3, 4 ] ]->[2]->[0];
+my $val1 = ${$x}[1];
+my $val2 = $${$x}->[1];
+----------
+
+ 'git125' => <<'----------',
+sub Add ( $x, $y );
+sub Sub( $x, $y );
+----------
+
+ 'git134' => <<'----------',
+# test padding at '[ $clientKey,' by sub pad_broken_list, see git #134
+sub foo {
+ my ( $self, $clientKey, $systemKey ) = @_;
+
+ $q->do(
+ q{
+ Something
+ },
+ [ $clientKey,
+ $systemKey,
+ ],
+ );
+
+ return;
+}
+----------
+
+ 'git135' => <<'----------',
+# simple alignments
+$j /= 2;
+$pow2 = $pow2 * $pow2;
+
+@tmp = reverse split( /\//, $date );
+$tmp[1] -= 1;
+$tmp[2] -= 1900;
+$epoch = timelocal( 0, 0, 0, @tmp );
+$$dow_ref{$date} = $day = ( localtime $epoch )[6];
+
+$state{rate} *= $rate_multiplier;
+$state{catalyst} = $catalyst;
+$state{_trail} .= ", catalysed ($file:$line)";
+
+# trailing alignments
+$self->{'_expect'} ||= $e || 'UNKNOWN';
+$self->{'_s'} = $s || 'UNKNOWN';
+$self->{'_word_size'} = $w || 'UNKNOWN';
+$self->{'_t1'} = $t1 || 'UNKNOWN';
+
+$v = "'" . $v unless $v =~ /^'/;
+$v .= "'" unless $v =~ /'$/;
+$hex .= " $v";
+
+$mask |= $catmask;
+$mask |= $DeadBits{$word} if $fatal;
+$mask = ~( ~$mask | $DeadBits{$word} ) if $no_fatal;
+
+{{{
+ # line limit exceeded if we align final '=' and 'if'
+ my $row = $list_count;
+ $row /= 2 if $main::config_parms{html_category_cols} == 2;
+ $height = $row * 25 if $row * 25 < $height;
+}}}
+
+----------
+
+ 'git137' => <<'----------',
+generate_error( msg =>
+"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
+);
+
+subtype 'USState' => as Str => where {
+ ( exists $STATES->{code2state}{ uc($_) }
+ || exists $STATES->{state2code}{ uc($_) } );
+};
+
+$self->blurt( "Error: No INPUT definition for type '$type', typekind '"
+ . $type->xstype
+ . "' found" );
+----------
+
+ 'git138' => <<'----------',
+my $sth = $dbh->prepare( "select * from accountlines
+ where (borrowernumber = ?) and (amountoutstanding<>0)
+ order by date"
+);
+$VAR1 = [ 'method',
+ 1,
+ 'prepare',
+ 'SELECT table_name, table_owner, num_rows FROM iitables
+ where table_owner != \'$ingres\' and table_owner != \'DBA\''
+];
+----------
+
+ 'vsn' => <<'----------',
+@data = (
+ [ "1st", "2nd", "3rd", "4th", "5th", "6th", "7th", "8th", "9th" ],
+ [ 1, 2, 5, 6, 3, 1.5, -1, -3, -4 ],
+ [ -4, -3, 1, 1, -3, -1.5, -2, -1, 0 ],
+ [ 9, 8, 9, 8.4, 7.1, 7.5, 8, 3, -3 ],
+ [ 0.1, 0.2, 0.5, 0.4, 0.3, 0.5, 0.1, 0, 0.4 ],
+);
+
+$s->drawLine( 35, 0 );
+$s->drawLine( 0, 10 );
+$s->drawLine( -35, 0 );
+$s->drawLine( 0, -10 );
+----------
+ };
+
+ ####################################
+ # BEGIN SECTION 3: Expected output #
+ ####################################
+ $rtests = {
+
+ 'git125.git125' => {
+ source => "git125",
+ params => "git125",
+ expect => <<'#1...........',
+sub Add( $x, $y );
+sub Sub( $x, $y );
+#1...........
+ },
+
+ 'vsn.def' => {
+ source => "vsn",
+ params => "def",
+ expect => <<'#2...........',
+@data = (
+ [ "1st", "2nd", "3rd", "4th", "5th", "6th", "7th", "8th", "9th" ],
+ [ 1, 2, 5, 6, 3, 1.5, -1, -3, -4 ],
+ [ -4, -3, 1, 1, -3, -1.5, -2, -1, 0 ],
+ [ 9, 8, 9, 8.4, 7.1, 7.5, 8, 3, -3 ],
+ [ 0.1, 0.2, 0.5, 0.4, 0.3, 0.5, 0.1, 0, 0.4 ],
+);
+
+$s->drawLine( 35, 0 );
+$s->drawLine( 0, 10 );
+$s->drawLine( -35, 0 );
+$s->drawLine( 0, -10 );
+#2...........
+ },
+
+ 'vsn.vsn1' => {
+ source => "vsn",
+ params => "vsn1",
+ expect => <<'#3...........',
+@data = (
+ ["1st", "2nd", "3rd", "4th", "5th", "6th", "7th", "8th", "9th"],
+ [ 1, 2, 5, 6, 3, 1.5, -1, -3, -4],
+ [-4, -3, 1, 1, -3, -1.5, -2, -1, 0],
+ [ 9, 8, 9, 8.4, 7.1, 7.5, 8, 3, -3],
+ [ 0.1, 0.2, 0.5, 0.4, 0.3, 0.5, 0.1, 0, 0.4],
+ );
+
+$s->drawLine( 35, 0);
+$s->drawLine( 0, 10);
+$s->drawLine(-35, 0);
+$s->drawLine( 0, -10);
+#3...........
+ },
+
+ 'vsn.vsn2' => {
+ source => "vsn",
+ params => "vsn2",
+ expect => <<'#4...........',
+@data = (
+ [ "1st", "2nd", "3rd", "4th", "5th", "6th", "7th", "8th", "9th" ],
+ [ 1, 2, 5, 6, 3, 1.5, -1, -3, -4 ],
+ [ -4, -3, 1, 1, -3, -1.5, -2, -1, 0 ],
+ [ 9, 8, 9, 8.4, 7.1, 7.5, 8, 3, -3 ],
+ [ 0.1, 0.2, 0.5, 0.4, 0.3, 0.5, 0.1, 0, 0.4 ],
+);
+
+$s->drawLine( 35, 0 );
+$s->drawLine( 0, 10 );
+$s->drawLine( -35, 0 );
+$s->drawLine( 0, -10 );
+#4...........
+ },
+
+ 'dia.def' => {
+ source => "dia",
+ params => "def",
+ expect => <<'#5...........',
+return $this->{'content'}[$row][$col];
+return $this->{'content'}->[$row]->[$col];
+return $self->{'commandline'}->{'arg_list'}->[0]->[0]->{'hostgroups'};
+return $self->{'commandline'}{'arg_list'}[0][0]{'hostgroups'};
+$names->{'strings'}[ $featureEntry->{'settings'}{$setting} ][1][0]{0};
+$names->{'strings'}->[ $featureEntry->{'settings'}->{$setting} ]->[1]->[0]->{0};
+$this->{'hline_color'}[ $last_drawn_row + 1 ][$col];
+$this->{'hline_color'}->[ $last_drawn_row + 1 ]->[$col];
+@{ $table{$file}{$subname}{$pack}{ $type . $name }->{$event} };
+$tagslib->{ $fields[$x_i]->tag() }{ $subf[$i][0] }{tab};
+$m2_results{ $modlog->{uid} }->{m2_count}{ $_->{uid} }++;
+$self->_get_meta_data_hash_ref()->{ $p_object->get_key() }->[$p_offset];
+my $v2 = [ 1, 2, [ 3, 4 ] ]->[2]->[0];
+my $val1 = ${$x}[1];
+my $val2 = $${$x}->[1];
+#5...........
+ },
+
+ 'dia.dia1' => {
+ source => "dia",
+ params => "dia1",
+ expect => <<'#6...........',
+return $this->{'content'}[$row][$col];
+return $this->{'content'}[$row][$col];
+return $self->{'commandline'}{'arg_list'}[0][0]{'hostgroups'};
+return $self->{'commandline'}{'arg_list'}[0][0]{'hostgroups'};
+$names->{'strings'}[ $featureEntry->{'settings'}{$setting} ][1][0]{0};
+$names->{'strings'}[ $featureEntry->{'settings'}{$setting} ]->[1][0]{0};
+$this->{'hline_color'}[ $last_drawn_row + 1 ][$col];
+$this->{'hline_color'}[ $last_drawn_row + 1 ][$col];
+@{ $table{$file}{$subname}{$pack}{ $type . $name }{$event} };
+$tagslib->{ $fields[$x_i]->tag() }{ $subf[$i][0] }{tab};
+$m2_results{ $modlog->{uid} }->{m2_count}{ $_->{uid} }++;
+$self->_get_meta_data_hash_ref()->{ $p_object->get_key() }->[$p_offset];
+my $v2 = [ 1, 2, [ 3, 4 ] ]->[2][0];
+my $val1 = ${$x}[1];
+my $val2 = $${$x}->[1];
+#6...........
+ },
+
+ 'dia.dia2' => {
+ source => "dia",
+ params => "dia2",
+ expect => <<'#7...........',
+return $this->{'content'}->[$row]->[$col];
+return $this->{'content'}->[$row]->[$col];
+return $self->{'commandline'}->{'arg_list'}->[0]->[0]->{'hostgroups'};
+return $self->{'commandline'}->{'arg_list'}->[0]->[0]->{'hostgroups'};
+$names->{'strings'}->[ $featureEntry->{'settings'}->{$setting} ][1]->[0]->{0};
+$names->{'strings'}->[ $featureEntry->{'settings'}->{$setting} ]->[1]->[0]->{0};
+$this->{'hline_color'}->[ $last_drawn_row + 1 ]->[$col];
+$this->{'hline_color'}->[ $last_drawn_row + 1 ]->[$col];
+@{ $table{$file}->{$subname}->{$pack}->{ $type . $name }->{$event} };
+$tagslib->{ $fields[$x_i]->tag() }{ $subf[$i]->[0] }{tab};
+$m2_results{ $modlog->{uid} }->{m2_count}->{ $_->{uid} }++;
+$self->_get_meta_data_hash_ref()->{ $p_object->get_key() }->[$p_offset];
+my $v2 = [ 1, 2, [ 3, 4 ] ]->[2]->[0];
+my $val1 = ${$x}[1];
+my $val2 = $${$x}->[1];
+#7...........
+ },
+
+ 'dia.dia3' => {
+ source => "dia",
+ params => "dia3",
+ expect => <<'#8...........',
+return $this->{'content'}->[$row][$col];
+return $this->{'content'}->[$row][$col];
+return $self->{'commandline'}->{'arg_list'}->[0][0]->{'hostgroups'};
+return $self->{'commandline'}->{'arg_list'}->[0][0]->{'hostgroups'};
+$names->{'strings'}->[ $featureEntry->{'settings'}->{$setting} ][1][0]->{0};
+$names->{'strings'}->[ $featureEntry->{'settings'}->{$setting} ][1][0]->{0};
+$this->{'hline_color'}->[ $last_drawn_row + 1 ][$col];
+$this->{'hline_color'}->[ $last_drawn_row + 1 ][$col];
+@{ $table{$file}->{$subname}->{$pack}->{ $type . $name }->{$event} };
+$tagslib->{ $fields[$x_i]->tag() }->{ $subf[$i][0] }->{tab};
+$m2_results{ $modlog->{uid} }->{m2_count}->{ $_->{uid} }++;
+$self->_get_meta_data_hash_ref()->{ $p_object->get_key() }->[$p_offset];
+my $v2 = [ 1, 2, [ 3, 4 ] ]->[2][0];
+my $val1 = ${$x}[1];
+my $val2 = $${$x}->[1];
+#8...........
+ },
+
+ 'git134.def' => {
+ source => "git134",
+ params => "def",
+ expect => <<'#9...........',
+# test padding at '[ $clientKey,' by sub pad_broken_list, see git #134
+sub foo {
+ my ( $self, $clientKey, $systemKey ) = @_;
+
+ $q->do(
+ q{
+ Something
+ },
+ [ $clientKey,
+ $systemKey,
+ ],
+ );
+
+ return;
+}
+#9...........
+ },
+
+ 'git135.def' => {
+ source => "git135",
+ params => "def",
+ expect => <<'#10...........',
+# simple alignments
+$j /= 2;
+$pow2 = $pow2 * $pow2;
+
+@tmp = reverse split( /\//, $date );
+$tmp[1] -= 1;
+$tmp[2] -= 1900;
+$epoch = timelocal( 0, 0, 0, @tmp );
+$$dow_ref{$date} = $day = ( localtime $epoch )[6];
+
+$state{rate} *= $rate_multiplier;
+$state{catalyst} = $catalyst;
+$state{_trail} .= ", catalysed ($file:$line)";
+
+# trailing alignments
+$self->{'_expect'} ||= $e || 'UNKNOWN';
+$self->{'_s'} = $s || 'UNKNOWN';
+$self->{'_word_size'} = $w || 'UNKNOWN';
+$self->{'_t1'} = $t1 || 'UNKNOWN';
+
+$v = "'" . $v unless $v =~ /^'/;
+$v .= "'" unless $v =~ /'$/;
+$hex .= " $v";
+
+$mask |= $catmask;
+$mask |= $DeadBits{$word} if $fatal;
+$mask = ~( ~$mask | $DeadBits{$word} ) if $no_fatal;
+
+{
+ {
+ {
+ # line limit exceeded if we align final '=' and 'if'
+ my $row = $list_count;
+ $row /= 2 if $main::config_parms{html_category_cols} == 2;
+ $height = $row * 25 if $row * 25 < $height;
+ }
+ }
+}
+
+#10...........
+ },
+
+ 'git135.git135' => {
+ source => "git135",
+ params => "git135",
+ expect => <<'#11...........',
+# simple alignments
+$j /= 2;
+$pow2 = $pow2 * $pow2;
+
+@tmp = reverse split( /\//, $date );
+$tmp[1] -= 1;
+$tmp[2] -= 1900;
+$epoch = timelocal( 0, 0, 0, @tmp );
+$$dow_ref{$date} = $day = ( localtime $epoch )[6];
+
+$state{rate} *= $rate_multiplier;
+$state{catalyst} = $catalyst;
+$state{_trail} .= ", catalysed ($file:$line)";
+
+# trailing alignments
+$self->{'_expect'} ||= $e || 'UNKNOWN';
+$self->{'_s'} = $s || 'UNKNOWN';
+$self->{'_word_size'} = $w || 'UNKNOWN';
+$self->{'_t1'} = $t1 || 'UNKNOWN';
+
+$v = "'" . $v unless $v =~ /^'/;
+$v .= "'" unless $v =~ /'$/;
+$hex .= " $v";
+
+$mask |= $catmask;
+$mask |= $DeadBits{$word} if $fatal;
+$mask = ~( ~$mask | $DeadBits{$word} ) if $no_fatal;
+
+{
+ {
+ {
+ # line limit exceeded if we align final '=' and 'if'
+ my $row = $list_count;
+ $row /= 2 if $main::config_parms{html_category_cols} == 2;
+ $height = $row * 25 if $row * 25 < $height;
+ }
+ }
+}
+
+#11...........
+ },
+
+ 'c352.def' => {
+ source => "c352",
+ params => "def",
+ expect => <<'#12...........',
+ # issue c352, recombine a small terminal quote
+ $text .= filter_blocks( $code,
+ line( substr( $source, 0, $pos[0] ), $line ) ) . ")";
+ print( ( $Pipe->ResizeBuffer($NewSize) == $NewSize )
+ ? "Successful"
+ : "Unsucessful" ) . "!\n\n";
+ my $func =
+ 'encode_utf8('
+ . ( !defined $str ? 'undef'
+ : is_utf8($str) ? '$utf8_str'
+ : '$ascii_str' ) . ')';
+#12...........
+ },
+
+ 'c353.c353' => {
+ source => "c353",
+ params => "c353",
+ expect => <<'#13...........',
+@ns = split( /, ?/, join( ',', @ns ) );
+@cnames = split( /, ?/, join( ',', @cnames ) );
+$recurse = 1 unless ( defined $recurse );
+$port = 53 unless ( defined $port );
+$srcport = 0 unless ( defined $srcport );
+$ttl = 30 * 60 unless ( defined $ttl );
+$hash = 32 if ( defined $hash && $hash <= 0 );
+$hash = 63 if ( defined $hash && $hash > 63 );
+$unique = 1 unless ( defined $hash || defined $unique );
+$unique ||= $hash if (1);
+
+$basepath = $CWD unless length($basepath);
+$basepath .= '/' if -d $basepath && $basepath !~ m#/$#;
+
+$$hr = $1 || $5 || $9 || 0; # 9 is undef, but 5 is defined..
+$$mr = $2 || $6 || 0;
+$$sr = $3 || $7 || 0;
+$ampm = $4 || $8 || $10;
+$$tzr = $11;
+$$hr += 12 if $ampm and "\U$ampm" eq "PM" && $$hr != 12;
+$$hr = 0 if $$hr == 12 && "\U$ampm" eq "AM";
+$$hr = 0 if $$hr == 24;
+
+$map = $DEFAULT_MAP unless defined $map;
+$map = $DEFAULT_PATH . "/" . $map unless $map =~ m|/|;
+$map .= $DEFAULT_EXT unless $map =~ m|/[^/]+\.[^/]+$|;
+#13...........
+ },
+
+ 'c353.def' => {
+ source => "c353",
+ params => "def",
+ expect => <<'#14...........',
+@ns = split( /, ?/, join( ',', @ns ) );
+@cnames = split( /, ?/, join( ',', @cnames ) );
+$recurse = 1 unless ( defined $recurse );
+$port = 53 unless ( defined $port );
+$srcport = 0 unless ( defined $srcport );
+$ttl = 30 * 60 unless ( defined $ttl );
+$hash = 32 if ( defined $hash && $hash <= 0 );
+$hash = 63 if ( defined $hash && $hash > 63 );
+$unique = 1 unless ( defined $hash || defined $unique );
+$unique ||= $hash if (1);
+
+$basepath = $CWD unless length($basepath);
+$basepath .= '/' if -d $basepath && $basepath !~ m#/$#;
+
+$$hr = $1 || $5 || $9 || 0; # 9 is undef, but 5 is defined..
+$$mr = $2 || $6 || 0;
+$$sr = $3 || $7 || 0;
+$ampm = $4 || $8 || $10;
+$$tzr = $11;
+$$hr += 12 if $ampm and "\U$ampm" eq "PM" && $$hr != 12;
+$$hr = 0 if $$hr == 12 && "\U$ampm" eq "AM";
+$$hr = 0 if $$hr == 24;
+
+$map = $DEFAULT_MAP unless defined $map;
+$map = $DEFAULT_PATH . "/" . $map unless $map =~ m|/|;
+$map .= $DEFAULT_EXT unless $map =~ m|/[^/]+\.[^/]+$|;
+#14...........
+ },
+
+ 'git137.def' => {
+ source => "git137",
+ params => "def",
+ expect => <<'#15...........',
+generate_error( msg =>
+"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
+);
+
+subtype 'USState' => as Str => where {
+ ( exists $STATES->{code2state}{ uc($_) }
+ || exists $STATES->{state2code}{ uc($_) } );
+};
+
+$self->blurt( "Error: No INPUT definition for type '$type', typekind '"
+ . $type->xstype
+ . "' found" );
+#15...........
+ },
+
+ 'git137.git137' => {
+ source => "git137",
+ params => "git137",
+ expect => <<'#16...........',
+generate_error( msg =>
+ "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
+);
+
+subtype 'USState' => as Str => where {
+ ( exists $STATES->{code2state}{ uc($_) }
+ || exists $STATES->{state2code}{ uc($_) } );
+};
+
+$self->blurt( "Error: No INPUT definition for type '$type', typekind '"
+ . $type->xstype
+ . "' found" );
+#16...........
+ },
+
+ 'git138.def' => {
+ source => "git138",
+ params => "def",
+ expect => <<'#17...........',
+my $sth = $dbh->prepare(
+ "select * from accountlines
+ where (borrowernumber = ?) and (amountoutstanding<>0)
+ order by date"
+);
+$VAR1 = [
+ 'method',
+ 1,
+ 'prepare',
+ 'SELECT table_name, table_owner, num_rows FROM iitables
+ where table_owner != \'$ingres\' and table_owner != \'DBA\''
+];
+#17...........
+ },
+
+ 'git138.git138' => {
+ source => "git138",
+ params => "git138",
+ expect => <<'#18...........',
+my $sth = $dbh->prepare( "select * from accountlines
+ where (borrowernumber = ?) and (amountoutstanding<>0)
+ order by date"
+);
+$VAR1 = [ 'method',
+ 1,
+ 'prepare',
+ 'SELECT table_name, table_owner, num_rows FROM iitables
+ where table_owner != \'$ingres\' and table_owner != \'DBA\''
+];
+#18...........
+ },
+
+ 'vsn.vsn3' => {
+ source => "vsn",
+ params => "vsn3",
+ expect => <<'#19...........',
+@data = (
+ [ "1st", "2nd", "3rd", "4th", "5th", "6th", "7th", "8th", "9th" ],
+ [ 1, 2, 5, 6, 3, 1.5, -1, -3, -4 ],
+ [ -4, -3, 1, 1, -3, -1.5, -2, -1, 0 ],
+ [ 9, 8, 9, 8.4, 7.1, 7.5, 8, 3, -3 ],
+ [ 0.1, 0.2, 0.5, 0.4, 0.3, 0.5, 0.1, 0, 0.4 ],
+);
+
+$s->drawLine( 35, 0 );
+$s->drawLine( 0, 10 );
+$s->drawLine( -35, 0 );
+$s->drawLine( 0, -10 );
+#19...........
+ },
+ };
+
+ my $ntests = 0 + keys %{$rtests};
+ plan tests => $ntests;
+}
+
+###############
+# EXECUTE TESTS
+###############
+
+foreach my $key ( sort keys %{$rtests} ) {
+ my $output;
+ my $sname = $rtests->{$key}->{source};
+ my $expect = $rtests->{$key}->{expect};
+ my $pname = $rtests->{$key}->{params};
+ my $source = $rsources->{$sname};
+ my $params = defined($pname) ? $rparams->{$pname} : "";
+ my $stderr_string;
+ my $errorfile_string;
+ my $err = Perl::Tidy::perltidy(
+ source => \$source,
+ destination => \$output,
+ perltidyrc => \$params,
+ argv => '', # for safety; hide any ARGV from perltidy
+ stderr => \$stderr_string,
+ errorfile => \$errorfile_string, # not used when -se flag is set
+ );
+ if ( $err || $stderr_string || $errorfile_string ) {
+ print STDERR "Error output received for test '$key'\n";
+ if ($err) {
+ print STDERR "An error flag '$err' was returned\n";
+ ok( !$err );
+ }
+ if ($stderr_string) {
+ print STDERR "---------------------\n";
+ print STDERR "<<STDERR>>\n$stderr_string\n";
+ print STDERR "---------------------\n";
+ ok( !$stderr_string );
+ }
+ if ($errorfile_string) {
+ print STDERR "---------------------\n";
+ print STDERR "<<.ERR file>>\n$errorfile_string\n";
+ print STDERR "---------------------\n";
+ ok( !$errorfile_string );
+ }
+ }
+ else {
+ if ( !is( $output, $expect, $key ) ) {
+ my $leno = length($output);
+ my $lene = length($expect);
+ if ( $leno == $lene ) {
+ print STDERR
+"#> Test '$key' gave unexpected output. Strings differ but both have length $leno\n";
+ }
+ else {
+ print STDERR
+"#> Test '$key' gave unexpected output. String lengths differ: output=$leno, expected=$lene\n";
+ }
+ }
+ }
+}
+
0.1
*
-yvals$a;
+yvals
+$a
+;
# keep paren separate here:
use
Foo::Bar (
--- /dev/null
+# Created with: ./make_t.pl
+
+# Contents:
+#1 git143.def
+#2 git143.git143
+#3 git146.def
+#4 git146.git146
+#5 altc.altc1
+#6 altc.altc2
+#7 altc.def
+#8 dltc.def
+#9 dltc.dltc1
+#10 dltc.dltc2
+#11 logical_xor.def
+#12 csc.csc3
+#13 git159.def
+#14 git159.git159
+#15 git162.def
+#16 git162.git162
+#17 qwaf.def
+#18 qwaf.qwaf
+#19 btct.btct1
+
+# To locate test #13 you can search for its name or the string '#13'
+
+use strict;
+use Test::More;
+use Carp;
+use Perl::Tidy;
+my $rparams;
+my $rsources;
+my $rtests;
+
+BEGIN {
+
+ ###########################################
+ # BEGIN SECTION 1: Parameter combinations #
+ ###########################################
+ $rparams = {
+ 'altc1' => "-naltc -atc -wtc=m",
+ 'altc2' => "-altc -atc -wtc=m",
+ 'btct1' => "-btct=1",
+ 'csc3' => "-csc -csci=2 -ncscb",
+ 'def' => "",
+ 'dltc1' => "-dtc -wtc=0",
+ 'dltc2' => "-dtc -wtc=0 -ndltc",
+ 'git143' => "-atc -wtc=h",
+ 'git146' => <<'----------',
+# testing three dash parameters
+---add-trailing-commas
+---unknown-future-option
+---wtc=h
+----------
+ 'git159' => "-bl -nsbl",
+ 'git162' => "-nwrs=A",
+ 'qwaf' => <<'----------',
+# git164
+-qwaf
+-sfp
+----------
+ };
+
+ ############################
+ # BEGIN SECTION 2: Sources #
+ ############################
+ $rsources = {
+
+ 'altc' => <<'----------',
+$self->make_grammar(
+ {
+ iterator => $self->_iterator,
+ parser => $self,
+ version => $self->version
+ }
+);
+----------
+
+ 'btct' => <<'----------',
+$w->bind(
+ '<Page_Down>' => xx,
+);
+
+$w->bind( '<Page_Down>' => xx,);
+
+$w->bind(
+ '<Page_Down>' => xx
+);
+
+$w->bind( '<Page_Down>' => xx);
+
+$lut = byte [ [ 0, 0, 0 ], [ 10, 1, 10 ], [ 2, 20, 20 ], [ 30, 30, 3 ], ];
+----------
+
+ 'csc' => <<'----------',
+ sub message {
+ if ( !defined( $_[0] ) ) {
+ print("Hello, World\n");
+ }
+ else {
+ print( $_[0], "\n" );
+ }
+ } ## end sub message
+
+ my $message =sub {
+ if ( !defined( $_[0] ) ) {
+ print("Hello, World\n");
+ }
+ else {
+ print( $_[0], "\n" );
+ }
+ };
+----------
+
+ 'dltc' => <<'----------',
+$self->make_grammar(
+ {
+ iterator => $self->_iterator,
+ parser => $self,
+ version => $self->version,
+ },
+);
+----------
+
+ 'git143' => <<'----------',
+# include '=>' in comma count to allow adding trailing comma here
+ my %strips = (
+ 1 => [
+ [ [ 1750, 150, ], [ 1850, 150, ], ],
+ [ [ 1950, 150, ], [ 2050, 150, ], ],
+ ]
+ );
+----------
+
+ 'git146' => <<'----------',
+ my %strips = (
+ 1 => [
+ [ [ 1750, 150, ], [ 1850, 150, ], ],
+ [ [ 1950, 150, ], [ 2050, 150, ], ],
+ ]
+ );
+----------
+
+ 'git159' => <<'----------',
+sub example {
+ my $ex = 0;
+ if ($ex)
+ {
+ print "yay\n";
+ }
+}
+----------
+
+ 'git162' => <<'----------',
+if ( $x in : eq @some_strings ) {
+ say "x is one of the given strings";
+}
+
+match( $n : == ) {
+ case (1) { say "It's one" }
+ case (2) { say "It's two" }
+ case (3) { say "It's three" }
+ case (4), case (5) { say "It's four or five" }
+ case if ( $n < 10 ) { say "It's less than ten" }
+ default { say "It's something else" }
+}
+----------
+
+ 'logical_xor' => <<'----------',
+$x^^$y and say "One of x or y is true, but not both";
+----------
+
+ 'qwaf' => <<'----------',
+use Digest::MD5 qw( md5_hex );
+
+@fields = qw( $st_dev $st_ino $st_mode
+ $st_nlink $st_uid $st_gid
+ $st_rdev $st_size
+ $st_atime $st_mtime $st_ctime
+ $st_blksize $st_blocks
+);
+
+@hdr_colors = qw(
+ CadetBlue1
+ MediumPurple1
+ turquoise1
+ PaleTurquoise1
+ SlateBlue1
+ );
+
+# has blank line, so keep line breaks
+@hdr_colors = qw(
+
+ CadetBlue1
+ MediumPurple1
+ turquoise1
+ PaleTurquoise1
+ SlateBlue1
+ );
+
+@list = qw( \ );
+----------
+ };
+
+ ####################################
+ # BEGIN SECTION 3: Expected output #
+ ####################################
+ $rtests = {
+
+ 'git143.def' => {
+ source => "git143",
+ params => "def",
+ expect => <<'#1...........',
+ # include '=>' in comma count to allow adding trailing comma here
+ my %strips = (
+ 1 => [
+ [ [ 1750, 150, ], [ 1850, 150, ], ],
+ [ [ 1950, 150, ], [ 2050, 150, ], ],
+ ]
+ );
+#1...........
+ },
+
+ 'git143.git143' => {
+ source => "git143",
+ params => "git143",
+ expect => <<'#2...........',
+ # include '=>' in comma count to allow adding trailing comma here
+ my %strips = (
+ 1 => [
+ [ [ 1750, 150, ], [ 1850, 150, ], ],
+ [ [ 1950, 150, ], [ 2050, 150, ], ],
+ ],
+ );
+#2...........
+ },
+
+ 'git146.def' => {
+ source => "git146",
+ params => "def",
+ expect => <<'#3...........',
+ my %strips = (
+ 1 => [
+ [ [ 1750, 150, ], [ 1850, 150, ], ],
+ [ [ 1950, 150, ], [ 2050, 150, ], ],
+ ]
+ );
+#3...........
+ },
+
+ 'git146.git146' => {
+ source => "git146",
+ params => "git146",
+ expect => <<'#4...........',
+ my %strips = (
+ 1 => [
+ [ [ 1750, 150, ], [ 1850, 150, ], ],
+ [ [ 1950, 150, ], [ 2050, 150, ], ],
+ ],
+ );
+#4...........
+ },
+
+ 'altc.altc1' => {
+ source => "altc",
+ params => "altc1",
+ expect => <<'#5...........',
+$self->make_grammar(
+ {
+ iterator => $self->_iterator,
+ parser => $self,
+ version => $self->version,
+ }
+);
+#5...........
+ },
+
+ 'altc.altc2' => {
+ source => "altc",
+ params => "altc2",
+ expect => <<'#6...........',
+$self->make_grammar(
+ {
+ iterator => $self->_iterator,
+ parser => $self,
+ version => $self->version,
+ },
+);
+#6...........
+ },
+
+ 'altc.def' => {
+ source => "altc",
+ params => "def",
+ expect => <<'#7...........',
+$self->make_grammar(
+ {
+ iterator => $self->_iterator,
+ parser => $self,
+ version => $self->version
+ }
+);
+#7...........
+ },
+
+ 'dltc.def' => {
+ source => "dltc",
+ params => "def",
+ expect => <<'#8...........',
+$self->make_grammar(
+ {
+ iterator => $self->_iterator,
+ parser => $self,
+ version => $self->version,
+ },
+);
+#8...........
+ },
+
+ 'dltc.dltc1' => {
+ source => "dltc",
+ params => "dltc1",
+ expect => <<'#9...........',
+$self->make_grammar(
+ {
+ iterator => $self->_iterator,
+ parser => $self,
+ version => $self->version
+ }
+);
+#9...........
+ },
+
+ 'dltc.dltc2' => {
+ source => "dltc",
+ params => "dltc2",
+ expect => <<'#10...........',
+$self->make_grammar(
+ {
+ iterator => $self->_iterator,
+ parser => $self,
+ version => $self->version
+ },
+);
+#10...........
+ },
+
+ 'logical_xor.def' => {
+ source => "logical_xor",
+ params => "def",
+ expect => <<'#11...........',
+$x ^^ $y and say "One of x or y is true, but not both";
+#11...........
+ },
+
+ 'csc.csc3' => {
+ source => "csc",
+ params => "csc3",
+ expect => <<'#12...........',
+ sub message {
+ if ( !defined( $_[0] ) ) {
+ print("Hello, World\n");
+ } ## end if ( !defined( $_[0] ))
+ else {
+ print( $_[0], "\n" );
+ } ## end else [ if ( !defined( $_[0] ))
+ } ## end sub message
+
+ my $message = sub {
+ if ( !defined( $_[0] ) ) {
+ print("Hello, World\n");
+ } ## end if ( !defined( $_[0] ))
+ else {
+ print( $_[0], "\n" );
+ } ## end else [ if ( !defined( $_[0] ))
+ }; ## end $message = sub
+#12...........
+ },
+
+ 'git159.def' => {
+ source => "git159",
+ params => "def",
+ expect => <<'#13...........',
+sub example {
+ my $ex = 0;
+ if ($ex) {
+ print "yay\n";
+ }
+}
+#13...........
+ },
+
+ 'git159.git159' => {
+ source => "git159",
+ params => "git159",
+ expect => <<'#14...........',
+sub example {
+ my $ex = 0;
+ if ($ex)
+ {
+ print "yay\n";
+ }
+}
+#14...........
+ },
+
+ 'git162.def' => {
+ source => "git162",
+ params => "def",
+ expect => <<'#15...........',
+if ( $x in : eq @some_strings ) {
+ say "x is one of the given strings";
+}
+
+match( $n : == ) {
+ case (1) { say "It's one" }
+ case (2) { say "It's two" }
+ case (3) { say "It's three" }
+ case (4), case (5) { say "It's four or five" }
+ case if ( $n < 10 ) { say "It's less than ten" }
+ default { say "It's something else" }
+}
+#15...........
+ },
+
+ 'git162.git162' => {
+ source => "git162",
+ params => "git162",
+ expect => <<'#16...........',
+if ( $x in :eq @some_strings ) {
+ say "x is one of the given strings";
+}
+
+match( $n :== ) {
+ case (1) { say "It's one" }
+ case (2) { say "It's two" }
+ case (3) { say "It's three" }
+ case (4), case (5) { say "It's four or five" }
+ case if ( $n < 10 ) { say "It's less than ten" }
+ default { say "It's something else" }
+}
+#16...........
+ },
+
+ 'qwaf.def' => {
+ source => "qwaf",
+ params => "def",
+ expect => <<'#17...........',
+use Digest::MD5 qw( md5_hex );
+
+@fields = qw( $st_dev $st_ino $st_mode
+ $st_nlink $st_uid $st_gid
+ $st_rdev $st_size
+ $st_atime $st_mtime $st_ctime
+ $st_blksize $st_blocks
+);
+
+@hdr_colors = qw(
+ CadetBlue1
+ MediumPurple1
+ turquoise1
+ PaleTurquoise1
+ SlateBlue1
+);
+
+# has blank line, so keep line breaks
+@hdr_colors = qw(
+
+ CadetBlue1
+ MediumPurple1
+ turquoise1
+ PaleTurquoise1
+ SlateBlue1
+);
+
+@list = qw( \ );
+#17...........
+ },
+
+ 'qwaf.qwaf' => {
+ source => "qwaf",
+ params => "qwaf",
+ expect => <<'#18...........',
+use Digest::MD5 qw(md5_hex);
+
+@fields = qw(
+ $st_dev $st_ino $st_mode $st_nlink $st_uid $st_gid
+ $st_rdev $st_size $st_atime $st_mtime $st_ctime $st_blksize
+ $st_blocks
+);
+
+@hdr_colors =
+ qw( CadetBlue1 MediumPurple1 turquoise1 PaleTurquoise1 SlateBlue1 );
+
+# has blank line, so keep line breaks
+@hdr_colors = qw(
+
+ CadetBlue1
+ MediumPurple1
+ turquoise1
+ PaleTurquoise1
+ SlateBlue1
+);
+
+@list = qw( \ );
+#18...........
+ },
+
+ 'btct.btct1' => {
+ source => "btct",
+ params => "btct1",
+ expect => <<'#19...........',
+$w->bind(
+ '<Page_Down>' => xx,
+);
+
+$w->bind(
+ '<Page_Down>' => xx,
+);
+
+$w->bind( '<Page_Down>' => xx );
+
+$w->bind( '<Page_Down>' => xx );
+
+$lut = byte [
+ [ 0, 0, 0 ], [ 10, 1, 10 ], [ 2, 20, 20 ], [ 30, 30, 3 ],
+];
+#19...........
+ },
+ };
+
+ my $ntests = 0 + keys %{$rtests};
+ plan tests => $ntests;
+}
+
+###############
+# EXECUTE TESTS
+###############
+
+foreach my $key ( sort keys %{$rtests} ) {
+ my $output;
+ my $sname = $rtests->{$key}->{source};
+ my $expect = $rtests->{$key}->{expect};
+ my $pname = $rtests->{$key}->{params};
+ my $source = $rsources->{$sname};
+ my $params = defined($pname) ? $rparams->{$pname} : "";
+ my $stderr_string;
+ my $errorfile_string;
+ my $err = Perl::Tidy::perltidy(
+ source => \$source,
+ destination => \$output,
+ perltidyrc => \$params,
+ argv => '', # for safety; hide any ARGV from perltidy
+ stderr => \$stderr_string,
+ errorfile => \$errorfile_string, # not used when -se flag is set
+ );
+ if ( $err || $stderr_string || $errorfile_string ) {
+ print STDERR "Error output received for test '$key'\n";
+ if ($err) {
+ print STDERR "An error flag '$err' was returned\n";
+ ok( !$err );
+ }
+ if ($stderr_string) {
+ print STDERR "---------------------\n";
+ print STDERR "<<STDERR>>\n$stderr_string\n";
+ print STDERR "---------------------\n";
+ ok( !$stderr_string );
+ }
+ if ($errorfile_string) {
+ print STDERR "---------------------\n";
+ print STDERR "<<.ERR file>>\n$errorfile_string\n";
+ print STDERR "---------------------\n";
+ ok( !$errorfile_string );
+ }
+ }
+ else {
+ if ( !is( $output, $expect, $key ) ) {
+ my $leno = length($output);
+ my $lene = length($expect);
+ if ( $leno == $lene ) {
+ print STDERR
+"#> Test '$key' gave unexpected output. Strings differ but both have length $leno\n";
+ }
+ else {
+ print STDERR
+"#> Test '$key' gave unexpected output. String lengths differ: output=$leno, expected=$lene\n";
+ }
+ }
+ }
+}
--- /dev/null
+# Created with: ./make_t.pl
+
+# Contents:
+#1 btct.btct2
+#2 btct.btct3
+#3 btct.def
+#4 c424.c424
+#5 c424.def
+#6 ils.def
+#7 ils.ils
+#8 mutt.def
+#9 mutt.mutt1
+
+# To locate test #13 you can search for its name or the string '#13'
+
+use strict;
+use Test::More;
+use Carp;
+use Perl::Tidy;
+my $rparams;
+my $rsources;
+my $rtests;
+
+BEGIN {
+
+ ###########################################
+ # BEGIN SECTION 1: Parameter combinations #
+ ###########################################
+ $rparams = {
+ 'btct2' => "-btct=1 -atc -wtc=1",
+ 'btct3' => "-btct=1 -atc -wtc=1",
+ 'c424' => "-naws -qwaf",
+ 'def' => "",
+ 'ils' => "-nils -bos",
+ 'mutt1' => <<'----------',
+-mutt='q*'
+----------
+ };
+
+ ############################
+ # BEGIN SECTION 2: Sources #
+ ############################
+ $rsources = {
+
+ 'btct' => <<'----------',
+$w->bind(
+ '<Page_Down>' => xx,
+);
+
+$w->bind( '<Page_Down>' => xx,);
+
+$w->bind(
+ '<Page_Down>' => xx
+);
+
+$w->bind( '<Page_Down>' => xx);
+
+$lut = byte [ [ 0, 0, 0 ], [ 10, 1, 10 ], [ 2, 20, 20 ], [ 30, 30, 3 ], ];
+----------
+
+ 'c424' => <<'----------',
+my @chars = qw( | / - \ | / - \ );
+my @chars = qw(| / - \ | / - \ );
+----------
+
+ 'ils' => <<'----------',
+$z = sqrt( $x**2 + $y**2 )
+;
+----------
+
+ 'mutt' => <<'----------',
+my $rlist = [qw(alpha beta gamma)];
+$aqx->appendChild(
+ $parser->parse_balanced_chunk(qq(<param name="skv">$skv</param>)) );
+----------
+ };
+
+ ####################################
+ # BEGIN SECTION 3: Expected output #
+ ####################################
+ $rtests = {
+
+ 'btct.btct2' => {
+ source => "btct",
+ params => "btct2",
+ expect => <<'#1...........',
+$w->bind(
+ '<Page_Down>' => xx,
+);
+
+$w->bind(
+ '<Page_Down>' => xx,
+);
+
+$w->bind(
+ '<Page_Down>' => xx,
+);
+
+$w->bind(
+ '<Page_Down>' => xx,
+);
+
+$lut = byte [
+ [
+ 0, 0, 0,
+ ], [
+ 10, 1, 10,
+ ], [
+ 2, 20, 20,
+ ], [
+ 30, 30, 3,
+ ],
+];
+#1...........
+ },
+
+ 'btct.btct3' => {
+ source => "btct",
+ params => "btct3",
+ expect => <<'#2...........',
+$w->bind(
+ '<Page_Down>' => xx,
+);
+
+$w->bind(
+ '<Page_Down>' => xx,
+);
+
+$w->bind(
+ '<Page_Down>' => xx,
+);
+
+$w->bind(
+ '<Page_Down>' => xx,
+);
+
+$lut = byte [
+ [
+ 0, 0, 0,
+ ], [
+ 10, 1, 10,
+ ], [
+ 2, 20, 20,
+ ], [
+ 30, 30, 3,
+ ],
+];
+#2...........
+ },
+
+ 'btct.def' => {
+ source => "btct",
+ params => "def",
+ expect => <<'#3...........',
+$w->bind( '<Page_Down>' => xx, );
+
+$w->bind( '<Page_Down>' => xx, );
+
+$w->bind( '<Page_Down>' => xx );
+
+$w->bind( '<Page_Down>' => xx );
+
+$lut = byte [ [ 0, 0, 0 ], [ 10, 1, 10 ], [ 2, 20, 20 ], [ 30, 30, 3 ], ];
+#3...........
+ },
+
+ 'c424.c424' => {
+ source => "c424",
+ params => "c424",
+ expect => <<'#4...........',
+my @chars = qw( | / - \ | / - \ );
+my @chars = qw(| / - \ | / - \ );
+#4...........
+ },
+
+ 'c424.def' => {
+ source => "c424",
+ params => "def",
+ expect => <<'#5...........',
+my @chars = qw( | / - \ | / - \ );
+my @chars = qw(| / - \ | / - \ );
+#5...........
+ },
+
+ 'ils.def' => {
+ source => "ils",
+ params => "def",
+ expect => <<'#6...........',
+$z = sqrt( $x**2 + $y**2 );
+#6...........
+ },
+
+ 'ils.ils' => {
+ source => "ils",
+ params => "ils",
+ expect => <<'#7...........',
+$z = sqrt( $x**2 + $y**2 )
+;
+#7...........
+ },
+
+ 'mutt.def' => {
+ source => "mutt",
+ params => "def",
+ expect => <<'#8...........',
+my $rlist = [qw(alpha beta gamma)];
+$aqx->appendChild(
+ $parser->parse_balanced_chunk(qq(<param name="skv">$skv</param>)) );
+#8...........
+ },
+
+ 'mutt.mutt1' => {
+ source => "mutt",
+ params => "mutt1",
+ expect => <<'#9...........',
+my $rlist = [ qw(alpha beta gamma) ];
+$aqx->appendChild(
+ $parser->parse_balanced_chunk( qq(<param name="skv">$skv</param>) ) );
+#9...........
+ },
+ };
+
+ my $ntests = 0 + keys %{$rtests};
+ plan tests => $ntests;
+}
+
+###############
+# EXECUTE TESTS
+###############
+
+foreach my $key ( sort keys %{$rtests} ) {
+ my $output;
+ my $sname = $rtests->{$key}->{source};
+ my $expect = $rtests->{$key}->{expect};
+ my $pname = $rtests->{$key}->{params};
+ my $source = $rsources->{$sname};
+ my $params = defined($pname) ? $rparams->{$pname} : "";
+ my $stderr_string;
+ my $errorfile_string;
+ my $err = Perl::Tidy::perltidy(
+ source => \$source,
+ destination => \$output,
+ perltidyrc => \$params,
+ argv => '', # for safety; hide any ARGV from perltidy
+ stderr => \$stderr_string,
+ errorfile => \$errorfile_string, # not used when -se flag is set
+ );
+ if ( $err || $stderr_string || $errorfile_string ) {
+ print STDERR "Error output received for test '$key'\n";
+ if ($err) {
+ print STDERR "An error flag '$err' was returned\n";
+ ok( !$err );
+ }
+ if ($stderr_string) {
+ print STDERR "---------------------\n";
+ print STDERR "<<STDERR>>\n$stderr_string\n";
+ print STDERR "---------------------\n";
+ ok( !$stderr_string );
+ }
+ if ($errorfile_string) {
+ print STDERR "---------------------\n";
+ print STDERR "<<.ERR file>>\n$errorfile_string\n";
+ print STDERR "---------------------\n";
+ ok( !$errorfile_string );
+ }
+ }
+ else {
+ if ( !is( $output, $expect, $key ) ) {
+ my $leno = length($output);
+ my $lene = length($expect);
+ if ( $leno == $lene ) {
+ print STDERR
+"#> Test '$key' gave unexpected output. Strings differ but both have length $leno\n";
+ }
+ else {
+ print STDERR
+"#> Test '$key' gave unexpected output. String lengths differ: output=$leno, expected=$lene\n";
+ }
+ }
+ }
+}
source => "html1",
params => "html",
expect => <<'#14...........',
-<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!-- Generated by perltidy -->
<html xmlns="http://www.w3.org/1999/xhtml">
<!--
/* default style sheet generated by perltidy */
body {background: #FFFFFF; color: #000000}
-pre { color: #000000;
+pre { color: #000000;
background: #FFFFFF;
font-family: courier;
- }
+ }
.c { color: #228B22;} /* comment */
.cm { color: #000000;} /* comma */
expect => <<'#12...........',
my $xyz_shield = [
[ -0.060, -0.060, 0. ],
- [ 0.060, -0.060, 0. ],
- [ 0.060, 0.060, 0. ],
- [ -0.060, 0.060, 0. ],
+ [ 0.060, -0.060, 0. ],
+ [ 0.060, 0.060, 0. ],
+ [ -0.060, 0.060, 0. ],
[ -0.0925, -0.0925, 0.092 ],
- [ 0.0925, -0.0925, 0.092 ],
- [ 0.0925, 0.0925, 0.092 ],
- [ -0.0925, 0.0925, 0.092 ],
+ [ 0.0925, -0.0925, 0.092 ],
+ [ 0.0925, 0.0925, 0.092 ],
+ [ -0.0925, 0.0925, 0.092 ],
];
#12...........
},
when => time(),
message => 'abc'
};
-my $json2 = encode_json + {
+my $json2 = encode_json +{
when => time(),
message => 'abc'
};
# try to work around problem where sub Test::More::note does not exist
# in older versions of perl
- if ($] >= 5.010) {
+ if ($] > 5.010) {
note($msg);
}
return;
# work around problem where sub Test::More::note does not exist
# in older versions of perl
- if ($] >= 5.010) {
+ if ($] > 5.010) {
note($msg);
}
return;
--- /dev/null
+use strict;
+use Test;
+use Carp;
+BEGIN {plan tests => 1}
+use Perl::Tidy;
+
+#-----------------------------------------------------------
+# test formatting a single character '0' with no line ending
+#-----------------------------------------------------------
+my $source = '0';
+my $perltidyrc = <<'EOM';
+-noadd-terminal-newline
+EOM
+
+my $output;
+
+Perl::Tidy::perltidy(
+ source => \$source,
+ destination => \$output,
+ perltidyrc => \$perltidyrc,
+ argv => '',
+);
+
+my $expected_output='0';
+ok($output, $expected_output);