1 ############################################################
3 # perltidy - a perl script indenter and formatter
5 # Copyright (c) 2000-2007 by Steve Hancock
6 # Distributed under the GPL license agreement; see file COPYING
8 # This program is free software; you can redistribute it and/or modify
9 # it under the terms of the GNU General Public License as published by
10 # the Free Software Foundation; either version 2 of the License, or
11 # (at your option) any later version.
13 # This program is distributed in the hope that it will be useful,
14 # but WITHOUT ANY WARRANTY; without even the implied warranty of
15 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
16 # GNU General Public License for more details.
18 # You should have received a copy of the GNU General Public License
19 # along with this program; if not, write to the Free Software
20 # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
22 # For brief instructions instructions, try 'perltidy -h'.
23 # For more complete documentation, try 'man perltidy'
24 # or visit http://perltidy.sourceforge.net
26 # This script is an example of the default style. It was formatted with:
31 # Michael Cartmell supplied code for adaptation to VMS and helped with
33 # Hugh S. Myers supplied sub streamhandle and the supporting code to
34 # create a Perl::Tidy module which can operate on strings, arrays, etc.
35 # Yves Orton supplied coding to help detect Windows versions.
36 # Axel Rose supplied a patch for MacPerl.
37 # Sebastien Aperghis-Tramoni supplied a patch for the defined or operator.
38 # Dan Tyrell sent a patch for binary I/O.
39 # Many others have supplied key ideas, suggestions, and bug reports;
40 # see the CHANGES file.
42 ############################################################
45 use 5.004; # need IO::File from 5.004 or later
46 BEGIN { $^W = 1; } # turn on warnings
60 @ISA = qw( Exporter );
61 @EXPORT = qw( &perltidy );
67 ( $VERSION = q($Id: Tidy.pm,v 1.64 2007/05/08 20:01:45 perltidy Exp $) ) =~ s/^.*\s+(\d+)\/(\d+)\/(\d+).*$/$1$2$3/; # all one line for MakeMaker
72 # given filename and mode (r or w), create an object which:
73 # has a 'getline' method if mode='r', and
74 # has a 'print' method if mode='w'.
75 # The objects also need a 'close' method.
77 # How the object is made:
79 # if $filename is: Make object using:
80 # ---------------- -----------------
81 # '-' (STDIN if mode = 'r', STDOUT if mode='w')
83 # ARRAY ref Perl::Tidy::IOScalarArray (formerly IO::ScalarArray)
84 # STRING ref Perl::Tidy::IOScalar (formerly IO::Scalar)
86 # (check for 'print' method for 'w' mode)
87 # (check for 'getline' method for 'r' mode)
88 my $ref = ref( my $filename = shift );
95 if ( $ref eq 'ARRAY' ) {
96 $New = sub { Perl::Tidy::IOScalarArray->new(@_) };
98 elsif ( $ref eq 'SCALAR' ) {
99 $New = sub { Perl::Tidy::IOScalar->new(@_) };
103 # Accept an object with a getline method for reading. Note:
104 # IO::File is built-in and does not respond to the defined
105 # operator. If this causes trouble, the check can be
106 # skipped and we can just let it crash if there is no
108 if ( $mode =~ /[rR]/ ) {
109 if ( $ref eq 'IO::File' || defined &{ $ref . "::getline" } ) {
110 $New = sub { $filename };
113 $New = sub { undef };
115 ------------------------------------------------------------------------
116 No 'getline' method is defined for object of class $ref
117 Please check your call to Perl::Tidy::perltidy. Trace follows.
118 ------------------------------------------------------------------------
123 # Accept an object with a print method for writing.
124 # See note above about IO::File
125 if ( $mode =~ /[wW]/ ) {
126 if ( $ref eq 'IO::File' || defined &{ $ref . "::print" } ) {
127 $New = sub { $filename };
130 $New = sub { undef };
132 ------------------------------------------------------------------------
133 No 'print' method is defined for object of class $ref
134 Please check your call to Perl::Tidy::perltidy. Trace follows.
135 ------------------------------------------------------------------------
144 if ( $filename eq '-' ) {
145 $New = sub { $mode eq 'w' ? *STDOUT : *STDIN }
148 $New = sub { IO::File->new(@_) };
151 $fh = $New->( $filename, $mode )
152 or warn "Couldn't open file:$filename in mode:$mode : $!\n";
153 return $fh, ( $ref or $filename );
156 sub find_input_line_ending {
158 # Peek at a file and return first line ending character.
159 # Quietly return undef in case of any trouble.
160 my ($input_file) = @_;
163 # silently ignore input from object or stdin
164 if ( ref($input_file) || $input_file eq '-' ) {
167 open( INFILE, $input_file ) || return $ending;
171 read( INFILE, $buf, 1024 );
173 if ( $buf && $buf =~ /([\012\015]+)/ ) {
177 if ( $test =~ /^(\015\012)+$/ ) { $ending = "\015\012" }
180 elsif ( $test =~ /^\015+$/ ) { $ending = "\015" }
183 elsif ( $test =~ /^\012+$/ ) { $ending = "\012" }
197 # concatenate a path and file basename
198 # returns undef in case of error
200 BEGIN { eval "require File::Spec"; $missing_file_spec = $@; }
202 # use File::Spec if we can
203 unless ($missing_file_spec) {
204 return File::Spec->catfile(@_);
207 # Perl 5.004 systems may not have File::Spec so we'll make
208 # a simple try. We assume File::Basename is available.
209 # return undef if not successful.
211 my $path = join '/', @_;
212 my $test_file = $path . $name;
213 my ( $test_name, $test_path ) = fileparse($test_file);
214 return $test_file if ( $test_name eq $name );
215 return undef if ( $^O eq 'VMS' );
217 # this should work at least for Windows and Unix:
218 $test_file = $path . '/' . $name;
219 ( $test_name, $test_path ) = fileparse($test_file);
220 return $test_file if ( $test_name eq $name );
224 sub make_temporary_filename {
226 # Make a temporary filename.
228 # The POSIX tmpnam() function tends to be unreliable for non-unix
229 # systems (at least for the win32 systems that I've tested), so use
230 # a pre-defined name. A slight disadvantage of this is that two
231 # perltidy runs in the same working directory may conflict.
232 # However, the chance of that is small and managable by the user.
233 # An alternative would be to check for the file's existance and use,
234 # say .TMP0, .TMP1, etc, but that scheme has its own problems. So,
236 my $name = "perltidy.TMP";
237 if ( $^O =~ /win32|dos/i || $^O eq 'VMS' || $^O eq 'MacOs' ) {
240 eval "use POSIX qw(tmpnam)";
241 if ($@) { return $name }
244 # just make a couple of tries before giving up and using the default
246 my $tmpname = tmpnam();
247 my $fh = IO::File->new( $tmpname, O_RDWR | O_CREAT | O_EXCL );
257 # Here is a map of the flow of data from the input source to the output
260 # LineSource-->Tokenizer-->Formatter-->VerticalAligner-->FileWriter-->
261 # input groups output
262 # lines tokens lines of lines lines
265 # The names correspond to the package names responsible for the unit processes.
267 # The overall process is controlled by the "main" package.
269 # LineSource is the stream of input lines
271 # Tokenizer analyzes a line and breaks it into tokens, peeking ahead
272 # if necessary. A token is any section of the input line which should be
273 # manipulated as a single entity during formatting. For example, a single
274 # ',' character is a token, and so is an entire side comment. It handles
275 # the complexities of Perl syntax, such as distinguishing between '<<' as
276 # a shift operator and as a here-document, or distinguishing between '/'
277 # as a divide symbol and as a pattern delimiter.
279 # Formatter inserts and deletes whitespace between tokens, and breaks
280 # sequences of tokens at appropriate points as output lines. It bases its
281 # decisions on the default rules as modified by any command-line options.
283 # VerticalAligner collects groups of lines together and tries to line up
284 # certain tokens, such as '=>', '#', and '=' by adding whitespace.
286 # FileWriter simply writes lines to the output stream.
288 # The Logger package, not shown, records significant events and warning
289 # messages. It writes a .LOG file, which may be saved with a
290 # '-log' or a '-g' flag.
294 # variables needed by interrupt handler:
298 # this routine may be called to give a status report if interrupted. If a
299 # parameter is given, it will call exit with that parameter. This is no
300 # longer used because it works under Unix but not under Windows.
301 sub interrupt_handler {
303 my $exit_flag = shift;
304 print STDERR "perltidy interrupted";
306 my $input_line_number =
307 Perl::Tidy::Tokenizer::get_input_line_number();
308 print STDERR " at line $input_line_number";
312 if ( ref $input_file ) { print STDERR " of reference to:" }
313 else { print STDERR " of file:" }
314 print STDERR " $input_file";
317 exit $exit_flag if defined($exit_flag);
324 destination => undef,
331 dump_options => undef,
332 dump_options_type => undef,
333 dump_getopt_flags => undef,
334 dump_options_category => undef,
335 dump_options_range => undef,
336 dump_abbreviations => undef,
339 # don't overwrite callers ARGV
344 if ( my @bad_keys = grep { !exists $defaults{$_} } keys %input_hash ) {
346 my @good_keys = sort keys %defaults;
347 @bad_keys = sort @bad_keys;
349 ------------------------------------------------------------------------
350 Unknown perltidy parameter : (@bad_keys)
351 perltidy only understands : (@good_keys)
352 ------------------------------------------------------------------------
357 my $get_hash_ref = sub {
359 my $hash_ref = $input_hash{$key};
360 if ( defined($hash_ref) ) {
361 unless ( ref($hash_ref) eq 'HASH' ) {
362 my $what = ref($hash_ref);
364 $what ? "but is ref to $what" : "but is not a reference";
366 ------------------------------------------------------------------------
367 error in call to perltidy:
368 -$key must be reference to HASH $but_is
369 ------------------------------------------------------------------------
376 %input_hash = ( %defaults, %input_hash );
377 my $argv = $input_hash{'argv'};
378 my $destination_stream = $input_hash{'destination'};
379 my $errorfile_stream = $input_hash{'errorfile'};
380 my $logfile_stream = $input_hash{'logfile'};
381 my $perltidyrc_stream = $input_hash{'perltidyrc'};
382 my $source_stream = $input_hash{'source'};
383 my $stderr_stream = $input_hash{'stderr'};
384 my $user_formatter = $input_hash{'formatter'};
386 # various dump parameters
387 my $dump_options_type = $input_hash{'dump_options_type'};
388 my $dump_options = $get_hash_ref->('dump_options');
389 my $dump_getopt_flags = $get_hash_ref->('dump_getopt_flags');
390 my $dump_options_category = $get_hash_ref->('dump_options_category');
391 my $dump_abbreviations = $get_hash_ref->('dump_abbreviations');
392 my $dump_options_range = $get_hash_ref->('dump_options_range');
394 # validate dump_options_type
395 if ( defined($dump_options) ) {
396 unless ( defined($dump_options_type) ) {
397 $dump_options_type = 'perltidyrc';
399 unless ( $dump_options_type =~ /^(perltidyrc|full)$/ ) {
401 ------------------------------------------------------------------------
402 Please check value of -dump_options_type in call to perltidy;
403 saw: '$dump_options_type'
404 expecting: 'perltidyrc' or 'full'
405 ------------------------------------------------------------------------
411 $dump_options_type = "";
414 if ($user_formatter) {
416 # if the user defines a formatter, there is no output stream,
417 # but we need a null stream to keep coding simple
418 $destination_stream = Perl::Tidy::DevNull->new();
421 # see if ARGV is overridden
422 if ( defined($argv) ) {
424 my $rargv = ref $argv;
425 if ( $rargv eq 'SCALAR' ) { $argv = $$argv; $rargv = undef }
429 if ( $rargv eq 'ARRAY' ) {
434 ------------------------------------------------------------------------
435 Please check value of -argv in call to perltidy;
436 it must be a string or ref to ARRAY but is: $rargv
437 ------------------------------------------------------------------------
444 my ( $rargv, $msg ) = parse_args($argv);
447 Error parsing this string passed to to perltidy with 'argv':
455 # redirect STDERR if requested
456 if ($stderr_stream) {
457 my ( $fh_stderr, $stderr_file ) =
458 Perl::Tidy::streamhandle( $stderr_stream, 'w' );
459 if ($fh_stderr) { *STDERR = $fh_stderr }
462 ------------------------------------------------------------------------
463 Unable to redirect STDERR to $stderr_stream
464 Please check value of -stderr in call to perltidy
465 ------------------------------------------------------------------------
470 my $rpending_complaint;
471 $$rpending_complaint = "";
472 my $rpending_logfile_message;
473 $$rpending_logfile_message = "";
475 my ( $is_Windows, $Windows_type ) =
476 look_for_Windows($rpending_complaint);
478 # VMS file names are restricted to a 40.40 format, so we append _tdy
479 # instead of .tdy, etc. (but see also sub check_vms_filename)
482 if ( $^O eq 'VMS' ) {
488 $dot_pattern = '\.'; # must escape for use in regex
491 # handle command line options
492 my ( $rOpts, $config_file, $rraw_options, $saw_extrude, $roption_string,
493 $rexpansion, $roption_category, $roption_range )
494 = process_command_line(
495 $perltidyrc_stream, $is_Windows, $Windows_type,
496 $rpending_complaint, $dump_options_type,
499 # return or exit immediately after all dumps
502 # Getopt parameters and their flags
503 if ( defined($dump_getopt_flags) ) {
505 foreach my $op ( @{$roption_string} ) {
514 if ( $opt =~ /(.*)(!|=.*|:.*)$/ ) {
518 $dump_getopt_flags->{$opt} = $flag;
522 if ( defined($dump_options_category) ) {
524 %{$dump_options_category} = %{$roption_category};
527 if ( defined($dump_options_range) ) {
529 %{$dump_options_range} = %{$roption_range};
532 if ( defined($dump_abbreviations) ) {
534 %{$dump_abbreviations} = %{$rexpansion};
537 if ( defined($dump_options) ) {
539 %{$dump_options} = %{$rOpts};
542 return if ($quit_now);
544 # dump from command line
545 if ( $rOpts->{'dump-options'} ) {
546 dump_options( $rOpts, $roption_string );
550 check_options( $rOpts, $is_Windows, $Windows_type,
551 $rpending_complaint );
553 if ($user_formatter) {
554 $rOpts->{'format'} = 'user';
557 # there must be one entry here for every possible format
558 my %default_file_extension = (
564 # be sure we have a valid output format
565 unless ( exists $default_file_extension{ $rOpts->{'format'} } ) {
566 my $formats = join ' ',
567 sort map { "'" . $_ . "'" } keys %default_file_extension;
568 my $fmt = $rOpts->{'format'};
569 die "-format='$fmt' but must be one of: $formats\n";
572 my $output_extension =
573 make_extension( $rOpts->{'output-file-extension'},
574 $default_file_extension{ $rOpts->{'format'} }, $dot );
576 my $backup_extension =
577 make_extension( $rOpts->{'backup-file-extension'}, 'bak', $dot );
579 my $html_toc_extension =
580 make_extension( $rOpts->{'html-toc-extension'}, 'toc', $dot );
582 my $html_src_extension =
583 make_extension( $rOpts->{'html-src-extension'}, 'src', $dot );
585 # check for -b option;
586 my $in_place_modify = $rOpts->{'backup-and-modify-in-place'}
587 && $rOpts->{'format'} eq 'tidy' # silently ignore unless beautify mode
588 && @ARGV > 0; # silently ignore if standard input;
589 # this allows -b to be in a .perltidyrc file
590 # without error messages when running from an editor
592 # turn off -b with warnings in case of conflicts with other options
593 if ($in_place_modify) {
594 if ( $rOpts->{'standard-output'} ) {
595 warn "Ignoring -b; you may not use -b and -st together\n";
596 $in_place_modify = 0;
598 if ($destination_stream) {
600 "Ignoring -b; you may not specify a destination array and -b together\n";
601 $in_place_modify = 0;
603 if ($source_stream) {
605 "Ignoring -b; you may not specify a source array and -b together\n";
606 $in_place_modify = 0;
608 if ( $rOpts->{'outfile'} ) {
609 warn "Ignoring -b; you may not use -b and -o together\n";
610 $in_place_modify = 0;
612 if ( defined( $rOpts->{'output-path'} ) ) {
613 warn "Ignoring -b; you may not use -b and -opath together\n";
614 $in_place_modify = 0;
618 Perl::Tidy::Formatter::check_options($rOpts);
619 if ( $rOpts->{'format'} eq 'html' ) {
620 Perl::Tidy::HtmlWriter->check_options($rOpts);
623 # make the pattern of file extensions that we shouldn't touch
624 my $forbidden_file_extensions = "(($dot_pattern)(LOG|DEBUG|ERR|TEE)";
625 if ($output_extension) {
626 my $ext = quotemeta($output_extension);
627 $forbidden_file_extensions .= "|$ext";
629 if ( $in_place_modify && $backup_extension ) {
630 my $ext = quotemeta($backup_extension);
631 $forbidden_file_extensions .= "|$ext";
633 $forbidden_file_extensions .= ')$';
635 # Create a diagnostics object if requested;
636 # This is only useful for code development
637 my $diagnostics_object = undef;
638 if ( $rOpts->{'DIAGNOSTICS'} ) {
639 $diagnostics_object = Perl::Tidy::Diagnostics->new();
642 # no filenames should be given if input is from an array
643 if ($source_stream) {
646 "You may not specify any filenames when a source array is given\n";
649 # we'll stuff the source array into ARGV
650 unshift( @ARGV, $source_stream );
652 # No special treatment for source stream which is a filename.
653 # This will enable checks for binary files and other bad stuff.
654 $source_stream = undef unless ref($source_stream);
657 # use stdin by default if no source array and no args
659 unshift( @ARGV, '-' ) unless @ARGV;
662 # loop to process all files in argument list
663 my $number_of_files = @ARGV;
664 my $formatter = undef;
666 while ( $input_file = shift @ARGV ) {
668 my $input_file_permissions;
670 #---------------------------------------------------------------
671 # determine the input file name
672 #---------------------------------------------------------------
673 if ($source_stream) {
674 $fileroot = "perltidy";
676 elsif ( $input_file eq '-' ) { # '-' indicates input from STDIN
677 $fileroot = "perltidy"; # root name to use for .ERR, .LOG, etc
678 $in_place_modify = 0;
681 $fileroot = $input_file;
682 unless ( -e $input_file ) {
684 # file doesn't exist - check for a file glob
685 if ( $input_file =~ /([\?\*\[\{])/ ) {
687 # Windows shell may not remove quotes, so do it
688 my $input_file = $input_file;
689 if ( $input_file =~ /^\'(.+)\'$/ ) { $input_file = $1 }
690 if ( $input_file =~ /^\"(.+)\"$/ ) { $input_file = $1 }
691 my $pattern = fileglob_to_re($input_file);
693 if ( !$@ && opendir( DIR, './' ) ) {
695 grep { /$pattern/ && !-d $_ } readdir(DIR);
698 unshift @ARGV, @files;
703 print "skipping file: '$input_file': no matches found\n";
707 unless ( -f $input_file ) {
708 print "skipping file: $input_file: not a regular file\n";
712 unless ( ( -T $input_file ) || $rOpts->{'force-read-binary'} ) {
714 "skipping file: $input_file: Non-text (override with -f)\n";
718 # we should have a valid filename now
719 $fileroot = $input_file;
720 $input_file_permissions = ( stat $input_file )[2] & 07777;
722 if ( $^O eq 'VMS' ) {
723 ( $fileroot, $dot ) = check_vms_filename($fileroot);
726 # add option to change path here
727 if ( defined( $rOpts->{'output-path'} ) ) {
729 my ( $base, $old_path ) = fileparse($fileroot);
730 my $new_path = $rOpts->{'output-path'};
731 unless ( -d $new_path ) {
732 unless ( mkdir $new_path, 0777 ) {
733 die "unable to create directory $new_path: $!\n";
736 my $path = $new_path;
737 $fileroot = catfile( $path, $base );
740 ------------------------------------------------------------------------
741 Problem combining $new_path and $base to make a filename; check -opath
742 ------------------------------------------------------------------------
748 # Skip files with same extension as the output files because
749 # this can lead to a messy situation with files like
750 # script.tdy.tdy.tdy ... or worse problems ... when you
751 # rerun perltidy over and over with wildcard input.
754 && ( $input_file =~ /$forbidden_file_extensions/o
755 || $input_file eq 'DIAGNOSTICS' )
758 print "skipping file: $input_file: wrong extension\n";
762 # the 'source_object' supplies a method to read the input file
764 Perl::Tidy::LineSource->new( $input_file, $rOpts,
765 $rpending_logfile_message );
766 next unless ($source_object);
768 # register this file name with the Diagnostics package
769 $diagnostics_object->set_input_file($input_file)
770 if $diagnostics_object;
772 #---------------------------------------------------------------
773 # determine the output file name
774 #---------------------------------------------------------------
775 my $output_file = undef;
776 my $actual_output_extension;
778 if ( $rOpts->{'outfile'} ) {
780 if ( $number_of_files <= 1 ) {
782 if ( $rOpts->{'standard-output'} ) {
783 die "You may not use -o and -st together\n";
785 elsif ($destination_stream) {
787 "You may not specify a destination array and -o together\n";
789 elsif ( defined( $rOpts->{'output-path'} ) ) {
790 die "You may not specify -o and -opath together\n";
792 elsif ( defined( $rOpts->{'output-file-extension'} ) ) {
793 die "You may not specify -o and -oext together\n";
795 $output_file = $rOpts->{outfile};
797 # make sure user gives a file name after -o
798 if ( $output_file =~ /^-/ ) {
799 die "You must specify a valid filename after -o\n";
802 # do not overwrite input file with -o
803 if ( defined($input_file_permissions)
804 && ( $output_file eq $input_file ) )
807 "Use 'perltidy -b $input_file' to modify in-place\n";
811 die "You may not use -o with more than one input file\n";
814 elsif ( $rOpts->{'standard-output'} ) {
815 if ($destination_stream) {
817 "You may not specify a destination array and -st together\n";
821 if ( $number_of_files <= 1 ) {
824 die "You may not use -st with more than one input file\n";
827 elsif ($destination_stream) {
828 $output_file = $destination_stream;
830 elsif ($source_stream) { # source but no destination goes to stdout
833 elsif ( $input_file eq '-' ) {
837 if ($in_place_modify) {
838 $output_file = IO::File->new_tmpfile()
839 or die "cannot open temp file for -b option: $!\n";
842 $actual_output_extension = $output_extension;
843 $output_file = $fileroot . $output_extension;
847 # the 'sink_object' knows how to write the output file
848 my $tee_file = $fileroot . $dot . "TEE";
850 my $line_separator = $rOpts->{'output-line-ending'};
851 if ( $rOpts->{'preserve-line-endings'} ) {
852 $line_separator = find_input_line_ending($input_file);
855 # Eventually all I/O may be done with binmode, but for now it is
856 # only done when a user requests a particular line separator
857 # through the -ple or -ole flags
859 if ( defined($line_separator) ) { $binmode = 1 }
860 else { $line_separator = "\n" }
863 Perl::Tidy::LineSink->new( $output_file, $tee_file,
864 $line_separator, $rOpts, $rpending_logfile_message, $binmode );
866 #---------------------------------------------------------------
867 # initialize the error logger
868 #---------------------------------------------------------------
869 my $warning_file = $fileroot . $dot . "ERR";
870 if ($errorfile_stream) { $warning_file = $errorfile_stream }
871 my $log_file = $fileroot . $dot . "LOG";
872 if ($logfile_stream) { $log_file = $logfile_stream }
875 Perl::Tidy::Logger->new( $rOpts, $log_file, $warning_file,
877 write_logfile_header(
878 $rOpts, $logger_object, $config_file,
879 $rraw_options, $Windows_type
881 if ($$rpending_logfile_message) {
882 $logger_object->write_logfile_entry($$rpending_logfile_message);
884 if ($$rpending_complaint) {
885 $logger_object->complain($$rpending_complaint);
888 #---------------------------------------------------------------
889 # initialize the debug object, if any
890 #---------------------------------------------------------------
891 my $debugger_object = undef;
892 if ( $rOpts->{DEBUG} ) {
894 Perl::Tidy::Debugger->new( $fileroot . $dot . "DEBUG" );
897 #---------------------------------------------------------------
898 # create a formatter for this file : html writer or pretty printer
899 #---------------------------------------------------------------
901 # we have to delete any old formatter because, for safety,
902 # the formatter will check to see that there is only one.
905 if ($user_formatter) {
906 $formatter = $user_formatter;
908 elsif ( $rOpts->{'format'} eq 'html' ) {
910 Perl::Tidy::HtmlWriter->new( $fileroot, $output_file,
911 $actual_output_extension, $html_toc_extension,
912 $html_src_extension );
914 elsif ( $rOpts->{'format'} eq 'tidy' ) {
915 $formatter = Perl::Tidy::Formatter->new(
916 logger_object => $logger_object,
917 diagnostics_object => $diagnostics_object,
918 sink_object => $sink_object,
922 die "I don't know how to do -format=$rOpts->{'format'}\n";
925 unless ($formatter) {
926 die "Unable to continue with $rOpts->{'format'} formatting\n";
929 #---------------------------------------------------------------
930 # create the tokenizer for this file
931 #---------------------------------------------------------------
932 $tokenizer = undef; # must destroy old tokenizer
933 $tokenizer = Perl::Tidy::Tokenizer->new(
934 source_object => $source_object,
935 logger_object => $logger_object,
936 debugger_object => $debugger_object,
937 diagnostics_object => $diagnostics_object,
938 starting_level => $rOpts->{'starting-indentation-level'},
939 tabs => $rOpts->{'tabs'},
940 indent_columns => $rOpts->{'indent-columns'},
941 look_for_hash_bang => $rOpts->{'look-for-hash-bang'},
942 look_for_autoloader => $rOpts->{'look-for-autoloader'},
943 look_for_selfloader => $rOpts->{'look-for-selfloader'},
944 trim_qw => $rOpts->{'trim-qw'},
947 #---------------------------------------------------------------
949 #---------------------------------------------------------------
950 process_this_file( $tokenizer, $formatter );
952 #---------------------------------------------------------------
953 # close the input source and report errors
954 #---------------------------------------------------------------
955 $source_object->close_input_file();
957 # get file names to use for syntax check
958 my $ifname = $source_object->get_input_file_copy_name();
959 my $ofname = $sink_object->get_output_file_copy();
961 #---------------------------------------------------------------
962 # handle the -b option (backup and modify in-place)
963 #---------------------------------------------------------------
964 if ($in_place_modify) {
965 unless ( -f $input_file ) {
967 # oh, oh, no real file to backup ..
968 # shouldn't happen because of numerous preliminary checks
970 "problem with -b backing up input file '$input_file': not a file\n";
972 my $backup_name = $input_file . $backup_extension;
973 if ( -f $backup_name ) {
976 "unable to remove previous '$backup_name' for -b option; check permissions: $!\n";
978 rename( $input_file, $backup_name )
980 "problem renaming $input_file to $backup_name for -b option: $!\n";
981 $ifname = $backup_name;
983 seek( $output_file, 0, 0 )
984 or die "unable to rewind tmp file for -b option: $!\n";
986 my $fout = IO::File->new("> $input_file")
988 "problem opening $input_file for write for -b option; check directory permissions: $!\n";
991 while ( $line = $output_file->getline() ) {
995 $output_file = $input_file;
996 $ofname = $input_file;
999 #---------------------------------------------------------------
1000 # clean up and report errors
1001 #---------------------------------------------------------------
1002 $sink_object->close_output_file() if $sink_object;
1003 $debugger_object->close_debug_file() if $debugger_object;
1005 my $infile_syntax_ok = 0; # -1 no 0=don't know 1 yes
1008 if ($input_file_permissions) {
1010 # give output script same permissions as input script, but
1011 # make it user-writable or else we can't run perltidy again.
1012 # Thus we retain whatever executable flags were set.
1013 if ( $rOpts->{'format'} eq 'tidy' ) {
1014 chmod( $input_file_permissions | 0600, $output_file );
1017 # else use default permissions for html and any other format
1020 if ( $logger_object && $rOpts->{'check-syntax'} ) {
1022 check_syntax( $ifname, $ofname, $logger_object, $rOpts );
1026 $logger_object->finish( $infile_syntax_ok, $formatter )
1028 } # end of loop to process all files
1029 } # end of main program
1032 sub fileglob_to_re {
1034 # modified (corrected) from version in find2perl
1036 $x =~ s#([./^\$()])#\\$1#g; # escape special characters
1037 $x =~ s#\*#.*#g; # '*' -> '.*'
1038 $x =~ s#\?#.#g; # '?' -> '.'
1039 "^$x\\z"; # match whole word
1042 sub make_extension {
1044 # Make a file extension, including any leading '.' if necessary
1045 # The '.' may actually be an '_' under VMS
1046 my ( $extension, $default, $dot ) = @_;
1048 # Use the default if none specified
1049 $extension = $default unless ($extension);
1051 # Only extensions with these leading characters get a '.'
1052 # This rule gives the user some freedom
1053 if ( $extension =~ /^[a-zA-Z0-9]/ ) {
1054 $extension = $dot . $extension;
1059 sub write_logfile_header {
1060 my ( $rOpts, $logger_object, $config_file, $rraw_options, $Windows_type ) =
1062 $logger_object->write_logfile_entry(
1063 "perltidy version $VERSION log file on a $^O system, OLD_PERL_VERSION=$]\n"
1065 if ($Windows_type) {
1066 $logger_object->write_logfile_entry("Windows type is $Windows_type\n");
1068 my $options_string = join( ' ', @$rraw_options );
1071 $logger_object->write_logfile_entry(
1072 "Found Configuration File >>> $config_file \n");
1074 $logger_object->write_logfile_entry(
1075 "Configuration and command line parameters for this run:\n");
1076 $logger_object->write_logfile_entry("$options_string\n");
1078 if ( $rOpts->{'DEBUG'} || $rOpts->{'show-options'} ) {
1079 $rOpts->{'logfile'} = 1; # force logfile to be saved
1080 $logger_object->write_logfile_entry(
1081 "Final parameter set for this run\n");
1082 $logger_object->write_logfile_entry(
1083 "------------------------------------\n");
1085 foreach ( keys %{$rOpts} ) {
1086 $logger_object->write_logfile_entry( '--' . "$_=$rOpts->{$_}\n" );
1088 $logger_object->write_logfile_entry(
1089 "------------------------------------\n");
1091 $logger_object->write_logfile_entry(
1092 "To find error messages search for 'WARNING' with your editor\n");
1095 sub generate_options {
1097 ######################################################################
1098 # Generate and return references to:
1099 # @option_string - the list of options to be passed to Getopt::Long
1100 # @defaults - the list of default options
1101 # %expansion - a hash showing how all abbreviations are expanded
1102 # %category - a hash giving the general category of each option
1103 # %option_range - a hash giving the valid ranges of certain options
1105 # Note: a few options are not documented in the man page and usage
1106 # message. This is because these are experimental or debug options and
1107 # may or may not be retained in future versions.
1109 # Here are the undocumented flags as far as I know. Any of them
1110 # may disappear at any time. They are mainly for fine-tuning
1113 # fll --> fuzzy-line-length # a trivial parameter which gets
1114 # turned off for the extrude option
1115 # which is mainly for debugging
1116 # chk --> check-multiline-quotes # check for old bug; to be deleted
1117 # scl --> short-concatenation-item-length # helps break at '.'
1118 # recombine # for debugging line breaks
1119 # valign # for debugging vertical alignment
1120 # I --> DIAGNOSTICS # for debugging
1121 ######################################################################
1123 # here is a summary of the Getopt codes:
1124 # <none> does not take an argument
1125 # =s takes a mandatory string
1126 # :s takes an optional string (DO NOT USE - filenames will get eaten up)
1127 # =i takes a mandatory integer
1128 # :i takes an optional integer (NOT RECOMMENDED - can cause trouble)
1129 # ! does not take an argument and may be negated
1130 # i.e., -foo and -nofoo are allowed
1131 # a double dash signals the end of the options list
1133 #---------------------------------------------------------------
1134 # Define the option string passed to GetOptions.
1135 #---------------------------------------------------------------
1137 my @option_string = ();
1139 my %option_category = ();
1140 my %option_range = ();
1141 my $rexpansion = \%expansion;
1143 # names of categories in manual
1144 # leading integers will allow sorting
1145 my @category_name = (
1147 '1. Basic formatting options',
1148 '2. Code indentation control',
1149 '3. Whitespace control',
1150 '4. Comment controls',
1151 '5. Linebreak controls',
1152 '6. Controlling list formatting',
1153 '7. Retaining or ignoring existing line breaks',
1154 '8. Blank line control',
1155 '9. Other controls',
1157 '11. pod2html options',
1158 '12. Controlling HTML properties',
1162 # These options are parsed directly by perltidy:
1165 # However, they are included in the option set so that they will
1166 # be seen in the options dump.
1168 # These long option names have no abbreviations or are treated specially
1169 @option_string = qw(
1178 my $category = 13; # Debugging
1179 foreach (@option_string) {
1180 my $opt = $_; # must avoid changing the actual flag
1182 $option_category{$opt} = $category_name[$category];
1185 $category = 11; # HTML
1186 $option_category{html} = $category_name[$category];
1188 # routine to install and check options
1189 my $add_option = sub {
1190 my ( $long_name, $short_name, $flag ) = @_;
1191 push @option_string, $long_name . $flag;
1192 $option_category{$long_name} = $category_name[$category];
1194 if ( $expansion{$short_name} ) {
1195 my $existing_name = $expansion{$short_name}[0];
1197 "redefining abbreviation $short_name for $long_name; already used for $existing_name\n";
1199 $expansion{$short_name} = [$long_name];
1200 if ( $flag eq '!' ) {
1201 my $nshort_name = 'n' . $short_name;
1202 my $nolong_name = 'no' . $long_name;
1203 if ( $expansion{$nshort_name} ) {
1204 my $existing_name = $expansion{$nshort_name}[0];
1206 "attempting to redefine abbreviation $nshort_name for $nolong_name; already used for $existing_name\n";
1208 $expansion{$nshort_name} = [$nolong_name];
1213 # Install long option names which have a simple abbreviation.
1214 # Options with code '!' get standard negation ('no' for long names,
1215 # 'n' for abbreviations). Categories follow the manual.
1217 ###########################
1218 $category = 0; # I/O_Control
1219 ###########################
1220 $add_option->( 'backup-and-modify-in-place', 'b', '!' );
1221 $add_option->( 'backup-file-extension', 'bext', '=s' );
1222 $add_option->( 'force-read-binary', 'f', '!' );
1223 $add_option->( 'format', 'fmt', '=s' );
1224 $add_option->( 'logfile', 'log', '!' );
1225 $add_option->( 'logfile-gap', 'g', ':i' );
1226 $add_option->( 'outfile', 'o', '=s' );
1227 $add_option->( 'output-file-extension', 'oext', '=s' );
1228 $add_option->( 'output-path', 'opath', '=s' );
1229 $add_option->( 'profile', 'pro', '=s' );
1230 $add_option->( 'quiet', 'q', '!' );
1231 $add_option->( 'standard-error-output', 'se', '!' );
1232 $add_option->( 'standard-output', 'st', '!' );
1233 $add_option->( 'warning-output', 'w', '!' );
1235 # options which are both toggle switches and values moved here
1236 # to hide from tidyview (which does not show category 0 flags):
1237 # -ole moved here from category 1
1238 # -sil moved here from category 2
1239 $add_option->( 'output-line-ending', 'ole', '=s' );
1240 $add_option->( 'starting-indentation-level', 'sil', '=i' );
1242 ########################################
1243 $category = 1; # Basic formatting options
1244 ########################################
1245 $add_option->( 'check-syntax', 'syn', '!' );
1246 $add_option->( 'entab-leading-whitespace', 'et', '=i' );
1247 $add_option->( 'indent-columns', 'i', '=i' );
1248 $add_option->( 'maximum-line-length', 'l', '=i' );
1249 $add_option->( 'perl-syntax-check-flags', 'pscf', '=s' );
1250 $add_option->( 'preserve-line-endings', 'ple', '!' );
1251 $add_option->( 'tabs', 't', '!' );
1253 ########################################
1254 $category = 2; # Code indentation control
1255 ########################################
1256 $add_option->( 'continuation-indentation', 'ci', '=i' );
1257 $add_option->( 'line-up-parentheses', 'lp', '!' );
1258 $add_option->( 'outdent-keyword-list', 'okwl', '=s' );
1259 $add_option->( 'outdent-keywords', 'okw', '!' );
1260 $add_option->( 'outdent-labels', 'ola', '!' );
1261 $add_option->( 'outdent-long-quotes', 'olq', '!' );
1262 $add_option->( 'indent-closing-brace', 'icb', '!' );
1263 $add_option->( 'closing-token-indentation', 'cti', '=i' );
1264 $add_option->( 'closing-paren-indentation', 'cpi', '=i' );
1265 $add_option->( 'closing-brace-indentation', 'cbi', '=i' );
1266 $add_option->( 'closing-square-bracket-indentation', 'csbi', '=i' );
1267 $add_option->( 'brace-left-and-indent', 'bli', '!' );
1268 $add_option->( 'brace-left-and-indent-list', 'blil', '=s' );
1270 ########################################
1271 $category = 3; # Whitespace control
1272 ########################################
1273 $add_option->( 'add-semicolons', 'asc', '!' );
1274 $add_option->( 'add-whitespace', 'aws', '!' );
1275 $add_option->( 'block-brace-tightness', 'bbt', '=i' );
1276 $add_option->( 'brace-tightness', 'bt', '=i' );
1277 $add_option->( 'delete-old-whitespace', 'dws', '!' );
1278 $add_option->( 'delete-semicolons', 'dsm', '!' );
1279 $add_option->( 'nospace-after-keyword', 'nsak', '=s' );
1280 $add_option->( 'nowant-left-space', 'nwls', '=s' );
1281 $add_option->( 'nowant-right-space', 'nwrs', '=s' );
1282 $add_option->( 'paren-tightness', 'pt', '=i' );
1283 $add_option->( 'space-after-keyword', 'sak', '=s' );
1284 $add_option->( 'space-for-semicolon', 'sfs', '!' );
1285 $add_option->( 'space-function-paren', 'sfp', '!' );
1286 $add_option->( 'space-keyword-paren', 'skp', '!' );
1287 $add_option->( 'space-terminal-semicolon', 'sts', '!' );
1288 $add_option->( 'square-bracket-tightness', 'sbt', '=i' );
1289 $add_option->( 'square-bracket-vertical-tightness', 'sbvt', '=i' );
1290 $add_option->( 'square-bracket-vertical-tightness-closing', 'sbvtc', '=i' );
1291 $add_option->( 'trim-qw', 'tqw', '!' );
1292 $add_option->( 'want-left-space', 'wls', '=s' );
1293 $add_option->( 'want-right-space', 'wrs', '=s' );
1295 ########################################
1296 $category = 4; # Comment controls
1297 ########################################
1298 $add_option->( 'closing-side-comment-else-flag', 'csce', '=i' );
1299 $add_option->( 'closing-side-comment-interval', 'csci', '=i' );
1300 $add_option->( 'closing-side-comment-list', 'cscl', '=s' );
1301 $add_option->( 'closing-side-comment-maximum-text', 'csct', '=i' );
1302 $add_option->( 'closing-side-comment-prefix', 'cscp', '=s' );
1303 $add_option->( 'closing-side-comment-warnings', 'cscw', '!' );
1304 $add_option->( 'closing-side-comments', 'csc', '!' );
1305 $add_option->( 'format-skipping', 'fs', '!' );
1306 $add_option->( 'format-skipping-begin', 'fsb', '=s' );
1307 $add_option->( 'format-skipping-end', 'fse', '=s' );
1308 $add_option->( 'hanging-side-comments', 'hsc', '!' );
1309 $add_option->( 'indent-block-comments', 'ibc', '!' );
1310 $add_option->( 'indent-spaced-block-comments', 'isbc', '!' );
1311 $add_option->( 'minimum-space-to-comment', 'msc', '=i' );
1312 $add_option->( 'outdent-long-comments', 'olc', '!' );
1313 $add_option->( 'outdent-static-block-comments', 'osbc', '!' );
1314 $add_option->( 'static-block-comment-prefix', 'sbcp', '=s' );
1315 $add_option->( 'static-block-comments', 'sbc', '!' );
1316 $add_option->( 'static-side-comment-prefix', 'sscp', '=s' );
1317 $add_option->( 'static-side-comments', 'ssc', '!' );
1319 ########################################
1320 $category = 5; # Linebreak controls
1321 ########################################
1322 $add_option->( 'add-newlines', 'anl', '!' );
1323 $add_option->( 'block-brace-vertical-tightness', 'bbvt', '=i' );
1324 $add_option->( 'block-brace-vertical-tightness-list', 'bbvtl', '=s' );
1325 $add_option->( 'brace-vertical-tightness', 'bvt', '=i' );
1326 $add_option->( 'brace-vertical-tightness-closing', 'bvtc', '=i' );
1327 $add_option->( 'cuddled-else', 'ce', '!' );
1328 $add_option->( 'delete-old-newlines', 'dnl', '!' );
1329 $add_option->( 'opening-brace-always-on-right', 'bar', '!' );
1330 $add_option->( 'opening-brace-on-new-line', 'bl', '!' );
1331 $add_option->( 'opening-hash-brace-right', 'ohbr', '!' );
1332 $add_option->( 'opening-paren-right', 'opr', '!' );
1333 $add_option->( 'opening-square-bracket-right', 'osbr', '!' );
1334 $add_option->( 'opening-sub-brace-on-new-line', 'sbl', '!' );
1335 $add_option->( 'paren-vertical-tightness', 'pvt', '=i' );
1336 $add_option->( 'paren-vertical-tightness-closing', 'pvtc', '=i' );
1337 $add_option->( 'stack-closing-hash-brace', 'schb', '!' );
1338 $add_option->( 'stack-closing-paren', 'scp', '!' );
1339 $add_option->( 'stack-closing-square-bracket', 'scsb', '!' );
1340 $add_option->( 'stack-opening-hash-brace', 'sohb', '!' );
1341 $add_option->( 'stack-opening-paren', 'sop', '!' );
1342 $add_option->( 'stack-opening-square-bracket', 'sosb', '!' );
1343 $add_option->( 'vertical-tightness', 'vt', '=i' );
1344 $add_option->( 'vertical-tightness-closing', 'vtc', '=i' );
1345 $add_option->( 'want-break-after', 'wba', '=s' );
1346 $add_option->( 'want-break-before', 'wbb', '=s' );
1348 ########################################
1349 $category = 6; # Controlling list formatting
1350 ########################################
1351 $add_option->( 'break-at-old-comma-breakpoints', 'boc', '!' );
1352 $add_option->( 'comma-arrow-breakpoints', 'cab', '=i' );
1353 $add_option->( 'maximum-fields-per-table', 'mft', '=i' );
1355 ########################################
1356 $category = 7; # Retaining or ignoring existing line breaks
1357 ########################################
1358 $add_option->( 'break-at-old-keyword-breakpoints', 'bok', '!' );
1359 $add_option->( 'break-at-old-logical-breakpoints', 'bol', '!' );
1360 $add_option->( 'break-at-old-ternary-breakpoints', 'bot', '!' );
1361 $add_option->( 'ignore-old-breakpoints', 'iob', '!' );
1363 ########################################
1364 $category = 8; # Blank line control
1365 ########################################
1366 $add_option->( 'blanks-before-blocks', 'bbb', '!' );
1367 $add_option->( 'blanks-before-comments', 'bbc', '!' );
1368 $add_option->( 'blanks-before-subs', 'bbs', '!' );
1369 $add_option->( 'long-block-line-count', 'lbl', '=i' );
1370 $add_option->( 'maximum-consecutive-blank-lines', 'mbl', '=i' );
1371 $add_option->( 'swallow-optional-blank-lines', 'sob', '!' );
1373 ########################################
1374 $category = 9; # Other controls
1375 ########################################
1376 $add_option->( 'delete-block-comments', 'dbc', '!' );
1377 $add_option->( 'delete-closing-side-comments', 'dcsc', '!' );
1378 $add_option->( 'delete-pod', 'dp', '!' );
1379 $add_option->( 'delete-side-comments', 'dsc', '!' );
1380 $add_option->( 'tee-block-comments', 'tbc', '!' );
1381 $add_option->( 'tee-pod', 'tp', '!' );
1382 $add_option->( 'tee-side-comments', 'tsc', '!' );
1383 $add_option->( 'look-for-autoloader', 'lal', '!' );
1384 $add_option->( 'look-for-hash-bang', 'x', '!' );
1385 $add_option->( 'look-for-selfloader', 'lsl', '!' );
1386 $add_option->( 'pass-version-line', 'pvl', '!' );
1388 ########################################
1389 $category = 13; # Debugging
1390 ########################################
1391 $add_option->( 'DEBUG', 'D', '!' );
1392 $add_option->( 'DIAGNOSTICS', 'I', '!' );
1393 $add_option->( 'check-multiline-quotes', 'chk', '!' );
1394 $add_option->( 'dump-defaults', 'ddf', '!' );
1395 $add_option->( 'dump-long-names', 'dln', '!' );
1396 $add_option->( 'dump-options', 'dop', '!' );
1397 $add_option->( 'dump-profile', 'dpro', '!' );
1398 $add_option->( 'dump-short-names', 'dsn', '!' );
1399 $add_option->( 'dump-token-types', 'dtt', '!' );
1400 $add_option->( 'dump-want-left-space', 'dwls', '!' );
1401 $add_option->( 'dump-want-right-space', 'dwrs', '!' );
1402 $add_option->( 'fuzzy-line-length', 'fll', '!' );
1403 $add_option->( 'help', 'h', '' );
1404 $add_option->( 'short-concatenation-item-length', 'scl', '=i' );
1405 $add_option->( 'show-options', 'opt', '!' );
1406 $add_option->( 'version', 'v', '' );
1408 #---------------------------------------------------------------------
1410 # The Perl::Tidy::HtmlWriter will add its own options to the string
1411 Perl::Tidy::HtmlWriter->make_getopt_long_names( \@option_string );
1413 ########################################
1414 # Set categories 10, 11, 12
1415 ########################################
1416 # Based on their known order
1417 $category = 12; # HTML properties
1418 foreach my $opt (@option_string) {
1419 my $long_name = $opt;
1420 $long_name =~ s/(!|=.*|:.*)$//;
1421 unless ( defined( $option_category{$long_name} ) ) {
1422 if ( $long_name =~ /^html-linked/ ) {
1423 $category = 10; # HTML options
1425 elsif ( $long_name =~ /^pod2html/ ) {
1426 $category = 11; # Pod2html
1428 $option_category{$long_name} = $category_name[$category];
1432 #---------------------------------------------------------------
1433 # Assign valid ranges to certain options
1434 #---------------------------------------------------------------
1435 # In the future, these may be used to make preliminary checks
1436 # hash keys are long names
1437 # If key or value is undefined:
1438 # strings may have any value
1439 # integer ranges are >=0
1440 # If value is defined:
1441 # value is [qw(any valid words)] for strings
1442 # value is [min, max] for integers
1443 # if min is undefined, there is no lower limit
1444 # if max is undefined, there is no upper limit
1445 # Parameters not listed here have defaults
1447 'format' => [ 'tidy', 'html', 'user' ],
1448 'output-line-ending' => [ 'dos', 'win', 'mac', 'unix' ],
1450 'block-brace-tightness' => [ 0, 2 ],
1451 'brace-tightness' => [ 0, 2 ],
1452 'paren-tightness' => [ 0, 2 ],
1453 'square-bracket-tightness' => [ 0, 2 ],
1455 'block-brace-vertical-tightness' => [ 0, 2 ],
1456 'brace-vertical-tightness' => [ 0, 2 ],
1457 'brace-vertical-tightness-closing' => [ 0, 2 ],
1458 'paren-vertical-tightness' => [ 0, 2 ],
1459 'paren-vertical-tightness-closing' => [ 0, 2 ],
1460 'square-bracket-vertical-tightness' => [ 0, 2 ],
1461 'square-bracket-vertical-tightness-closing' => [ 0, 2 ],
1462 'vertical-tightness' => [ 0, 2 ],
1463 'vertical-tightness-closing' => [ 0, 2 ],
1465 'closing-brace-indentation' => [ 0, 3 ],
1466 'closing-paren-indentation' => [ 0, 3 ],
1467 'closing-square-bracket-indentation' => [ 0, 3 ],
1468 'closing-token-indentation' => [ 0, 3 ],
1470 'closing-side-comment-else-flag' => [ 0, 2 ],
1471 'comma-arrow-breakpoints' => [ 0, 3 ],
1474 # Note: we could actually allow negative ci if someone really wants it:
1475 # $option_range{'continuation-indentation'} = [ undef, undef ];
1477 #---------------------------------------------------------------
1478 # Assign default values to the above options here, except
1479 # for 'outfile' and 'help'.
1480 # These settings should approximate the perlstyle(1) suggestions.
1481 #---------------------------------------------------------------
1486 blanks-before-blocks
1487 blanks-before-comments
1489 block-brace-tightness=0
1490 block-brace-vertical-tightness=0
1492 brace-vertical-tightness-closing=0
1493 brace-vertical-tightness=0
1494 break-at-old-logical-breakpoints
1495 break-at-old-ternary-breakpoints
1496 break-at-old-keyword-breakpoints
1497 comma-arrow-breakpoints=1
1499 closing-side-comment-interval=6
1500 closing-side-comment-maximum-text=20
1501 closing-side-comment-else-flag=0
1502 closing-paren-indentation=0
1503 closing-brace-indentation=0
1504 closing-square-bracket-indentation=0
1505 continuation-indentation=2
1509 hanging-side-comments
1510 indent-block-comments
1512 long-block-line-count=8
1515 maximum-consecutive-blank-lines=1
1516 maximum-fields-per-table=0
1517 maximum-line-length=80
1518 minimum-space-to-comment=4
1519 nobrace-left-and-indent
1521 nodelete-old-whitespace
1526 nostatic-side-comments
1527 noswallow-optional-blank-lines
1532 outdent-long-comments
1534 paren-vertical-tightness-closing=0
1535 paren-vertical-tightness=0
1539 short-concatenation-item-length=8
1541 square-bracket-tightness=1
1542 square-bracket-vertical-tightness-closing=0
1543 square-bracket-vertical-tightness=0
1544 static-block-comments
1547 backup-file-extension=bak
1551 html-table-of-contents
1555 push @defaults, "perl-syntax-check-flags=-c -T";
1557 #---------------------------------------------------------------
1558 # Define abbreviations which will be expanded into the above primitives.
1559 # These may be defined recursively.
1560 #---------------------------------------------------------------
1563 'freeze-newlines' => [qw(noadd-newlines nodelete-old-newlines)],
1564 'fnl' => [qw(freeze-newlines)],
1565 'freeze-whitespace' => [qw(noadd-whitespace nodelete-old-whitespace)],
1566 'fws' => [qw(freeze-whitespace)],
1567 'indent-only' => [qw(freeze-newlines freeze-whitespace)],
1568 'outdent-long-lines' => [qw(outdent-long-quotes outdent-long-comments)],
1569 'nooutdent-long-lines' =>
1570 [qw(nooutdent-long-quotes nooutdent-long-comments)],
1571 'noll' => [qw(nooutdent-long-lines)],
1572 'io' => [qw(indent-only)],
1573 'delete-all-comments' =>
1574 [qw(delete-block-comments delete-side-comments delete-pod)],
1575 'nodelete-all-comments' =>
1576 [qw(nodelete-block-comments nodelete-side-comments nodelete-pod)],
1577 'dac' => [qw(delete-all-comments)],
1578 'ndac' => [qw(nodelete-all-comments)],
1579 'gnu' => [qw(gnu-style)],
1580 'pbp' => [qw(perl-best-practices)],
1581 'tee-all-comments' =>
1582 [qw(tee-block-comments tee-side-comments tee-pod)],
1583 'notee-all-comments' =>
1584 [qw(notee-block-comments notee-side-comments notee-pod)],
1585 'tac' => [qw(tee-all-comments)],
1586 'ntac' => [qw(notee-all-comments)],
1587 'html' => [qw(format=html)],
1588 'nhtml' => [qw(format=tidy)],
1589 'tidy' => [qw(format=tidy)],
1591 'break-after-comma-arrows' => [qw(cab=0)],
1592 'nobreak-after-comma-arrows' => [qw(cab=1)],
1593 'baa' => [qw(cab=0)],
1594 'nbaa' => [qw(cab=1)],
1596 'break-at-old-trinary-breakpoints' => [qw(bot)],
1598 'cti=0' => [qw(cpi=0 cbi=0 csbi=0)],
1599 'cti=1' => [qw(cpi=1 cbi=1 csbi=1)],
1600 'cti=2' => [qw(cpi=2 cbi=2 csbi=2)],
1601 'icp' => [qw(cpi=2 cbi=2 csbi=2)],
1602 'nicp' => [qw(cpi=0 cbi=0 csbi=0)],
1604 'closing-token-indentation=0' => [qw(cpi=0 cbi=0 csbi=0)],
1605 'closing-token-indentation=1' => [qw(cpi=1 cbi=1 csbi=1)],
1606 'closing-token-indentation=2' => [qw(cpi=2 cbi=2 csbi=2)],
1607 'indent-closing-paren' => [qw(cpi=2 cbi=2 csbi=2)],
1608 'noindent-closing-paren' => [qw(cpi=0 cbi=0 csbi=0)],
1610 'vt=0' => [qw(pvt=0 bvt=0 sbvt=0)],
1611 'vt=1' => [qw(pvt=1 bvt=1 sbvt=1)],
1612 'vt=2' => [qw(pvt=2 bvt=2 sbvt=2)],
1614 'vertical-tightness=0' => [qw(pvt=0 bvt=0 sbvt=0)],
1615 'vertical-tightness=1' => [qw(pvt=1 bvt=1 sbvt=1)],
1616 'vertical-tightness=2' => [qw(pvt=2 bvt=2 sbvt=2)],
1618 'vtc=0' => [qw(pvtc=0 bvtc=0 sbvtc=0)],
1619 'vtc=1' => [qw(pvtc=1 bvtc=1 sbvtc=1)],
1620 'vtc=2' => [qw(pvtc=2 bvtc=2 sbvtc=2)],
1622 'vertical-tightness-closing=0' => [qw(pvtc=0 bvtc=0 sbvtc=0)],
1623 'vertical-tightness-closing=1' => [qw(pvtc=1 bvtc=1 sbvtc=1)],
1624 'vertical-tightness-closing=2' => [qw(pvtc=2 bvtc=2 sbvtc=2)],
1626 'otr' => [qw(opr ohbr osbr)],
1627 'opening-token-right' => [qw(opr ohbr osbr)],
1628 'notr' => [qw(nopr nohbr nosbr)],
1629 'noopening-token-right' => [qw(nopr nohbr nosbr)],
1631 'sot' => [qw(sop sohb sosb)],
1632 'nsot' => [qw(nsop nsohb nsosb)],
1633 'stack-opening-tokens' => [qw(sop sohb sosb)],
1634 'nostack-opening-tokens' => [qw(nsop nsohb nsosb)],
1636 'sct' => [qw(scp schb scsb)],
1637 'stack-closing-tokens' => => [qw(scp schb scsb)],
1638 'nsct' => [qw(nscp nschb nscsb)],
1639 'nostack-opening-tokens' => [qw(nscp nschb nscsb)],
1641 # 'mangle' originally deleted pod and comments, but to keep it
1642 # reversible, it no longer does. But if you really want to
1643 # delete them, just use:
1646 # An interesting use for 'mangle' is to do this:
1647 # perltidy -mangle myfile.pl -st | perltidy -o myfile.pl.new
1648 # which will form as many one-line blocks as possible
1654 delete-old-whitespace
1657 maximum-consecutive-blank-lines=0
1658 maximum-line-length=100000
1662 noblanks-before-blocks
1663 noblanks-before-subs
1668 # 'extrude' originally deleted pod and comments, but to keep it
1669 # reversible, it no longer does. But if you really want to
1670 # delete them, just use
1673 # An interesting use for 'extrude' is to do this:
1674 # perltidy -extrude myfile.pl -st | perltidy -o myfile.pl.new
1675 # which will break up all one-line blocks.
1682 delete-old-whitespace
1685 maximum-consecutive-blank-lines=0
1686 maximum-line-length=1
1689 noblanks-before-blocks
1690 noblanks-before-subs
1697 # this style tries to follow the GNU Coding Standards (which do
1698 # not really apply to perl but which are followed by some perl
1702 lp bl noll pt=2 bt=2 sbt=2 cpi=1 csbi=1 cbi=1
1706 # Style suggested in Damian Conway's Perl Best Practices
1707 'perl-best-practices' => [
1708 qw(l=78 i=4 ci=4 st se vt=2 cti=0 pt=1 bt=1 sbt=1 bbt=1 nsfs nolq),
1709 q(wbb=% + - * / x != == >= <= =~ !~ < > | & = **= += *= &= <<= &&= -= /= |= >>= ||= //= .= %= ^= x=)
1712 # Additional styles can be added here
1715 Perl::Tidy::HtmlWriter->make_abbreviated_names( \%expansion );
1717 # Uncomment next line to dump all expansions for debugging:
1718 # dump_short_names(\%expansion);
1720 \@option_string, \@defaults, \%expansion,
1721 \%option_category, \%option_range
1724 } # end of generate_options
1726 sub process_command_line {
1729 $perltidyrc_stream, $is_Windows, $Windows_type,
1730 $rpending_complaint, $dump_options_type
1736 $roption_string, $rdefaults, $rexpansion,
1737 $roption_category, $roption_range
1738 ) = generate_options();
1740 #---------------------------------------------------------------
1741 # set the defaults by passing the above list through GetOptions
1742 #---------------------------------------------------------------
1748 # do not load the defaults if we are just dumping perltidyrc
1749 unless ( $dump_options_type eq 'perltidyrc' ) {
1750 for $i (@$rdefaults) { push @ARGV, "--" . $i }
1753 # Patch to save users Getopt::Long configuration
1754 # and set to Getopt::Long defaults. Use eval to avoid
1755 # breaking old versions of Perl without these routines.
1757 eval { $glc = Getopt::Long::Configure() };
1759 eval { Getopt::Long::ConfigDefaults() };
1761 else { $glc = undef }
1763 if ( !GetOptions( \%Opts, @$roption_string ) ) {
1764 die "Programming Bug: error in setting default options";
1767 # Patch to put the previous Getopt::Long configuration back
1768 eval { Getopt::Long::Configure($glc) } if defined $glc;
1772 my @raw_options = ();
1773 my $config_file = "";
1774 my $saw_ignore_profile = 0;
1775 my $saw_extrude = 0;
1776 my $saw_dump_profile = 0;
1779 #---------------------------------------------------------------
1780 # Take a first look at the command-line parameters. Do as many
1781 # immediate dumps as possible, which can avoid confusion if the
1782 # perltidyrc file has an error.
1783 #---------------------------------------------------------------
1784 foreach $i (@ARGV) {
1787 if ( $i =~ /^-(npro|noprofile|no-profile)$/ ) {
1788 $saw_ignore_profile = 1;
1791 # note: this must come before -pro and -profile, below:
1792 elsif ( $i =~ /^-(dump-profile|dpro)$/ ) {
1793 $saw_dump_profile = 1;
1795 elsif ( $i =~ /^-(pro|profile)=(.+)/ ) {
1798 "Only one -pro=filename allowed, using '$2' instead of '$config_file'\n";
1801 unless ( -e $config_file ) {
1802 warn "cannot find file given with -pro=$config_file: $!\n";
1806 elsif ( $i =~ /^-(pro|profile)=?$/ ) {
1807 die "usage: -pro=filename or --profile=filename, no spaces\n";
1809 elsif ( $i =~ /^-extrude$/ ) {
1812 elsif ( $i =~ /^-(help|h|HELP|H)$/ ) {
1816 elsif ( $i =~ /^-(version|v)$/ ) {
1820 elsif ( $i =~ /^-(dump-defaults|ddf)$/ ) {
1821 dump_defaults(@$rdefaults);
1824 elsif ( $i =~ /^-(dump-long-names|dln)$/ ) {
1825 dump_long_names(@$roption_string);
1828 elsif ( $i =~ /^-(dump-short-names|dsn)$/ ) {
1829 dump_short_names($rexpansion);
1832 elsif ( $i =~ /^-(dump-token-types|dtt)$/ ) {
1833 Perl::Tidy::Tokenizer->dump_token_types(*STDOUT);
1838 if ( $saw_dump_profile && $saw_ignore_profile ) {
1839 warn "No profile to dump because of -npro\n";
1843 #---------------------------------------------------------------
1844 # read any .perltidyrc configuration file
1845 #---------------------------------------------------------------
1846 unless ($saw_ignore_profile) {
1848 # resolve possible conflict between $perltidyrc_stream passed
1849 # as call parameter to perltidy and -pro=filename on command
1851 if ($perltidyrc_stream) {
1854 Conflict: a perltidyrc configuration file was specified both as this
1855 perltidy call parameter: $perltidyrc_stream
1856 and with this -profile=$config_file.
1857 Using -profile=$config_file.
1861 $config_file = $perltidyrc_stream;
1865 # look for a config file if we don't have one yet
1866 my $rconfig_file_chatter;
1867 $$rconfig_file_chatter = "";
1869 find_config_file( $is_Windows, $Windows_type, $rconfig_file_chatter,
1870 $rpending_complaint )
1871 unless $config_file;
1873 # open any config file
1876 ( $fh_config, $config_file ) =
1877 Perl::Tidy::streamhandle( $config_file, 'r' );
1878 unless ($fh_config) {
1879 $$rconfig_file_chatter .=
1880 "# $config_file exists but cannot be opened\n";
1884 if ($saw_dump_profile) {
1885 if ($saw_dump_profile) {
1886 dump_config_file( $fh_config, $config_file,
1887 $rconfig_file_chatter );
1894 my ( $rconfig_list, $death_message ) =
1895 read_config_file( $fh_config, $config_file, $rexpansion );
1896 die $death_message if ($death_message);
1898 # process any .perltidyrc parameters right now so we can
1900 if (@$rconfig_list) {
1901 local @ARGV = @$rconfig_list;
1903 expand_command_abbreviations( $rexpansion, \@raw_options,
1906 if ( !GetOptions( \%Opts, @$roption_string ) ) {
1908 "Error in this config file: $config_file \nUse -npro to ignore this file, -h for help'\n";
1911 # Anything left in this local @ARGV is an error and must be
1912 # invalid bare words from the configuration file. We cannot
1913 # check this earlier because bare words may have been valid
1914 # values for parameters. We had to wait for GetOptions to have
1918 my $str = "\'" . pop(@ARGV) . "\'";
1919 while ( my $param = pop(@ARGV) ) {
1920 if ( length($str) < 70 ) {
1921 $str .= ", '$param'";
1929 There are $count unrecognized values in the configuration file '$config_file':
1931 Use leading dashes for parameters. Use -npro to ignore this file.
1935 # Undo any options which cause premature exit. They are not
1936 # appropriate for a config file, and it could be hard to
1937 # diagnose the cause of the premature exit.
1946 dump-want-left-space
1947 dump-want-right-space
1955 if ( defined( $Opts{$_} ) ) {
1957 warn "ignoring --$_ in config file: $config_file\n";
1964 #---------------------------------------------------------------
1965 # now process the command line parameters
1966 #---------------------------------------------------------------
1967 expand_command_abbreviations( $rexpansion, \@raw_options, $config_file );
1969 if ( !GetOptions( \%Opts, @$roption_string ) ) {
1970 die "Error on command line; for help try 'perltidy -h'\n";
1973 return ( \%Opts, $config_file, \@raw_options, $saw_extrude, $roption_string,
1974 $rexpansion, $roption_category, $roption_range );
1975 } # end of process_command_line
1979 my ( $rOpts, $is_Windows, $Windows_type, $rpending_complaint ) = @_;
1981 #---------------------------------------------------------------
1982 # check and handle any interactions among the basic options..
1983 #---------------------------------------------------------------
1985 # Since -vt, -vtc, and -cti are abbreviations, but under
1986 # msdos, an unquoted input parameter like vtc=1 will be
1987 # seen as 2 parameters, vtc and 1, so the abbreviations
1988 # won't be seen. Therefore, we will catch them here if
1991 if ( defined $rOpts->{'vertical-tightness'} ) {
1992 my $vt = $rOpts->{'vertical-tightness'};
1993 $rOpts->{'paren-vertical-tightness'} = $vt;
1994 $rOpts->{'square-bracket-vertical-tightness'} = $vt;
1995 $rOpts->{'brace-vertical-tightness'} = $vt;
1998 if ( defined $rOpts->{'vertical-tightness-closing'} ) {
1999 my $vtc = $rOpts->{'vertical-tightness-closing'};
2000 $rOpts->{'paren-vertical-tightness-closing'} = $vtc;
2001 $rOpts->{'square-bracket-vertical-tightness-closing'} = $vtc;
2002 $rOpts->{'brace-vertical-tightness-closing'} = $vtc;
2005 if ( defined $rOpts->{'closing-token-indentation'} ) {
2006 my $cti = $rOpts->{'closing-token-indentation'};
2007 $rOpts->{'closing-square-bracket-indentation'} = $cti;
2008 $rOpts->{'closing-brace-indentation'} = $cti;
2009 $rOpts->{'closing-paren-indentation'} = $cti;
2012 # In quiet mode, there is no log file and hence no way to report
2013 # results of syntax check, so don't do it.
2014 if ( $rOpts->{'quiet'} ) {
2015 $rOpts->{'check-syntax'} = 0;
2018 # can't check syntax if no output
2019 if ( $rOpts->{'format'} ne 'tidy' ) {
2020 $rOpts->{'check-syntax'} = 0;
2023 # Never let Windows 9x/Me systems run syntax check -- this will prevent a
2024 # wide variety of nasty problems on these systems, because they cannot
2025 # reliably run backticks. Don't even think about changing this!
2026 if ( $rOpts->{'check-syntax'}
2028 && ( !$Windows_type || $Windows_type =~ /^(9|Me)/ ) )
2030 $rOpts->{'check-syntax'} = 0;
2033 # It's really a bad idea to check syntax as root unless you wrote
2034 # the script yourself. FIXME: not sure if this works with VMS
2035 unless ($is_Windows) {
2037 if ( $< == 0 && $rOpts->{'check-syntax'} ) {
2038 $rOpts->{'check-syntax'} = 0;
2039 $$rpending_complaint .=
2040 "Syntax check deactivated for safety; you shouldn't run this as root\n";
2044 # see if user set a non-negative logfile-gap
2045 if ( defined( $rOpts->{'logfile-gap'} ) && $rOpts->{'logfile-gap'} >= 0 ) {
2047 # a zero gap will be taken as a 1
2048 if ( $rOpts->{'logfile-gap'} == 0 ) {
2049 $rOpts->{'logfile-gap'} = 1;
2052 # setting a non-negative logfile gap causes logfile to be saved
2053 $rOpts->{'logfile'} = 1;
2056 # not setting logfile gap, or setting it negative, causes default of 50
2058 $rOpts->{'logfile-gap'} = 50;
2061 # set short-cut flag when only indentation is to be done.
2062 # Note that the user may or may not have already set the
2064 if ( !$rOpts->{'add-whitespace'}
2065 && !$rOpts->{'delete-old-whitespace'}
2066 && !$rOpts->{'add-newlines'}
2067 && !$rOpts->{'delete-old-newlines'} )
2069 $rOpts->{'indent-only'} = 1;
2072 # -isbc implies -ibc
2073 if ( $rOpts->{'indent-spaced-block-comments'} ) {
2074 $rOpts->{'indent-block-comments'} = 1;
2077 # -bli flag implies -bl
2078 if ( $rOpts->{'brace-left-and-indent'} ) {
2079 $rOpts->{'opening-brace-on-new-line'} = 1;
2082 if ( $rOpts->{'opening-brace-always-on-right'}
2083 && $rOpts->{'opening-brace-on-new-line'} )
2086 Conflict: you specified both 'opening-brace-always-on-right' (-bar) and
2087 'opening-brace-on-new-line' (-bl). Ignoring -bl.
2089 $rOpts->{'opening-brace-on-new-line'} = 0;
2092 # it simplifies things if -bl is 0 rather than undefined
2093 if ( !defined( $rOpts->{'opening-brace-on-new-line'} ) ) {
2094 $rOpts->{'opening-brace-on-new-line'} = 0;
2097 # -sbl defaults to -bl if not defined
2098 if ( !defined( $rOpts->{'opening-sub-brace-on-new-line'} ) ) {
2099 $rOpts->{'opening-sub-brace-on-new-line'} =
2100 $rOpts->{'opening-brace-on-new-line'};
2103 # set shortcut flag if no blanks to be written
2104 unless ( $rOpts->{'maximum-consecutive-blank-lines'} ) {
2105 $rOpts->{'swallow-optional-blank-lines'} = 1;
2108 if ( $rOpts->{'entab-leading-whitespace'} ) {
2109 if ( $rOpts->{'entab-leading-whitespace'} < 0 ) {
2110 warn "-et=n must use a positive integer; ignoring -et\n";
2111 $rOpts->{'entab-leading-whitespace'} = undef;
2114 # entab leading whitespace has priority over the older 'tabs' option
2115 if ( $rOpts->{'tabs'} ) { $rOpts->{'tabs'} = 0; }
2119 sub expand_command_abbreviations {
2121 # go through @ARGV and expand any abbreviations
2123 my ( $rexpansion, $rraw_options, $config_file ) = @_;
2126 # set a pass limit to prevent an infinite loop;
2127 # 10 should be plenty, but it may be increased to allow deeply
2128 # nested expansions.
2129 my $max_passes = 10;
2132 # keep looping until all expansions have been converted into actual
2134 for ( my $pass_count = 0 ; $pass_count <= $max_passes ; $pass_count++ ) {
2136 my $abbrev_count = 0;
2138 # loop over each item in @ARGV..
2139 foreach $word (@ARGV) {
2141 # convert any leading 'no-' to just 'no'
2142 if ( $word =~ /^(-[-]?no)-(.*)/ ) { $word = $1 . $2 }
2144 # if it is a dash flag (instead of a file name)..
2145 if ( $word =~ /^-[-]?([\w\-]+)(.*)/ ) {
2150 # save the raw input for debug output in case of circular refs
2151 if ( $pass_count == 0 ) {
2152 push( @$rraw_options, $word );
2155 # recombine abbreviation and flag, if necessary,
2156 # to allow abbreviations with arguments such as '-vt=1'
2157 if ( $rexpansion->{ $abr . $flags } ) {
2158 $abr = $abr . $flags;
2162 # if we see this dash item in the expansion hash..
2163 if ( $rexpansion->{$abr} ) {
2166 # stuff all of the words that it expands to into the
2167 # new arg list for the next pass
2168 foreach my $abbrev ( @{ $rexpansion->{$abr} } ) {
2169 next unless $abbrev; # for safety; shouldn't happen
2170 push( @new_argv, '--' . $abbrev . $flags );
2174 # not in expansion hash, must be actual long name
2176 push( @new_argv, $word );
2180 # not a dash item, so just save it for the next pass
2182 push( @new_argv, $word );
2184 } # end of this pass
2186 # update parameter list @ARGV to the new one
2188 last unless ( $abbrev_count > 0 );
2190 # make sure we are not in an infinite loop
2191 if ( $pass_count == $max_passes ) {
2193 "I'm tired. We seem to be in an infinite loop trying to expand aliases.\n";
2194 print STDERR "Here are the raw options\n";
2196 print STDERR "(@$rraw_options)\n";
2197 my $num = @new_argv;
2200 print STDERR "After $max_passes passes here is ARGV\n";
2201 print STDERR "(@new_argv)\n";
2204 print STDERR "After $max_passes passes ARGV has $num entries\n";
2209 Please check your configuration file $config_file for circular-references.
2210 To deactivate it, use -npro.
2215 Program bug - circular-references in the %expansion hash, probably due to
2216 a recent program change.
2219 } # end of check for circular references
2220 } # end of loop over all passes
2223 # Debug routine -- this will dump the expansion hash
2224 sub dump_short_names {
2225 my $rexpansion = shift;
2227 List of short names. This list shows how all abbreviations are
2228 translated into other abbreviations and, eventually, into long names.
2229 New abbreviations may be defined in a .perltidyrc file.
2230 For a list of all long names, use perltidy --dump-long-names (-dln).
2231 --------------------------------------------------------------------------
2233 foreach my $abbrev ( sort keys %$rexpansion ) {
2234 my @list = @{ $$rexpansion{$abbrev} };
2235 print STDOUT "$abbrev --> @list\n";
2239 sub check_vms_filename {
2241 # given a valid filename (the perltidy input file)
2242 # create a modified filename and separator character
2245 # Contributed by Michael Cartmell
2247 my ( $base, $path ) = fileparse( $_[0] );
2249 # remove explicit ; version
2250 $base =~ s/;-?\d*$//
2252 # remove explicit . version ie two dots in filename NB ^ escapes a dot
2253 or $base =~ s/( # begin capture $1
2254 (?:^|[^^])\. # match a dot not preceded by a caret
2255 (?: # followed by nothing
2257 .*[^^] # anything ending in a non caret
2260 \.-?\d*$ # match . version number
2263 # normalise filename, if there are no unescaped dots then append one
2264 $base .= '.' unless $base =~ /(?:^|[^^])\./;
2266 # if we don't already have an extension then we just append the extention
2267 my $separator = ( $base =~ /\.$/ ) ? "" : "_";
2268 return ( $path . $base, $separator );
2273 # TODO: are these more standard names?
2274 # Win32s Win95 Win98 WinMe WinNT3.51 WinNT4 Win2000 WinXP/.Net Win2003
2276 # Returns a string that determines what MS OS we are on.
2277 # Returns win32s,95,98,Me,NT3.51,NT4,2000,XP/.Net,Win2003
2278 # Returns blank string if not an MS system.
2279 # Original code contributed by: Yves Orton
2280 # We need to know this to decide where to look for config files
2282 my $rpending_complaint = shift;
2284 return $os unless $^O =~ /win32|dos/i; # is it a MS box?
2286 # Systems built from Perl source may not have Win32.pm
2287 # But probably have Win32::GetOSVersion() anyway so the
2288 # following line is not 'required':
2289 # return $os unless eval('require Win32');
2291 # Use the standard API call to determine the version
2292 my ( $undef, $major, $minor, $build, $id );
2293 eval { ( $undef, $major, $minor, $build, $id ) = Win32::GetOSVersion() };
2296 # NAME ID MAJOR MINOR
2297 # Windows NT 4 2 4 0
2298 # Windows 2000 2 5 0
2300 # Windows Server 2003 2 5 2
2302 return "win32s" unless $id; # If id==0 then its a win32s box.
2303 $os = { # Magic numbers from MSDN
2304 # documentation of GetOSVersion
2311 0 => "2000", # or NT 4, see below
2318 # If $os is undefined, the above code is out of date. Suggested updates
2320 unless ( defined $os ) {
2322 $$rpending_complaint .= <<EOS;
2323 Error trying to discover Win_OS_Type: $id:$major:$minor Has no name of record!
2324 We won't be able to look for a system-wide config file.
2328 # Unfortunately the logic used for the various versions isnt so clever..
2329 # so we have to handle an outside case.
2330 return ( $os eq "2000" && $major != 5 ) ? "NT4" : $os;
2334 return ( $^O !~ /win32|dos/i )
2337 && ( $^O ne 'MacOS' );
2340 sub look_for_Windows {
2342 # determine Windows sub-type and location of
2343 # system-wide configuration files
2344 my $rpending_complaint = shift;
2345 my $is_Windows = ( $^O =~ /win32|dos/i );
2346 my $Windows_type = Win_OS_Type($rpending_complaint) if $is_Windows;
2347 return ( $is_Windows, $Windows_type );
2350 sub find_config_file {
2352 # look for a .perltidyrc configuration file
2353 my ( $is_Windows, $Windows_type, $rconfig_file_chatter,
2354 $rpending_complaint ) = @_;
2356 $$rconfig_file_chatter .= "# Config file search...system reported as:";
2358 $$rconfig_file_chatter .= "Windows $Windows_type\n";
2361 $$rconfig_file_chatter .= " $^O\n";
2364 # sub to check file existance and record all tests
2365 my $exists_config_file = sub {
2366 my $config_file = shift;
2367 return 0 unless $config_file;
2368 $$rconfig_file_chatter .= "# Testing: $config_file\n";
2369 return -f $config_file;
2374 # look in current directory first
2375 $config_file = ".perltidyrc";
2376 return $config_file if $exists_config_file->($config_file);
2378 # Default environment vars.
2379 my @envs = qw(PERLTIDY HOME);
2381 # Check the NT/2k/XP locations, first a local machine def, then a
2383 push @envs, qw(USERPROFILE HOMESHARE) if $^O =~ /win32/i;
2385 # Now go through the enviornment ...
2386 foreach my $var (@envs) {
2387 $$rconfig_file_chatter .= "# Examining: \$ENV{$var}";
2388 if ( defined( $ENV{$var} ) ) {
2389 $$rconfig_file_chatter .= " = $ENV{$var}\n";
2391 # test ENV{ PERLTIDY } as file:
2392 if ( $var eq 'PERLTIDY' ) {
2393 $config_file = "$ENV{$var}";
2394 return $config_file if $exists_config_file->($config_file);
2397 # test ENV as directory:
2398 $config_file = catfile( $ENV{$var}, ".perltidyrc" );
2399 return $config_file if $exists_config_file->($config_file);
2402 $$rconfig_file_chatter .= "\n";
2406 # then look for a system-wide definition
2407 # where to look varies with OS
2410 if ($Windows_type) {
2411 my ( $os, $system, $allusers ) =
2412 Win_Config_Locs( $rpending_complaint, $Windows_type );
2414 # Check All Users directory, if there is one.
2416 $config_file = catfile( $allusers, ".perltidyrc" );
2417 return $config_file if $exists_config_file->($config_file);
2420 # Check system directory.
2421 $config_file = catfile( $system, ".perltidyrc" );
2422 return $config_file if $exists_config_file->($config_file);
2426 # Place to add customization code for other systems
2427 elsif ( $^O eq 'OS2' ) {
2429 elsif ( $^O eq 'MacOS' ) {
2431 elsif ( $^O eq 'VMS' ) {
2434 # Assume some kind of Unix
2437 $config_file = "/usr/local/etc/perltidyrc";
2438 return $config_file if $exists_config_file->($config_file);
2440 $config_file = "/etc/perltidyrc";
2441 return $config_file if $exists_config_file->($config_file);
2444 # Couldn't find a config file
2448 sub Win_Config_Locs {
2450 # In scalar context returns the OS name (95 98 ME NT3.51 NT4 2000 XP),
2451 # or undef if its not a win32 OS. In list context returns OS, System
2452 # Directory, and All Users Directory. All Users will be empty on a
2453 # 9x/Me box. Contributed by: Yves Orton.
2455 my $rpending_complaint = shift;
2456 my $os = (@_) ? shift : Win_OS_Type();
2462 if ( $os =~ /9[58]|Me/ ) {
2463 $system = "C:/Windows";
2465 elsif ( $os =~ /NT|XP|200?/ ) {
2466 $system = ( $os =~ /XP/ ) ? "C:/Windows/" : "C:/WinNT/";
2469 ? "C:/WinNT/profiles/All Users/"
2470 : "C:/Documents and Settings/All Users/";
2474 # This currently would only happen on a win32s computer. I dont have
2475 # one to test, so I am unsure how to proceed. Suggestions welcome!
2476 $$rpending_complaint .=
2477 "I dont know a sensible place to look for config files on an $os system.\n";
2480 return wantarray ? ( $os, $system, $allusers ) : $os;
2483 sub dump_config_file {
2485 my $config_file = shift;
2486 my $rconfig_file_chatter = shift;
2487 print STDOUT "$$rconfig_file_chatter";
2489 print STDOUT "# Dump of file: '$config_file'\n";
2490 while ( my $line = $fh->getline() ) { print STDOUT $line }
2491 eval { $fh->close() };
2494 print STDOUT "# ...no config file found\n";
2498 sub read_config_file {
2500 my ( $fh, $config_file, $rexpansion ) = @_;
2501 my @config_list = ();
2503 # file is bad if non-empty $death_message is returned
2504 my $death_message = "";
2508 while ( my $line = $fh->getline() ) {
2511 next if $line =~ /^\s*#/; # skip full-line comment
2512 ( $line, $death_message ) =
2513 strip_comment( $line, $config_file, $line_no );
2514 last if ($death_message);
2515 $line =~ s/^\s*(.*?)\s*$/$1/; # trim both ends
2518 # look for something of the general form
2523 if ( $line =~ /^((\w+)\s*\{)?([^}]*)(\})?$/ ) {
2524 my ( $newname, $body, $curly ) = ( $2, $3, $4 );
2526 # handle a new alias definition
2530 "No '}' seen after $name and before $newname in config file $config_file line $.\n";
2535 if ( ${$rexpansion}{$name} ) {
2537 my @names = sort keys %$rexpansion;
2539 "Here is a list of all installed aliases\n(@names)\n"
2540 . "Attempting to redefine alias ($name) in config file $config_file line $.\n";
2543 ${$rexpansion}{$name} = [];
2549 my ( $rbody_parts, $msg ) = parse_args($body);
2551 $death_message = <<EOM;
2552 Error reading file '$config_file' at line number $line_no.
2554 Please fix this line or use -npro to avoid reading this file
2561 # remove leading dashes if this is an alias
2562 foreach (@$rbody_parts) { s/^\-+//; }
2563 push @{ ${$rexpansion}{$name} }, @$rbody_parts;
2566 push( @config_list, @$rbody_parts );
2573 "Unexpected '}' seen in config file $config_file line $.\n";
2580 eval { $fh->close() };
2581 return ( \@config_list, $death_message );
2586 my ( $instr, $config_file, $line_no ) = @_;
2589 # nothing to do if no comments
2590 if ( $instr !~ /#/ ) {
2591 return ( $instr, $msg );
2594 # use simple method of no quotes
2595 elsif ( $instr !~ /['"]/ ) {
2596 $instr =~ s/\s*\#.*$//; # simple trim
2597 return ( $instr, $msg );
2600 # handle comments and quotes
2602 my $quote_char = "";
2605 # looking for ending quote character
2607 if ( $instr =~ /\G($quote_char)/gc ) {
2611 elsif ( $instr =~ /\G(.)/gc ) {
2615 # error..we reached the end without seeing the ending quote char
2618 Error reading file $config_file at line number $line_no.
2619 Did not see ending quote character <$quote_char> in this text:
2621 Please fix this line or use -npro to avoid reading this file
2627 # accumulating characters and looking for start of a quoted string
2629 if ( $instr =~ /\G([\"\'])/gc ) {
2633 elsif ( $instr =~ /\G#/gc ) {
2636 elsif ( $instr =~ /\G(.)/gc ) {
2644 return ( $outstr, $msg );
2649 # Parse a command string containing multiple string with possible
2650 # quotes, into individual commands. It might look like this, for example:
2652 # -wba=" + - " -some-thing -wbb='. && ||'
2654 # There is no need, at present, to handle escaped quote characters.
2655 # (They are not perltidy tokens, so needn't be in strings).
2658 my @body_parts = ();
2659 my $quote_char = "";
2664 # looking for ending quote character
2666 if ( $body =~ /\G($quote_char)/gc ) {
2669 elsif ( $body =~ /\G(.)/gc ) {
2673 # error..we reached the end without seeing the ending quote char
2675 if ( length($part) ) { push @body_parts, $part; }
2677 Did not see ending quote character <$quote_char> in this text:
2684 # accumulating characters and looking for start of a quoted string
2686 if ( $body =~ /\G([\"\'])/gc ) {
2689 elsif ( $body =~ /\G(\s+)/gc ) {
2690 if ( length($part) ) { push @body_parts, $part; }
2693 elsif ( $body =~ /\G(.)/gc ) {
2697 if ( length($part) ) { push @body_parts, $part; }
2702 return ( \@body_parts, $msg );
2705 sub dump_long_names {
2707 my @names = sort @_;
2709 # Command line long names (passed to GetOptions)
2710 #---------------------------------------------------------------
2711 # here is a summary of the Getopt codes:
2712 # <none> does not take an argument
2713 # =s takes a mandatory string
2714 # :s takes an optional string
2715 # =i takes a mandatory integer
2716 # :i takes an optional integer
2717 # ! does not take an argument and may be negated
2718 # i.e., -foo and -nofoo are allowed
2719 # a double dash signals the end of the options list
2721 #---------------------------------------------------------------
2724 foreach (@names) { print STDOUT "$_\n" }
2728 my @defaults = sort @_;
2729 print STDOUT "Default command line options:\n";
2730 foreach (@_) { print STDOUT "$_\n" }
2735 # write the options back out as a valid .perltidyrc file
2736 my ( $rOpts, $roption_string ) = @_;
2738 my $rGetopt_flags = \%Getopt_flags;
2739 foreach my $opt ( @{$roption_string} ) {
2741 if ( $opt =~ /(.*)(!|=.*)$/ ) {
2745 if ( defined( $rOpts->{$opt} ) ) {
2746 $rGetopt_flags->{$opt} = $flag;
2749 print STDOUT "# Final parameter set for this run:\n";
2750 foreach my $key ( sort keys %{$rOpts} ) {
2751 my $flag = $rGetopt_flags->{$key};
2752 my $value = $rOpts->{$key};
2756 if ( $flag =~ /^=/ ) {
2757 if ( $value !~ /^\d+$/ ) { $value = '"' . $value . '"' }
2758 $suffix = "=" . $value;
2760 elsif ( $flag =~ /^!/ ) {
2761 $prefix .= "no" unless ($value);
2767 "# ERROR in dump_options: unrecognized flag $flag for $key\n";
2770 print STDOUT $prefix . $key . $suffix . "\n";
2776 This is perltidy, v$VERSION
2778 Copyright 2000-2007, Steve Hancock
2780 Perltidy is free software and may be copied under the terms of the GNU
2781 General Public License, which is included in the distribution files.
2783 Complete documentation for perltidy can be found using 'man perltidy'
2784 or on the internet at http://perltidy.sourceforge.net.
2791 This is perltidy version $VERSION, a perl script indenter. Usage:
2793 perltidy [ options ] file1 file2 file3 ...
2794 (output goes to file1.tdy, file2.tdy, file3.tdy, ...)
2795 perltidy [ options ] file1 -o outfile
2796 perltidy [ options ] file1 -st >outfile
2797 perltidy [ options ] <infile >outfile
2799 Options have short and long forms. Short forms are shown; see
2800 man pages for long forms. Note: '=s' indicates a required string,
2801 and '=n' indicates a required integer.
2805 -o=file name of the output file (only if single input file)
2806 -oext=s change output extension from 'tdy' to s
2807 -opath=path change path to be 'path' for output files
2808 -b backup original to .bak and modify file in-place
2809 -bext=s change default backup extension from 'bak' to s
2810 -q deactivate error messages (for running under editor)
2811 -w include non-critical warning messages in the .ERR error output
2812 -syn run perl -c to check syntax (default under unix systems)
2813 -log save .LOG file, which has useful diagnostics
2814 -f force perltidy to read a binary file
2815 -g like -log but writes more detailed .LOG file, for debugging scripts
2816 -opt write the set of options actually used to a .LOG file
2817 -npro ignore .perltidyrc configuration command file
2818 -pro=file read configuration commands from file instead of .perltidyrc
2819 -st send output to standard output, STDOUT
2820 -se send error output to standard error output, STDERR
2821 -v display version number to standard output and quit
2824 -i=n use n columns per indentation level (default n=4)
2825 -t tabs: use one tab character per indentation level, not recommeded
2826 -nt no tabs: use n spaces per indentation level (default)
2827 -et=n entab leading whitespace n spaces per tab; not recommended
2828 -io "indent only": just do indentation, no other formatting.
2829 -sil=n set starting indentation level to n; use if auto detection fails
2830 -ole=s specify output line ending (s=dos or win, mac, unix)
2831 -ple keep output line endings same as input (input must be filename)
2834 -fws freeze whitespace; this disables all whitespace changes
2835 and disables the following switches:
2836 -bt=n sets brace tightness, n= (0 = loose, 1=default, 2 = tight)
2837 -bbt same as -bt but for code block braces; same as -bt if not given
2838 -bbvt block braces vertically tight; use with -bl or -bli
2839 -bbvtl=s make -bbvt to apply to selected list of block types
2840 -pt=n paren tightness (n=0, 1 or 2)
2841 -sbt=n square bracket tightness (n=0, 1, or 2)
2842 -bvt=n brace vertical tightness,
2843 n=(0=open, 1=close unless multiple steps on a line, 2=always close)
2844 -pvt=n paren vertical tightness (see -bvt for n)
2845 -sbvt=n square bracket vertical tightness (see -bvt for n)
2846 -bvtc=n closing brace vertical tightness:
2847 n=(0=open, 1=sometimes close, 2=always close)
2848 -pvtc=n closing paren vertical tightness, see -bvtc for n.
2849 -sbvtc=n closing square bracket vertical tightness, see -bvtc for n.
2850 -ci=n sets continuation indentation=n, default is n=2 spaces
2851 -lp line up parentheses, brackets, and non-BLOCK braces
2852 -sfs add space before semicolon in for( ; ; )
2853 -aws allow perltidy to add whitespace (default)
2854 -dws delete all old non-essential whitespace
2855 -icb indent closing brace of a code block
2856 -cti=n closing indentation of paren, square bracket, or non-block brace:
2857 n=0 none, =1 align with opening, =2 one full indentation level
2858 -icp equivalent to -cti=2
2859 -wls=s want space left of tokens in string; i.e. -nwls='+ - * /'
2860 -wrs=s want space right of tokens in string;
2861 -sts put space before terminal semicolon of a statement
2862 -sak=s put space between keywords given in s and '(';
2863 -nsak=s no space between keywords in s and '('; i.e. -nsak='my our local'
2866 -fnl freeze newlines; this disables all line break changes
2867 and disables the following switches:
2868 -anl add newlines; ok to introduce new line breaks
2869 -bbs add blank line before subs and packages
2870 -bbc add blank line before block comments
2871 -bbb add blank line between major blocks
2872 -sob swallow optional blank lines
2873 -ce cuddled else; use this style: '} else {'
2874 -dnl delete old newlines (default)
2875 -mbl=n maximum consecutive blank lines (default=1)
2876 -l=n maximum line length; default n=80
2877 -bl opening brace on new line
2878 -sbl opening sub brace on new line. value of -bl is used if not given.
2879 -bli opening brace on new line and indented
2880 -bar opening brace always on right, even for long clauses
2881 -vt=n vertical tightness (requires -lp); n controls break after opening
2882 token: 0=never 1=no break if next line balanced 2=no break
2883 -vtc=n vertical tightness of closing container; n controls if closing
2884 token starts new line: 0=always 1=not unless list 1=never
2885 -wba=s want break after tokens in string; i.e. wba=': .'
2886 -wbb=s want break before tokens in string
2888 Following Old Breakpoints
2889 -boc break at old comma breaks: turns off all automatic list formatting
2890 -bol break at old logical breakpoints: or, and, ||, && (default)
2891 -bok break at old list keyword breakpoints such as map, sort (default)
2892 -bot break at old conditional (ternary ?:) operator breakpoints (default)
2893 -cab=n break at commas after a comma-arrow (=>):
2894 n=0 break at all commas after =>
2895 n=1 stable: break unless this breaks an existing one-line container
2896 n=2 break only if a one-line container cannot be formed
2897 n=3 do not treat commas after => specially at all
2900 -ibc indent block comments (default)
2901 -isbc indent spaced block comments; may indent unless no leading space
2902 -msc=n minimum desired spaces to side comment, default 4
2903 -csc add or update closing side comments after closing BLOCK brace
2904 -dcsc delete closing side comments created by a -csc command
2905 -cscp=s change closing side comment prefix to be other than '## end'
2906 -cscl=s change closing side comment to apply to selected list of blocks
2907 -csci=n minimum number of lines needed to apply a -csc tag, default n=6
2908 -csct=n maximum number of columns of appended text, default n=20
2909 -cscw causes warning if old side comment is overwritten with -csc
2911 -sbc use 'static block comments' identified by leading '##' (default)
2912 -sbcp=s change static block comment identifier to be other than '##'
2913 -osbc outdent static block comments
2915 -ssc use 'static side comments' identified by leading '##' (default)
2916 -sscp=s change static side comment identifier to be other than '##'
2918 Delete selected text
2919 -dac delete all comments AND pod
2920 -dbc delete block comments
2921 -dsc delete side comments
2924 Send selected text to a '.TEE' file
2925 -tac tee all comments AND pod
2926 -tbc tee block comments
2927 -tsc tee side comments
2931 -olq outdent long quoted strings (default)
2932 -olc outdent a long block comment line
2933 -ola outdent statement labels
2934 -okw outdent control keywords (redo, next, last, goto, return)
2935 -okwl=s specify alternative keywords for -okw command
2938 -mft=n maximum fields per table; default n=40
2939 -x do not format lines before hash-bang line (i.e., for VMS)
2940 -asc allows perltidy to add a ';' when missing (default)
2941 -dsm allows perltidy to delete an unnecessary ';' (default)
2943 Combinations of other parameters
2944 -gnu attempt to follow GNU Coding Standards as applied to perl
2945 -mangle remove as many newlines as possible (but keep comments and pods)
2946 -extrude insert as many newlines as possible
2948 Dump and die, debugging
2949 -dop dump options used in this run to standard output and quit
2950 -ddf dump default options to standard output and quit
2951 -dsn dump all option short names to standard output and quit
2952 -dln dump option long names to standard output and quit
2953 -dpro dump whatever configuration file is in effect to standard output
2954 -dtt dump all token types to standard output and quit
2957 -html write an html file (see 'man perl2web' for many options)
2958 Note: when -html is used, no indentation or formatting are done.
2959 Hint: try perltidy -html -css=mystyle.css filename.pl
2960 and edit mystyle.css to change the appearance of filename.html.
2961 -nnn gives line numbers
2962 -pre only writes out <pre>..</pre> code section
2963 -toc places a table of contents to subs at the top (default)
2964 -pod passes pod text through pod2html (default)
2965 -frm write html as a frame (3 files)
2966 -text=s extra extension for table of contents if -frm, default='toc'
2967 -sext=s extra extension for file content if -frm, default='src'
2969 A prefix of "n" negates short form toggle switches, and a prefix of "no"
2970 negates the long forms. For example, -nasc means don't add missing
2973 If you are unable to see this entire text, try "perltidy -h | more"
2974 For more detailed information, and additional options, try "man perltidy",
2975 or go to the perltidy home page at http://perltidy.sourceforge.net
2980 sub process_this_file {
2982 my ( $truth, $beauty ) = @_;
2984 # loop to process each line of this file
2985 while ( my $line_of_tokens = $truth->get_line() ) {
2986 $beauty->write_line($line_of_tokens);
2990 eval { $beauty->finish_formatting() };
2991 $truth->report_tokenization_errors();
2996 # Use 'perl -c' to make sure that we did not create bad syntax
2997 # This is a very good independent check for programming errors
2999 # Given names of the input and output files, ($ifname, $ofname),
3000 # we do the following:
3001 # - check syntax of the input file
3002 # - if bad, all done (could be an incomplete code snippet)
3003 # - if infile syntax ok, then check syntax of the output file;
3004 # - if outfile syntax bad, issue warning; this implies a code bug!
3005 # - set and return flag "infile_syntax_ok" : =-1 bad 0 unknown 1 good
3007 my ( $ifname, $ofname, $logger_object, $rOpts ) = @_;
3008 my $infile_syntax_ok = 0;
3009 my $line_of_dashes = '-' x 42 . "\n";
3011 my $flags = $rOpts->{'perl-syntax-check-flags'};
3013 # be sure we invoke perl with -c
3014 # note: perl will accept repeated flags like '-c -c'. It is safest
3015 # to append another -c than try to find an interior bundled c, as
3016 # in -Tc, because such a 'c' might be in a quoted string, for example.
3017 if ( $flags !~ /(^-c|\s+-c)/ ) { $flags .= " -c" }
3019 # be sure we invoke perl with -x if requested
3020 # same comments about repeated parameters applies
3021 if ( $rOpts->{'look-for-hash-bang'} ) {
3022 if ( $flags !~ /(^-x|\s+-x)/ ) { $flags .= " -x" }
3025 # this shouldn't happen unless a termporary file couldn't be made
3026 if ( $ifname eq '-' ) {
3027 $logger_object->write_logfile_entry(
3028 "Cannot run perl -c on STDIN and STDOUT\n");
3029 return $infile_syntax_ok;
3032 $logger_object->write_logfile_entry(
3033 "checking input file syntax with perl $flags\n");
3034 $logger_object->write_logfile_entry($line_of_dashes);
3036 # Not all operating systems/shells support redirection of the standard
3038 my $error_redirection = ( $^O eq 'VMS' ) ? "" : '2>&1';
3040 my $perl_output = do_syntax_check( $ifname, $flags, $error_redirection );
3041 $logger_object->write_logfile_entry("$perl_output\n");
3043 if ( $perl_output =~ /syntax\s*OK/ ) {
3044 $infile_syntax_ok = 1;
3045 $logger_object->write_logfile_entry($line_of_dashes);
3046 $logger_object->write_logfile_entry(
3047 "checking output file syntax with perl $flags ...\n");
3048 $logger_object->write_logfile_entry($line_of_dashes);
3051 do_syntax_check( $ofname, $flags, $error_redirection );
3052 $logger_object->write_logfile_entry("$perl_output\n");
3054 unless ( $perl_output =~ /syntax\s*OK/ ) {
3055 $logger_object->write_logfile_entry($line_of_dashes);
3056 $logger_object->warning(
3057 "The output file has a syntax error when tested with perl $flags $ofname !\n"
3059 $logger_object->warning(
3060 "This implies an error in perltidy; the file $ofname is bad\n");
3061 $logger_object->report_definite_bug();
3063 # the perl version number will be helpful for diagnosing the problem
3064 $logger_object->write_logfile_entry(
3065 qx/perl -v $error_redirection/ . "\n" );
3070 # Only warn of perl -c syntax errors. Other messages,
3071 # such as missing modules, are too common. They can be
3072 # seen by running with perltidy -w
3073 $logger_object->complain("A syntax check using perl $flags gives: \n");
3074 $logger_object->complain($line_of_dashes);
3075 $logger_object->complain("$perl_output\n");
3076 $logger_object->complain($line_of_dashes);
3077 $infile_syntax_ok = -1;
3078 $logger_object->write_logfile_entry($line_of_dashes);
3079 $logger_object->write_logfile_entry(
3080 "The output file will not be checked because of input file problems\n"
3083 return $infile_syntax_ok;
3086 sub do_syntax_check {
3087 my ( $fname, $flags, $error_redirection ) = @_;
3089 # We have to quote the filename in case it has unusual characters
3090 # or spaces. Example: this filename #CM11.pm# gives trouble.
3091 $fname = '"' . $fname . '"';
3093 # Under VMS something like -T will become -t (and an error) so we
3094 # will put quotes around the flags. Double quotes seem to work on
3095 # Unix/Windows/VMS, but this may not work on all systems. (Single
3096 # quotes do not work under Windows). It could become necessary to
3097 # put double quotes around each flag, such as: -"c" -"T"
3098 # We may eventually need some system-dependent coding here.
3099 $flags = '"' . $flags . '"';
3101 # now wish for luck...
3102 return qx/perl $flags $fname $error_redirection/;
3105 #####################################################################
3107 # This is a stripped down version of IO::Scalar
3108 # Given a reference to a scalar, it supplies either:
3109 # a getline method which reads lines (mode='r'), or
3110 # a print method which reads lines (mode='w')
3112 #####################################################################
3113 package Perl::Tidy::IOScalar;
3117 my ( $package, $rscalar, $mode ) = @_;
3118 my $ref = ref $rscalar;
3119 if ( $ref ne 'SCALAR' ) {
3121 ------------------------------------------------------------------------
3122 expecting ref to SCALAR but got ref to ($ref); trace follows:
3123 ------------------------------------------------------------------------
3127 if ( $mode eq 'w' ) {
3129 return bless [ $rscalar, $mode ], $package;
3131 elsif ( $mode eq 'r' ) {
3133 # Convert a scalar to an array.
3134 # This avoids looking for "\n" on each call to getline
3135 my @array = map { $_ .= "\n" } split /\n/, ${$rscalar};
3137 return bless [ \@array, $mode, $i_next ], $package;
3141 ------------------------------------------------------------------------
3142 expecting mode = 'r' or 'w' but got mode ($mode); trace follows:
3143 ------------------------------------------------------------------------
3150 my $mode = $self->[1];
3151 if ( $mode ne 'r' ) {
3153 ------------------------------------------------------------------------
3154 getline call requires mode = 'r' but mode = ($mode); trace follows:
3155 ------------------------------------------------------------------------
3158 my $i = $self->[2]++;
3159 ##my $line = $self->[0]->[$i];
3160 return $self->[0]->[$i];
3165 my $mode = $self->[1];
3166 if ( $mode ne 'w' ) {
3168 ------------------------------------------------------------------------
3169 print call requires mode = 'w' but mode = ($mode); trace follows:
3170 ------------------------------------------------------------------------
3173 ${ $self->[0] } .= $_[0];
3175 sub close { return }
3177 #####################################################################
3179 # This is a stripped down version of IO::ScalarArray
3180 # Given a reference to an array, it supplies either:
3181 # a getline method which reads lines (mode='r'), or
3182 # a print method which reads lines (mode='w')
3184 # NOTE: this routine assumes that that there aren't any embedded
3185 # newlines within any of the array elements. There are no checks
3188 #####################################################################
3189 package Perl::Tidy::IOScalarArray;
3193 my ( $package, $rarray, $mode ) = @_;
3194 my $ref = ref $rarray;
3195 if ( $ref ne 'ARRAY' ) {
3197 ------------------------------------------------------------------------
3198 expecting ref to ARRAY but got ref to ($ref); trace follows:
3199 ------------------------------------------------------------------------
3203 if ( $mode eq 'w' ) {
3205 return bless [ $rarray, $mode ], $package;
3207 elsif ( $mode eq 'r' ) {
3209 return bless [ $rarray, $mode, $i_next ], $package;
3213 ------------------------------------------------------------------------
3214 expecting mode = 'r' or 'w' but got mode ($mode); trace follows:
3215 ------------------------------------------------------------------------
3222 my $mode = $self->[1];
3223 if ( $mode ne 'r' ) {
3225 ------------------------------------------------------------------------
3226 getline requires mode = 'r' but mode = ($mode); trace follows:
3227 ------------------------------------------------------------------------
3230 my $i = $self->[2]++;
3231 ##my $line = $self->[0]->[$i];
3232 return $self->[0]->[$i];
3237 my $mode = $self->[1];
3238 if ( $mode ne 'w' ) {
3240 ------------------------------------------------------------------------
3241 print requires mode = 'w' but mode = ($mode); trace follows:
3242 ------------------------------------------------------------------------
3245 push @{ $self->[0] }, $_[0];
3247 sub close { return }
3249 #####################################################################
3251 # the Perl::Tidy::LineSource class supplies an object with a 'get_line()' method
3252 # which returns the next line to be parsed
3254 #####################################################################
3256 package Perl::Tidy::LineSource;
3260 my ( $class, $input_file, $rOpts, $rpending_logfile_message ) = @_;
3261 my $input_file_copy = undef;
3264 my $input_line_ending;
3265 if ( $rOpts->{'preserve-line-endings'} ) {
3266 $input_line_ending = Perl::Tidy::find_input_line_ending($input_file);
3269 ( my $fh, $input_file ) = Perl::Tidy::streamhandle( $input_file, 'r' );
3270 return undef unless $fh;
3272 # in order to check output syntax when standard output is used,
3273 # or when it is an object, we have to make a copy of the file
3274 if ( ( $input_file eq '-' || ref $input_file ) && $rOpts->{'check-syntax'} )
3277 # Turning off syntax check when input output is used.
3278 # The reason is that temporary files cause problems on
3280 $rOpts->{'check-syntax'} = 0;
3281 $input_file_copy = '-';
3283 $$rpending_logfile_message .= <<EOM;
3284 Note: --syntax check will be skipped because standard input is used
3291 _fh_copy => $fh_copy,
3292 _filename => $input_file,
3293 _input_file_copy => $input_file_copy,
3294 _input_line_ending => $input_line_ending,
3295 _rinput_buffer => [],
3300 sub get_input_file_copy_name {
3302 my $ifname = $self->{_input_file_copy};
3304 $ifname = $self->{_filename};
3309 sub close_input_file {
3311 eval { $self->{_fh}->close() };
3312 eval { $self->{_fh_copy}->close() } if $self->{_fh_copy};
3318 my $fh = $self->{_fh};
3319 my $fh_copy = $self->{_fh_copy};
3320 my $rinput_buffer = $self->{_rinput_buffer};
3322 if ( scalar(@$rinput_buffer) ) {
3323 $line = shift @$rinput_buffer;
3326 $line = $fh->getline();
3328 # patch to read raw mac files under unix, dos
3329 # see if the first line has embedded \r's
3330 if ( $line && !$self->{_started} ) {
3331 if ( $line =~ /[\015][^\015\012]/ ) {
3333 # found one -- break the line up and store in a buffer
3334 @$rinput_buffer = map { $_ . "\n" } split /\015/, $line;
3335 my $count = @$rinput_buffer;
3336 $line = shift @$rinput_buffer;
3338 $self->{_started}++;
3341 if ( $line && $fh_copy ) { $fh_copy->print($line); }
3348 my $fh = $self->{_fh};
3349 my $fh_copy = $self->{_fh_copy};
3350 $line = $fh->getline();
3351 if ( $line && $fh_copy ) { $fh_copy->print($line); }
3355 #####################################################################
3357 # the Perl::Tidy::LineSink class supplies a write_line method for
3358 # actual file writing
3360 #####################################################################
3362 package Perl::Tidy::LineSink;
3366 my ( $class, $output_file, $tee_file, $line_separator, $rOpts,
3367 $rpending_logfile_message, $binmode )
3370 my $fh_copy = undef;
3372 my $output_file_copy = "";
3373 my $output_file_open = 0;
3375 if ( $rOpts->{'format'} eq 'tidy' ) {
3376 ( $fh, $output_file ) = Perl::Tidy::streamhandle( $output_file, 'w' );
3377 unless ($fh) { die "Cannot write to output stream\n"; }
3378 $output_file_open = 1;
3380 if ( ref($fh) eq 'IO::File' ) {
3383 if ( $output_file eq '-' ) { binmode STDOUT }
3387 # in order to check output syntax when standard output is used,
3388 # or when it is an object, we have to make a copy of the file
3389 if ( $output_file eq '-' || ref $output_file ) {
3390 if ( $rOpts->{'check-syntax'} ) {
3392 # Turning off syntax check when standard output is used.
3393 # The reason is that temporary files cause problems on
3395 $rOpts->{'check-syntax'} = 0;
3396 $output_file_copy = '-';
3397 $$rpending_logfile_message .= <<EOM;
3398 Note: --syntax check will be skipped because standard output is used
3406 _fh_copy => $fh_copy,
3408 _output_file => $output_file,
3409 _output_file_open => $output_file_open,
3410 _output_file_copy => $output_file_copy,
3412 _tee_file => $tee_file,
3413 _tee_file_opened => 0,
3414 _line_separator => $line_separator,
3415 _binmode => $binmode,
3422 my $fh = $self->{_fh};
3423 my $fh_copy = $self->{_fh_copy};
3425 my $output_file_open = $self->{_output_file_open};
3427 $_[0] .= $self->{_line_separator};
3429 $fh->print( $_[0] ) if ( $self->{_output_file_open} );
3430 print $fh_copy $_[0] if ( $fh_copy && $self->{_output_file_copy} );
3432 if ( $self->{_tee_flag} ) {
3433 unless ( $self->{_tee_file_opened} ) { $self->really_open_tee_file() }
3434 my $fh_tee = $self->{_fh_tee};
3435 print $fh_tee $_[0];
3439 sub get_output_file_copy {
3441 my $ofname = $self->{_output_file_copy};
3443 $ofname = $self->{_output_file};
3450 $self->{_tee_flag} = 1;
3455 $self->{_tee_flag} = 0;
3458 sub really_open_tee_file {
3460 my $tee_file = $self->{_tee_file};
3462 $fh_tee = IO::File->new(">$tee_file")
3463 or die("couldn't open TEE file $tee_file: $!\n");
3464 binmode $fh_tee if $self->{_binmode};
3465 $self->{_tee_file_opened} = 1;
3466 $self->{_fh_tee} = $fh_tee;
3469 sub close_output_file {
3471 eval { $self->{_fh}->close() } if $self->{_output_file_open};
3472 eval { $self->{_fh_copy}->close() } if ( $self->{_output_file_copy} );
3473 $self->close_tee_file();
3476 sub close_tee_file {
3479 if ( $self->{_tee_file_opened} ) {
3480 eval { $self->{_fh_tee}->close() };
3481 $self->{_tee_file_opened} = 0;
3485 #####################################################################
3487 # The Perl::Tidy::Diagnostics class writes the DIAGNOSTICS file, which is
3488 # useful for program development.
3490 # Only one such file is created regardless of the number of input
3491 # files processed. This allows the results of processing many files
3492 # to be summarized in a single file.
3494 #####################################################################
3496 package Perl::Tidy::Diagnostics;
3502 _write_diagnostics_count => 0,
3503 _last_diagnostic_file => "",
3509 sub set_input_file {
3511 $self->{_input_file} = $_[0];
3514 # This is a diagnostic routine which is useful for program development.
3515 # Output from debug messages go to a file named DIAGNOSTICS, where
3516 # they are labeled by file and line. This allows many files to be
3517 # scanned at once for some particular condition of interest.
3518 sub write_diagnostics {
3521 unless ( $self->{_write_diagnostics_count} ) {
3522 open DIAGNOSTICS, ">DIAGNOSTICS"
3523 or death("couldn't open DIAGNOSTICS: $!\n");
3526 my $last_diagnostic_file = $self->{_last_diagnostic_file};
3527 my $input_file = $self->{_input_file};
3528 if ( $last_diagnostic_file ne $input_file ) {
3529 print DIAGNOSTICS "\nFILE:$input_file\n";
3531 $self->{_last_diagnostic_file} = $input_file;
3532 my $input_line_number = Perl::Tidy::Tokenizer::get_input_line_number();
3533 print DIAGNOSTICS "$input_line_number:\t@_";
3534 $self->{_write_diagnostics_count}++;
3537 #####################################################################
3539 # The Perl::Tidy::Logger class writes the .LOG and .ERR files
3541 #####################################################################
3543 package Perl::Tidy::Logger;
3548 my ( $rOpts, $log_file, $warning_file, $saw_extrude ) = @_;
3550 # remove any old error output file
3551 unless ( ref($warning_file) ) {
3552 if ( -e $warning_file ) { unlink($warning_file) }
3556 _log_file => $log_file,
3557 _fh_warnings => undef,
3559 _fh_warnings => undef,
3560 _last_input_line_written => 0,
3561 _at_end_of_file => 0,
3563 _block_log_output => 0,
3564 _line_of_tokens => undef,
3565 _output_line_number => undef,
3566 _wrote_line_information_string => 0,
3567 _wrote_column_headings => 0,
3568 _warning_file => $warning_file,
3569 _warning_count => 0,
3570 _complaint_count => 0,
3571 _saw_code_bug => -1, # -1=no 0=maybe 1=for sure
3572 _saw_brace_error => 0,
3573 _saw_extrude => $saw_extrude,
3574 _output_array => [],
3578 sub close_log_file {
3581 if ( $self->{_fh_warnings} ) {
3582 eval { $self->{_fh_warnings}->close() };
3583 $self->{_fh_warnings} = undef;
3587 sub get_warning_count {
3589 return $self->{_warning_count};
3592 sub get_use_prefix {
3594 return $self->{_use_prefix};
3597 sub block_log_output {
3599 $self->{_block_log_output} = 1;
3602 sub unblock_log_output {
3604 $self->{_block_log_output} = 0;
3607 sub interrupt_logfile {
3609 $self->{_use_prefix} = 0;
3610 $self->warning("\n");
3611 $self->write_logfile_entry( '#' x 24 . " WARNING " . '#' x 25 . "\n" );
3614 sub resume_logfile {
3616 $self->write_logfile_entry( '#' x 60 . "\n" );
3617 $self->{_use_prefix} = 1;
3620 sub we_are_at_the_last_line {
3622 unless ( $self->{_wrote_line_information_string} ) {
3623 $self->write_logfile_entry("Last line\n\n");
3625 $self->{_at_end_of_file} = 1;
3628 # record some stuff in case we go down in flames
3631 my ( $line_of_tokens, $output_line_number ) = @_;
3632 my $input_line = $line_of_tokens->{_line_text};
3633 my $input_line_number = $line_of_tokens->{_line_number};
3635 # save line information in case we have to write a logfile message
3636 $self->{_line_of_tokens} = $line_of_tokens;
3637 $self->{_output_line_number} = $output_line_number;
3638 $self->{_wrote_line_information_string} = 0;
3640 my $last_input_line_written = $self->{_last_input_line_written};
3641 my $rOpts = $self->{_rOpts};
3644 ( $input_line_number - $last_input_line_written ) >=
3645 $rOpts->{'logfile-gap'}
3647 || ( $input_line =~ /^\s*(sub|package)\s+(\w+)/ )
3650 my $rlevels = $line_of_tokens->{_rlevels};
3651 my $structural_indentation_level = $$rlevels[0];
3652 $self->{_last_input_line_written} = $input_line_number;
3653 ( my $out_str = $input_line ) =~ s/^\s*//;
3656 $out_str = ( '.' x $structural_indentation_level ) . $out_str;
3658 if ( length($out_str) > 35 ) {
3659 $out_str = substr( $out_str, 0, 35 ) . " ....";
3661 $self->logfile_output( "", "$out_str\n" );
3665 sub write_logfile_entry {
3668 # add leading >>> to avoid confusing error mesages and code
3669 $self->logfile_output( ">>>", "@_" );
3672 sub write_column_headings {
3675 $self->{_wrote_column_headings} = 1;
3676 my $routput_array = $self->{_output_array};
3677 push @{$routput_array}, <<EOM;
3678 The nesting depths in the table below are at the start of the lines.
3679 The indicated output line numbers are not always exact.
3680 ci = levels of continuation indentation; bk = 1 if in BLOCK, 0 if not.
3682 in:out indent c b nesting code + messages; (messages begin with >>>)
3683 lines levels i k (code begins with one '.' per indent level)
3684 ------ ----- - - -------- -------------------------------------------
3688 sub make_line_information_string {
3690 # make columns of information when a logfile message needs to go out
3692 my $line_of_tokens = $self->{_line_of_tokens};
3693 my $input_line_number = $line_of_tokens->{_line_number};
3694 my $line_information_string = "";
3695 if ($input_line_number) {
3697 my $output_line_number = $self->{_output_line_number};
3698 my $brace_depth = $line_of_tokens->{_curly_brace_depth};
3699 my $paren_depth = $line_of_tokens->{_paren_depth};
3700 my $square_bracket_depth = $line_of_tokens->{_square_bracket_depth};
3701 my $python_indentation_level =
3702 $line_of_tokens->{_python_indentation_level};
3703 my $rlevels = $line_of_tokens->{_rlevels};
3704 my $rnesting_tokens = $line_of_tokens->{_rnesting_tokens};
3705 my $rci_levels = $line_of_tokens->{_rci_levels};
3706 my $rnesting_blocks = $line_of_tokens->{_rnesting_blocks};
3708 my $structural_indentation_level = $$rlevels[0];
3710 $self->write_column_headings() unless $self->{_wrote_column_headings};
3712 # keep logfile columns aligned for scripts up to 999 lines;
3713 # for longer scripts it doesn't really matter
3714 my $extra_space = "";
3716 ( $input_line_number < 10 ) ? " "
3717 : ( $input_line_number < 100 ) ? " "
3720 ( $output_line_number < 10 ) ? " "
3721 : ( $output_line_number < 100 ) ? " "
3724 # there are 2 possible nesting strings:
3725 # the original which looks like this: (0 [1 {2
3726 # the new one, which looks like this: {{[
3727 # the new one is easier to read, and shows the order, but
3728 # could be arbitrarily long, so we use it unless it is too long
3729 my $nesting_string =
3730 "($paren_depth [$square_bracket_depth {$brace_depth";
3731 my $nesting_string_new = $$rnesting_tokens[0];
3733 my $ci_level = $$rci_levels[0];
3734 if ( $ci_level > 9 ) { $ci_level = '*' }
3735 my $bk = ( $$rnesting_blocks[0] =~ /1$/ ) ? '1' : '0';
3737 if ( length($nesting_string_new) <= 8 ) {
3739 $nesting_string_new . " " x ( 8 - length($nesting_string_new) );
3741 if ( $python_indentation_level < 0 ) { $python_indentation_level = 0 }
3742 $line_information_string =
3743 "L$input_line_number:$output_line_number$extra_space i$python_indentation_level:$structural_indentation_level $ci_level $bk $nesting_string";
3745 return $line_information_string;
3748 sub logfile_output {
3750 my ( $prompt, $msg ) = @_;
3751 return if ( $self->{_block_log_output} );
3753 my $routput_array = $self->{_output_array};
3754 if ( $self->{_at_end_of_file} || !$self->{_use_prefix} ) {
3755 push @{$routput_array}, "$msg";
3758 my $line_information_string = $self->make_line_information_string();
3759 $self->{_wrote_line_information_string} = 1;
3761 if ($line_information_string) {
3762 push @{$routput_array}, "$line_information_string $prompt$msg";
3765 push @{$routput_array}, "$msg";
3770 sub get_saw_brace_error {
3772 return $self->{_saw_brace_error};
3775 sub increment_brace_error {
3777 $self->{_saw_brace_error}++;
3782 use constant BRACE_WARNING_LIMIT => 10;
3783 my $saw_brace_error = $self->{_saw_brace_error};
3785 if ( $saw_brace_error < BRACE_WARNING_LIMIT ) {
3789 $self->{_saw_brace_error} = $saw_brace_error;
3791 if ( $saw_brace_error == BRACE_WARNING_LIMIT ) {
3792 $self->warning("No further warnings of this type will be given\n");
3798 # handle non-critical warning messages based on input flag
3800 my $rOpts = $self->{_rOpts};
3802 # these appear in .ERR output only if -w flag is used
3803 if ( $rOpts->{'warning-output'} ) {
3807 # otherwise, they go to the .LOG file
3809 $self->{_complaint_count}++;
3810 $self->write_logfile_entry(@_);
3816 # report errors to .ERR file (or stdout)
3818 use constant WARNING_LIMIT => 50;
3820 my $rOpts = $self->{_rOpts};
3821 unless ( $rOpts->{'quiet'} ) {
3823 my $warning_count = $self->{_warning_count};
3824 unless ($warning_count) {
3825 my $warning_file = $self->{_warning_file};
3827 if ( $rOpts->{'standard-error-output'} ) {
3828 $fh_warnings = *STDERR;
3831 ( $fh_warnings, my $filename ) =
3832 Perl::Tidy::streamhandle( $warning_file, 'w' );
3833 $fh_warnings or die("couldn't open $filename $!\n");
3834 warn "## Please see file $filename\n";
3836 $self->{_fh_warnings} = $fh_warnings;
3839 my $fh_warnings = $self->{_fh_warnings};
3840 if ( $warning_count < WARNING_LIMIT ) {
3841 if ( $self->get_use_prefix() > 0 ) {
3842 my $input_line_number =
3843 Perl::Tidy::Tokenizer::get_input_line_number();
3844 $fh_warnings->print("$input_line_number:\t@_");
3845 $self->write_logfile_entry("WARNING: @_");
3848 $fh_warnings->print(@_);
3849 $self->write_logfile_entry(@_);
3853 $self->{_warning_count} = $warning_count;
3855 if ( $warning_count == WARNING_LIMIT ) {
3856 $fh_warnings->print("No further warnings will be given\n");
3861 # programming bug codes:
3863 # 0 = maybe, not sure.
3865 sub report_possible_bug {
3867 my $saw_code_bug = $self->{_saw_code_bug};
3868 $self->{_saw_code_bug} = ( $saw_code_bug < 0 ) ? 0 : $saw_code_bug;
3871 sub report_definite_bug {
3873 $self->{_saw_code_bug} = 1;
3876 sub ask_user_for_bug_report {
3879 my ( $infile_syntax_ok, $formatter ) = @_;
3880 my $saw_code_bug = $self->{_saw_code_bug};
3881 if ( ( $saw_code_bug == 0 ) && ( $infile_syntax_ok == 1 ) ) {
3882 $self->warning(<<EOM);
3884 You may have encountered a code bug in perltidy. If you think so, and
3885 the problem is not listed in the BUGS file at
3886 http://perltidy.sourceforge.net, please report it so that it can be
3887 corrected. Include the smallest possible script which has the problem,
3888 along with the .LOG file. See the manual pages for contact information.
3893 elsif ( $saw_code_bug == 1 ) {
3894 if ( $self->{_saw_extrude} ) {
3895 $self->warning(<<EOM);
3897 You may have encountered a bug in perltidy. However, since you are using the
3898 -extrude option, the problem may be with perl or one of its modules, which have
3899 occasional problems with this type of file. If you believe that the
3900 problem is with perltidy, and the problem is not listed in the BUGS file at
3901 http://perltidy.sourceforge.net, please report it so that it can be corrected.
3902 Include the smallest possible script which has the problem, along with the .LOG
3903 file. See the manual pages for contact information.
3908 $self->warning(<<EOM);
3910 Oops, you seem to have encountered a bug in perltidy. Please check the
3911 BUGS file at http://perltidy.sourceforge.net. If the problem is not
3912 listed there, please report it so that it can be corrected. Include the
3913 smallest possible script which produces this message, along with the
3914 .LOG file if appropriate. See the manual pages for contact information.
3915 Your efforts are appreciated.
3918 my $added_semicolon_count = 0;
3920 $added_semicolon_count =
3921 $formatter->get_added_semicolon_count();
3923 if ( $added_semicolon_count > 0 ) {
3924 $self->warning(<<EOM);
3926 The log file shows that perltidy added $added_semicolon_count semicolons.
3927 Please rerun with -nasc to see if that is the cause of the syntax error. Even
3928 if that is the problem, please report it so that it can be fixed.
3938 # called after all formatting to summarize errors
3940 my ( $infile_syntax_ok, $formatter ) = @_;
3942 my $rOpts = $self->{_rOpts};
3943 my $warning_count = $self->{_warning_count};
3944 my $saw_code_bug = $self->{_saw_code_bug};
3946 my $save_logfile = ( $saw_code_bug == 0 && $infile_syntax_ok == 1 )
3947 || $saw_code_bug == 1
3948 || $rOpts->{'logfile'};
3949 my $log_file = $self->{_log_file};
3950 if ($warning_count) {
3951 if ($save_logfile) {
3952 $self->block_log_output(); # avoid echoing this to the logfile
3954 "The logfile $log_file may contain useful information\n");
3955 $self->unblock_log_output();
3958 if ( $self->{_complaint_count} > 0 ) {
3960 "To see $self->{_complaint_count} non-critical warnings rerun with -w\n"
3964 if ( $self->{_saw_brace_error}
3965 && ( $rOpts->{'logfile-gap'} > 1 || !$save_logfile ) )
3967 $self->warning("To save a full .LOG file rerun with -g\n");
3970 $self->ask_user_for_bug_report( $infile_syntax_ok, $formatter );
3972 if ($save_logfile) {
3973 my $log_file = $self->{_log_file};
3974 my ( $fh, $filename ) = Perl::Tidy::streamhandle( $log_file, 'w' );
3976 my $routput_array = $self->{_output_array};
3977 foreach ( @{$routput_array} ) { $fh->print($_) }
3978 eval { $fh->close() };
3983 #####################################################################
3985 # The Perl::Tidy::DevNull class supplies a dummy print method
3987 #####################################################################
3989 package Perl::Tidy::DevNull;
3990 sub new { return bless {}, $_[0] }
3991 sub print { return }
3992 sub close { return }
3994 #####################################################################
3996 # The Perl::Tidy::HtmlWriter class writes a copy of the input stream in html
3998 #####################################################################
4000 package Perl::Tidy::HtmlWriter;
4010 %short_to_long_names
4014 $missing_html_entities
4017 # replace unsafe characters with HTML entity representation if HTML::Entities
4019 { eval "use HTML::Entities"; $missing_html_entities = $@; }
4023 my ( $class, $input_file, $html_file, $extension, $html_toc_extension,
4024 $html_src_extension )
4027 my $html_file_opened = 0;
4029 ( $html_fh, my $html_filename ) =
4030 Perl::Tidy::streamhandle( $html_file, 'w' );
4032 warn("can't open $html_file: $!\n");
4035 $html_file_opened = 1;
4037 if ( !$input_file || $input_file eq '-' || ref($input_file) ) {
4038 $input_file = "NONAME";
4041 # write the table of contents to a string
4043 my $html_toc_fh = Perl::Tidy::IOScalar->new( \$toc_string, 'w' );
4046 my @pre_string_stack;
4047 if ( $rOpts->{'html-pre-only'} ) {
4049 # pre section goes directly to the output stream
4050 $html_pre_fh = $html_fh;
4051 $html_pre_fh->print( <<"PRE_END");
4057 # pre section go out to a temporary string
4059 $html_pre_fh = Perl::Tidy::IOScalar->new( \$pre_string, 'w' );
4060 push @pre_string_stack, \$pre_string;
4063 # pod text gets diverted if the 'pod2html' is used
4066 if ( $rOpts->{'pod2html'} ) {
4067 if ( $rOpts->{'html-pre-only'} ) {
4068 undef $rOpts->{'pod2html'};
4071 eval "use Pod::Html";
4074 "unable to find Pod::Html; cannot use pod2html\n-npod disables this message\n";
4075 undef $rOpts->{'pod2html'};
4078 $html_pod_fh = Perl::Tidy::IOScalar->new( \$pod_string, 'w' );
4085 if ( $rOpts->{'frames'} ) {
4086 unless ($extension) {
4088 "cannot use frames without a specified output extension; ignoring -frm\n";
4089 undef $rOpts->{'frames'};
4092 $toc_filename = $input_file . $html_toc_extension . $extension;
4093 $src_filename = $input_file . $html_src_extension . $extension;
4097 # ----------------------------------------------------------
4098 # Output is now directed as follows:
4099 # html_toc_fh <-- table of contents items
4100 # html_pre_fh <-- the <pre> section of formatted code, except:
4101 # html_pod_fh <-- pod goes here with the pod2html option
4102 # ----------------------------------------------------------
4104 my $title = $rOpts->{'title'};
4106 ( $title, my $path ) = fileparse($input_file);
4108 my $toc_item_count = 0;
4109 my $in_toc_package = "";
4112 _input_file => $input_file, # name of input file
4113 _title => $title, # title, unescaped
4114 _html_file => $html_file, # name of .html output file
4115 _toc_filename => $toc_filename, # for frames option
4116 _src_filename => $src_filename, # for frames option
4117 _html_file_opened => $html_file_opened, # a flag
4118 _html_fh => $html_fh, # the output stream
4119 _html_pre_fh => $html_pre_fh, # pre section goes here
4120 _rpre_string_stack => \@pre_string_stack, # stack of pre sections
4121 _html_pod_fh => $html_pod_fh, # pod goes here if pod2html
4122 _rpod_string => \$pod_string, # string holding pod
4123 _pod_cut_count => 0, # how many =cut's?
4124 _html_toc_fh => $html_toc_fh, # fh for table of contents
4125 _rtoc_string => \$toc_string, # string holding toc
4126 _rtoc_item_count => \$toc_item_count, # how many toc items
4127 _rin_toc_package => \$in_toc_package, # package name
4128 _rtoc_name_count => {}, # hash to track unique names
4129 _rpackage_stack => [], # stack to check for package
4131 _rlast_level => \$last_level, # brace indentation level
4137 # Add an item to the html table of contents.
4138 # This is called even if no table of contents is written,
4139 # because we still want to put the anchors in the <pre> text.
4140 # We are given an anchor name and its type; types are:
4141 # 'package', 'sub', '__END__', '__DATA__', 'EOF'
4142 # There must be an 'EOF' call at the end to wrap things up.
4144 my ( $name, $type ) = @_;
4145 my $html_toc_fh = $self->{_html_toc_fh};
4146 my $html_pre_fh = $self->{_html_pre_fh};
4147 my $rtoc_name_count = $self->{_rtoc_name_count};
4148 my $rtoc_item_count = $self->{_rtoc_item_count};
4149 my $rlast_level = $self->{_rlast_level};
4150 my $rin_toc_package = $self->{_rin_toc_package};
4151 my $rpackage_stack = $self->{_rpackage_stack};
4153 # packages contain sublists of subs, so to avoid errors all package
4154 # items are written and finished with the following routines
4155 my $end_package_list = sub {
4156 if ($$rin_toc_package) {
4157 $html_toc_fh->print("</ul>\n</li>\n");
4158 $$rin_toc_package = "";
4162 my $start_package_list = sub {
4163 my ( $unique_name, $package ) = @_;
4164 if ($$rin_toc_package) { $end_package_list->() }
4165 $html_toc_fh->print(<<EOM);
4166 <li><a href=\"#$unique_name\">package $package</a>
4169 $$rin_toc_package = $package;
4172 # start the table of contents on the first item
4173 unless ($$rtoc_item_count) {
4175 # but just quit if we hit EOF without any other entries
4176 # in this case, there will be no toc
4177 return if ( $type eq 'EOF' );
4178 $html_toc_fh->print( <<"TOC_END");
4179 <!-- BEGIN CODE INDEX --><a name="code-index"></a>
4183 $$rtoc_item_count++;
4185 # make a unique anchor name for this location:
4186 # - packages get a 'package-' prefix
4187 # - subs use their names
4188 my $unique_name = $name;
4189 if ( $type eq 'package' ) { $unique_name = "package-$name" }
4191 # append '-1', '-2', etc if necessary to make unique; this will
4192 # be unique because subs and packages cannot have a '-'
4193 if ( my $count = $rtoc_name_count->{ lc $unique_name }++ ) {
4194 $unique_name .= "-$count";
4197 # - all names get terminal '-' if pod2html is used, to avoid
4198 # conflicts with anchor names created by pod2html
4199 if ( $rOpts->{'pod2html'} ) { $unique_name .= '-' }
4201 # start/stop lists of subs
4202 if ( $type eq 'sub' ) {
4203 my $package = $rpackage_stack->[$$rlast_level];
4204 unless ($package) { $package = 'main' }
4206 # if we're already in a package/sub list, be sure its the right
4207 # package or else close it
4208 if ( $$rin_toc_package && $$rin_toc_package ne $package ) {
4209 $end_package_list->();
4212 # start a package/sub list if necessary
4213 unless ($$rin_toc_package) {
4214 $start_package_list->( $unique_name, $package );
4218 # now write an entry in the toc for this item
4219 if ( $type eq 'package' ) {
4220 $start_package_list->( $unique_name, $name );
4222 elsif ( $type eq 'sub' ) {
4223 $html_toc_fh->print("<li><a href=\"#$unique_name\">$name</a></li>\n");
4226 $end_package_list->();
4227 $html_toc_fh->print("<li><a href=\"#$unique_name\">$name</a></li>\n");
4230 # write the anchor in the <pre> section
4231 $html_pre_fh->print("<a name=\"$unique_name\"></a>");
4233 # end the table of contents, if any, on the end of file
4234 if ( $type eq 'EOF' ) {
4235 $html_toc_fh->print( <<"TOC_END");
4237 <!-- END CODE INDEX -->
4244 # This is the official list of tokens which may be identified by the
4245 # user. Long names are used as getopt keys. Short names are
4246 # convenient short abbreviations for specifying input. Short names
4247 # somewhat resemble token type characters, but are often different
4248 # because they may only be alphanumeric, to allow command line
4249 # input. Also, note that because of case insensitivity of html,
4250 # this table must be in a single case only (I've chosen to use all
4252 # When adding NEW_TOKENS: update this hash table
4253 # short names => long names
4254 %short_to_long_names = (
4264 'pu' => 'punctuation',
4265 'i' => 'identifier',
4267 'h' => 'here-doc-target',
4268 'hh' => 'here-doc-text',
4270 'sc' => 'semicolon',
4271 'm' => 'subroutine',
4275 # Now we have to map actual token types into one of the above short
4276 # names; any token types not mapped will get 'punctuation'
4279 # The values of this hash table correspond to the keys of the
4280 # previous hash table.
4281 # The keys of this hash table are token types and can be seen
4282 # by running with --dump-token-types (-dtt).
4284 # When adding NEW_TOKENS: update this hash table
4285 # $type => $short_name
4286 %token_short_names = (
4311 # These token types will all be called identifiers for now
4312 # FIXME: need to separate user defined modules as separate type
4313 my @identifier = qw" i t U C Y Z G :: ";
4314 @token_short_names{@identifier} = ('i') x scalar(@identifier);
4316 # These token types will be called 'structure'
4317 my @structure = qw" { } ";
4318 @token_short_names{@structure} = ('s') x scalar(@structure);
4320 # OLD NOTES: save for reference
4321 # Any of these could be added later if it would be useful.
4322 # For now, they will by default become punctuation
4323 # my @list = qw" L R [ ] ";
4324 # @token_long_names{@list} = ('non-structure') x scalar(@list);
4327 # / /= * *= ** **= + += - -= % %= = ++ -- << <<= >> >>= pp p m mm
4329 # @token_long_names{@list} = ('math') x scalar(@list);
4331 # my @list = qw" & &= ~ ~= ^ ^= | |= ";
4332 # @token_long_names{@list} = ('bit') x scalar(@list);
4334 # my @list = qw" == != < > <= <=> ";
4335 # @token_long_names{@list} = ('numerical-comparison') x scalar(@list);
4337 # my @list = qw" && || ! &&= ||= //= ";
4338 # @token_long_names{@list} = ('logical') x scalar(@list);
4340 # my @list = qw" . .= =~ !~ x x= ";
4341 # @token_long_names{@list} = ('string-operators') x scalar(@list);
4344 # my @list = qw" .. -> <> ... \ ? ";
4345 # @token_long_names{@list} = ('misc-operators') x scalar(@list);
4349 sub make_getopt_long_names {
4351 my ($rgetopt_names) = @_;
4352 while ( my ( $short_name, $name ) = each %short_to_long_names ) {
4353 push @$rgetopt_names, "html-color-$name=s";
4354 push @$rgetopt_names, "html-italic-$name!";
4355 push @$rgetopt_names, "html-bold-$name!";
4357 push @$rgetopt_names, "html-color-background=s";
4358 push @$rgetopt_names, "html-linked-style-sheet=s";
4359 push @$rgetopt_names, "nohtml-style-sheets";
4360 push @$rgetopt_names, "html-pre-only";
4361 push @$rgetopt_names, "html-line-numbers";
4362 push @$rgetopt_names, "html-entities!";
4363 push @$rgetopt_names, "stylesheet";
4364 push @$rgetopt_names, "html-table-of-contents!";
4365 push @$rgetopt_names, "pod2html!";
4366 push @$rgetopt_names, "frames!";
4367 push @$rgetopt_names, "html-toc-extension=s";
4368 push @$rgetopt_names, "html-src-extension=s";
4370 # Pod::Html parameters:
4371 push @$rgetopt_names, "backlink=s";
4372 push @$rgetopt_names, "cachedir=s";
4373 push @$rgetopt_names, "htmlroot=s";
4374 push @$rgetopt_names, "libpods=s";
4375 push @$rgetopt_names, "podpath=s";
4376 push @$rgetopt_names, "podroot=s";
4377 push @$rgetopt_names, "title=s";
4379 # Pod::Html parameters with leading 'pod' which will be removed
4380 # before the call to Pod::Html
4381 push @$rgetopt_names, "podquiet!";
4382 push @$rgetopt_names, "podverbose!";
4383 push @$rgetopt_names, "podrecurse!";
4384 push @$rgetopt_names, "podflush";
4385 push @$rgetopt_names, "podheader!";
4386 push @$rgetopt_names, "podindex!";
4389 sub make_abbreviated_names {
4391 # We're appending things like this to the expansion list:
4392 # 'hcc' => [qw(html-color-comment)],
4393 # 'hck' => [qw(html-color-keyword)],
4396 my ($rexpansion) = @_;
4398 # abbreviations for color/bold/italic properties
4399 while ( my ( $short_name, $long_name ) = each %short_to_long_names ) {
4400 ${$rexpansion}{"hc$short_name"} = ["html-color-$long_name"];
4401 ${$rexpansion}{"hb$short_name"} = ["html-bold-$long_name"];
4402 ${$rexpansion}{"hi$short_name"} = ["html-italic-$long_name"];
4403 ${$rexpansion}{"nhb$short_name"} = ["nohtml-bold-$long_name"];
4404 ${$rexpansion}{"nhi$short_name"} = ["nohtml-italic-$long_name"];
4407 # abbreviations for all other html options
4408 ${$rexpansion}{"hcbg"} = ["html-color-background"];
4409 ${$rexpansion}{"pre"} = ["html-pre-only"];
4410 ${$rexpansion}{"toc"} = ["html-table-of-contents"];
4411 ${$rexpansion}{"ntoc"} = ["nohtml-table-of-contents"];
4412 ${$rexpansion}{"nnn"} = ["html-line-numbers"];
4413 ${$rexpansion}{"hent"} = ["html-entities"];
4414 ${$rexpansion}{"nhent"} = ["nohtml-entities"];
4415 ${$rexpansion}{"css"} = ["html-linked-style-sheet"];
4416 ${$rexpansion}{"nss"} = ["nohtml-style-sheets"];
4417 ${$rexpansion}{"ss"} = ["stylesheet"];
4418 ${$rexpansion}{"pod"} = ["pod2html"];
4419 ${$rexpansion}{"npod"} = ["nopod2html"];
4420 ${$rexpansion}{"frm"} = ["frames"];
4421 ${$rexpansion}{"nfrm"} = ["noframes"];
4422 ${$rexpansion}{"text"} = ["html-toc-extension"];
4423 ${$rexpansion}{"sext"} = ["html-src-extension"];
4428 # This will be called once after options have been parsed
4432 # X11 color names for default settings that seemed to look ok
4433 # (these color names are only used for programming clarity; the hex
4434 # numbers are actually written)
4435 use constant ForestGreen => "#228B22";
4436 use constant SaddleBrown => "#8B4513";
4437 use constant magenta4 => "#8B008B";
4438 use constant IndianRed3 => "#CD5555";
4439 use constant DeepSkyBlue4 => "#00688B";
4440 use constant MediumOrchid3 => "#B452CD";
4441 use constant black => "#000000";
4442 use constant white => "#FFFFFF";
4443 use constant red => "#FF0000";
4445 # set default color, bold, italic properties
4446 # anything not listed here will be given the default (punctuation) color --
4447 # these types currently not listed and get default: ws pu s sc cm co p
4448 # When adding NEW_TOKENS: add an entry here if you don't want defaults
4450 # set_default_properties( $short_name, default_color, bold?, italic? );
4451 set_default_properties( 'c', ForestGreen, 0, 0 );
4452 set_default_properties( 'pd', ForestGreen, 0, 1 );
4453 set_default_properties( 'k', magenta4, 1, 0 ); # was SaddleBrown
4454 set_default_properties( 'q', IndianRed3, 0, 0 );
4455 set_default_properties( 'hh', IndianRed3, 0, 1 );
4456 set_default_properties( 'h', IndianRed3, 1, 0 );
4457 set_default_properties( 'i', DeepSkyBlue4, 0, 0 );
4458 set_default_properties( 'w', black, 0, 0 );
4459 set_default_properties( 'n', MediumOrchid3, 0, 0 );
4460 set_default_properties( 'v', MediumOrchid3, 0, 0 );
4461 set_default_properties( 'j', IndianRed3, 1, 0 );
4462 set_default_properties( 'm', red, 1, 0 );
4464 set_default_color( 'html-color-background', white );
4465 set_default_color( 'html-color-punctuation', black );
4467 # setup property lookup tables for tokens based on their short names
4468 # every token type has a short name, and will use these tables
4469 # to do the html markup
4470 while ( my ( $short_name, $long_name ) = each %short_to_long_names ) {
4471 $html_color{$short_name} = $rOpts->{"html-color-$long_name"};
4472 $html_bold{$short_name} = $rOpts->{"html-bold-$long_name"};
4473 $html_italic{$short_name} = $rOpts->{"html-italic-$long_name"};
4476 # write style sheet to STDOUT and die if requested
4477 if ( defined( $rOpts->{'stylesheet'} ) ) {
4478 write_style_sheet_file('-');
4482 # make sure user gives a file name after -css
4483 if ( defined( $rOpts->{'html-linked-style-sheet'} ) ) {
4484 $css_linkname = $rOpts->{'html-linked-style-sheet'};
4485 if ( $css_linkname =~ /^-/ ) {
4486 die "You must specify a valid filename after -css\n";
4490 # check for conflict
4491 if ( $css_linkname && $rOpts->{'nohtml-style-sheets'} ) {
4492 $rOpts->{'nohtml-style-sheets'} = 0;
4493 warning("You can't specify both -css and -nss; -nss ignored\n");
4496 # write a style sheet file if necessary
4497 if ($css_linkname) {
4499 # if the selected filename exists, don't write, because user may
4500 # have done some work by hand to create it; use backup name instead
4501 # Also, this will avoid a potential disaster in which the user
4502 # forgets to specify the style sheet, like this:
4503 # perltidy -html -css myfile1.pl myfile2.pl
4504 # This would cause myfile1.pl to parsed as the style sheet by GetOpts
4505 my $css_filename = $css_linkname;
4506 unless ( -e $css_filename ) {
4507 write_style_sheet_file($css_filename);
4510 $missing_html_entities = 1 unless $rOpts->{'html-entities'};
4513 sub write_style_sheet_file {
4515 my $css_filename = shift;
4517 unless ( $fh = IO::File->new("> $css_filename") ) {
4518 die "can't open $css_filename: $!\n";
4520 write_style_sheet_data($fh);
4521 eval { $fh->close };
4524 sub write_style_sheet_data {
4526 # write the style sheet data to an open file handle
4529 my $bg_color = $rOpts->{'html-color-background'};
4530 my $text_color = $rOpts->{'html-color-punctuation'};
4532 # pre-bgcolor is new, and may not be defined
4533 my $pre_bg_color = $rOpts->{'html-pre-color-background'};
4534 $pre_bg_color = $bg_color unless $pre_bg_color;
4536 $fh->print(<<"EOM");
4537 /* default style sheet generated by perltidy */
4538 body {background: $bg_color; color: $text_color}
4539 pre { color: $text_color;
4540 background: $pre_bg_color;
4541 font-family: courier;
4546 foreach my $short_name ( sort keys %short_to_long_names ) {
4547 my $long_name = $short_to_long_names{$short_name};
4549 my $abbrev = '.' . $short_name;
4550 if ( length($short_name) == 1 ) { $abbrev .= ' ' } # for alignment
4551 my $color = $html_color{$short_name};
4552 if ( !defined($color) ) { $color = $text_color }
4553 $fh->print("$abbrev \{ color: $color;");
4555 if ( $html_bold{$short_name} ) {
4556 $fh->print(" font-weight:bold;");
4559 if ( $html_italic{$short_name} ) {
4560 $fh->print(" font-style:italic;");
4562 $fh->print("} /* $long_name */\n");
4566 sub set_default_color {
4568 # make sure that options hash $rOpts->{$key} contains a valid color
4569 my ( $key, $color ) = @_;
4570 if ( $rOpts->{$key} ) { $color = $rOpts->{$key} }
4571 $rOpts->{$key} = check_RGB($color);
4576 # if color is a 6 digit hex RGB value, prepend a #, otherwise
4577 # assume that it is a valid ascii color name
4579 if ( $color =~ /^[0-9a-fA-F]{6,6}$/ ) { $color = "#$color" }
4583 sub set_default_properties {
4584 my ( $short_name, $color, $bold, $italic ) = @_;
4586 set_default_color( "html-color-$short_to_long_names{$short_name}", $color );
4588 $key = "html-bold-$short_to_long_names{$short_name}";
4589 $rOpts->{$key} = ( defined $rOpts->{$key} ) ? $rOpts->{$key} : $bold;
4590 $key = "html-italic-$short_to_long_names{$short_name}";
4591 $rOpts->{$key} = ( defined $rOpts->{$key} ) ? $rOpts->{$key} : $italic;
4596 # Use Pod::Html to process the pod and make the page
4597 # then merge the perltidy code sections into it.
4598 # return 1 if success, 0 otherwise
4600 my ( $pod_string, $css_string, $toc_string, $rpre_string_stack ) = @_;
4601 my $input_file = $self->{_input_file};
4602 my $title = $self->{_title};
4603 my $success_flag = 0;
4605 # don't try to use pod2html if no pod
4606 unless ($pod_string) {
4607 return $success_flag;
4610 # Pod::Html requires a real temporary filename
4611 # If we are making a frame, we have a name available
4612 # Otherwise, we have to fine one
4614 if ( $rOpts->{'frames'} ) {
4615 $tmpfile = $self->{_toc_filename};
4618 $tmpfile = Perl::Tidy::make_temporary_filename();
4620 my $fh_tmp = IO::File->new( $tmpfile, 'w' );
4622 warn "unable to open temporary file $tmpfile; cannot use pod2html\n";
4623 return $success_flag;
4626 #------------------------------------------------------------------
4627 # Warning: a temporary file is open; we have to clean up if
4628 # things go bad. From here on all returns should be by going to
4629 # RETURN so that the temporary file gets unlinked.
4630 #------------------------------------------------------------------
4632 # write the pod text to the temporary file
4633 $fh_tmp->print($pod_string);
4636 # Hand off the pod to pod2html.
4637 # Note that we can use the same temporary filename for input and output
4638 # because of the way pod2html works.
4642 push @args, "--infile=$tmpfile", "--outfile=$tmpfile", "--title=$title";
4645 # Flags with string args:
4646 # "backlink=s", "cachedir=s", "htmlroot=s", "libpods=s",
4647 # "podpath=s", "podroot=s"
4648 # Note: -css=s is handled by perltidy itself
4649 foreach $kw (qw(backlink cachedir htmlroot libpods podpath podroot)) {
4650 if ( $rOpts->{$kw} ) { push @args, "--$kw=$rOpts->{$kw}" }
4653 # Toggle switches; these have extra leading 'pod'
4654 # "header!", "index!", "recurse!", "quiet!", "verbose!"
4655 foreach $kw (qw(podheader podindex podrecurse podquiet podverbose)) {
4656 my $kwd = $kw; # allows us to strip 'pod'
4657 if ( $rOpts->{$kw} ) { $kwd =~ s/^pod//; push @args, "--$kwd" }
4658 elsif ( defined( $rOpts->{$kw} ) ) {
4660 push @args, "--no$kwd";
4666 if ( $rOpts->{$kw} ) { $kw =~ s/^pod//; push @args, "--$kw" }
4668 # Must clean up if pod2html dies (it can);
4669 # Be careful not to overwrite callers __DIE__ routine
4670 local $SIG{__DIE__} = sub {
4672 unlink $tmpfile if -e $tmpfile;
4678 $fh_tmp = IO::File->new( $tmpfile, 'r' );
4681 # this error shouldn't happen ... we just used this filename
4682 warn "unable to open temporary file $tmpfile; cannot use pod2html\n";
4686 my $html_fh = $self->{_html_fh};
4691 # This routine will write the html selectively and store the toc
4692 my $html_print = sub {
4694 $html_fh->print($_) unless ($no_print);
4695 if ($in_toc) { push @toc, $_ }
4699 # loop over lines of html output from pod2html and merge in
4700 # the necessary perltidy html sections
4701 my ( $saw_body, $saw_index, $saw_body_end );
4702 while ( my $line = $fh_tmp->getline() ) {
4704 if ( $line =~ /^\s*<html>\s*$/i ) {
4705 my $date = localtime;
4706 $html_print->("<!-- Generated by perltidy on $date -->\n");
4707 $html_print->($line);
4710 # Copy the perltidy css, if any, after <body> tag
4711 elsif ( $line =~ /^\s*<body.*>\s*$/i ) {
4713 $html_print->($css_string) if $css_string;
4714 $html_print->($line);
4716 # add a top anchor and heading
4717 $html_print->("<a name=\"-top-\"></a>\n");
4718 $title = escape_html($title);
4719 $html_print->("<h1>$title</h1>\n");
4721 elsif ( $line =~ /^\s*<!-- INDEX BEGIN -->\s*$/i ) {
4724 # when frames are used, an extra table of contents in the
4725 # contents panel is confusing, so don't print it
4726 $no_print = $rOpts->{'frames'}
4727 || !$rOpts->{'html-table-of-contents'};
4728 $html_print->("<h2>Doc Index:</h2>\n") if $rOpts->{'frames'};
4729 $html_print->($line);
4732 # Copy the perltidy toc, if any, after the Pod::Html toc
4733 elsif ( $line =~ /^\s*<!-- INDEX END -->\s*$/i ) {
4735 $html_print->($line);
4737 $html_print->("<hr />\n") if $rOpts->{'frames'};
4738 $html_print->("<h2>Code Index:</h2>\n");
4739 my @toc = map { $_ .= "\n" } split /\n/, $toc_string;
4740 $html_print->(@toc);
4746 # Copy one perltidy section after each marker
4747 elsif ( $line =~ /^(.*)<!-- pERLTIDY sECTION -->(.*)$/ ) {
4749 $html_print->($1) if $1;
4751 # Intermingle code and pod sections if we saw multiple =cut's.
4752 if ( $self->{_pod_cut_count} > 1 ) {
4753 my $rpre_string = shift(@$rpre_string_stack);
4754 if ($$rpre_string) {
4755 $html_print->('<pre>');
4756 $html_print->($$rpre_string);
4757 $html_print->('</pre>');
4761 # shouldn't happen: we stored a string before writing
4764 "Problem merging html stream with pod2html; order may be wrong\n";
4766 $html_print->($line);
4769 # If didn't see multiple =cut lines, we'll put the pod out first
4770 # and then the code, because it's less confusing.
4773 # since we are not intermixing code and pod, we don't need
4774 # or want any <hr> lines which separated pod and code
4775 $html_print->($line) unless ( $line =~ /^\s*<hr>\s*$/i );
4779 # Copy any remaining code section before the </body> tag
4780 elsif ( $line =~ /^\s*<\/body>\s*$/i ) {
4782 if (@$rpre_string_stack) {
4783 unless ( $self->{_pod_cut_count} > 1 ) {
4784 $html_print->('<hr />');
4786 while ( my $rpre_string = shift(@$rpre_string_stack) ) {
4787 $html_print->('<pre>');
4788 $html_print->($$rpre_string);
4789 $html_print->('</pre>');
4792 $html_print->($line);
4795 $html_print->($line);
4800 unless ($saw_body) {
4801 warn "Did not see <body> in pod2html output\n";
4804 unless ($saw_body_end) {
4805 warn "Did not see </body> in pod2html output\n";
4808 unless ($saw_index) {
4809 warn "Did not find INDEX END in pod2html output\n";
4814 eval { $html_fh->close() };
4816 # note that we have to unlink tmpfile before making frames
4817 # because the tmpfile may be one of the names used for frames
4818 unlink $tmpfile if -e $tmpfile;
4819 if ( $success_flag && $rOpts->{'frames'} ) {
4820 $self->make_frame( \@toc );
4822 return $success_flag;
4827 # Make a frame with table of contents in the left panel
4828 # and the text in the right panel.
4830 # $html_filename contains the no-frames html output
4831 # $rtoc is a reference to an array with the table of contents
4834 my $input_file = $self->{_input_file};
4835 my $html_filename = $self->{_html_file};
4836 my $toc_filename = $self->{_toc_filename};
4837 my $src_filename = $self->{_src_filename};
4838 my $title = $self->{_title};
4839 $title = escape_html($title);
4841 # FUTURE input parameter:
4842 my $top_basename = "";
4844 # We need to produce 3 html files:
4845 # 1. - the table of contents
4846 # 2. - the contents (source code) itself
4847 # 3. - the frame which contains them
4849 # get basenames for relative links
4850 my ( $toc_basename, $toc_path ) = fileparse($toc_filename);
4851 my ( $src_basename, $src_path ) = fileparse($src_filename);
4853 # 1. Make the table of contents panel, with appropriate changes
4854 # to the anchor names
4855 my $src_frame_name = 'SRC';
4857 write_toc_html( $title, $toc_filename, $src_basename, $rtoc,
4860 # 2. The current .html filename is renamed to be the contents panel
4861 rename( $html_filename, $src_filename )
4862 or die "Cannot rename $html_filename to $src_filename:$!\n";
4864 # 3. Then use the original html filename for the frame
4866 $title, $html_filename, $top_basename,
4867 $toc_basename, $src_basename, $src_frame_name
4871 sub write_toc_html {
4873 # write a separate html table of contents file for frames
4874 my ( $title, $toc_filename, $src_basename, $rtoc, $src_frame_name ) = @_;
4875 my $fh = IO::File->new( $toc_filename, 'w' )
4876 or die "Cannot open $toc_filename:$!\n";
4880 <title>$title</title>
4883 <h1><a href=\"$src_basename#-top-" target="$src_frame_name">$title</a></h1>
4887 change_anchor_names( $rtoc, $src_basename, "$src_frame_name" );
4888 $fh->print( join "", @$rtoc );
4897 sub write_frame_html {
4899 # write an html file to be the table of contents frame
4901 $title, $frame_filename, $top_basename,
4902 $toc_basename, $src_basename, $src_frame_name
4905 my $fh = IO::File->new( $frame_filename, 'w' )
4906 or die "Cannot open $toc_basename:$!\n";
4909 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
4910 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
4911 <?xml version="1.0" encoding="iso-8859-1" ?>
4912 <html xmlns="http://www.w3.org/1999/xhtml">
4914 <title>$title</title>
4918 # two left panels, one right, if master index file
4919 if ($top_basename) {
4921 <frameset cols="20%,80%">
4922 <frameset rows="30%,70%">
4923 <frame src = "$top_basename" />
4924 <frame src = "$toc_basename" />
4929 # one left panels, one right, if no master index file
4932 <frameset cols="20%,*">
4933 <frame src = "$toc_basename" />
4937 <frame src = "$src_basename" name = "$src_frame_name" />
4940 <p>If you see this message, you are using a non-frame-capable web client.</p>
4941 <p>This document contains:</p>
4943 <li><a href="$toc_basename">A table of contents</a></li>
4944 <li><a href="$src_basename">The source code</a></li>
4953 sub change_anchor_names {
4955 # add a filename and target to anchors
4956 # also return the first anchor
4957 my ( $rlines, $filename, $target ) = @_;
4959 foreach my $line (@$rlines) {
4961 # We're looking for lines like this:
4962 # <LI><A HREF="#synopsis">SYNOPSIS</A></LI>
4963 # ---- - -------- -----------------
4965 if ( $line =~ /^(.*)<a(.*)href\s*=\s*"([^#]*)#([^"]+)"[^>]*>(.*)$/i ) {
4969 my $href = "$filename#$name";
4970 $line = "$pre<a href=\"$href\" target=\"$target\">$post\n";
4971 unless ($first_anchor) { $first_anchor = $href }
4974 return $first_anchor;
4977 sub close_html_file {
4979 return unless $self->{_html_file_opened};
4981 my $html_fh = $self->{_html_fh};
4982 my $rtoc_string = $self->{_rtoc_string};
4984 # There are 3 basic paths to html output...
4986 # ---------------------------------
4987 # Path 1: finish up if in -pre mode
4988 # ---------------------------------
4989 if ( $rOpts->{'html-pre-only'} ) {
4990 $html_fh->print( <<"PRE_END");
4993 eval { $html_fh->close() };
4998 $self->add_toc_item( 'EOF', 'EOF' );
5000 my $rpre_string_stack = $self->{_rpre_string_stack};
5002 # Patch to darken the <pre> background color in case of pod2html and
5003 # interleaved code/documentation. Otherwise, the distinction
5004 # between code and documentation is blurred.
5005 if ( $rOpts->{pod2html}
5006 && $self->{_pod_cut_count} >= 1
5007 && $rOpts->{'html-color-background'} eq '#FFFFFF' )
5009 $rOpts->{'html-pre-color-background'} = '#F0F0F0';
5012 # put the css or its link into a string, if used
5014 my $fh_css = Perl::Tidy::IOScalar->new( \$css_string, 'w' );
5016 # use css linked to another file
5017 if ( $rOpts->{'html-linked-style-sheet'} ) {
5019 qq(<link rel="stylesheet" href="$css_linkname" type="text/css" />)
5023 # use css embedded in this file
5024 elsif ( !$rOpts->{'nohtml-style-sheets'} ) {
5025 $fh_css->print( <<'ENDCSS');
5026 <style type="text/css">
5029 write_style_sheet_data($fh_css);
5030 $fh_css->print( <<"ENDCSS");
5036 # -----------------------------------------------------------
5037 # path 2: use pod2html if requested
5038 # If we fail for some reason, continue on to path 3
5039 # -----------------------------------------------------------
5040 if ( $rOpts->{'pod2html'} ) {
5041 my $rpod_string = $self->{_rpod_string};
5042 $self->pod_to_html( $$rpod_string, $css_string, $$rtoc_string,
5043 $rpre_string_stack )
5047 # --------------------------------------------------
5048 # path 3: write code in html, with pod only in italics
5049 # --------------------------------------------------
5050 my $input_file = $self->{_input_file};
5051 my $title = escape_html($input_file);
5052 my $date = localtime;
5053 $html_fh->print( <<"HTML_START");
5054 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
5055 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
5056 <!-- Generated by perltidy on $date -->
5057 <html xmlns="http://www.w3.org/1999/xhtml">
5059 <title>$title</title>
5062 # output the css, if used
5064 $html_fh->print($css_string);
5065 $html_fh->print( <<"ENDCSS");
5072 $html_fh->print( <<"HTML_START");
5074 <body bgcolor=\"$rOpts->{'html-color-background'}\" text=\"$rOpts->{'html-color-punctuation'}\">
5078 $html_fh->print("<a name=\"-top-\"></a>\n");
5079 $html_fh->print( <<"EOM");
5083 # copy the table of contents
5085 && !$rOpts->{'frames'}
5086 && $rOpts->{'html-table-of-contents'} )
5088 $html_fh->print($$rtoc_string);
5091 # copy the pre section(s)
5092 my $fname_comment = $input_file;
5093 $fname_comment =~ s/--+/-/g; # protect HTML comment tags
5094 $html_fh->print( <<"END_PRE");
5096 <!-- contents of filename: $fname_comment -->
5100 foreach my $rpre_string (@$rpre_string_stack) {
5101 $html_fh->print($$rpre_string);
5104 # and finish the html page
5105 $html_fh->print( <<"HTML_END");
5110 eval { $html_fh->close() }; # could be object without close method
5112 if ( $rOpts->{'frames'} ) {
5113 my @toc = map { $_ .= "\n" } split /\n/, $$rtoc_string;
5114 $self->make_frame( \@toc );
5120 my ( $rtokens, $rtoken_type, $rlevels ) = @_;
5121 my ( @colored_tokens, $j, $string, $type, $token, $level );
5122 my $rlast_level = $self->{_rlast_level};
5123 my $rpackage_stack = $self->{_rpackage_stack};
5125 for ( $j = 0 ; $j < @$rtoken_type ; $j++ ) {
5126 $type = $$rtoken_type[$j];
5127 $token = $$rtokens[$j];
5128 $level = $$rlevels[$j];
5129 $level = 0 if ( $level < 0 );
5131 #-------------------------------------------------------
5132 # Update the package stack. The package stack is needed to keep
5133 # the toc correct because some packages may be declared within
5134 # blocks and go out of scope when we leave the block.
5135 #-------------------------------------------------------
5136 if ( $level > $$rlast_level ) {
5137 unless ( $rpackage_stack->[ $level - 1 ] ) {
5138 $rpackage_stack->[ $level - 1 ] = 'main';
5140 $rpackage_stack->[$level] = $rpackage_stack->[ $level - 1 ];
5142 elsif ( $level < $$rlast_level ) {
5143 my $package = $rpackage_stack->[$level];
5144 unless ($package) { $package = 'main' }
5146 # if we change packages due to a nesting change, we
5147 # have to make an entry in the toc
5148 if ( $package ne $rpackage_stack->[ $level + 1 ] ) {
5149 $self->add_toc_item( $package, 'package' );
5152 $$rlast_level = $level;
5154 #-------------------------------------------------------
5155 # Intercept a sub name here; split it
5156 # into keyword 'sub' and sub name; and add an
5158 #-------------------------------------------------------
5159 if ( $type eq 'i' && $token =~ /^(sub\s+)(\w.*)$/ ) {
5160 $token = $self->markup_html_element( $1, 'k' );
5161 push @colored_tokens, $token;
5165 # but don't include sub declarations in the toc;
5166 # these wlll have leading token types 'i;'
5167 my $signature = join "", @$rtoken_type;
5168 unless ( $signature =~ /^i;/ ) {
5169 my $subname = $token;
5170 $subname =~ s/[\s\(].*$//; # remove any attributes and prototype
5171 $self->add_toc_item( $subname, 'sub' );
5175 #-------------------------------------------------------
5176 # Intercept a package name here; split it
5177 # into keyword 'package' and name; add to the toc,
5178 # and update the package stack
5179 #-------------------------------------------------------
5180 if ( $type eq 'i' && $token =~ /^(package\s+)(\w.*)$/ ) {
5181 $token = $self->markup_html_element( $1, 'k' );
5182 push @colored_tokens, $token;
5185 $self->add_toc_item( "$token", 'package' );
5186 $rpackage_stack->[$level] = $token;
5189 $token = $self->markup_html_element( $token, $type );
5190 push @colored_tokens, $token;
5192 return ( \@colored_tokens );
5195 sub markup_html_element {
5197 my ( $token, $type ) = @_;
5199 return $token if ( $type eq 'b' ); # skip a blank token
5200 return $token if ( $token =~ /^\s*$/ ); # skip a blank line
5201 $token = escape_html($token);
5203 # get the short abbreviation for this token type
5204 my $short_name = $token_short_names{$type};
5205 if ( !defined($short_name) ) {
5206 $short_name = "pu"; # punctuation is default
5209 # handle style sheets..
5210 if ( !$rOpts->{'nohtml-style-sheets'} ) {
5211 if ( $short_name ne 'pu' ) {
5212 $token = qq(<span class="$short_name">) . $token . "</span>";
5216 # handle no style sheets..
5218 my $color = $html_color{$short_name};
5220 if ( $color && ( $color ne $rOpts->{'html-color-punctuation'} ) ) {
5221 $token = qq(<font color="$color">) . $token . "</font>";
5223 if ( $html_italic{$short_name} ) { $token = "<i>$token</i>" }
5224 if ( $html_bold{$short_name} ) { $token = "<b>$token</b>" }
5232 if ($missing_html_entities) {
5233 $token =~ s/\&/&/g;
5234 $token =~ s/\</</g;
5235 $token =~ s/\>/>/g;
5236 $token =~ s/\"/"/g;
5239 HTML::Entities::encode_entities($token);
5244 sub finish_formatting {
5246 # called after last line
5248 $self->close_html_file();
5255 return unless $self->{_html_file_opened};
5256 my $html_pre_fh = $self->{_html_pre_fh};
5257 my ($line_of_tokens) = @_;
5258 my $line_type = $line_of_tokens->{_line_type};
5259 my $input_line = $line_of_tokens->{_line_text};
5260 my $line_number = $line_of_tokens->{_line_number};
5263 # markup line of code..
5265 if ( $line_type eq 'CODE' ) {
5266 my $rtoken_type = $line_of_tokens->{_rtoken_type};
5267 my $rtokens = $line_of_tokens->{_rtokens};
5268 my $rlevels = $line_of_tokens->{_rlevels};
5270 if ( $input_line =~ /(^\s*)/ ) {
5276 my ($rcolored_tokens) =
5277 $self->markup_tokens( $rtokens, $rtoken_type, $rlevels );
5278 $html_line .= join '', @$rcolored_tokens;
5281 # markup line of non-code..
5284 if ( $line_type eq 'HERE' ) { $line_character = 'H' }
5285 elsif ( $line_type eq 'HERE_END' ) { $line_character = 'h' }
5286 elsif ( $line_type eq 'FORMAT' ) { $line_character = 'H' }
5287 elsif ( $line_type eq 'FORMAT_END' ) { $line_character = 'h' }
5288 elsif ( $line_type eq 'SYSTEM' ) { $line_character = 'c' }
5289 elsif ( $line_type eq 'END_START' ) {
5290 $line_character = 'k';
5291 $self->add_toc_item( '__END__', '__END__' );
5293 elsif ( $line_type eq 'DATA_START' ) {
5294 $line_character = 'k';
5295 $self->add_toc_item( '__DATA__', '__DATA__' );
5297 elsif ( $line_type =~ /^POD/ ) {
5298 $line_character = 'P';
5299 if ( $rOpts->{'pod2html'} ) {
5300 my $html_pod_fh = $self->{_html_pod_fh};
5301 if ( $line_type eq 'POD_START' ) {
5303 my $rpre_string_stack = $self->{_rpre_string_stack};
5304 my $rpre_string = $rpre_string_stack->[-1];
5306 # if we have written any non-blank lines to the
5307 # current pre section, start writing to a new output
5309 if ( $$rpre_string =~ /\S/ ) {
5312 Perl::Tidy::IOScalar->new( \$pre_string, 'w' );
5313 $self->{_html_pre_fh} = $html_pre_fh;
5314 push @$rpre_string_stack, \$pre_string;
5316 # leave a marker in the pod stream so we know
5317 # where to put the pre section we just
5319 my $for_html = '=for html'; # don't confuse pod utils
5320 $html_pod_fh->print(<<EOM);
5323 <!-- pERLTIDY sECTION -->
5328 # otherwise, just clear the current string and start
5332 $html_pod_fh->print("\n");
5335 $html_pod_fh->print( $input_line . "\n" );
5336 if ( $line_type eq 'POD_END' ) {
5337 $self->{_pod_cut_count}++;
5338 $html_pod_fh->print("\n");
5343 else { $line_character = 'Q' }
5344 $html_line = $self->markup_html_element( $input_line, $line_character );
5347 # add the line number if requested
5348 if ( $rOpts->{'html-line-numbers'} ) {
5350 ( $line_number < 10 ) ? " "
5351 : ( $line_number < 100 ) ? " "
5352 : ( $line_number < 1000 ) ? " "
5354 $html_line = $extra_space . $line_number . " " . $html_line;
5358 $html_pre_fh->print("$html_line\n");
5361 #####################################################################
5363 # The Perl::Tidy::Formatter package adds indentation, whitespace, and
5364 # line breaks to the token stream
5366 # WARNING: This is not a real class for speed reasons. Only one
5367 # Formatter may be used.
5369 #####################################################################
5371 package Perl::Tidy::Formatter;
5375 # Caution: these debug flags produce a lot of output
5376 # They should all be 0 except when debugging small scripts
5377 use constant FORMATTER_DEBUG_FLAG_BOND => 0;
5378 use constant FORMATTER_DEBUG_FLAG_BREAK => 0;
5379 use constant FORMATTER_DEBUG_FLAG_CI => 0;
5380 use constant FORMATTER_DEBUG_FLAG_FLUSH => 0;
5381 use constant FORMATTER_DEBUG_FLAG_FORCE => 0;
5382 use constant FORMATTER_DEBUG_FLAG_LIST => 0;
5383 use constant FORMATTER_DEBUG_FLAG_NOBREAK => 0;
5384 use constant FORMATTER_DEBUG_FLAG_OUTPUT => 0;
5385 use constant FORMATTER_DEBUG_FLAG_SPARSE => 0;
5386 use constant FORMATTER_DEBUG_FLAG_STORE => 0;
5387 use constant FORMATTER_DEBUG_FLAG_UNDOBP => 0;
5388 use constant FORMATTER_DEBUG_FLAG_WHITE => 0;
5390 my $debug_warning = sub {
5391 print "FORMATTER_DEBUGGING with key $_[0]\n";
5394 FORMATTER_DEBUG_FLAG_BOND && $debug_warning->('BOND');
5395 FORMATTER_DEBUG_FLAG_BREAK && $debug_warning->('BREAK');
5396 FORMATTER_DEBUG_FLAG_CI && $debug_warning->('CI');
5397 FORMATTER_DEBUG_FLAG_FLUSH && $debug_warning->('FLUSH');
5398 FORMATTER_DEBUG_FLAG_FORCE && $debug_warning->('FORCE');
5399 FORMATTER_DEBUG_FLAG_LIST && $debug_warning->('LIST');
5400 FORMATTER_DEBUG_FLAG_NOBREAK && $debug_warning->('NOBREAK');
5401 FORMATTER_DEBUG_FLAG_OUTPUT && $debug_warning->('OUTPUT');
5402 FORMATTER_DEBUG_FLAG_SPARSE && $debug_warning->('SPARSE');
5403 FORMATTER_DEBUG_FLAG_STORE && $debug_warning->('STORE');
5404 FORMATTER_DEBUG_FLAG_UNDOBP && $debug_warning->('UNDOBP');
5405 FORMATTER_DEBUG_FLAG_WHITE && $debug_warning->('WHITE');
5412 $max_gnu_stack_index
5413 $gnu_position_predictor
5414 $line_start_index_to_go
5415 $last_indentation_written
5416 $last_unadjusted_indentation
5419 $saw_VERSION_in_this_file
5424 $gnu_sequence_number
5425 $last_output_indentation
5431 @type_sequence_to_go
5432 @container_environment_to_go
5433 @bond_strength_to_go
5434 @forced_breakpoint_to_go
5437 @leading_spaces_to_go
5438 @reduced_spaces_to_go
5439 @matching_token_to_go
5441 @nesting_blocks_to_go
5443 @nesting_depth_to_go
5445 @old_breakpoint_to_go
5449 %saved_opening_indentation
5452 $comma_count_in_batch
5453 $old_line_count_in_batch
5454 $last_nonblank_index_to_go
5455 $last_nonblank_type_to_go
5456 $last_nonblank_token_to_go
5457 $last_last_nonblank_index_to_go
5458 $last_last_nonblank_type_to_go
5459 $last_last_nonblank_token_to_go
5460 @nonblank_lines_at_depth
5464 $in_format_skipping_section
5465 $format_skipping_pattern_begin
5466 $format_skipping_pattern_end
5468 $forced_breakpoint_count
5469 $forced_breakpoint_undo_count
5470 @forced_breakpoint_undo_stack
5471 %postponed_breakpoint
5475 $first_embedded_tab_at
5476 $last_embedded_tab_at
5477 $deleted_semicolon_count
5478 $first_deleted_semicolon_at
5479 $last_deleted_semicolon_at
5480 $added_semicolon_count
5481 $first_added_semicolon_at
5482 $last_added_semicolon_at
5483 $first_tabbing_disagreement
5484 $last_tabbing_disagreement
5485 $in_tabbing_disagreement
5486 $tabbing_disagreement_count
5490 $last_line_leading_type
5491 $last_line_leading_level
5492 $last_last_line_leading_level
5495 %block_opening_line_number
5496 $csc_new_statement_ok
5497 $accumulating_text_for_block
5499 $rleading_block_if_elsif_text
5500 $leading_block_text_level
5501 $leading_block_text_length_exceeded
5502 $leading_block_text_line_length
5503 $leading_block_text_line_number
5504 $closing_side_comment_prefix_pattern
5505 $closing_side_comment_list_pattern
5507 $last_nonblank_token
5509 $last_last_nonblank_token
5510 $last_last_nonblank_type
5511 $last_nonblank_block_type
5514 %is_if_brace_follower
5515 %space_after_keyword
5518 %is_last_next_redo_return
5519 %is_other_brace_follower
5520 %is_else_brace_follower
5521 %is_anon_sub_brace_follower
5522 %is_anon_sub_1_brace_follower
5524 %is_sort_map_grep_eval
5525 %is_sort_map_grep_eval_do
5526 %is_block_without_semicolon
5531 %is_if_unless_and_or_last_next_redo_return
5532 %is_until_while_for_if_elsif_else
5538 $is_static_block_comment
5539 $index_start_one_line_block
5540 $semicolons_before_block_self_destruct
5541 $index_max_forced_break
5544 $vertical_aligner_object
5549 $last_line_had_side_comment
5552 $static_block_comment_pattern
5553 $static_side_comment_pattern
5554 %opening_vertical_tightness
5555 %closing_vertical_tightness
5556 %closing_token_indentation
5558 %opening_token_right
5559 %stack_opening_token
5560 %stack_closing_token
5562 $block_brace_vertical_tightness_pattern
5565 $rOpts_add_whitespace
5566 $rOpts_block_brace_tightness
5567 $rOpts_block_brace_vertical_tightness
5568 $rOpts_brace_left_and_indent
5569 $rOpts_comma_arrow_breakpoints
5570 $rOpts_break_at_old_keyword_breakpoints
5571 $rOpts_break_at_old_comma_breakpoints
5572 $rOpts_break_at_old_logical_breakpoints
5573 $rOpts_break_at_old_ternary_breakpoints
5574 $rOpts_closing_side_comment_else_flag
5575 $rOpts_closing_side_comment_maximum_text
5576 $rOpts_continuation_indentation
5578 $rOpts_delete_old_whitespace
5579 $rOpts_fuzzy_line_length
5580 $rOpts_indent_columns
5581 $rOpts_line_up_parentheses
5582 $rOpts_maximum_fields_per_table
5583 $rOpts_maximum_line_length
5584 $rOpts_short_concatenation_item_length
5585 $rOpts_swallow_optional_blank_lines
5586 $rOpts_ignore_old_breakpoints
5587 $rOpts_format_skipping
5588 $rOpts_space_function_paren
5589 $rOpts_space_keyword_paren
5591 $half_maximum_line_length
5595 %is_keyword_returning_list
5599 %right_bond_strength
5616 # default list of block types for which -bli would apply
5617 $bli_list_string = 'if else elsif unless while for foreach do : sub';
5620 .. :: << >> ** && .. || // -> => += -= .= %= &= |= ^= *= <>
5621 <= >= == =~ !~ != ++ -- /= x=
5623 @is_digraph{@_} = (1) x scalar(@_);
5625 @_ = qw( ... **= <<= >>= &&= ||= //= <=> );
5626 @is_trigraph{@_} = (1) x scalar(@_);
5629 = **= += *= &= <<= &&=
5630 -= /= |= >>= ||= //=
5634 @is_assignment{@_} = (1) x scalar(@_);
5644 @is_keyword_returning_list{@_} = (1) x scalar(@_);
5646 @_ = qw(is if unless and or err last next redo return);
5647 @is_if_unless_and_or_last_next_redo_return{@_} = (1) x scalar(@_);
5649 # always break after a closing curly of these block types:
5650 @_ = qw(until while for if elsif else);
5651 @is_until_while_for_if_elsif_else{@_} = (1) x scalar(@_);
5653 @_ = qw(last next redo return);
5654 @is_last_next_redo_return{@_} = (1) x scalar(@_);
5656 @_ = qw(sort map grep);
5657 @is_sort_map_grep{@_} = (1) x scalar(@_);
5659 @_ = qw(sort map grep eval);
5660 @is_sort_map_grep_eval{@_} = (1) x scalar(@_);
5662 @_ = qw(sort map grep eval do);
5663 @is_sort_map_grep_eval_do{@_} = (1) x scalar(@_);
5666 @is_if_unless{@_} = (1) x scalar(@_);
5668 @_ = qw(and or err);
5669 @is_and_or{@_} = (1) x scalar(@_);
5671 # Identify certain operators which often occur in chains.
5672 # Note: the minus (-) causes a side effect of padding of the first line in
5673 # something like this (by sub set_logical_padding):
5674 # Checkbutton => 'Transmission checked',
5675 # -variable => \$TRANS
5676 # This usually improves appearance so it seems ok.
5677 @_ = qw(&& || and or : ? . + - * /);
5678 @is_chain_operator{@_} = (1) x scalar(@_);
5680 # We can remove semicolons after blocks preceded by these keywords
5681 @_ = qw(BEGIN END CHECK INIT AUTOLOAD DESTROY continue if elsif else
5682 unless while until for foreach);
5683 @is_block_without_semicolon{@_} = (1) x scalar(@_);
5685 # 'L' is token for opening { at hash key
5687 @is_opening_type{@_} = (1) x scalar(@_);
5689 # 'R' is token for closing } at hash key
5691 @is_closing_type{@_} = (1) x scalar(@_);
5694 @is_opening_token{@_} = (1) x scalar(@_);
5697 @is_closing_token{@_} = (1) x scalar(@_);
5701 use constant WS_YES => 1;
5702 use constant WS_OPTIONAL => 0;
5703 use constant WS_NO => -1;
5705 # Token bond strengths.
5706 use constant NO_BREAK => 10000;
5707 use constant VERY_STRONG => 100;
5708 use constant STRONG => 2.1;
5709 use constant NOMINAL => 1.1;
5710 use constant WEAK => 0.8;
5711 use constant VERY_WEAK => 0.55;
5713 # values for testing indexes in output array
5714 use constant UNDEFINED_INDEX => -1;
5716 # Maximum number of little messages; probably need not be changed.
5717 use constant MAX_NAG_MESSAGES => 6;
5719 # increment between sequence numbers for each type
5720 # For example, ?: pairs might have numbers 7,11,15,...
5721 use constant TYPE_SEQUENCE_INCREMENT => 4;
5725 # methods to count instances
5727 sub get_count { $_count; }
5728 sub _increment_count { ++$_count }
5729 sub _decrement_count { --$_count }
5734 # trim leading and trailing whitespace from a string
5742 # given a string containing words separated by whitespace,
5743 # return the list of words
5748 return split( /\s+/, $str );
5751 # interface to Perl::Tidy::Logger routines
5753 if ($logger_object) {
5754 $logger_object->warning(@_);
5759 if ($logger_object) {
5760 $logger_object->complain(@_);
5764 sub write_logfile_entry {
5765 if ($logger_object) {
5766 $logger_object->write_logfile_entry(@_);
5771 if ($logger_object) {
5772 $logger_object->black_box(@_);
5776 sub report_definite_bug {
5777 if ($logger_object) {
5778 $logger_object->report_definite_bug();
5782 sub get_saw_brace_error {
5783 if ($logger_object) {
5784 $logger_object->get_saw_brace_error();
5788 sub we_are_at_the_last_line {
5789 if ($logger_object) {
5790 $logger_object->we_are_at_the_last_line();
5794 # interface to Perl::Tidy::Diagnostics routine
5795 sub write_diagnostics {
5797 if ($diagnostics_object) {
5798 $diagnostics_object->write_diagnostics(@_);
5802 sub get_added_semicolon_count {
5804 return $added_semicolon_count;
5808 $_[0]->_decrement_count();
5815 # we are given an object with a write_line() method to take lines
5817 sink_object => undef,
5818 diagnostics_object => undef,
5819 logger_object => undef,
5821 my %args = ( %defaults, @_ );
5823 $logger_object = $args{logger_object};
5824 $diagnostics_object = $args{diagnostics_object};
5826 # we create another object with a get_line() and peek_ahead() method
5827 my $sink_object = $args{sink_object};
5828 $file_writer_object =
5829 Perl::Tidy::FileWriter->new( $sink_object, $rOpts, $logger_object );
5831 # initialize the leading whitespace stack to negative levels
5832 # so that we can never run off the end of the stack
5833 $gnu_position_predictor = 0; # where the current token is predicted to be
5834 $max_gnu_stack_index = 0;
5835 $max_gnu_item_index = -1;
5836 $gnu_stack[0] = new_lp_indentation_item( 0, -1, -1, 0, 0 );
5837 @gnu_item_list = ();
5838 $last_output_indentation = 0;
5839 $last_indentation_written = 0;
5840 $last_unadjusted_indentation = 0;
5841 $last_leading_token = "";
5843 $saw_VERSION_in_this_file = !$rOpts->{'pass-version-line'};
5844 $saw_END_or_DATA_ = 0;
5846 @block_type_to_go = ();
5847 @type_sequence_to_go = ();
5848 @container_environment_to_go = ();
5849 @bond_strength_to_go = ();
5850 @forced_breakpoint_to_go = ();
5851 @lengths_to_go = (); # line length to start of ith token
5853 @matching_token_to_go = ();
5854 @mate_index_to_go = ();
5855 @nesting_blocks_to_go = ();
5856 @ci_levels_to_go = ();
5857 @nesting_depth_to_go = (0);
5858 @nobreak_to_go = ();
5859 @old_breakpoint_to_go = ();
5862 @leading_spaces_to_go = ();
5863 @reduced_spaces_to_go = ();
5866 @has_broken_sublist = ();
5867 @want_comma_break = ();
5870 $first_tabbing_disagreement = 0;
5871 $last_tabbing_disagreement = 0;
5872 $tabbing_disagreement_count = 0;
5873 $in_tabbing_disagreement = 0;
5874 $input_line_tabbing = undef;
5876 $last_line_type = "";
5877 $last_last_line_leading_level = 0;
5878 $last_line_leading_level = 0;
5879 $last_line_leading_type = '#';
5881 $last_nonblank_token = ';';
5882 $last_nonblank_type = ';';
5883 $last_last_nonblank_token = ';';
5884 $last_last_nonblank_type = ';';
5885 $last_nonblank_block_type = "";
5886 $last_output_level = 0;
5887 $looking_for_else = 0;
5888 $embedded_tab_count = 0;
5889 $first_embedded_tab_at = 0;
5890 $last_embedded_tab_at = 0;
5891 $deleted_semicolon_count = 0;
5892 $first_deleted_semicolon_at = 0;
5893 $last_deleted_semicolon_at = 0;
5894 $added_semicolon_count = 0;
5895 $first_added_semicolon_at = 0;
5896 $last_added_semicolon_at = 0;
5897 $last_line_had_side_comment = 0;
5898 $is_static_block_comment = 0;
5899 %postponed_breakpoint = ();
5901 # variables for adding side comments
5902 %block_leading_text = ();
5903 %block_opening_line_number = ();
5904 $csc_new_statement_ok = 1;
5906 %saved_opening_indentation = ();
5907 $in_format_skipping_section = 0;
5909 reset_block_text_accumulator();
5911 prepare_for_new_input_lines();
5913 $vertical_aligner_object =
5914 Perl::Tidy::VerticalAligner->initialize( $rOpts, $file_writer_object,
5915 $logger_object, $diagnostics_object );
5917 if ( $rOpts->{'entab-leading-whitespace'} ) {
5918 write_logfile_entry(
5919 "Leading whitespace will be entabbed with $rOpts->{'entab-leading-whitespace'} spaces per tab\n"
5922 elsif ( $rOpts->{'tabs'} ) {
5923 write_logfile_entry("Indentation will be with a tab character\n");
5926 write_logfile_entry(
5927 "Indentation will be with $rOpts->{'indent-columns'} spaces\n");
5930 # This was the start of a formatter referent, but object-oriented
5931 # coding has turned out to be too slow here.
5932 $formatter_self = {};
5934 bless $formatter_self, $class;
5936 # Safety check..this is not a class yet
5937 if ( _increment_count() > 1 ) {
5939 "Attempt to create more than 1 object in $class, which is not a true class yet\n";
5941 return $formatter_self;
5944 sub prepare_for_new_input_lines {
5946 $gnu_sequence_number++; # increment output batch counter
5947 %last_gnu_equals = ();
5948 %gnu_comma_count = ();
5949 %gnu_arrow_count = ();
5950 $line_start_index_to_go = 0;
5951 $max_gnu_item_index = UNDEFINED_INDEX;
5952 $index_max_forced_break = UNDEFINED_INDEX;
5953 $max_index_to_go = UNDEFINED_INDEX;
5954 $last_nonblank_index_to_go = UNDEFINED_INDEX;
5955 $last_nonblank_type_to_go = '';
5956 $last_nonblank_token_to_go = '';
5957 $last_last_nonblank_index_to_go = UNDEFINED_INDEX;
5958 $last_last_nonblank_type_to_go = '';
5959 $last_last_nonblank_token_to_go = '';
5960 $forced_breakpoint_count = 0;
5961 $forced_breakpoint_undo_count = 0;
5962 $rbrace_follower = undef;
5963 $lengths_to_go[0] = 0;
5964 $old_line_count_in_batch = 1;
5965 $comma_count_in_batch = 0;
5966 $starting_in_quote = 0;
5968 destroy_one_line_block();
5974 my ($line_of_tokens) = @_;
5976 my $line_type = $line_of_tokens->{_line_type};
5977 my $input_line = $line_of_tokens->{_line_text};
5979 # _line_type codes are:
5980 # SYSTEM - system-specific code before hash-bang line
5981 # CODE - line of perl code (including comments)
5982 # POD_START - line starting pod, such as '=head'
5983 # POD - pod documentation text
5984 # POD_END - last line of pod section, '=cut'
5985 # HERE - text of here-document
5986 # HERE_END - last line of here-doc (target word)
5987 # FORMAT - format section
5988 # FORMAT_END - last line of format section, '.'
5989 # DATA_START - __DATA__ line
5990 # DATA - unidentified text following __DATA__
5991 # END_START - __END__ line
5992 # END - unidentified text following __END__
5993 # ERROR - we are in big trouble, probably not a perl script
5995 # put a blank line after an =cut which comes before __END__ and __DATA__
5996 # (required by podchecker)
5997 if ( $last_line_type eq 'POD_END' && !$saw_END_or_DATA_ ) {
5998 $file_writer_object->reset_consecutive_blank_lines();
5999 if ( $input_line !~ /^\s*$/ ) { want_blank_line() }
6002 # handle line of code..
6003 if ( $line_type eq 'CODE' ) {
6005 # let logger see all non-blank lines of code
6006 if ( $input_line !~ /^\s*$/ ) {
6007 my $output_line_number =
6008 $vertical_aligner_object->get_output_line_number();
6009 black_box( $line_of_tokens, $output_line_number );
6011 print_line_of_tokens($line_of_tokens);
6014 # handle line of non-code..
6020 if ( $line_type =~ /^POD/ ) {
6022 # Pod docs should have a preceding blank line. But be
6023 # very careful in __END__ and __DATA__ sections, because:
6024 # 1. the user may be using this section for any purpose whatsoever
6025 # 2. the blank counters are not active there
6026 # It should be safe to request a blank line between an
6027 # __END__ or __DATA__ and an immediately following '=head'
6028 # type line, (types END_START and DATA_START), but not for
6029 # any other lines of type END or DATA.
6030 if ( $rOpts->{'delete-pod'} ) { $skip_line = 1; }
6031 if ( $rOpts->{'tee-pod'} ) { $tee_line = 1; }
6033 && $line_type eq 'POD_START'
6034 && $last_line_type !~ /^(END|DATA)$/ )
6040 # leave the blank counters in a predictable state
6041 # after __END__ or __DATA__
6042 elsif ( $line_type =~ /^(END_START|DATA_START)$/ ) {
6043 $file_writer_object->reset_consecutive_blank_lines();
6044 $saw_END_or_DATA_ = 1;
6047 # write unindented non-code line
6048 if ( !$skip_line ) {
6049 if ($tee_line) { $file_writer_object->tee_on() }
6050 write_unindented_line($input_line);
6051 if ($tee_line) { $file_writer_object->tee_off() }
6054 $last_line_type = $line_type;
6057 sub create_one_line_block {
6058 $index_start_one_line_block = $_[0];
6059 $semicolons_before_block_self_destruct = $_[1];
6062 sub destroy_one_line_block {
6063 $index_start_one_line_block = UNDEFINED_INDEX;
6064 $semicolons_before_block_self_destruct = 0;
6067 sub leading_spaces_to_go {
6069 # return the number of indentation spaces for a token in the output stream;
6070 # these were previously stored by 'set_leading_whitespace'.
6072 return get_SPACES( $leading_spaces_to_go[ $_[0] ] );
6078 # return the number of leading spaces associated with an indentation
6079 # variable $indentation is either a constant number of spaces or an object
6080 # with a get_SPACES method.
6081 my $indentation = shift;
6082 return ref($indentation) ? $indentation->get_SPACES() : $indentation;
6085 sub get_RECOVERABLE_SPACES {
6087 # return the number of spaces (+ means shift right, - means shift left)
6088 # that we would like to shift a group of lines with the same indentation
6089 # to get them to line up with their opening parens
6090 my $indentation = shift;
6091 return ref($indentation) ? $indentation->get_RECOVERABLE_SPACES() : 0;
6094 sub get_AVAILABLE_SPACES_to_go {
6096 my $item = $leading_spaces_to_go[ $_[0] ];
6098 # return the number of available leading spaces associated with an
6099 # indentation variable. $indentation is either a constant number of
6100 # spaces or an object with a get_AVAILABLE_SPACES method.
6101 return ref($item) ? $item->get_AVAILABLE_SPACES() : 0;
6104 sub new_lp_indentation_item {
6106 # this is an interface to the IndentationItem class
6107 my ( $spaces, $level, $ci_level, $available_spaces, $align_paren ) = @_;
6109 # A negative level implies not to store the item in the item_list
6111 if ( $level >= 0 ) { $index = ++$max_gnu_item_index; }
6113 my $item = Perl::Tidy::IndentationItem->new(
6115 $ci_level, $available_spaces,
6116 $index, $gnu_sequence_number,
6117 $align_paren, $max_gnu_stack_index,
6118 $line_start_index_to_go,
6121 if ( $level >= 0 ) {
6122 $gnu_item_list[$max_gnu_item_index] = $item;
6128 sub set_leading_whitespace {
6130 # This routine defines leading whitespace
6131 # given: the level and continuation_level of a token,
6132 # define: space count of leading string which would apply if it
6133 # were the first token of a new line.
6135 my ( $level, $ci_level, $in_continued_quote ) = @_;
6137 # modify for -bli, which adds one continuation indentation for
6139 if ( $rOpts_brace_left_and_indent
6140 && $max_index_to_go == 0
6141 && $block_type_to_go[$max_index_to_go] =~ /$bli_pattern/o )
6146 # patch to avoid trouble when input file has negative indentation.
6147 # other logic should catch this error.
6148 if ( $level < 0 ) { $level = 0 }
6150 #-------------------------------------------
6151 # handle the standard indentation scheme
6152 #-------------------------------------------
6153 unless ($rOpts_line_up_parentheses) {
6155 $ci_level * $rOpts_continuation_indentation +
6156 $level * $rOpts_indent_columns;
6158 ( $ci_level == 0 ) ? 0 : $rOpts_continuation_indentation;
6160 if ($in_continued_quote) {
6164 $leading_spaces_to_go[$max_index_to_go] = $space_count;
6165 $reduced_spaces_to_go[$max_index_to_go] = $space_count - $ci_spaces;
6169 #-------------------------------------------------------------
6170 # handle case of -lp indentation..
6171 #-------------------------------------------------------------
6173 # The continued_quote flag means that this is the first token of a
6174 # line, and it is the continuation of some kind of multi-line quote
6175 # or pattern. It requires special treatment because it must have no
6176 # added leading whitespace. So we create a special indentation item
6177 # which is not in the stack.
6178 if ($in_continued_quote) {
6179 my $space_count = 0;
6180 my $available_space = 0;
6181 $level = -1; # flag to prevent storing in item_list
6182 $leading_spaces_to_go[$max_index_to_go] =
6183 $reduced_spaces_to_go[$max_index_to_go] =
6184 new_lp_indentation_item( $space_count, $level, $ci_level,
6185 $available_space, 0 );
6189 # get the top state from the stack
6190 my $space_count = $gnu_stack[$max_gnu_stack_index]->get_SPACES();
6191 my $current_level = $gnu_stack[$max_gnu_stack_index]->get_LEVEL();
6192 my $current_ci_level = $gnu_stack[$max_gnu_stack_index]->get_CI_LEVEL();
6194 my $type = $types_to_go[$max_index_to_go];
6195 my $token = $tokens_to_go[$max_index_to_go];
6196 my $total_depth = $nesting_depth_to_go[$max_index_to_go];
6198 if ( $type eq '{' || $type eq '(' ) {
6200 $gnu_comma_count{ $total_depth + 1 } = 0;
6201 $gnu_arrow_count{ $total_depth + 1 } = 0;
6203 # If we come to an opening token after an '=' token of some type,
6204 # see if it would be helpful to 'break' after the '=' to save space
6205 my $last_equals = $last_gnu_equals{$total_depth};
6206 if ( $last_equals && $last_equals > $line_start_index_to_go ) {
6208 # find the position if we break at the '='
6209 my $i_test = $last_equals;
6210 if ( $types_to_go[ $i_test + 1 ] eq 'b' ) { $i_test++ }
6213 ##my $too_close = ($i_test==$max_index_to_go-1);
6215 my $test_position = total_line_length( $i_test, $max_index_to_go );
6219 # the equals is not just before an open paren (testing)
6222 # if we are beyond the midpoint
6223 $gnu_position_predictor > $half_maximum_line_length
6225 # or we are beyont the 1/4 point and there was an old
6226 # break at the equals
6228 $gnu_position_predictor > $half_maximum_line_length / 2
6230 $old_breakpoint_to_go[$last_equals]
6231 || ( $last_equals > 0
6232 && $old_breakpoint_to_go[ $last_equals - 1 ] )
6233 || ( $last_equals > 1
6234 && $types_to_go[ $last_equals - 1 ] eq 'b'
6235 && $old_breakpoint_to_go[ $last_equals - 2 ] )
6241 # then make the switch -- note that we do not set a real
6242 # breakpoint here because we may not really need one; sub
6243 # scan_list will do that if necessary
6244 $line_start_index_to_go = $i_test + 1;
6245 $gnu_position_predictor = $test_position;
6250 # Check for decreasing depth ..
6251 # Note that one token may have both decreasing and then increasing
6252 # depth. For example, (level, ci) can go from (1,1) to (2,0). So,
6253 # in this example we would first go back to (1,0) then up to (2,0)
6255 if ( $level < $current_level || $ci_level < $current_ci_level ) {
6257 # loop to find the first entry at or completely below this level
6258 my ( $lev, $ci_lev );
6260 if ($max_gnu_stack_index) {
6262 # save index of token which closes this level
6263 $gnu_stack[$max_gnu_stack_index]->set_CLOSED($max_index_to_go);
6265 # Undo any extra indentation if we saw no commas
6266 my $available_spaces =
6267 $gnu_stack[$max_gnu_stack_index]->get_AVAILABLE_SPACES();
6269 my $comma_count = 0;
6270 my $arrow_count = 0;
6271 if ( $type eq '}' || $type eq ')' ) {
6272 $comma_count = $gnu_comma_count{$total_depth};
6273 $arrow_count = $gnu_arrow_count{$total_depth};
6274 $comma_count = 0 unless $comma_count;
6275 $arrow_count = 0 unless $arrow_count;
6277 $gnu_stack[$max_gnu_stack_index]->set_COMMA_COUNT($comma_count);
6278 $gnu_stack[$max_gnu_stack_index]->set_ARROW_COUNT($arrow_count);
6280 if ( $available_spaces > 0 ) {
6282 if ( $comma_count <= 0 || $arrow_count > 0 ) {
6284 my $i = $gnu_stack[$max_gnu_stack_index]->get_INDEX();
6286 $gnu_stack[$max_gnu_stack_index]
6287 ->get_SEQUENCE_NUMBER();
6289 # Be sure this item was created in this batch. This
6290 # should be true because we delete any available
6291 # space from open items at the end of each batch.
6292 if ( $gnu_sequence_number != $seqno
6293 || $i > $max_gnu_item_index )
6296 "Program bug with -lp. seqno=$seqno should be $gnu_sequence_number and i=$i should be less than max=$max_gnu_item_index\n"
6298 report_definite_bug();
6302 if ( $arrow_count == 0 ) {
6304 ->permanently_decrease_AVAILABLE_SPACES(
6309 ->tentatively_decrease_AVAILABLE_SPACES(
6316 $j <= $max_gnu_item_index ;
6321 ->decrease_SPACES($available_spaces);
6328 --$max_gnu_stack_index;
6329 $lev = $gnu_stack[$max_gnu_stack_index]->get_LEVEL();
6330 $ci_lev = $gnu_stack[$max_gnu_stack_index]->get_CI_LEVEL();
6332 # stop when we reach a level at or below the current level
6333 if ( $lev <= $level && $ci_lev <= $ci_level ) {
6335 $gnu_stack[$max_gnu_stack_index]->get_SPACES();
6336 $current_level = $lev;
6337 $current_ci_level = $ci_lev;
6342 # reached bottom of stack .. should never happen because
6343 # only negative levels can get here, and $level was forced
6344 # to be positive above.
6347 "program bug with -lp: stack_error. level=$level; lev=$lev; ci_level=$ci_level; ci_lev=$ci_lev; rerun with -nlp\n"
6349 report_definite_bug();
6355 # handle increasing depth
6356 if ( $level > $current_level || $ci_level > $current_ci_level ) {
6358 # Compute the standard incremental whitespace. This will be
6359 # the minimum incremental whitespace that will be used. This
6360 # choice results in a smooth transition between the gnu-style
6361 # and the standard style.
6362 my $standard_increment =
6363 ( $level - $current_level ) * $rOpts_indent_columns +
6364 ( $ci_level - $current_ci_level ) * $rOpts_continuation_indentation;
6366 # Now we have to define how much extra incremental space
6367 # ("$available_space") we want. This extra space will be
6368 # reduced as necessary when long lines are encountered or when
6369 # it becomes clear that we do not have a good list.
6370 my $available_space = 0;
6371 my $align_paren = 0;
6374 # initialization on empty stack..
6375 if ( $max_gnu_stack_index == 0 ) {
6376 $space_count = $level * $rOpts_indent_columns;
6379 # if this is a BLOCK, add the standard increment
6380 elsif ($last_nonblank_block_type) {
6381 $space_count += $standard_increment;
6384 # if last nonblank token was not structural indentation,
6385 # just use standard increment
6386 elsif ( $last_nonblank_type ne '{' ) {
6387 $space_count += $standard_increment;
6390 # otherwise use the space to the first non-blank level change token
6393 $space_count = $gnu_position_predictor;
6395 my $min_gnu_indentation =
6396 $gnu_stack[$max_gnu_stack_index]->get_SPACES();
6398 $available_space = $space_count - $min_gnu_indentation;
6399 if ( $available_space >= $standard_increment ) {
6400 $min_gnu_indentation += $standard_increment;
6402 elsif ( $available_space > 1 ) {
6403 $min_gnu_indentation += $available_space + 1;
6405 elsif ( $last_nonblank_token =~ /^[\{\[\(]$/ ) {
6406 if ( ( $tightness{$last_nonblank_token} < 2 ) ) {
6407 $min_gnu_indentation += 2;
6410 $min_gnu_indentation += 1;
6414 $min_gnu_indentation += $standard_increment;
6416 $available_space = $space_count - $min_gnu_indentation;
6418 if ( $available_space < 0 ) {
6419 $space_count = $min_gnu_indentation;
6420 $available_space = 0;
6425 # update state, but not on a blank token
6426 if ( $types_to_go[$max_index_to_go] ne 'b' ) {
6428 $gnu_stack[$max_gnu_stack_index]->set_HAVE_CHILD(1);
6430 ++$max_gnu_stack_index;
6431 $gnu_stack[$max_gnu_stack_index] =
6432 new_lp_indentation_item( $space_count, $level, $ci_level,
6433 $available_space, $align_paren );
6435 # If the opening paren is beyond the half-line length, then
6436 # we will use the minimum (standard) indentation. This will
6437 # help avoid problems associated with running out of space
6438 # near the end of a line. As a result, in deeply nested
6439 # lists, there will be some indentations which are limited
6440 # to this minimum standard indentation. But the most deeply
6441 # nested container will still probably be able to shift its
6442 # parameters to the right for proper alignment, so in most
6443 # cases this will not be noticable.
6444 if ( $available_space > 0
6445 && $space_count > $half_maximum_line_length )
6447 $gnu_stack[$max_gnu_stack_index]
6448 ->tentatively_decrease_AVAILABLE_SPACES($available_space);
6453 # Count commas and look for non-list characters. Once we see a
6454 # non-list character, we give up and don't look for any more commas.
6455 if ( $type eq '=>' ) {
6456 $gnu_arrow_count{$total_depth}++;
6458 # tentatively treating '=>' like '=' for estimating breaks
6459 # TODO: this could use some experimentation
6460 $last_gnu_equals{$total_depth} = $max_index_to_go;
6463 elsif ( $type eq ',' ) {
6464 $gnu_comma_count{$total_depth}++;
6467 elsif ( $is_assignment{$type} ) {
6468 $last_gnu_equals{$total_depth} = $max_index_to_go;
6471 # this token might start a new line
6472 # if this is a non-blank..
6473 if ( $type ne 'b' ) {
6478 # this is the first nonblank token of the line
6479 $max_index_to_go == 1 && $types_to_go[0] eq 'b'
6481 # or previous character was one of these:
6482 || $last_nonblank_type_to_go =~ /^([\:\?\,f])$/
6484 # or previous character was opening and this does not close it
6485 || ( $last_nonblank_type_to_go eq '{' && $type ne '}' )
6486 || ( $last_nonblank_type_to_go eq '(' and $type ne ')' )
6488 # or this token is one of these:
6489 || $type =~ /^([\.]|\|\||\&\&)$/
6491 # or this is a closing structure
6492 || ( $last_nonblank_type_to_go eq '}'
6493 && $last_nonblank_token_to_go eq $last_nonblank_type_to_go )
6495 # or previous token was keyword 'return'
6496 || ( $last_nonblank_type_to_go eq 'k'
6497 && ( $last_nonblank_token_to_go eq 'return' && $type ne '{' ) )
6499 # or starting a new line at certain keywords is fine
6501 && $is_if_unless_and_or_last_next_redo_return{$token} )
6503 # or this is after an assignment after a closing structure
6505 $is_assignment{$last_nonblank_type_to_go}
6507 $last_last_nonblank_type_to_go =~ /^[\}\)\]]$/
6509 # and it is significantly to the right
6510 || $gnu_position_predictor > $half_maximum_line_length
6515 check_for_long_gnu_style_lines();
6516 $line_start_index_to_go = $max_index_to_go;
6518 # back up 1 token if we want to break before that type
6519 # otherwise, we may strand tokens like '?' or ':' on a line
6520 if ( $line_start_index_to_go > 0 ) {
6521 if ( $last_nonblank_type_to_go eq 'k' ) {
6523 if ( $want_break_before{$last_nonblank_token_to_go} ) {
6524 $line_start_index_to_go--;
6527 elsif ( $want_break_before{$last_nonblank_type_to_go} ) {
6528 $line_start_index_to_go--;
6534 # remember the predicted position of this token on the output line
6535 if ( $max_index_to_go > $line_start_index_to_go ) {
6536 $gnu_position_predictor =
6537 total_line_length( $line_start_index_to_go, $max_index_to_go );
6540 $gnu_position_predictor = $space_count +
6541 token_sequence_length( $max_index_to_go, $max_index_to_go );
6544 # store the indentation object for this token
6545 # this allows us to manipulate the leading whitespace
6546 # (in case we have to reduce indentation to fit a line) without
6547 # having to change any token values
6548 $leading_spaces_to_go[$max_index_to_go] = $gnu_stack[$max_gnu_stack_index];
6549 $reduced_spaces_to_go[$max_index_to_go] =
6550 ( $max_gnu_stack_index > 0 && $ci_level )
6551 ? $gnu_stack[ $max_gnu_stack_index - 1 ]
6552 : $gnu_stack[$max_gnu_stack_index];
6556 sub check_for_long_gnu_style_lines {
6558 # look at the current estimated maximum line length, and
6559 # remove some whitespace if it exceeds the desired maximum
6561 # this is only for the '-lp' style
6562 return unless ($rOpts_line_up_parentheses);
6564 # nothing can be done if no stack items defined for this line
6565 return if ( $max_gnu_item_index == UNDEFINED_INDEX );
6567 # see if we have exceeded the maximum desired line length
6568 # keep 2 extra free because they are needed in some cases
6569 # (result of trial-and-error testing)
6571 $gnu_position_predictor - $rOpts_maximum_line_length + 2;
6573 return if ( $spaces_needed < 0 );
6575 # We are over the limit, so try to remove a requested number of
6576 # spaces from leading whitespace. We are only allowed to remove
6577 # from whitespace items created on this batch, since others have
6578 # already been used and cannot be undone.
6579 my @candidates = ();
6582 # loop over all whitespace items created for the current batch
6583 for ( $i = 0 ; $i <= $max_gnu_item_index ; $i++ ) {
6584 my $item = $gnu_item_list[$i];
6586 # item must still be open to be a candidate (otherwise it
6587 # cannot influence the current token)
6588 next if ( $item->get_CLOSED() >= 0 );
6590 my $available_spaces = $item->get_AVAILABLE_SPACES();
6592 if ( $available_spaces > 0 ) {
6593 push( @candidates, [ $i, $available_spaces ] );
6597 return unless (@candidates);
6599 # sort by available whitespace so that we can remove whitespace
6600 # from the maximum available first
6601 @candidates = sort { $b->[1] <=> $a->[1] } @candidates;
6603 # keep removing whitespace until we are done or have no more
6605 foreach $candidate (@candidates) {
6606 my ( $i, $available_spaces ) = @{$candidate};
6607 my $deleted_spaces =
6608 ( $available_spaces > $spaces_needed )
6610 : $available_spaces;
6612 # remove the incremental space from this item
6613 $gnu_item_list[$i]->decrease_AVAILABLE_SPACES($deleted_spaces);
6617 # update the leading whitespace of this item and all items
6618 # that came after it
6619 for ( ; $i <= $max_gnu_item_index ; $i++ ) {
6621 my $old_spaces = $gnu_item_list[$i]->get_SPACES();
6622 if ( $old_spaces > $deleted_spaces ) {
6623 $gnu_item_list[$i]->decrease_SPACES($deleted_spaces);
6626 # shouldn't happen except for code bug:
6628 my $level = $gnu_item_list[$i_debug]->get_LEVEL();
6629 my $ci_level = $gnu_item_list[$i_debug]->get_CI_LEVEL();
6630 my $old_level = $gnu_item_list[$i]->get_LEVEL();
6631 my $old_ci_level = $gnu_item_list[$i]->get_CI_LEVEL();
6633 "program bug with -lp: want to delete $deleted_spaces from item $i, but old=$old_spaces deleted: lev=$level ci=$ci_level deleted: level=$old_level ci=$ci_level\n"
6635 report_definite_bug();
6638 $gnu_position_predictor -= $deleted_spaces;
6639 $spaces_needed -= $deleted_spaces;
6640 last unless ( $spaces_needed > 0 );
6644 sub finish_lp_batch {
6646 # This routine is called once after each each output stream batch is
6647 # finished to undo indentation for all incomplete -lp
6648 # indentation levels. It is too risky to leave a level open,
6649 # because then we can't backtrack in case of a long line to follow.
6650 # This means that comments and blank lines will disrupt this
6651 # indentation style. But the vertical aligner may be able to
6652 # get the space back if there are side comments.
6654 # this is only for the 'lp' style
6655 return unless ($rOpts_line_up_parentheses);
6657 # nothing can be done if no stack items defined for this line
6658 return if ( $max_gnu_item_index == UNDEFINED_INDEX );
6660 # loop over all whitespace items created for the current batch
6662 for ( $i = 0 ; $i <= $max_gnu_item_index ; $i++ ) {
6663 my $item = $gnu_item_list[$i];
6665 # only look for open items
6666 next if ( $item->get_CLOSED() >= 0 );
6668 # Tentatively remove all of the available space
6669 # (The vertical aligner will try to get it back later)
6670 my $available_spaces = $item->get_AVAILABLE_SPACES();
6671 if ( $available_spaces > 0 ) {
6673 # delete incremental space for this item
6675 ->tentatively_decrease_AVAILABLE_SPACES($available_spaces);
6677 # Reduce the total indentation space of any nodes that follow
6678 # Note that any such nodes must necessarily be dependents
6680 foreach ( $i + 1 .. $max_gnu_item_index ) {
6681 $gnu_item_list[$_]->decrease_SPACES($available_spaces);
6688 sub reduce_lp_indentation {
6690 # reduce the leading whitespace at token $i if possible by $spaces_needed
6691 # (a large value of $spaces_needed will remove all excess space)
6692 # NOTE: to be called from scan_list only for a sequence of tokens
6693 # contained between opening and closing parens/braces/brackets
6695 my ( $i, $spaces_wanted ) = @_;
6696 my $deleted_spaces = 0;
6698 my $item = $leading_spaces_to_go[$i];
6699 my $available_spaces = $item->get_AVAILABLE_SPACES();
6702 $available_spaces > 0
6703 && ( ( $spaces_wanted <= $available_spaces )
6704 || !$item->get_HAVE_CHILD() )
6708 # we'll remove these spaces, but mark them as recoverable
6710 $item->tentatively_decrease_AVAILABLE_SPACES($spaces_wanted);
6713 return $deleted_spaces;
6716 sub token_sequence_length {
6718 # return length of tokens ($ifirst .. $ilast) including first & last
6719 # returns 0 if $ifirst > $ilast
6722 return 0 if ( $ilast < 0 || $ifirst > $ilast );
6723 return $lengths_to_go[ $ilast + 1 ] if ( $ifirst < 0 );
6724 return $lengths_to_go[ $ilast + 1 ] - $lengths_to_go[$ifirst];
6727 sub total_line_length {
6729 # return length of a line of tokens ($ifirst .. $ilast)
6732 if ( $ifirst < 0 ) { $ifirst = 0 }
6734 return leading_spaces_to_go($ifirst) +
6735 token_sequence_length( $ifirst, $ilast );
6738 sub excess_line_length {
6740 # return number of characters by which a line of tokens ($ifirst..$ilast)
6741 # exceeds the allowable line length.
6744 if ( $ifirst < 0 ) { $ifirst = 0 }
6745 return leading_spaces_to_go($ifirst) +
6746 token_sequence_length( $ifirst, $ilast ) - $rOpts_maximum_line_length;
6749 sub finish_formatting {
6751 # flush buffer and write any informative messages
6755 $file_writer_object->decrement_output_line_number()
6756 ; # fix up line number since it was incremented
6757 we_are_at_the_last_line();
6758 if ( $added_semicolon_count > 0 ) {
6759 my $first = ( $added_semicolon_count > 1 ) ? "First" : "";
6761 ( $added_semicolon_count > 1 ) ? "semicolons were" : "semicolon was";
6762 write_logfile_entry("$added_semicolon_count $what added:\n");
6763 write_logfile_entry(
6764 " $first at input line $first_added_semicolon_at\n");
6766 if ( $added_semicolon_count > 1 ) {
6767 write_logfile_entry(
6768 " Last at input line $last_added_semicolon_at\n");
6770 write_logfile_entry(" (Use -nasc to prevent semicolon addition)\n");
6771 write_logfile_entry("\n");
6774 if ( $deleted_semicolon_count > 0 ) {
6775 my $first = ( $deleted_semicolon_count > 1 ) ? "First" : "";
6777 ( $deleted_semicolon_count > 1 )
6780 write_logfile_entry(
6781 "$deleted_semicolon_count unnecessary $what deleted:\n");
6782 write_logfile_entry(
6783 " $first at input line $first_deleted_semicolon_at\n");
6785 if ( $deleted_semicolon_count > 1 ) {
6786 write_logfile_entry(
6787 " Last at input line $last_deleted_semicolon_at\n");
6789 write_logfile_entry(" (Use -ndsc to prevent semicolon deletion)\n");
6790 write_logfile_entry("\n");
6793 if ( $embedded_tab_count > 0 ) {
6794 my $first = ( $embedded_tab_count > 1 ) ? "First" : "";
6796 ( $embedded_tab_count > 1 )
6797 ? "quotes or patterns"
6798 : "quote or pattern";
6799 write_logfile_entry("$embedded_tab_count $what had embedded tabs:\n");
6800 write_logfile_entry(
6801 "This means the display of this script could vary with device or software\n"
6803 write_logfile_entry(" $first at input line $first_embedded_tab_at\n");
6805 if ( $embedded_tab_count > 1 ) {
6806 write_logfile_entry(
6807 " Last at input line $last_embedded_tab_at\n");
6809 write_logfile_entry("\n");
6812 if ($first_tabbing_disagreement) {
6813 write_logfile_entry(
6814 "First indentation disagreement seen at input line $first_tabbing_disagreement\n"
6818 if ($in_tabbing_disagreement) {
6819 write_logfile_entry(
6820 "Ending with indentation disagreement which started at input line $in_tabbing_disagreement\n"
6825 if ($last_tabbing_disagreement) {
6827 write_logfile_entry(
6828 "Last indentation disagreement seen at input line $last_tabbing_disagreement\n"
6832 write_logfile_entry("No indentation disagreement seen\n");
6835 write_logfile_entry("\n");
6837 $vertical_aligner_object->report_anything_unusual();
6839 $file_writer_object->report_line_length_errors();
6844 # This routine is called to check the Opts hash after it is defined
6847 my ( $tabbing_string, $tab_msg );
6849 make_static_block_comment_pattern();
6850 make_static_side_comment_pattern();
6851 make_closing_side_comment_prefix();
6852 make_closing_side_comment_list_pattern();
6853 $format_skipping_pattern_begin =
6854 make_format_skipping_pattern( 'format-skipping-begin', '#<<<' );
6855 $format_skipping_pattern_end =
6856 make_format_skipping_pattern( 'format-skipping-end', '#>>>' );
6858 # If closing side comments ARE selected, then we can safely
6859 # delete old closing side comments unless closing side comment
6860 # warnings are requested. This is a good idea because it will
6861 # eliminate any old csc's which fall below the line count threshold.
6862 # We cannot do this if warnings are turned on, though, because we
6863 # might delete some text which has been added. So that must
6864 # be handled when comments are created.
6865 if ( $rOpts->{'closing-side-comments'} ) {
6866 if ( !$rOpts->{'closing-side-comment-warnings'} ) {
6867 $rOpts->{'delete-closing-side-comments'} = 1;
6871 # If closing side comments ARE NOT selected, but warnings ARE
6872 # selected and we ARE DELETING csc's, then we will pretend to be
6873 # adding with a huge interval. This will force the comments to be
6874 # generated for comparison with the old comments, but not added.
6875 elsif ( $rOpts->{'closing-side-comment-warnings'} ) {
6876 if ( $rOpts->{'delete-closing-side-comments'} ) {
6877 $rOpts->{'delete-closing-side-comments'} = 0;
6878 $rOpts->{'closing-side-comments'} = 1;
6879 $rOpts->{'closing-side-comment-interval'} = 100000000;
6884 make_block_brace_vertical_tightness_pattern();
6886 if ( $rOpts->{'line-up-parentheses'} ) {
6888 if ( $rOpts->{'indent-only'}
6889 || !$rOpts->{'add-newlines'}
6890 || !$rOpts->{'delete-old-newlines'} )
6893 -----------------------------------------------------------------------
6894 Conflict: -lp conflicts with -io, -fnl, -nanl, or -ndnl; ignoring -lp
6896 The -lp indentation logic requires that perltidy be able to coordinate
6897 arbitrarily large numbers of line breakpoints. This isn't possible
6898 with these flags. Sometimes an acceptable workaround is to use -wocb=3
6899 -----------------------------------------------------------------------
6901 $rOpts->{'line-up-parentheses'} = 0;
6905 # At present, tabs are not compatable with the line-up-parentheses style
6906 # (it would be possible to entab the total leading whitespace
6907 # just prior to writing the line, if desired).
6908 if ( $rOpts->{'line-up-parentheses'} && $rOpts->{'tabs'} ) {
6910 Conflict: -t (tabs) cannot be used with the -lp option; ignoring -t; see -et.
6912 $rOpts->{'tabs'} = 0;
6915 # Likewise, tabs are not compatable with outdenting..
6916 if ( $rOpts->{'outdent-keywords'} && $rOpts->{'tabs'} ) {
6918 Conflict: -t (tabs) cannot be used with the -okw options; ignoring -t; see -et.
6920 $rOpts->{'tabs'} = 0;
6923 if ( $rOpts->{'outdent-labels'} && $rOpts->{'tabs'} ) {
6925 Conflict: -t (tabs) cannot be used with the -ola option; ignoring -t; see -et.
6927 $rOpts->{'tabs'} = 0;
6930 if ( !$rOpts->{'space-for-semicolon'} ) {
6931 $want_left_space{'f'} = -1;
6934 if ( $rOpts->{'space-terminal-semicolon'} ) {
6935 $want_left_space{';'} = 1;
6938 # implement outdenting preferences for keywords
6939 %outdent_keyword = ();
6940 unless ( @_ = split_words( $rOpts->{'outdent-keyword-okl'} ) ) {
6941 @_ = qw(next last redo goto return); # defaults
6944 # FUTURE: if not a keyword, assume that it is an identifier
6946 if ( $Perl::Tidy::Tokenizer::is_keyword{$_} ) {
6947 $outdent_keyword{$_} = 1;
6950 warn "ignoring '$_' in -okwl list; not a perl keyword";
6954 # implement user whitespace preferences
6955 if ( @_ = split_words( $rOpts->{'want-left-space'} ) ) {
6956 @want_left_space{@_} = (1) x scalar(@_);
6959 if ( @_ = split_words( $rOpts->{'want-right-space'} ) ) {
6960 @want_right_space{@_} = (1) x scalar(@_);
6963 if ( @_ = split_words( $rOpts->{'nowant-left-space'} ) ) {
6964 @want_left_space{@_} = (-1) x scalar(@_);
6967 if ( @_ = split_words( $rOpts->{'nowant-right-space'} ) ) {
6968 @want_right_space{@_} = (-1) x scalar(@_);
6970 if ( $rOpts->{'dump-want-left-space'} ) {
6971 dump_want_left_space(*STDOUT);
6975 if ( $rOpts->{'dump-want-right-space'} ) {
6976 dump_want_right_space(*STDOUT);
6980 # default keywords for which space is introduced before an opening paren
6981 # (at present, including them messes up vertical alignment)
6982 @_ = qw(my local our and or err eq ne if else elsif until
6983 unless while for foreach return switch case given when);
6984 @space_after_keyword{@_} = (1) x scalar(@_);
6986 # allow user to modify these defaults
6987 if ( @_ = split_words( $rOpts->{'space-after-keyword'} ) ) {
6988 @space_after_keyword{@_} = (1) x scalar(@_);
6991 if ( @_ = split_words( $rOpts->{'nospace-after-keyword'} ) ) {
6992 @space_after_keyword{@_} = (0) x scalar(@_);
6995 # implement user break preferences
6996 foreach my $tok ( split_words( $rOpts->{'want-break-after'} ) ) {
6997 if ( $tok eq '?' ) { $tok = ':' } # patch to coordinate ?/:
6998 my $lbs = $left_bond_strength{$tok};
6999 my $rbs = $right_bond_strength{$tok};
7000 if ( defined($lbs) && defined($rbs) && $lbs < $rbs ) {
7001 ( $right_bond_strength{$tok}, $left_bond_strength{$tok} ) =
7006 foreach my $tok ( split_words( $rOpts->{'want-break-before'} ) ) {
7007 my $lbs = $left_bond_strength{$tok};
7008 my $rbs = $right_bond_strength{$tok};
7009 if ( defined($lbs) && defined($rbs) && $rbs < $lbs ) {
7010 ( $right_bond_strength{$tok}, $left_bond_strength{$tok} ) =
7015 # make note if breaks are before certain key types
7016 %want_break_before = ();
7018 '=', '.', ',', ':', '?', '&&', '||', 'and',
7019 'or', 'err', 'xor', '+', '-', '*', '/',
7022 $want_break_before{$tok} =
7023 $left_bond_strength{$tok} < $right_bond_strength{$tok};
7026 # Coordinate ?/: breaks, which must be similar
7027 if ( !$want_break_before{':'} ) {
7028 $want_break_before{'?'} = $want_break_before{':'};
7029 $right_bond_strength{'?'} = $right_bond_strength{':'} + 0.01;
7030 $left_bond_strength{'?'} = NO_BREAK;
7033 # Define here tokens which may follow the closing brace of a do statement
7034 # on the same line, as in:
7035 # } while ( $something);
7036 @_ = qw(until while unless if ; : );
7038 @is_do_follower{@_} = (1) x scalar(@_);
7040 # These tokens may follow the closing brace of an if or elsif block.
7041 # In other words, for cuddled else we want code to look like:
7042 # } elsif ( $something) {
7044 if ( $rOpts->{'cuddled-else'} ) {
7045 @_ = qw(else elsif);
7046 @is_if_brace_follower{@_} = (1) x scalar(@_);
7049 %is_if_brace_follower = ();
7052 # nothing can follow the closing curly of an else { } block:
7053 %is_else_brace_follower = ();
7055 # what can follow a multi-line anonymous sub definition closing curly:
7056 @_ = qw# ; : => or and && || ~~ !~~ ) #;
7058 @is_anon_sub_brace_follower{@_} = (1) x scalar(@_);
7060 # what can follow a one-line anonynomous sub closing curly:
7061 # one-line anonumous subs also have ']' here...
7062 # see tk3.t and PP.pm
7063 @_ = qw# ; : => or and && || ) ] ~~ !~~ #;
7065 @is_anon_sub_1_brace_follower{@_} = (1) x scalar(@_);
7067 # What can follow a closing curly of a block
7068 # which is not an if/elsif/else/do/sort/map/grep/eval/sub
7069 # Testfiles: 'Toolbar.pm', 'Menubar.pm', bless.t, '3rules.pl'
7070 @_ = qw# ; : => or and && || ) #;
7073 # allow cuddled continue if cuddled else is specified
7074 if ( $rOpts->{'cuddled-else'} ) { push @_, 'continue'; }
7076 @is_other_brace_follower{@_} = (1) x scalar(@_);
7078 $right_bond_strength{'{'} = WEAK;
7079 $left_bond_strength{'{'} = VERY_STRONG;
7081 # make -l=0 equal to -l=infinite
7082 if ( !$rOpts->{'maximum-line-length'} ) {
7083 $rOpts->{'maximum-line-length'} = 1000000;
7086 # make -lbl=0 equal to -lbl=infinite
7087 if ( !$rOpts->{'long-block-line-count'} ) {
7088 $rOpts->{'long-block-line-count'} = 1000000;
7091 my $ole = $rOpts->{'output-line-ending'};
7100 unless ( $rOpts->{'output-line-ending'} = $endings{$ole} ) {
7101 my $str = join " ", keys %endings;
7103 Unrecognized line ending '$ole'; expecting one of: $str
7106 if ( $rOpts->{'preserve-line-endings'} ) {
7107 warn "Ignoring -ple; conflicts with -ole\n";
7108 $rOpts->{'preserve-line-endings'} = undef;
7112 # hashes used to simplify setting whitespace
7114 '{' => $rOpts->{'brace-tightness'},
7115 '}' => $rOpts->{'brace-tightness'},
7116 '(' => $rOpts->{'paren-tightness'},
7117 ')' => $rOpts->{'paren-tightness'},
7118 '[' => $rOpts->{'square-bracket-tightness'},
7119 ']' => $rOpts->{'square-bracket-tightness'},
7128 # frequently used parameters
7129 $rOpts_add_newlines = $rOpts->{'add-newlines'};
7130 $rOpts_add_whitespace = $rOpts->{'add-whitespace'};
7131 $rOpts_block_brace_tightness = $rOpts->{'block-brace-tightness'};
7132 $rOpts_block_brace_vertical_tightness =
7133 $rOpts->{'block-brace-vertical-tightness'};
7134 $rOpts_brace_left_and_indent = $rOpts->{'brace-left-and-indent'};
7135 $rOpts_comma_arrow_breakpoints = $rOpts->{'comma-arrow-breakpoints'};
7136 $rOpts_break_at_old_ternary_breakpoints =
7137 $rOpts->{'break-at-old-ternary-breakpoints'};
7138 $rOpts_break_at_old_comma_breakpoints =
7139 $rOpts->{'break-at-old-comma-breakpoints'};
7140 $rOpts_break_at_old_keyword_breakpoints =
7141 $rOpts->{'break-at-old-keyword-breakpoints'};
7142 $rOpts_break_at_old_logical_breakpoints =
7143 $rOpts->{'break-at-old-logical-breakpoints'};
7144 $rOpts_closing_side_comment_else_flag =
7145 $rOpts->{'closing-side-comment-else-flag'};
7146 $rOpts_closing_side_comment_maximum_text =
7147 $rOpts->{'closing-side-comment-maximum-text'};
7148 $rOpts_continuation_indentation = $rOpts->{'continuation-indentation'};
7149 $rOpts_cuddled_else = $rOpts->{'cuddled-else'};
7150 $rOpts_delete_old_whitespace = $rOpts->{'delete-old-whitespace'};
7151 $rOpts_fuzzy_line_length = $rOpts->{'fuzzy-line-length'};
7152 $rOpts_indent_columns = $rOpts->{'indent-columns'};
7153 $rOpts_line_up_parentheses = $rOpts->{'line-up-parentheses'};
7154 $rOpts_maximum_fields_per_table = $rOpts->{'maximum-fields-per-table'};
7155 $rOpts_maximum_line_length = $rOpts->{'maximum-line-length'};
7156 $rOpts_short_concatenation_item_length =
7157 $rOpts->{'short-concatenation-item-length'};
7158 $rOpts_swallow_optional_blank_lines =
7159 $rOpts->{'swallow-optional-blank-lines'};
7160 $rOpts_ignore_old_breakpoints = $rOpts->{'ignore-old-breakpoints'};
7161 $rOpts_format_skipping = $rOpts->{'format-skipping'};
7162 $rOpts_space_function_paren = $rOpts->{'space-function-paren'};
7163 $rOpts_space_keyword_paren = $rOpts->{'space-keyword-paren'};
7164 $half_maximum_line_length = $rOpts_maximum_line_length / 2;
7166 # Note that both opening and closing tokens can access the opening
7167 # and closing flags of their container types.
7168 %opening_vertical_tightness = (
7169 '(' => $rOpts->{'paren-vertical-tightness'},
7170 '{' => $rOpts->{'brace-vertical-tightness'},
7171 '[' => $rOpts->{'square-bracket-vertical-tightness'},
7172 ')' => $rOpts->{'paren-vertical-tightness'},
7173 '}' => $rOpts->{'brace-vertical-tightness'},
7174 ']' => $rOpts->{'square-bracket-vertical-tightness'},
7177 %closing_vertical_tightness = (
7178 '(' => $rOpts->{'paren-vertical-tightness-closing'},
7179 '{' => $rOpts->{'brace-vertical-tightness-closing'},
7180 '[' => $rOpts->{'square-bracket-vertical-tightness-closing'},
7181 ')' => $rOpts->{'paren-vertical-tightness-closing'},
7182 '}' => $rOpts->{'brace-vertical-tightness-closing'},
7183 ']' => $rOpts->{'square-bracket-vertical-tightness-closing'},
7186 # assume flag for '>' same as ')' for closing qw quotes
7187 %closing_token_indentation = (
7188 ')' => $rOpts->{'closing-paren-indentation'},
7189 '}' => $rOpts->{'closing-brace-indentation'},
7190 ']' => $rOpts->{'closing-square-bracket-indentation'},
7191 '>' => $rOpts->{'closing-paren-indentation'},
7194 %opening_token_right = (
7195 '(' => $rOpts->{'opening-paren-right'},
7196 '{' => $rOpts->{'opening-hash-brace-right'},
7197 '[' => $rOpts->{'opening-square-bracket-right'},
7200 %stack_opening_token = (
7201 '(' => $rOpts->{'stack-opening-paren'},
7202 '{' => $rOpts->{'stack-opening-hash-brace'},
7203 '[' => $rOpts->{'stack-opening-square-bracket'},
7206 %stack_closing_token = (
7207 ')' => $rOpts->{'stack-closing-paren'},
7208 '}' => $rOpts->{'stack-closing-hash-brace'},
7209 ']' => $rOpts->{'stack-closing-square-bracket'},
7213 sub make_static_block_comment_pattern {
7215 # create the pattern used to identify static block comments
7216 $static_block_comment_pattern = '^\s*##';
7218 # allow the user to change it
7219 if ( $rOpts->{'static-block-comment-prefix'} ) {
7220 my $prefix = $rOpts->{'static-block-comment-prefix'};
7221 $prefix =~ s/^\s*//;
7222 my $pattern = $prefix;
7224 # user may give leading caret to force matching left comments only
7225 if ( $prefix !~ /^\^#/ ) {
7226 if ( $prefix !~ /^#/ ) {
7228 "ERROR: the -sbcp prefix is '$prefix' but must begin with '#' or '^#'\n";
7230 $pattern = '^\s*' . $prefix;
7232 eval "'##'=~/$pattern/";
7235 "ERROR: the -sbc prefix '$prefix' causes the invalid regex '$pattern'\n";
7237 $static_block_comment_pattern = $pattern;
7241 sub make_format_skipping_pattern {
7242 my ( $opt_name, $default ) = @_;
7243 my $param = $rOpts->{$opt_name};
7244 unless ($param) { $param = $default }
7246 if ( $param !~ /^#/ ) {
7247 die "ERROR: the $opt_name parameter '$param' must begin with '#'\n";
7249 my $pattern = '^' . $param . '\s';
7250 eval "'#'=~/$pattern/";
7253 "ERROR: the $opt_name parameter '$param' causes the invalid regex '$pattern'\n";
7258 sub make_closing_side_comment_list_pattern {
7260 # turn any input list into a regex for recognizing selected block types
7261 $closing_side_comment_list_pattern = '^\w+';
7262 if ( defined( $rOpts->{'closing-side-comment-list'} )
7263 && $rOpts->{'closing-side-comment-list'} )
7265 $closing_side_comment_list_pattern =
7266 make_block_pattern( '-cscl', $rOpts->{'closing-side-comment-list'} );
7270 sub make_bli_pattern {
7272 if ( defined( $rOpts->{'brace-left-and-indent-list'} )
7273 && $rOpts->{'brace-left-and-indent-list'} )
7275 $bli_list_string = $rOpts->{'brace-left-and-indent-list'};
7278 $bli_pattern = make_block_pattern( '-blil', $bli_list_string );
7281 sub make_block_brace_vertical_tightness_pattern {
7283 # turn any input list into a regex for recognizing selected block types
7284 $block_brace_vertical_tightness_pattern =
7285 '^((if|else|elsif|unless|while|for|foreach|do|\w+:)$|sub)';
7287 if ( defined( $rOpts->{'block-brace-vertical-tightness-list'} )
7288 && $rOpts->{'block-brace-vertical-tightness-list'} )
7290 $block_brace_vertical_tightness_pattern =
7291 make_block_pattern( '-bbvtl',
7292 $rOpts->{'block-brace-vertical-tightness-list'} );
7296 sub make_block_pattern {
7298 # given a string of block-type keywords, return a regex to match them
7299 # The only tricky part is that labels are indicated with a single ':'
7300 # and the 'sub' token text may have additional text after it (name of
7305 # input string: "if else elsif unless while for foreach do : sub";
7306 # pattern: '^((if|else|elsif|unless|while|for|foreach|do|\w+:)$|sub)';
7308 my ( $abbrev, $string ) = @_;
7309 my @list = split_words($string);
7315 if ( $i eq 'sub' ) {
7317 elsif ( $i eq ':' ) {
7318 push @words, '\w+:';
7320 elsif ( $i =~ /^\w/ ) {
7324 warn "unrecognized block type $i after $abbrev, ignoring\n";
7327 my $pattern = '(' . join( '|', @words ) . ')$';
7328 if ( $seen{'sub'} ) {
7329 $pattern = '(' . $pattern . '|sub)';
7331 $pattern = '^' . $pattern;
7335 sub make_static_side_comment_pattern {
7337 # create the pattern used to identify static side comments
7338 $static_side_comment_pattern = '^##';
7340 # allow the user to change it
7341 if ( $rOpts->{'static-side-comment-prefix'} ) {
7342 my $prefix = $rOpts->{'static-side-comment-prefix'};
7343 $prefix =~ s/^\s*//;
7344 my $pattern = '^' . $prefix;
7345 eval "'##'=~/$pattern/";
7348 "ERROR: the -sscp prefix '$prefix' causes the invalid regex '$pattern'\n";
7350 $static_side_comment_pattern = $pattern;
7354 sub make_closing_side_comment_prefix {
7356 # Be sure we have a valid closing side comment prefix
7357 my $csc_prefix = $rOpts->{'closing-side-comment-prefix'};
7358 my $csc_prefix_pattern;
7359 if ( !defined($csc_prefix) ) {
7360 $csc_prefix = '## end';
7361 $csc_prefix_pattern = '^##\s+end';
7364 my $test_csc_prefix = $csc_prefix;
7365 if ( $test_csc_prefix !~ /^#/ ) {
7366 $test_csc_prefix = '#' . $test_csc_prefix;
7369 # make a regex to recognize the prefix
7370 my $test_csc_prefix_pattern = $test_csc_prefix;
7372 # escape any special characters
7373 $test_csc_prefix_pattern =~ s/([^#\s\w])/\\$1/g;
7375 $test_csc_prefix_pattern = '^' . $test_csc_prefix_pattern;
7377 # allow exact number of intermediate spaces to vary
7378 $test_csc_prefix_pattern =~ s/\s+/\\s\+/g;
7380 # make sure we have a good pattern
7381 # if we fail this we probably have an error in escaping
7383 eval "'##'=~/$test_csc_prefix_pattern/";
7386 # shouldn't happen..must have screwed up escaping, above
7387 report_definite_bug();
7389 "Program Error: the -cscp prefix '$csc_prefix' caused the invalid regex '$csc_prefix_pattern'\n";
7391 # just warn and keep going with defaults
7392 warn "Please consider using a simpler -cscp prefix\n";
7393 warn "Using default -cscp instead; please check output\n";
7396 $csc_prefix = $test_csc_prefix;
7397 $csc_prefix_pattern = $test_csc_prefix_pattern;
7400 $rOpts->{'closing-side-comment-prefix'} = $csc_prefix;
7401 $closing_side_comment_prefix_pattern = $csc_prefix_pattern;
7404 sub dump_want_left_space {
7408 These values are the main control of whitespace to the left of a token type;
7409 They may be altered with the -wls parameter.
7410 For a list of token types, use perltidy --dump-token-types (-dtt)
7411 1 means the token wants a space to its left
7412 -1 means the token does not want a space to its left
7413 ------------------------------------------------------------------------
7415 foreach ( sort keys %want_left_space ) {
7416 print $fh "$_\t$want_left_space{$_}\n";
7420 sub dump_want_right_space {
7424 These values are the main control of whitespace to the right of a token type;
7425 They may be altered with the -wrs parameter.
7426 For a list of token types, use perltidy --dump-token-types (-dtt)
7427 1 means the token wants a space to its right
7428 -1 means the token does not want a space to its right
7429 ------------------------------------------------------------------------
7431 foreach ( sort keys %want_right_space ) {
7432 print $fh "$_\t$want_right_space{$_}\n";
7436 { # begin is_essential_whitespace
7438 my %is_sort_grep_map;
7443 @_ = qw(sort grep map);
7444 @is_sort_grep_map{@_} = (1) x scalar(@_);
7446 @_ = qw(for foreach);
7447 @is_for_foreach{@_} = (1) x scalar(@_);
7451 sub is_essential_whitespace {
7453 # Essential whitespace means whitespace which cannot be safely deleted
7454 # without risking the introduction of a syntax error.
7455 # We are given three tokens and their types:
7456 # ($tokenl, $typel) is the token to the left of the space in question
7457 # ($tokenr, $typer) is the token to the right of the space in question
7458 # ($tokenll, $typell) is previous nonblank token to the left of $tokenl
7460 # This is a slow routine but is not needed too often except when -mangle
7463 # Note: This routine should almost never need to be changed. It is
7464 # for avoiding syntax problems rather than for formatting.
7465 my ( $tokenll, $typell, $tokenl, $typel, $tokenr, $typer ) = @_;
7469 # never combine two bare words or numbers
7470 # examples: and ::ok(1)
7472 # for bla::bla:: abc
7473 # example is "%overload:: and" in files Dumpvalue.pm or colonbug.pl
7474 # $input eq"quit" to make $inputeq"quit"
7475 # my $size=-s::SINK if $file; <==OK but we won't do it
7476 # don't join something like: for bla::bla:: abc
7477 # example is "%overload:: and" in files Dumpvalue.pm or colonbug.pl
7478 ( ( $tokenl =~ /([\'\w]|\:\:)$/ ) && ( $tokenr =~ /^([\'\w]|\:\:)/ ) )
7480 # do not combine a number with a concatination dot
7481 # example: pom.caputo:
7482 # $vt100_compatible ? "\e[0;0H" : ('-' x 78 . "\n");
7483 || ( ( $typel eq 'n' ) && ( $tokenr eq '.' ) )
7484 || ( ( $typer eq 'n' ) && ( $tokenl eq '.' ) )
7486 # do not join a minus with a bare word, because you might form
7487 # a file test operator. Example from Complex.pm:
7488 # if (CORE::abs($z - i) < $eps); "z-i" would be taken as a file test.
7489 || ( ( $tokenl eq '-' ) && ( $tokenr =~ /^[_A-Za-z]$/ ) )
7491 # and something like this could become ambiguous without space
7493 # use constant III=>1;
7497 || ( ( $tokenl eq '-' )
7498 && ( $typer =~ /^[wC]$/ && $tokenr =~ /^[_A-Za-z]/ ) )
7500 # '= -' should not become =- or you will get a warning
7502 # || ($tokenr eq '-')
7504 # keep a space between a quote and a bareword to prevent the
7505 # bareword from becomming a quote modifier.
7506 || ( ( $typel eq 'Q' ) && ( $tokenr =~ /^[a-zA-Z_]/ ) )
7508 # keep a space between a token ending in '$' and any word;
7509 # this caused trouble: "die @$ if $@"
7510 || ( ( $typel eq 'i' && $tokenl =~ /\$$/ )
7511 && ( $tokenr =~ /^[a-zA-Z_]/ ) )
7513 # perl is very fussy about spaces before <<
7514 || ( $tokenr =~ /^\<\</ )
7516 # avoid combining tokens to create new meanings. Example:
7517 # $a+ +$b must not become $a++$b
7518 || ( $is_digraph{ $tokenl . $tokenr } )
7519 || ( $is_trigraph{ $tokenl . $tokenr } )
7521 # another example: do not combine these two &'s:
7522 # allow_options & &OPT_EXECCGI
7523 || ( $is_digraph{ $tokenl . substr( $tokenr, 0, 1 ) } )
7525 # don't combine $$ or $# with any alphanumeric
7526 # (testfile mangle.t with --mangle)
7527 || ( ( $tokenl =~ /^\$[\$\#]$/ ) && ( $tokenr =~ /^\w/ ) )
7529 # retain any space after possible filehandle
7530 # (testfiles prnterr1.t with --extrude and mangle.t with --mangle)
7531 || ( $typel eq 'Z' )
7533 # Perl is sensitive to whitespace after the + here:
7534 # $b = xvals $a + 0.1 * yvals $a;
7535 || ( $typell eq 'Z' && $typel =~ /^[\/\?\+\-\*]$/ )
7537 # keep paren separate in 'use Foo::Bar ()'
7541 && $tokenll eq 'use' )
7543 # keep any space between filehandle and paren:
7544 # file mangle.t with --mangle:
7545 || ( $typel eq 'Y' && $tokenr eq '(' )
7547 # retain any space after here doc operator ( hereerr.t)
7548 || ( $typel eq 'h' )
7550 # FIXME: this needs some further work; extrude.t has test cases
7551 # it is safest to retain any space after start of ? : operator
7552 # because of perl's quirky parser.
7553 # ie, this line will fail if you remove the space after the '?':
7554 # $b=join $comma ? ',' : ':', @_; # ok
7555 # $b=join $comma ?',' : ':', @_; # error!
7557 # $b=join $comma?',' : ':', @_; # not a problem!
7558 ## || ($typel eq '?')
7560 # be careful with a space around ++ and --, to avoid ambiguity as to
7561 # which token it applies
7562 || ( ( $typer =~ /^(pp|mm)$/ ) && ( $tokenl !~ /^[\;\{\(\[]/ ) )
7563 || ( ( $typel =~ /^(\+\+|\-\-)$/ ) && ( $tokenr !~ /^[\;\}\)\]]/ ) )
7565 # need space after foreach my; for example, this will fail in
7566 # older versions of Perl:
7567 # foreach my$ft(@filetypes)...
7572 && $is_for_foreach{$tokenll}
7576 # must have space between grep and left paren; "grep(" will fail
7577 || ( $tokenr eq '(' && $is_sort_grep_map{$tokenl} )
7579 # don't stick numbers next to left parens, as in:
7580 #use Mail::Internet 1.28 (); (see Entity.pm, Head.pm, Test.pm)
7581 || ( ( $typel eq 'n' ) && ( $tokenr eq '(' ) )
7583 ; # the value of this long logic sequence is the result we want
7588 sub set_white_space_flag {
7590 # This routine examines each pair of nonblank tokens and
7591 # sets values for array @white_space_flag.
7593 # $white_space_flag[$j] is a flag indicating whether a white space
7594 # BEFORE token $j is needed, with the following values:
7596 # -1 do not want a space before token $j
7597 # 0 optional space or $j is a whitespace
7598 # 1 want a space before token $j
7601 # The values for the first token will be defined based
7602 # upon the contents of the "to_go" output array.
7604 # Note: retain debug print statements because they are usually
7605 # required after adding new token types.
7609 # initialize these global hashes, which control the use of
7610 # whitespace around tokens:
7615 # %space_after_keyword
7617 # Many token types are identical to the tokens themselves.
7618 # See the tokenizer for a complete list. Here are some special types:
7620 # f = semicolon in for statement
7623 # Note that :: is excluded since it should be contained in an identifier
7624 # Note that '->' is excluded because it never gets space
7625 # parentheses and brackets are excluded since they are handled specially
7626 # curly braces are included but may be overridden by logic, such as
7629 # NEW_TOKENS: create a whitespace rule here. This can be as
7630 # simple as adding your new letter to @spaces_both_sides, for
7634 @is_opening_type{@_} = (1) x scalar(@_);
7637 @is_closing_type{@_} = (1) x scalar(@_);
7639 my @spaces_both_sides = qw"
7640 + - * / % ? = . : x < > | & ^ .. << >> ** && .. || // => += -=
7641 .= %= x= &= |= ^= *= <> <= >= == =~ !~ /= != ... <<= >>= ~~ !~~
7642 &&= ||= //= <=> A k f w F n C Y U G v
7645 my @spaces_left_side = qw"
7646 t ! ~ m p { \ h pp mm Z j
7648 push( @spaces_left_side, '#' ); # avoids warning message
7650 my @spaces_right_side = qw"
7651 ; } ) ] R J ++ -- **=
7653 push( @spaces_right_side, ',' ); # avoids warning message
7654 @want_left_space{@spaces_both_sides} = (1) x scalar(@spaces_both_sides);
7655 @want_right_space{@spaces_both_sides} =
7656 (1) x scalar(@spaces_both_sides);
7657 @want_left_space{@spaces_left_side} = (1) x scalar(@spaces_left_side);
7658 @want_right_space{@spaces_left_side} = (-1) x scalar(@spaces_left_side);
7659 @want_left_space{@spaces_right_side} =
7660 (-1) x scalar(@spaces_right_side);
7661 @want_right_space{@spaces_right_side} =
7662 (1) x scalar(@spaces_right_side);
7663 $want_left_space{'L'} = WS_NO;
7664 $want_left_space{'->'} = WS_NO;
7665 $want_right_space{'->'} = WS_NO;
7666 $want_left_space{'**'} = WS_NO;
7667 $want_right_space{'**'} = WS_NO;
7669 # hash type information must stay tightly bound
7671 $binary_ws_rules{'i'}{'L'} = WS_NO;
7672 $binary_ws_rules{'i'}{'{'} = WS_YES;
7673 $binary_ws_rules{'k'}{'{'} = WS_YES;
7674 $binary_ws_rules{'U'}{'{'} = WS_YES;
7675 $binary_ws_rules{'i'}{'['} = WS_NO;
7676 $binary_ws_rules{'R'}{'L'} = WS_NO;
7677 $binary_ws_rules{'R'}{'{'} = WS_NO;
7678 $binary_ws_rules{'t'}{'L'} = WS_NO;
7679 $binary_ws_rules{'t'}{'{'} = WS_NO;
7680 $binary_ws_rules{'}'}{'L'} = WS_NO;
7681 $binary_ws_rules{'}'}{'{'} = WS_NO;
7682 $binary_ws_rules{'$'}{'L'} = WS_NO;
7683 $binary_ws_rules{'$'}{'{'} = WS_NO;
7684 $binary_ws_rules{'@'}{'L'} = WS_NO;
7685 $binary_ws_rules{'@'}{'{'} = WS_NO;
7686 $binary_ws_rules{'='}{'L'} = WS_YES;
7688 # the following includes ') {'
7689 # as in : if ( xxx ) { yyy }
7690 $binary_ws_rules{']'}{'L'} = WS_NO;
7691 $binary_ws_rules{']'}{'{'} = WS_NO;
7692 $binary_ws_rules{')'}{'{'} = WS_YES;
7693 $binary_ws_rules{')'}{'['} = WS_NO;
7694 $binary_ws_rules{']'}{'['} = WS_NO;
7695 $binary_ws_rules{']'}{'{'} = WS_NO;
7696 $binary_ws_rules{'}'}{'['} = WS_NO;
7697 $binary_ws_rules{'R'}{'['} = WS_NO;
7699 $binary_ws_rules{']'}{'++'} = WS_NO;
7700 $binary_ws_rules{']'}{'--'} = WS_NO;
7701 $binary_ws_rules{')'}{'++'} = WS_NO;
7702 $binary_ws_rules{')'}{'--'} = WS_NO;
7704 $binary_ws_rules{'R'}{'++'} = WS_NO;
7705 $binary_ws_rules{'R'}{'--'} = WS_NO;
7707 ########################################################
7708 # should no longer be necessary (see niek.pl)
7709 ##$binary_ws_rules{'k'}{':'} = WS_NO; # keep colon with label
7710 ##$binary_ws_rules{'w'}{':'} = WS_NO;
7711 ########################################################
7712 $binary_ws_rules{'i'}{'Q'} = WS_YES;
7713 $binary_ws_rules{'n'}{'('} = WS_YES; # occurs in 'use package n ()'
7715 # FIXME: we need to split 'i' into variables and functions
7716 # and have no space for functions but space for variables. For now,
7717 # I have a special patch in the special rules below
7718 $binary_ws_rules{'i'}{'('} = WS_NO;
7720 $binary_ws_rules{'w'}{'('} = WS_NO;
7721 $binary_ws_rules{'w'}{'{'} = WS_YES;
7723 my ( $jmax, $rtokens, $rtoken_type, $rblock_type ) = @_;
7724 my ( $last_token, $last_type, $last_block_type, $token, $type,
7726 my (@white_space_flag);
7727 my $j_tight_closing_paren = -1;
7729 if ( $max_index_to_go >= 0 ) {
7730 $token = $tokens_to_go[$max_index_to_go];
7731 $type = $types_to_go[$max_index_to_go];
7732 $block_type = $block_type_to_go[$max_index_to_go];
7740 # loop over all tokens
7743 for ( $j = 0 ; $j <= $jmax ; $j++ ) {
7745 if ( $$rtoken_type[$j] eq 'b' ) {
7746 $white_space_flag[$j] = WS_OPTIONAL;
7750 # set a default value, to be changed as needed
7752 $last_token = $token;
7754 $last_block_type = $block_type;
7755 $token = $$rtokens[$j];
7756 $type = $$rtoken_type[$j];
7757 $block_type = $$rblock_type[$j];
7759 #---------------------------------------------------------------
7761 # handle space on the inside of opening braces
7762 #---------------------------------------------------------------
7765 if ( $is_opening_type{$last_type} ) {
7767 $j_tight_closing_paren = -1;
7769 # let's keep empty matched braces together: () {} []
7771 if ( $token eq $matching_token{$last_token} ) {
7781 # we're considering the right of an opening brace
7782 # tightness = 0 means always pad inside with space
7783 # tightness = 1 means pad inside if "complex"
7784 # tightness = 2 means never pad inside with space
7787 if ( $last_type eq '{'
7788 && $last_token eq '{'
7789 && $last_block_type )
7791 $tightness = $rOpts_block_brace_tightness;
7793 else { $tightness = $tightness{$last_token} }
7795 if ( $tightness <= 0 ) {
7798 elsif ( $tightness > 1 ) {
7803 # Patch to count '-foo' as single token so that
7804 # each of $a{-foo} and $a{foo} and $a{'foo'} do
7805 # not get spaces with default formatting.
7809 && $last_token eq '{'
7810 && $$rtoken_type[ $j + 1 ] eq 'w' );
7812 # $j_next is where a closing token should be if
7813 # the container has a single token
7815 ( $$rtoken_type[ $j_here + 1 ] eq 'b' )
7818 my $tok_next = $$rtokens[$j_next];
7819 my $type_next = $$rtoken_type[$j_next];
7821 # for tightness = 1, if there is just one token
7822 # within the matching pair, we will keep it tight
7824 $tok_next eq $matching_token{$last_token}
7826 # but watch out for this: [ [ ] (misc.t)
7827 && $last_token ne $token
7831 # remember where to put the space for the closing paren
7832 $j_tight_closing_paren = $j_next;
7840 } # done with opening braces and brackets
7842 if FORMATTER_DEBUG_FLAG_WHITE;
7844 #---------------------------------------------------------------
7846 # handle space on inside of closing brace pairs
7847 #---------------------------------------------------------------
7850 if ( $is_closing_type{$type} ) {
7852 if ( $j == $j_tight_closing_paren ) {
7854 $j_tight_closing_paren = -1;
7859 if ( !defined($ws) ) {
7862 if ( $type eq '}' && $token eq '}' && $block_type ) {
7863 $tightness = $rOpts_block_brace_tightness;
7865 else { $tightness = $tightness{$token} }
7867 $ws = ( $tightness > 1 ) ? WS_NO : WS_YES;
7873 if FORMATTER_DEBUG_FLAG_WHITE;
7875 #---------------------------------------------------------------
7877 # use the binary table
7878 #---------------------------------------------------------------
7879 if ( !defined($ws) ) {
7880 $ws = $binary_ws_rules{$last_type}{$type};
7883 if FORMATTER_DEBUG_FLAG_WHITE;
7885 #---------------------------------------------------------------
7887 # some special cases
7888 #---------------------------------------------------------------
7889 if ( $token eq '(' ) {
7891 # This will have to be tweaked as tokenization changes.
7892 # We usually want a space at '} (', for example:
7893 # map { 1 * $_; } ( $y, $M, $w, $d, $h, $m, $s );
7896 # &{ $_->[1] }( delete $_[$#_]{ $_->[0] } );
7897 # At present, the above & block is marked as type L/R so this case
7898 # won't go through here.
7899 if ( $last_type eq '}' ) { $ws = WS_YES }
7901 # NOTE: some older versions of Perl had occasional problems if
7902 # spaces are introduced between keywords or functions and opening
7903 # parens. So the default is not to do this except is certain
7904 # cases. The current Perl seems to tolerate spaces.
7906 # Space between keyword and '('
7907 elsif ( $last_type eq 'k' ) {
7909 unless ( $rOpts_space_keyword_paren
7910 || $space_after_keyword{$last_token} );
7913 # Space between function and '('
7914 # -----------------------------------------------------
7915 # 'w' and 'i' checks for something like:
7916 # myfun( &myfun( ->myfun(
7917 # -----------------------------------------------------
7918 elsif (( $last_type =~ /^[wU]$/ )
7919 || ( $last_type =~ /^[wi]$/ && $last_token =~ /^(\&|->)/ ) )
7921 $ws = WS_NO unless ($rOpts_space_function_paren);
7924 # space between something like $i and ( in
7925 # for $i ( 0 .. 20 ) {
7926 # FIXME: eventually, type 'i' needs to be split into multiple
7927 # token types so this can be a hardwired rule.
7928 elsif ( $last_type eq 'i' && $last_token =~ /^[\$\%\@]/ ) {
7932 # allow constant function followed by '()' to retain no space
7933 elsif ( $last_type eq 'C' && $$rtokens[ $j + 1 ] eq ')' ) {
7938 # patch for SWITCH/CASE: make space at ']{' optional
7939 # since the '{' might begin a case or when block
7940 elsif ( ( $token eq '{' && $type ne 'L' ) && $last_token eq ']' ) {
7944 # keep space between 'sub' and '{' for anonymous sub definition
7945 if ( $type eq '{' ) {
7946 if ( $last_token eq 'sub' ) {
7950 # this is needed to avoid no space in '){'
7951 if ( $last_token eq ')' && $token eq '{' ) { $ws = WS_YES }
7953 # avoid any space before the brace or bracket in something like
7954 # @opts{'a','b',...}
7955 if ( $last_type eq 'i' && $last_token =~ /^\@/ ) {
7960 elsif ( $type eq 'i' ) {
7962 # never a space before ->
7963 if ( $token =~ /^\-\>/ ) {
7968 # retain any space between '-' and bare word
7969 elsif ( $type eq 'w' || $type eq 'C' ) {
7970 $ws = WS_OPTIONAL if $last_type eq '-';
7972 # never a space before ->
7973 if ( $token =~ /^\-\>/ ) {
7978 # retain any space between '-' and bare word
7979 # example: avoid space between 'USER' and '-' here:
7980 # $myhash{USER-NAME}='steve';
7981 elsif ( $type eq 'm' || $type eq '-' ) {
7982 $ws = WS_OPTIONAL if ( $last_type eq 'w' );
7985 # always space before side comment
7986 elsif ( $type eq '#' ) { $ws = WS_YES if $j > 0 }
7988 # always preserver whatever space was used after a possible
7989 # filehandle (except _) or here doc operator
7992 && ( ( $last_type eq 'Z' && $last_token ne '_' )
7993 || $last_type eq 'h' )
8000 if FORMATTER_DEBUG_FLAG_WHITE;
8002 #---------------------------------------------------------------
8004 # default rules not covered above
8005 #---------------------------------------------------------------
8006 # if we fall through to here,
8007 # look at the pre-defined hash tables for the two tokens, and
8008 # if (they are equal) use the common value
8009 # if (either is zero or undef) use the other
8010 # if (either is -1) use it
8024 if ( !defined($ws) ) {
8025 my $wl = $want_left_space{$type};
8026 my $wr = $want_right_space{$last_type};
8027 if ( !defined($wl) ) { $wl = 0 }
8028 if ( !defined($wr) ) { $wr = 0 }
8029 $ws = ( ( $wl == $wr ) || ( $wl == -1 ) || !$wr ) ? $wl : $wr;
8032 if ( !defined($ws) ) {
8035 "WS flag is undefined for tokens $last_token $token\n");
8038 # Treat newline as a whitespace. Otherwise, we might combine
8039 # 'Send' and '-recipients' here according to the above rules:
8040 # my $msg = new Fax::Send
8041 # -recipients => $to,
8043 if ( $ws == 0 && $j == 0 ) { $ws = 1 }
8048 && ( $last_type !~ /^[Zh]$/ ) )
8051 # If this happens, we have a non-fatal but undesirable
8052 # hole in the above rules which should be patched.
8054 "WS flag is zero for tokens $last_token $token\n");
8056 $white_space_flag[$j] = $ws;
8058 FORMATTER_DEBUG_FLAG_WHITE && do {
8059 my $str = substr( $last_token, 0, 15 );
8060 $str .= ' ' x ( 16 - length($str) );
8061 if ( !defined($ws_1) ) { $ws_1 = "*" }
8062 if ( !defined($ws_2) ) { $ws_2 = "*" }
8063 if ( !defined($ws_3) ) { $ws_3 = "*" }
8064 if ( !defined($ws_4) ) { $ws_4 = "*" }
8066 "WHITE: i=$j $str $last_type $type $ws_1 : $ws_2 : $ws_3 : $ws_4 : $ws \n";
8069 return \@white_space_flag;
8072 { # begin print_line_of_tokens
8079 my $rcontainer_type;
8080 my $rcontainer_environment;
8083 my $rnesting_tokens;
8085 my $rnesting_blocks;
8088 my $python_indentation_level;
8090 # These local token variables are stored by store_token_to_go:
8093 my $container_environment;
8095 my $in_continued_quote;
8098 my $no_internal_newlines;
8104 # routine to pull the jth token from the line of tokens
8107 $token = $$rtokens[$j];
8108 $type = $$rtoken_type[$j];
8109 $block_type = $$rblock_type[$j];
8110 $container_type = $$rcontainer_type[$j];
8111 $container_environment = $$rcontainer_environment[$j];
8112 $type_sequence = $$rtype_sequence[$j];
8113 $level = $$rlevels[$j];
8114 $slevel = $$rslevels[$j];
8115 $nesting_blocks = $$rnesting_blocks[$j];
8116 $ci_level = $$rci_levels[$j];
8122 sub save_current_token {
8125 $block_type, $ci_level,
8126 $container_environment, $container_type,
8127 $in_continued_quote, $level,
8128 $nesting_blocks, $no_internal_newlines,
8130 $type, $type_sequence,
8134 sub restore_current_token {
8136 $block_type, $ci_level,
8137 $container_environment, $container_type,
8138 $in_continued_quote, $level,
8139 $nesting_blocks, $no_internal_newlines,
8141 $type, $type_sequence,
8146 # Routine to place the current token into the output stream.
8147 # Called once per output token.
8148 sub store_token_to_go {
8150 my $flag = $no_internal_newlines;
8151 if ( $_[0] ) { $flag = 1 }
8153 $tokens_to_go[ ++$max_index_to_go ] = $token;
8154 $types_to_go[$max_index_to_go] = $type;
8155 $nobreak_to_go[$max_index_to_go] = $flag;
8156 $old_breakpoint_to_go[$max_index_to_go] = 0;
8157 $forced_breakpoint_to_go[$max_index_to_go] = 0;
8158 $block_type_to_go[$max_index_to_go] = $block_type;
8159 $type_sequence_to_go[$max_index_to_go] = $type_sequence;
8160 $container_environment_to_go[$max_index_to_go] = $container_environment;
8161 $nesting_blocks_to_go[$max_index_to_go] = $nesting_blocks;
8162 $ci_levels_to_go[$max_index_to_go] = $ci_level;
8163 $mate_index_to_go[$max_index_to_go] = -1;
8164 $matching_token_to_go[$max_index_to_go] = '';
8166 # Note: negative levels are currently retained as a diagnostic so that
8167 # the 'final indentation level' is correctly reported for bad scripts.
8168 # But this means that every use of $level as an index must be checked.
8169 # If this becomes too much of a problem, we might give up and just clip
8171 ## $levels_to_go[$max_index_to_go] = ( $level > 0 ) ? $level : 0;
8172 $levels_to_go[$max_index_to_go] = $level;
8173 $nesting_depth_to_go[$max_index_to_go] = ( $slevel >= 0 ) ? $slevel : 0;
8174 $lengths_to_go[ $max_index_to_go + 1 ] =
8175 $lengths_to_go[$max_index_to_go] + length($token);
8177 # Define the indentation that this token would have if it started
8178 # a new line. We have to do this now because we need to know this
8179 # when considering one-line blocks.
8180 set_leading_whitespace( $level, $ci_level, $in_continued_quote );
8182 if ( $type ne 'b' ) {
8183 $last_last_nonblank_index_to_go = $last_nonblank_index_to_go;
8184 $last_last_nonblank_type_to_go = $last_nonblank_type_to_go;
8185 $last_last_nonblank_token_to_go = $last_nonblank_token_to_go;
8186 $last_nonblank_index_to_go = $max_index_to_go;
8187 $last_nonblank_type_to_go = $type;
8188 $last_nonblank_token_to_go = $token;
8189 if ( $type eq ',' ) {
8190 $comma_count_in_batch++;
8194 FORMATTER_DEBUG_FLAG_STORE && do {
8195 my ( $a, $b, $c ) = caller();
8197 "STORE: from $a $c: storing token $token type $type lev=$level slev=$slevel at $max_index_to_go\n";
8201 sub insert_new_token_to_go {
8203 # insert a new token into the output stream. use same level as
8204 # previous token; assumes a character at max_index_to_go.
8205 save_current_token();
8206 ( $token, $type, $slevel, $no_internal_newlines ) = @_;
8208 if ( $max_index_to_go == UNDEFINED_INDEX ) {
8209 warning("code bug: bad call to insert_new_token_to_go\n");
8211 $level = $levels_to_go[$max_index_to_go];
8213 # FIXME: it seems to be necessary to use the next, rather than
8214 # previous, value of this variable when creating a new blank (align.t)
8215 #my $slevel = $nesting_depth_to_go[$max_index_to_go];
8216 $nesting_blocks = $nesting_blocks_to_go[$max_index_to_go];
8217 $ci_level = $ci_levels_to_go[$max_index_to_go];
8218 $container_environment = $container_environment_to_go[$max_index_to_go];
8219 $in_continued_quote = 0;
8221 $type_sequence = "";
8222 store_token_to_go();
8223 restore_current_token();
8227 sub print_line_of_tokens {
8229 my $line_of_tokens = shift;
8231 # This routine is called once per input line to process all of
8232 # the tokens on that line. This is the first stage of
8235 # Full-line comments and blank lines may be processed immediately.
8237 # For normal lines of code, the tokens are stored one-by-one,
8238 # via calls to 'sub store_token_to_go', until a known line break
8239 # point is reached. Then, the batch of collected tokens is
8240 # passed along to 'sub output_line_to_go' for further
8241 # processing. This routine decides if there should be
8242 # whitespace between each pair of non-white tokens, so later
8243 # routines only need to decide on any additional line breaks.
8244 # Any whitespace is initally a single space character. Later,
8245 # the vertical aligner may expand that to be multiple space
8246 # characters if necessary for alignment.
8248 # extract input line number for error messages
8249 $input_line_number = $line_of_tokens->{_line_number};
8251 $rtoken_type = $line_of_tokens->{_rtoken_type};
8252 $rtokens = $line_of_tokens->{_rtokens};
8253 $rlevels = $line_of_tokens->{_rlevels};
8254 $rslevels = $line_of_tokens->{_rslevels};
8255 $rblock_type = $line_of_tokens->{_rblock_type};
8256 $rcontainer_type = $line_of_tokens->{_rcontainer_type};
8257 $rcontainer_environment = $line_of_tokens->{_rcontainer_environment};
8258 $rtype_sequence = $line_of_tokens->{_rtype_sequence};
8259 $input_line = $line_of_tokens->{_line_text};
8260 $rnesting_tokens = $line_of_tokens->{_rnesting_tokens};
8261 $rci_levels = $line_of_tokens->{_rci_levels};
8262 $rnesting_blocks = $line_of_tokens->{_rnesting_blocks};
8264 $in_continued_quote = $starting_in_quote =
8265 $line_of_tokens->{_starting_in_quote};
8266 $in_quote = $line_of_tokens->{_ending_in_quote};
8267 $ending_in_quote = $in_quote;
8268 $python_indentation_level =
8269 $line_of_tokens->{_python_indentation_level};
8274 my $next_nonblank_token;
8275 my $next_nonblank_token_type;
8276 my $rwhite_space_flag;
8278 $jmax = @$rtokens - 1;
8280 $container_type = "";
8281 $container_environment = "";
8282 $type_sequence = "";
8283 $no_internal_newlines = 1 - $rOpts_add_newlines;
8284 $is_static_block_comment = 0;
8286 # Handle a continued quote..
8287 if ($in_continued_quote) {
8289 # A line which is entirely a quote or pattern must go out
8290 # verbatim. Note: the \n is contained in $input_line.
8292 if ( ( $input_line =~ "\t" ) ) {
8293 note_embedded_tab();
8295 write_unindented_line("$input_line");
8296 $last_line_had_side_comment = 0;
8300 # prior to version 20010406, perltidy had a bug which placed
8301 # continuation indentation before the last line of some multiline
8302 # quotes and patterns -- exactly the lines passing this way.
8303 # To help find affected lines in scripts run with these
8304 # versions, run with '-chk', and it will warn of any quotes or
8305 # patterns which might have been modified by these early
8307 if ( $rOpts->{'check-multiline-quotes'} && $input_line =~ /^ / ) {
8309 "-chk: please check this line for extra leading whitespace\n"
8314 # Write line verbatim if we are in a formatting skip section
8315 if ($in_format_skipping_section) {
8316 write_unindented_line("$input_line");
8317 $last_line_had_side_comment = 0;
8319 # Note: extra space appended to comment simplifies pattern matching
8321 && $$rtoken_type[0] eq '#'
8322 && ( $$rtokens[0] . " " ) =~ /$format_skipping_pattern_end/o )
8324 $in_format_skipping_section = 0;
8325 write_logfile_entry("Exiting formatting skip section\n");
8330 # See if we are entering a formatting skip section
8331 if ( $rOpts_format_skipping
8333 && $$rtoken_type[0] eq '#'
8334 && ( $$rtokens[0] . " " ) =~ /$format_skipping_pattern_begin/o )
8337 $in_format_skipping_section = 1;
8338 write_logfile_entry("Entering formatting skip section\n");
8339 write_unindented_line("$input_line");
8340 $last_line_had_side_comment = 0;
8344 # delete trailing blank tokens
8345 if ( $jmax > 0 && $$rtoken_type[$jmax] eq 'b' ) { $jmax-- }
8347 # Handle a blank line..
8350 # For the 'swallow-optional-blank-lines' option, we delete all
8351 # old blank lines and let the blank line rules generate any
8353 if ( !$rOpts_swallow_optional_blank_lines ) {
8355 $file_writer_object->write_blank_code_line();
8356 $last_line_leading_type = 'b';
8358 $last_line_had_side_comment = 0;
8362 # see if this is a static block comment (starts with ## by default)
8363 my $is_static_block_comment_without_leading_space = 0;
8365 && $$rtoken_type[0] eq '#'
8366 && $rOpts->{'static-block-comments'}
8367 && $input_line =~ /$static_block_comment_pattern/o )
8369 $is_static_block_comment = 1;
8370 $is_static_block_comment_without_leading_space =
8371 substr( $input_line, 0, 1 ) eq '#';
8374 # Check for comments which are line directives
8375 # Treat exactly as static block comments without leading space
8376 # reference: perlsyn, near end, section Plain Old Comments (Not!)
8377 # example: '# line 42 "new_filename.plx"'
8380 && $$rtoken_type[0] eq '#'
8381 && $input_line =~ /^\# \s*
8383 (?:\s("?)([^"]+)\2)? \s*
8387 $is_static_block_comment = 1;
8388 $is_static_block_comment_without_leading_space = 1;
8391 # create a hanging side comment if appropriate
8394 && $$rtoken_type[0] eq '#' # only token is a comment
8395 && $last_line_had_side_comment # last line had side comment
8396 && $input_line =~ /^\s/ # there is some leading space
8397 && !$is_static_block_comment # do not make static comment hanging
8398 && $rOpts->{'hanging-side-comments'} # user is allowing this
8402 # We will insert an empty qw string at the start of the token list
8403 # to force this comment to be a side comment. The vertical aligner
8404 # should then line it up with the previous side comment.
8405 unshift @$rtoken_type, 'q';
8406 unshift @$rtokens, '';
8407 unshift @$rlevels, $$rlevels[0];
8408 unshift @$rslevels, $$rslevels[0];
8409 unshift @$rblock_type, '';
8410 unshift @$rcontainer_type, '';
8411 unshift @$rcontainer_environment, '';
8412 unshift @$rtype_sequence, '';
8413 unshift @$rnesting_tokens, $$rnesting_tokens[0];
8414 unshift @$rci_levels, $$rci_levels[0];
8415 unshift @$rnesting_blocks, $$rnesting_blocks[0];
8419 # remember if this line has a side comment
8420 $last_line_had_side_comment =
8421 ( $jmax > 0 && $$rtoken_type[$jmax] eq '#' );
8423 # Handle a block (full-line) comment..
8424 if ( ( $jmax == 0 ) && ( $$rtoken_type[0] eq '#' ) ) {
8426 if ( $rOpts->{'delete-block-comments'} ) { return }
8428 if ( $rOpts->{'tee-block-comments'} ) {
8429 $file_writer_object->tee_on();
8432 destroy_one_line_block();
8433 output_line_to_go();
8435 # output a blank line before block comments
8437 $last_line_leading_type !~ /^[#b]$/
8438 && $rOpts->{'blanks-before-comments'} # only if allowed
8440 $is_static_block_comment # never before static block comments
8443 flush(); # switching to new output stream
8444 $file_writer_object->write_blank_code_line();
8445 $last_line_leading_type = 'b';
8448 # TRIM COMMENTS -- This could be turned off as a option
8449 $$rtokens[0] =~ s/\s*$//; # trim right end
8452 $rOpts->{'indent-block-comments'}
8453 && ( !$rOpts->{'indent-spaced-block-comments'}
8454 || $input_line =~ /^\s+/ )
8455 && !$is_static_block_comment_without_leading_space
8459 store_token_to_go();
8460 output_line_to_go();
8463 flush(); # switching to new output stream
8464 $file_writer_object->write_code_line( $$rtokens[0] . "\n" );
8465 $last_line_leading_type = '#';
8467 if ( $rOpts->{'tee-block-comments'} ) {
8468 $file_writer_object->tee_off();
8473 # compare input/output indentation except for continuation lines
8474 # (because they have an unknown amount of initial blank space)
8475 # and lines which are quotes (because they may have been outdented)
8476 # Note: this test is placed here because we know the continuation flag
8477 # at this point, which allows us to avoid non-meaningful checks.
8478 my $structural_indentation_level = $$rlevels[0];
8479 compare_indentation_levels( $python_indentation_level,
8480 $structural_indentation_level )
8481 unless ( $python_indentation_level < 0
8482 || ( $$rci_levels[0] > 0 )
8483 || ( ( $python_indentation_level == 0 ) && $$rtoken_type[0] eq 'Q' )
8486 # Patch needed for MakeMaker. Do not break a statement
8487 # in which $VERSION may be calculated. See MakeMaker.pm;
8488 # this is based on the coding in it.
8489 # The first line of a file that matches this will be eval'd:
8490 # /([\$*])(([\w\:\']*)\bVERSION)\b.*\=/
8492 # *VERSION = \'1.01';
8493 # ( $VERSION ) = '$Revision: 1.64 $ ' =~ /\$Revision:\s+([^\s]+)/;
8494 # We will pass such a line straight through without breaking
8495 # it unless -npvl is used
8497 my $is_VERSION_statement = 0;
8500 !$saw_VERSION_in_this_file
8501 && $input_line =~ /VERSION/ # quick check to reject most lines
8502 && $input_line =~ /([\$*])(([\w\:\']*)\bVERSION)\b.*\=/
8505 $saw_VERSION_in_this_file = 1;
8506 $is_VERSION_statement = 1;
8507 write_logfile_entry("passing VERSION line; -npvl deactivates\n");
8508 $no_internal_newlines = 1;
8511 # take care of indentation-only
8512 # NOTE: In previous versions we sent all qw lines out immediately here.
8513 # No longer doing this: also write a line which is entirely a 'qw' list
8514 # to allow stacking of opening and closing tokens. Note that interior
8515 # qw lines will still go out at the end of this routine.
8516 if ( $rOpts->{'indent-only'} ) {
8521 $token = $input_line;
8524 $container_type = "";
8525 $container_environment = "";
8526 $type_sequence = "";
8527 store_token_to_go();
8528 output_line_to_go();
8532 push( @$rtokens, ' ', ' ' ); # making $j+2 valid simplifies coding
8533 push( @$rtoken_type, 'b', 'b' );
8534 ($rwhite_space_flag) =
8535 set_white_space_flag( $jmax, $rtokens, $rtoken_type, $rblock_type );
8537 # find input tabbing to allow checks for tabbing disagreement
8539 ##$input_line_tabbing = "";
8540 ##if ( $input_line =~ /^(\s*)/ ) { $input_line_tabbing = $1; }
8542 # if the buffer hasn't been flushed, add a leading space if
8543 # necessary to keep essential whitespace. This is really only
8544 # necessary if we are squeezing out all ws.
8545 if ( $max_index_to_go >= 0 ) {
8547 $old_line_count_in_batch++;
8550 is_essential_whitespace(
8551 $last_last_nonblank_token,
8552 $last_last_nonblank_type,
8553 $tokens_to_go[$max_index_to_go],
8554 $types_to_go[$max_index_to_go],
8560 my $slevel = $$rslevels[0];
8561 insert_new_token_to_go( ' ', 'b', $slevel,
8562 $no_internal_newlines );
8566 # If we just saw the end of an elsif block, write nag message
8567 # if we do not see another elseif or an else.
8568 if ($looking_for_else) {
8570 unless ( $$rtokens[0] =~ /^(elsif|else)$/ ) {
8571 write_logfile_entry("(No else block)\n");
8573 $looking_for_else = 0;
8576 # This is a good place to kill incomplete one-line blocks
8577 if ( ( $semicolons_before_block_self_destruct == 0 )
8578 && ( $max_index_to_go >= 0 )
8579 && ( $types_to_go[$max_index_to_go] eq ';' )
8580 && ( $$rtokens[0] ne '}' ) )
8582 destroy_one_line_block();
8583 output_line_to_go();
8586 # loop to process the tokens one-by-one
8590 foreach $j ( 0 .. $jmax ) {
8592 # pull out the local values for this token
8595 if ( $type eq '#' ) {
8597 # trim trailing whitespace
8598 # (there is no option at present to prevent this)
8602 $rOpts->{'delete-side-comments'}
8604 # delete closing side comments if necessary
8605 || ( $rOpts->{'delete-closing-side-comments'}
8606 && $token =~ /$closing_side_comment_prefix_pattern/o
8607 && $last_nonblank_block_type =~
8608 /$closing_side_comment_list_pattern/o )
8611 if ( $types_to_go[$max_index_to_go] eq 'b' ) {
8612 unstore_token_to_go();
8618 # If we are continuing after seeing a right curly brace, flush
8619 # buffer unless we see what we are looking for, as in
8621 if ( $rbrace_follower && $type ne 'b' ) {
8623 unless ( $rbrace_follower->{$token} ) {
8624 output_line_to_go();
8626 $rbrace_follower = undef;
8629 $j_next = ( $$rtoken_type[ $j + 1 ] eq 'b' ) ? $j + 2 : $j + 1;
8630 $next_nonblank_token = $$rtokens[$j_next];
8631 $next_nonblank_token_type = $$rtoken_type[$j_next];
8633 #--------------------------------------------------------
8634 # Start of section to patch token text
8635 #--------------------------------------------------------
8637 # Modify certain tokens here for whitespace
8638 # The following is not yet done, but could be:
8640 if ( $type =~ /^[wit]$/ ) {
8643 # change '$ var' to '$var' etc
8644 # '-> new' to '->new'
8645 if ( $token =~ /^([\$\&\%\*\@]|\-\>)\s/ ) {
8649 if ( $token =~ /^sub/ ) { $token =~ s/\s+/ /g }
8652 # change 'LABEL :' to 'LABEL:'
8653 elsif ( $type eq 'J' ) { $token =~ s/\s+//g }
8655 # patch to add space to something like "x10"
8656 # This avoids having to split this token in the pre-tokenizer
8657 elsif ( $type eq 'n' ) {
8658 if ( $token =~ /^x\d+/ ) { $token =~ s/x/x / }
8661 elsif ( $type eq 'Q' ) {
8662 note_embedded_tab() if ( $token =~ "\t" );
8664 # make note of something like '$var = s/xxx/yyy/;'
8665 # in case it should have been '$var =~ s/xxx/yyy/;'
8667 $token =~ /^(s|tr|y|m|\/)/
8668 && $last_nonblank_token =~ /^(=|==|!=)$/
8670 # precededed by simple scalar
8671 && $last_last_nonblank_type eq 'i'
8672 && $last_last_nonblank_token =~ /^\$/
8674 # followed by some kind of termination
8675 # (but give complaint if we can's see far enough ahead)
8676 && $next_nonblank_token =~ /^[; \)\}]$/
8678 # scalar is not decleared
8680 $types_to_go[0] eq 'k'
8681 && $tokens_to_go[0] =~ /^(my|our|local)$/
8685 my $guess = substr( $last_nonblank_token, 0, 1 ) . '~';
8687 "Note: be sure you want '$last_nonblank_token' instead of '$guess' here\n"
8692 # trim blanks from right of qw quotes
8693 # (To avoid trimming qw quotes use -ntqw; the tokenizer handles this)
8694 elsif ( $type eq 'q' ) {
8696 note_embedded_tab() if ( $token =~ "\t" );
8699 #--------------------------------------------------------
8700 # End of section to patch token text
8701 #--------------------------------------------------------
8703 # insert any needed whitespace
8704 if ( ( $type ne 'b' )
8705 && ( $max_index_to_go >= 0 )
8706 && ( $types_to_go[$max_index_to_go] ne 'b' )
8707 && $rOpts_add_whitespace )
8709 my $ws = $$rwhite_space_flag[$j];
8712 insert_new_token_to_go( ' ', 'b', $slevel,
8713 $no_internal_newlines );
8717 # Do not allow breaks which would promote a side comment to a
8718 # block comment. In order to allow a break before an opening
8719 # or closing BLOCK, followed by a side comment, those sections
8720 # of code will handle this flag separately.
8721 my $side_comment_follows = ( $next_nonblank_token_type eq '#' );
8722 my $is_opening_BLOCK =
8726 && $block_type ne 't' );
8727 my $is_closing_BLOCK =
8731 && $block_type ne 't' );
8733 if ( $side_comment_follows
8734 && !$is_opening_BLOCK
8735 && !$is_closing_BLOCK )
8737 $no_internal_newlines = 1;
8740 # We're only going to handle breaking for code BLOCKS at this
8741 # (top) level. Other indentation breaks will be handled by
8742 # sub scan_list, which is better suited to dealing with them.
8743 if ($is_opening_BLOCK) {
8745 # Tentatively output this token. This is required before
8746 # calling starting_one_line_block. We may have to unstore
8747 # it, though, if we have to break before it.
8748 store_token_to_go($side_comment_follows);
8750 # Look ahead to see if we might form a one-line block
8752 starting_one_line_block( $j, $jmax, $level, $slevel,
8753 $ci_level, $rtokens, $rtoken_type, $rblock_type );
8754 clear_breakpoint_undo_stack();
8756 # to simplify the logic below, set a flag to indicate if
8757 # this opening brace is far from the keyword which introduces it
8758 my $keyword_on_same_line = 1;
8759 if ( ( $max_index_to_go >= 0 )
8760 && ( $last_nonblank_type eq ')' ) )
8762 if ( $block_type =~ /^(if|else|elsif)$/
8763 && ( $tokens_to_go[0] eq '}' )
8764 && $rOpts_cuddled_else )
8766 $keyword_on_same_line = 1;
8768 elsif ( ( $slevel < $nesting_depth_to_go[0] ) || $too_long )
8770 $keyword_on_same_line = 0;
8774 # decide if user requested break before '{'
8777 # use -bl flag if not a sub block of any type
8778 $block_type !~ /^sub/
8779 ? $rOpts->{'opening-brace-on-new-line'}
8781 # use -sbl flag unless this is an anonymous sub block
8782 : $block_type !~ /^sub\W*$/
8783 ? $rOpts->{'opening-sub-brace-on-new-line'}
8785 # do not break for anonymous subs
8788 # Break before an opening '{' ...
8794 # and we were unable to start looking for a block,
8795 && $index_start_one_line_block == UNDEFINED_INDEX
8797 # or if it will not be on same line as its keyword, so that
8798 # it will be outdented (eval.t, overload.t), and the user
8799 # has not insisted on keeping it on the right
8800 || ( !$keyword_on_same_line
8801 && !$rOpts->{'opening-brace-always-on-right'} )
8806 # but only if allowed
8807 unless ($no_internal_newlines) {
8809 # since we already stored this token, we must unstore it
8810 unstore_token_to_go();
8812 # then output the line
8813 output_line_to_go();
8815 # and now store this token at the start of a new line
8816 store_token_to_go($side_comment_follows);
8820 # Now update for side comment
8821 if ($side_comment_follows) { $no_internal_newlines = 1 }
8823 # now output this line
8824 unless ($no_internal_newlines) {
8825 output_line_to_go();
8829 elsif ($is_closing_BLOCK) {
8831 # If there is a pending one-line block ..
8832 if ( $index_start_one_line_block != UNDEFINED_INDEX ) {
8834 # we have to terminate it if..
8837 # it is too long (final length may be different from
8838 # initial estimate). note: must allow 1 space for this token
8839 excess_line_length( $index_start_one_line_block,
8840 $max_index_to_go ) >= 0
8842 # or if it has too many semicolons
8843 || ( $semicolons_before_block_self_destruct == 0
8844 && $last_nonblank_type ne ';' )
8847 destroy_one_line_block();
8851 # put a break before this closing curly brace if appropriate
8852 unless ( $no_internal_newlines
8853 || $index_start_one_line_block != UNDEFINED_INDEX )
8856 # add missing semicolon if ...
8857 # there are some tokens
8859 ( $max_index_to_go > 0 )
8861 # and we don't have one
8862 && ( $last_nonblank_type ne ';' )
8864 # patch until some block type issues are fixed:
8865 # Do not add semi-colon for block types '{',
8866 # '}', and ';' because we cannot be sure yet
8867 # that this is a block and not an anonomyous
8868 # hash (blktype.t, blktype1.t)
8869 && ( $block_type !~ /^[\{\};]$/ )
8871 # it seems best not to add semicolons in these
8872 # special block types: sort|map|grep
8873 && ( !$is_sort_map_grep{$block_type} )
8875 # and we are allowed to do so.
8876 && $rOpts->{'add-semicolons'}
8880 save_current_token();
8883 $level = $levels_to_go[$max_index_to_go];
8884 $slevel = $nesting_depth_to_go[$max_index_to_go];
8886 $nesting_blocks_to_go[$max_index_to_go];
8887 $ci_level = $ci_levels_to_go[$max_index_to_go];
8889 $container_type = "";
8890 $container_environment = "";
8891 $type_sequence = "";
8893 # Note - we remove any blank AFTER extracting its
8894 # parameters such as level, etc, above
8895 if ( $types_to_go[$max_index_to_go] eq 'b' ) {
8896 unstore_token_to_go();
8898 store_token_to_go();
8900 note_added_semicolon();
8901 restore_current_token();
8904 # then write out everything before this closing curly brace
8905 output_line_to_go();
8909 # Now update for side comment
8910 if ($side_comment_follows) { $no_internal_newlines = 1 }
8912 # store the closing curly brace
8913 store_token_to_go();
8915 # ok, we just stored a closing curly brace. Often, but
8916 # not always, we want to end the line immediately.
8917 # So now we have to check for special cases.
8919 # if this '}' successfully ends a one-line block..
8920 my $is_one_line_block = 0;
8922 if ( $index_start_one_line_block != UNDEFINED_INDEX ) {
8924 # Remember the type of token just before the
8925 # opening brace. It would be more general to use
8926 # a stack, but this will work for one-line blocks.
8927 $is_one_line_block =
8928 $types_to_go[$index_start_one_line_block];
8930 # we have to actually make it by removing tentative
8931 # breaks that were set within it
8932 undo_forced_breakpoint_stack(0);
8933 set_nobreaks( $index_start_one_line_block,
8934 $max_index_to_go - 1 );
8936 # then re-initialize for the next one-line block
8937 destroy_one_line_block();
8939 # then decide if we want to break after the '}' ..
8940 # We will keep going to allow certain brace followers as in:
8941 # do { $ifclosed = 1; last } unless $losing;
8943 # But make a line break if the curly ends a
8944 # significant block:
8946 $is_block_without_semicolon{$block_type}
8948 # if needless semicolon follows we handle it later
8949 && $next_nonblank_token ne ';'
8952 output_line_to_go() unless ($no_internal_newlines);
8956 # set string indicating what we need to look for brace follower
8958 if ( $block_type eq 'do' ) {
8959 $rbrace_follower = \%is_do_follower;
8961 elsif ( $block_type =~ /^(if|elsif|unless)$/ ) {
8962 $rbrace_follower = \%is_if_brace_follower;
8964 elsif ( $block_type eq 'else' ) {
8965 $rbrace_follower = \%is_else_brace_follower;
8968 # added eval for borris.t
8969 elsif ($is_sort_map_grep_eval{$block_type}
8970 || $is_one_line_block eq 'G' )
8972 $rbrace_follower = undef;
8977 elsif ( $block_type =~ /^sub\W*$/ ) {
8979 if ($is_one_line_block) {
8980 $rbrace_follower = \%is_anon_sub_1_brace_follower;
8983 $rbrace_follower = \%is_anon_sub_brace_follower;
8987 # None of the above: specify what can follow a closing
8988 # brace of a block which is not an
8989 # if/elsif/else/do/sort/map/grep/eval
8991 # 'Toolbar.pm', 'Menubar.pm', bless.t, '3rules.pl', 'break1.t
8993 $rbrace_follower = \%is_other_brace_follower;
8996 # See if an elsif block is followed by another elsif or else;
8998 if ( $block_type eq 'elsif' ) {
9000 if ( $next_nonblank_token_type eq 'b' ) { # end of line?
9001 $looking_for_else = 1; # ok, check on next line
9005 unless ( $next_nonblank_token =~ /^(elsif|else)$/ ) {
9006 write_logfile_entry("No else block :(\n");
9011 # keep going after certain block types (map,sort,grep,eval)
9012 # added eval for borris.t
9018 # if no more tokens, postpone decision until re-entring
9019 elsif ( ( $next_nonblank_token_type eq 'b' )
9020 && $rOpts_add_newlines )
9022 unless ($rbrace_follower) {
9023 output_line_to_go() unless ($no_internal_newlines);
9027 elsif ($rbrace_follower) {
9029 unless ( $rbrace_follower->{$next_nonblank_token} ) {
9030 output_line_to_go() unless ($no_internal_newlines);
9032 $rbrace_follower = undef;
9036 output_line_to_go() unless ($no_internal_newlines);
9039 } # end treatment of closing block token
9042 elsif ( $type eq ';' ) {
9044 # kill one-line blocks with too many semicolons
9045 $semicolons_before_block_self_destruct--;
9047 ( $semicolons_before_block_self_destruct < 0 )
9048 || ( $semicolons_before_block_self_destruct == 0
9049 && $next_nonblank_token_type !~ /^[b\}]$/ )
9052 destroy_one_line_block();
9055 # Remove unnecessary semicolons, but not after bare
9056 # blocks, where it could be unsafe if the brace is
9060 $last_nonblank_token eq '}'
9062 $is_block_without_semicolon{
9063 $last_nonblank_block_type}
9064 || $last_nonblank_block_type =~ /^sub\s+\w/
9065 || $last_nonblank_block_type =~ /^\w+:$/ )
9067 || $last_nonblank_type eq ';'
9072 $rOpts->{'delete-semicolons'}
9074 # don't delete ; before a # because it would promote it
9075 # to a block comment
9076 && ( $next_nonblank_token_type ne '#' )
9079 note_deleted_semicolon();
9081 unless ( $no_internal_newlines
9082 || $index_start_one_line_block != UNDEFINED_INDEX );
9086 write_logfile_entry("Extra ';'\n");
9089 store_token_to_go();
9092 unless ( $no_internal_newlines
9093 || ( $next_nonblank_token eq '}' ) );
9097 # handle here_doc target string
9098 elsif ( $type eq 'h' ) {
9099 $no_internal_newlines =
9100 1; # no newlines after seeing here-target
9101 destroy_one_line_block();
9102 store_token_to_go();
9105 # handle all other token types
9108 # if this is a blank...
9109 if ( $type eq 'b' ) {
9111 # make it just one character
9112 $token = ' ' if $rOpts_add_whitespace;
9114 # delete it if unwanted by whitespace rules
9115 # or we are deleting all whitespace
9116 my $ws = $$rwhite_space_flag[ $j + 1 ];
9117 if ( ( defined($ws) && $ws == -1 )
9118 || $rOpts_delete_old_whitespace )
9121 # unless it might make a syntax error
9123 unless is_essential_whitespace(
9124 $last_last_nonblank_token,
9125 $last_last_nonblank_type,
9126 $tokens_to_go[$max_index_to_go],
9127 $types_to_go[$max_index_to_go],
9128 $$rtokens[ $j + 1 ],
9129 $$rtoken_type[ $j + 1 ]
9133 store_token_to_go();
9136 # remember two previous nonblank OUTPUT tokens
9137 if ( $type ne '#' && $type ne 'b' ) {
9138 $last_last_nonblank_token = $last_nonblank_token;
9139 $last_last_nonblank_type = $last_nonblank_type;
9140 $last_nonblank_token = $token;
9141 $last_nonblank_type = $type;
9142 $last_nonblank_block_type = $block_type;
9145 # unset the continued-quote flag since it only applies to the
9146 # first token, and we want to resume normal formatting if
9147 # there are additional tokens on the line
9148 $in_continued_quote = 0;
9150 } # end of loop over all tokens in this 'line_of_tokens'
9152 # we have to flush ..
9155 # if there is a side comment
9156 ( ( $type eq '#' ) && !$rOpts->{'delete-side-comments'} )
9158 # if this line ends in a quote
9159 # NOTE: This is critically important for insuring that quoted lines
9160 # do not get processed by things like -sot and -sct
9163 # if this is a VERSION statement
9164 || $is_VERSION_statement
9166 # to keep a label on one line if that is how it is now
9167 || ( ( $type eq 'J' ) && ( $max_index_to_go == 0 ) )
9169 # if we are instructed to keep all old line breaks
9170 || !$rOpts->{'delete-old-newlines'}
9173 destroy_one_line_block();
9174 output_line_to_go();
9177 # mark old line breakpoints in current output stream
9178 if ( $max_index_to_go >= 0 && !$rOpts_ignore_old_breakpoints ) {
9179 $old_breakpoint_to_go[$max_index_to_go] = 1;
9181 } # end sub print_line_of_tokens
9182 } # end print_line_of_tokens
9184 # sub output_line_to_go sends one logical line of tokens on down the
9185 # pipeline to the VerticalAligner package, breaking the line into continuation
9186 # lines as necessary. The line of tokens is ready to go in the "to_go"
9188 sub output_line_to_go {
9190 # debug stuff; this routine can be called from many points
9191 FORMATTER_DEBUG_FLAG_OUTPUT && do {
9192 my ( $a, $b, $c ) = caller;
9194 "OUTPUT: output_line_to_go called: $a $c $last_nonblank_type $last_nonblank_token, one_line=$index_start_one_line_block, tokens to write=$max_index_to_go\n"
9196 my $output_str = join "", @tokens_to_go[ 0 .. $max_index_to_go ];
9197 write_diagnostics("$output_str\n");
9200 # just set a tentative breakpoint if we might be in a one-line block
9201 if ( $index_start_one_line_block != UNDEFINED_INDEX ) {
9202 set_forced_breakpoint($max_index_to_go);
9206 my $cscw_block_comment;
9207 $cscw_block_comment = add_closing_side_comment()
9208 if ( $rOpts->{'closing-side-comments'} && $max_index_to_go >= 0 );
9210 match_opening_and_closing_tokens();
9212 # tell the -lp option we are outputting a batch so it can close
9213 # any unfinished items in its stack
9216 # If this line ends in a code block brace, set breaks at any
9217 # previous closing code block braces to breakup a chain of code
9218 # blocks on one line. This is very rare but can happen for
9219 # user-defined subs. For example we might be looking at this:
9220 # BOOL { $server_data{uptime} > 0; } NUM { $server_data{load}; } STR {
9221 my $saw_good_break = 0; # flag to force breaks even if short line
9224 # looking for opening or closing block brace
9225 $block_type_to_go[$max_index_to_go]
9227 # but not one of these which are never duplicated on a line:
9228 # until|while|for|if|elsif|else
9229 && !$is_block_without_semicolon{ $block_type_to_go[$max_index_to_go] }
9232 my $lev = $nesting_depth_to_go[$max_index_to_go];
9234 # Walk backwards from the end and
9235 # set break at any closing block braces at the same level.
9236 # But quit if we are not in a chain of blocks.
9237 for ( my $i = $max_index_to_go - 1 ; $i >= 0 ; $i-- ) {
9238 last if ( $levels_to_go[$i] < $lev ); # stop at a lower level
9239 next if ( $levels_to_go[$i] > $lev ); # skip past higher level
9241 if ( $block_type_to_go[$i] ) {
9242 if ( $tokens_to_go[$i] eq '}' ) {
9243 set_forced_breakpoint($i);
9244 $saw_good_break = 1;
9248 # quit if we see anything besides words, function, blanks
9250 elsif ( $types_to_go[$i] !~ /^[\(\)Gwib]$/ ) { last }
9255 my $imax = $max_index_to_go;
9257 # trim any blank tokens
9258 if ( $max_index_to_go >= 0 ) {
9259 if ( $types_to_go[$imin] eq 'b' ) { $imin++ }
9260 if ( $types_to_go[$imax] eq 'b' ) { $imax-- }
9263 # anything left to write?
9264 if ( $imin <= $imax ) {
9266 # add a blank line before certain key types
9267 if ( $last_line_leading_type !~ /^[#b]/ ) {
9269 my $leading_token = $tokens_to_go[$imin];
9270 my $leading_type = $types_to_go[$imin];
9272 # blank lines before subs except declarations and one-liners
9273 # MCONVERSION LOCATION - for sub tokenization change
9274 if ( $leading_token =~ /^(sub\s)/ && $leading_type eq 'i' ) {
9275 $want_blank = ( $rOpts->{'blanks-before-subs'} )
9277 terminal_type( \@types_to_go, \@block_type_to_go, $imin,
9278 $imax ) !~ /^[\;\}]$/
9282 # break before all package declarations
9283 # MCONVERSION LOCATION - for tokenizaton change
9284 elsif ($leading_token =~ /^(package\s)/
9285 && $leading_type eq 'i' )
9287 $want_blank = ( $rOpts->{'blanks-before-subs'} );
9290 # break before certain key blocks except one-liners
9291 if ( $leading_token =~ /^(BEGIN|END)$/ && $leading_type eq 'k' ) {
9292 $want_blank = ( $rOpts->{'blanks-before-subs'} )
9294 terminal_type( \@types_to_go, \@block_type_to_go, $imin,
9299 # Break before certain block types if we haven't had a
9300 # break at this level for a while. This is the
9301 # difficult decision..
9302 elsif ($leading_token =~ /^(unless|if|while|until|for|foreach)$/
9303 && $leading_type eq 'k' )
9305 my $lc = $nonblank_lines_at_depth[$last_line_leading_level];
9306 if ( !defined($lc) ) { $lc = 0 }
9308 $want_blank = $rOpts->{'blanks-before-blocks'}
9309 && $lc >= $rOpts->{'long-block-line-count'}
9310 && $file_writer_object->get_consecutive_nonblank_lines() >=
9311 $rOpts->{'long-block-line-count'}
9313 terminal_type( \@types_to_go, \@block_type_to_go, $imin,
9320 # future: send blank line down normal path to VerticalAligner
9321 Perl::Tidy::VerticalAligner::flush();
9322 $file_writer_object->write_blank_code_line();
9326 # update blank line variables and count number of consecutive
9327 # non-blank, non-comment lines at this level
9328 $last_last_line_leading_level = $last_line_leading_level;
9329 $last_line_leading_level = $levels_to_go[$imin];
9330 if ( $last_line_leading_level < 0 ) { $last_line_leading_level = 0 }
9331 $last_line_leading_type = $types_to_go[$imin];
9332 if ( $last_line_leading_level == $last_last_line_leading_level
9333 && $last_line_leading_type ne 'b'
9334 && $last_line_leading_type ne '#'
9335 && defined( $nonblank_lines_at_depth[$last_line_leading_level] ) )
9337 $nonblank_lines_at_depth[$last_line_leading_level]++;
9340 $nonblank_lines_at_depth[$last_line_leading_level] = 1;
9343 FORMATTER_DEBUG_FLAG_FLUSH && do {
9344 my ( $package, $file, $line ) = caller;
9346 "FLUSH: flushing from $package $file $line, types= $types_to_go[$imin] to $types_to_go[$imax]\n";
9349 # add a couple of extra terminal blank tokens
9352 # set all forced breakpoints for good list formatting
9353 my $is_long_line = excess_line_length( $imin, $max_index_to_go ) > 0;
9356 $max_index_to_go > 0
9359 || $old_line_count_in_batch > 1
9360 || is_unbalanced_batch()
9362 $comma_count_in_batch
9363 && ( $rOpts_maximum_fields_per_table > 0
9364 || $rOpts_comma_arrow_breakpoints == 0 )
9369 $saw_good_break ||= scan_list();
9372 # let $ri_first and $ri_last be references to lists of
9373 # first and last tokens of line fragments to output..
9374 my ( $ri_first, $ri_last );
9376 # write a single line if..
9379 # we aren't allowed to add any newlines
9380 !$rOpts_add_newlines
9382 # or, we don't already have an interior breakpoint
9383 # and we didn't see a good breakpoint
9385 !$forced_breakpoint_count
9388 # and this line is 'short'
9393 @$ri_first = ($imin);
9394 @$ri_last = ($imax);
9397 # otherwise use multiple lines
9400 ( $ri_first, $ri_last ) = set_continuation_breaks($saw_good_break);
9402 break_all_chain_tokens( $ri_first, $ri_last );
9404 # now we do a correction step to clean this up a bit
9405 # (The only time we would not do this is for debugging)
9406 if ( $rOpts->{'recombine'} ) {
9407 ( $ri_first, $ri_last ) =
9408 recombine_breakpoints( $ri_first, $ri_last );
9412 # do corrector step if -lp option is used
9414 if ($rOpts_line_up_parentheses) {
9415 $do_not_pad = correct_lp_indentation( $ri_first, $ri_last );
9417 send_lines_to_vertical_aligner( $ri_first, $ri_last, $do_not_pad );
9419 prepare_for_new_input_lines();
9421 # output any new -cscw block comment
9422 if ($cscw_block_comment) {
9424 $file_writer_object->write_code_line( $cscw_block_comment . "\n" );
9428 sub note_added_semicolon {
9429 $last_added_semicolon_at = $input_line_number;
9430 if ( $added_semicolon_count == 0 ) {
9431 $first_added_semicolon_at = $last_added_semicolon_at;
9433 $added_semicolon_count++;
9434 write_logfile_entry("Added ';' here\n");
9437 sub note_deleted_semicolon {
9438 $last_deleted_semicolon_at = $input_line_number;
9439 if ( $deleted_semicolon_count == 0 ) {
9440 $first_deleted_semicolon_at = $last_deleted_semicolon_at;
9442 $deleted_semicolon_count++;
9443 write_logfile_entry("Deleted unnecessary ';'\n"); # i hope ;)
9446 sub note_embedded_tab {
9447 $embedded_tab_count++;
9448 $last_embedded_tab_at = $input_line_number;
9449 if ( !$first_embedded_tab_at ) {
9450 $first_embedded_tab_at = $last_embedded_tab_at;
9453 if ( $embedded_tab_count <= MAX_NAG_MESSAGES ) {
9454 write_logfile_entry("Embedded tabs in quote or pattern\n");
9458 sub starting_one_line_block {
9460 # after seeing an opening curly brace, look for the closing brace
9461 # and see if the entire block will fit on a line. This routine is
9462 # not always right because it uses the old whitespace, so a check
9463 # is made later (at the closing brace) to make sure we really
9464 # have a one-line block. We have to do this preliminary check,
9465 # though, because otherwise we would always break at a semicolon
9466 # within a one-line block if the block contains multiple statements.
9468 my ( $j, $jmax, $level, $slevel, $ci_level, $rtokens, $rtoken_type,
9472 # kill any current block - we can only go 1 deep
9473 destroy_one_line_block();
9476 # 1=distance from start of block to opening brace exceeds line length
9481 # shouldn't happen: there must have been a prior call to
9482 # store_token_to_go to put the opening brace in the output stream
9483 if ( $max_index_to_go < 0 ) {
9484 warning("program bug: store_token_to_go called incorrectly\n");
9485 report_definite_bug();
9489 # cannot use one-line blocks with cuddled else else/elsif lines
9490 if ( ( $tokens_to_go[0] eq '}' ) && $rOpts_cuddled_else ) {
9495 my $block_type = $$rblock_type[$j];
9497 # find the starting keyword for this block (such as 'if', 'else', ...)
9499 if ( $block_type =~ /^[\{\}\;\:]$/ ) {
9500 $i_start = $max_index_to_go;
9503 elsif ( $last_last_nonblank_token_to_go eq ')' ) {
9505 # For something like "if (xxx) {", the keyword "if" will be
9506 # just after the most recent break. This will be 0 unless
9507 # we have just killed a one-line block and are starting another.
9509 $i_start = $index_max_forced_break + 1;
9510 if ( $types_to_go[$i_start] eq 'b' ) {
9514 unless ( $tokens_to_go[$i_start] eq $block_type ) {
9519 # the previous nonblank token should start these block types
9521 ( $last_last_nonblank_token_to_go eq $block_type )
9522 || ( $block_type =~ /^sub/
9523 && $last_last_nonblank_token_to_go =~ /^sub/ )
9526 $i_start = $last_last_nonblank_index_to_go;
9529 # patch for SWITCH/CASE to retain one-line case/when blocks
9530 elsif ( $block_type eq 'case' || $block_type eq 'when' ) {
9531 $i_start = $index_max_forced_break + 1;
9532 if ( $types_to_go[$i_start] eq 'b' ) {
9535 unless ( $tokens_to_go[$i_start] eq $block_type ) {
9544 my $pos = total_line_length( $i_start, $max_index_to_go ) - 1;
9548 # see if length is too long to even start
9549 if ( $pos > $rOpts_maximum_line_length ) {
9553 for ( $i = $j + 1 ; $i <= $jmax ; $i++ ) {
9555 # old whitespace could be arbitrarily large, so don't use it
9556 if ( $$rtoken_type[$i] eq 'b' ) { $pos += 1 }
9557 else { $pos += length( $$rtokens[$i] ) }
9559 # Return false result if we exceed the maximum line length,
9560 if ( $pos > $rOpts_maximum_line_length ) {
9564 # or encounter another opening brace before finding the closing brace.
9565 elsif ($$rtokens[$i] eq '{'
9566 && $$rtoken_type[$i] eq '{'
9567 && $$rblock_type[$i] )
9572 # if we find our closing brace..
9573 elsif ($$rtokens[$i] eq '}'
9574 && $$rtoken_type[$i] eq '}'
9575 && $$rblock_type[$i] )
9578 # be sure any trailing comment also fits on the line
9580 ( $$rtoken_type[ $i + 1 ] eq 'b' ) ? $i + 2 : $i + 1;
9582 if ( $$rtoken_type[$i_nonblank] eq '#' ) {
9583 $pos += length( $$rtokens[$i_nonblank] );
9585 if ( $i_nonblank > $i + 1 ) {
9586 $pos += length( $$rtokens[ $i + 1 ] );
9589 if ( $pos > $rOpts_maximum_line_length ) {
9594 # ok, it's a one-line block
9595 create_one_line_block( $i_start, 20 );
9599 # just keep going for other characters
9604 # Allow certain types of new one-line blocks to form by joining
9605 # input lines. These can be safely done, but for other block types,
9606 # we keep old one-line blocks but do not form new ones. It is not
9607 # always a good idea to make as many one-line blocks as possible,
9608 # so other types are not done. The user can always use -mangle.
9609 if ( $is_sort_map_grep_eval{$block_type} ) {
9610 create_one_line_block( $i_start, 1 );
9616 sub unstore_token_to_go {
9618 # remove most recent token from output stream
9619 if ( $max_index_to_go > 0 ) {
9623 $max_index_to_go = UNDEFINED_INDEX;
9628 sub want_blank_line {
9630 $file_writer_object->want_blank_line();
9633 sub write_unindented_line {
9635 $file_writer_object->write_line( $_[0] );
9640 # If there is a single, long parameter within parens, like this:
9642 # $self->command( "/msg "
9644 # . " You said $1, but did you know that it's square was "
9645 # . $1 * $1 . " ?" );
9647 # we can remove the continuation indentation of the 2nd and higher lines
9648 # to achieve this effect, which is more pleasing:
9650 # $self->command("/msg "
9652 # . " You said $1, but did you know that it's square was "
9653 # . $1 * $1 . " ?");
9655 my ( $line_open, $i_start, $closing_index, $ri_first, $ri_last ) = @_;
9656 my $max_line = @$ri_first - 1;
9658 # must be multiple lines
9659 return unless $max_line > $line_open;
9661 my $lev_start = $levels_to_go[$i_start];
9662 my $ci_start_plus = 1 + $ci_levels_to_go[$i_start];
9664 # see if all additional lines in this container have continuation
9667 my $line_1 = 1 + $line_open;
9668 for ( $n = $line_1 ; $n <= $max_line ; ++$n ) {
9669 my $ibeg = $$ri_first[$n];
9670 my $iend = $$ri_last[$n];
9671 if ( $ibeg eq $closing_index ) { $n--; last }
9672 return if ( $lev_start != $levels_to_go[$ibeg] );
9673 return if ( $ci_start_plus != $ci_levels_to_go[$ibeg] );
9674 last if ( $closing_index <= $iend );
9677 # we can reduce the indentation of all continuation lines
9678 my $continuation_line_count = $n - $line_open;
9679 @ci_levels_to_go[ @$ri_first[ $line_1 .. $n ] ] =
9680 (0) x ($continuation_line_count);
9681 @leading_spaces_to_go[ @$ri_first[ $line_1 .. $n ] ] =
9682 @reduced_spaces_to_go[ @$ri_first[ $line_1 .. $n ] ];
9685 sub set_logical_padding {
9687 # Look at a batch of lines and see if extra padding can improve the
9688 # alignment when there are certain leading operators. Here is an
9689 # example, in which some extra space is introduced before
9690 # '( $year' to make it line up with the subsequent lines:
9692 # if ( ( $Year < 1601 )
9693 # || ( $Year > 2899 )
9694 # || ( $EndYear < 1601 )
9695 # || ( $EndYear > 2899 ) )
9697 # &Error_OutOfRange;
9700 my ( $ri_first, $ri_last ) = @_;
9701 my $max_line = @$ri_first - 1;
9703 my ( $ibeg, $ibeg_next, $ibegm, $iend, $iendm, $ipad, $line, $pad_spaces,
9704 $tok_next, $has_leading_op_next, $has_leading_op );
9706 # looking at each line of this batch..
9707 foreach $line ( 0 .. $max_line - 1 ) {
9709 # see if the next line begins with a logical operator
9710 $ibeg = $$ri_first[$line];
9711 $iend = $$ri_last[$line];
9712 $ibeg_next = $$ri_first[ $line + 1 ];
9713 $tok_next = $tokens_to_go[$ibeg_next];
9714 $has_leading_op_next = $is_chain_operator{$tok_next};
9715 next unless ($has_leading_op_next);
9717 # next line must not be at lesser depth
9719 if ( $nesting_depth_to_go[$ibeg] > $nesting_depth_to_go[$ibeg_next] );
9721 # identify the token in this line to be padded on the left
9724 # handle lines at same depth...
9725 if ( $nesting_depth_to_go[$ibeg] == $nesting_depth_to_go[$ibeg_next] ) {
9727 # if this is not first line of the batch ...
9730 # and we have leading operator
9731 next if $has_leading_op;
9734 # 1. the previous line is at lesser depth, or
9735 # 2. the previous line ends in an assignment
9737 # Example 1: previous line at lesser depth
9738 # if ( ( $Year < 1601 ) # <- we are here but
9739 # || ( $Year > 2899 ) # list has not yet
9740 # || ( $EndYear < 1601 ) # collapsed vertically
9741 # || ( $EndYear > 2899 ) )
9744 # Example 2: previous line ending in assignment:
9746 # $year % 4 ? 0 # <- We are here
9752 $is_assignment{ $types_to_go[$iendm] }
9753 || ( $nesting_depth_to_go[$ibegm] <
9754 $nesting_depth_to_go[$ibeg] )
9757 # we will add padding before the first token
9761 # for first line of the batch..
9764 # WARNING: Never indent if first line is starting in a
9765 # continued quote, which would change the quote.
9766 next if $starting_in_quote;
9768 # if this is text after closing '}'
9769 # then look for an interior token to pad
9770 if ( $types_to_go[$ibeg] eq '}' ) {
9774 # otherwise, we might pad if it looks really good
9777 # we might pad token $ibeg, so be sure that it
9778 # is at the same depth as the next line.
9780 if ( $nesting_depth_to_go[$ibeg] !=
9781 $nesting_depth_to_go[$ibeg_next] );
9783 # We can pad on line 1 of a statement if at least 3
9784 # lines will be aligned. Otherwise, it
9785 # can look very confusing.
9787 # We have to be careful not to pad if there are too few
9788 # lines. The current rule is:
9789 # (1) in general we require at least 3 consecutive lines
9790 # with the same leading chain operator token,
9791 # (2) but an exception is that we only require two lines
9792 # with leading colons if there are no more lines. For example,
9793 # the first $i in the following snippet would get padding
9794 # by the second rule:
9796 # $i == 1 ? ( "First", "Color" )
9797 # : $i == 2 ? ( "Then", "Rarity" )
9798 # : ( "Then", "Name" );
9800 if ( $max_line > 1 ) {
9801 my $leading_token = $tokens_to_go[$ibeg_next];
9804 # never indent line 1 of a '.' series because
9805 # previous line is most likely at same level.
9806 # TODO: we should also look at the leasing_spaces
9807 # of the last output line and skip if it is same
9809 next if ( $leading_token eq '.' );
9812 foreach my $l ( 2 .. 3 ) {
9813 last if ( $line + $l > $max_line );
9814 my $ibeg_next_next = $$ri_first[ $line + $l ];
9815 if ( $tokens_to_go[$ibeg_next_next] ne
9823 next if ($tokens_differ);
9824 next if ( $count < 3 && $leading_token ne ':' );
9834 # find interior token to pad if necessary
9835 if ( !defined($ipad) ) {
9837 for ( my $i = $ibeg ; ( $i < $iend ) && !$ipad ; $i++ ) {
9839 # find any unclosed container
9841 unless ( $type_sequence_to_go[$i]
9842 && $mate_index_to_go[$i] > $iend );
9844 # find next nonblank token to pad
9846 if ( $types_to_go[$ipad] eq 'b' ) {
9848 last if ( $ipad > $iend );
9854 # next line must not be at greater depth
9855 my $iend_next = $$ri_last[ $line + 1 ];
9857 if ( $nesting_depth_to_go[ $iend_next + 1 ] >
9858 $nesting_depth_to_go[$ipad] );
9860 # lines must be somewhat similar to be padded..
9861 my $inext_next = $ibeg_next + 1;
9862 if ( $types_to_go[$inext_next] eq 'b' ) {
9865 my $type = $types_to_go[$ipad];
9867 # see if there are multiple continuation lines
9868 my $logical_continuation_lines = 1;
9869 if ( $line + 2 <= $max_line ) {
9870 my $leading_token = $tokens_to_go[$ibeg_next];
9871 my $ibeg_next_next = $$ri_first[ $line + 2 ];
9872 if ( $tokens_to_go[$ibeg_next_next] eq $leading_token
9873 && $nesting_depth_to_go[$ibeg_next] eq
9874 $nesting_depth_to_go[$ibeg_next_next] )
9876 $logical_continuation_lines++;
9881 # either we have multiple continuation lines to follow
9882 # and we are not padding the first token
9883 ( $logical_continuation_lines > 1 && $ipad > 0 )
9889 $types_to_go[$inext_next] eq $type
9891 # and keywords must match if keyword
9894 && $tokens_to_go[$ipad] ne $tokens_to_go[$inext_next]
9900 #----------------------begin special checks--------------
9903 # A check is needed before we can make the pad.
9904 # If we are in a list with some long items, we want each
9905 # item to stand out. So in the following example, the
9906 # first line begining with '$casefold->' would look good
9907 # padded to align with the next line, but then it
9908 # would be indented more than the last line, so we
9912 # $casefold->{code} eq '0041'
9913 # && $casefold->{status} eq 'C'
9914 # && $casefold->{mapping} eq '0061',
9919 # It would be faster, and almost as good, to use a comma
9920 # count, and not pad if comma_count > 1 and the previous
9921 # line did not end with a comma.
9925 my $ibg = $$ri_first[ $line + 1 ];
9926 my $depth = $nesting_depth_to_go[ $ibg + 1 ];
9928 # just use simplified formula for leading spaces to avoid
9929 # needless sub calls
9930 my $lsp = $levels_to_go[$ibg] + $ci_levels_to_go[$ibg];
9932 # look at each line beyond the next ..
9934 foreach $l ( $line + 2 .. $max_line ) {
9935 my $ibg = $$ri_first[$l];
9937 # quit looking at the end of this container
9939 if ( $nesting_depth_to_go[ $ibg + 1 ] < $depth )
9940 || ( $nesting_depth_to_go[$ibg] < $depth );
9942 # cannot do the pad if a later line would be
9944 if ( $levels_to_go[$ibg] + $ci_levels_to_go[$ibg] < $lsp ) {
9950 # don't pad if we end in a broken list
9951 if ( $l == $max_line ) {
9952 my $i2 = $$ri_last[$l];
9953 if ( $types_to_go[$i2] eq '#' ) {
9954 my $i1 = $$ri_first[$l];
9957 terminal_type( \@types_to_go, \@block_type_to_go, $i1,
9964 # a minus may introduce a quoted variable, and we will
9965 # add the pad only if this line begins with a bare word,
9966 # such as for the word 'Button' here:
9968 # Button => "Print letter \"~$_\"",
9969 # -command => [ sub { print "$_[0]\n" }, $_ ],
9970 # -accelerator => "Meta+$_"
9973 # On the other hand, if 'Button' is quoted, it looks best
9976 # 'Button' => "Print letter \"~$_\"",
9977 # -command => [ sub { print "$_[0]\n" }, $_ ],
9978 # -accelerator => "Meta+$_"
9980 if ( $types_to_go[$ibeg_next] eq 'm' ) {
9981 $ok_to_pad = 0 if $types_to_go[$ibeg] eq 'Q';
9984 next unless $ok_to_pad;
9986 #----------------------end special check---------------
9988 my $length_1 = total_line_length( $ibeg, $ipad - 1 );
9989 my $length_2 = total_line_length( $ibeg_next, $inext_next - 1 );
9990 $pad_spaces = $length_2 - $length_1;
9992 # make sure this won't change if -lp is used
9993 my $indentation_1 = $leading_spaces_to_go[$ibeg];
9994 if ( ref($indentation_1) ) {
9995 if ( $indentation_1->get_RECOVERABLE_SPACES() == 0 ) {
9996 my $indentation_2 = $leading_spaces_to_go[$ibeg_next];
9997 unless ( $indentation_2->get_RECOVERABLE_SPACES() == 0 ) {
10003 # we might be able to handle a pad of -1 by removing a blank
10005 if ( $pad_spaces < 0 ) {
10006 if ( $pad_spaces == -1 ) {
10007 if ( $ipad > $ibeg && $types_to_go[ $ipad - 1 ] eq 'b' ) {
10008 $tokens_to_go[ $ipad - 1 ] = '';
10014 # now apply any padding for alignment
10015 if ( $ipad >= 0 && $pad_spaces ) {
10016 my $length_t = total_line_length( $ibeg, $iend );
10017 if ( $pad_spaces + $length_t <= $rOpts_maximum_line_length ) {
10018 $tokens_to_go[$ipad] =
10019 ' ' x $pad_spaces . $tokens_to_go[$ipad];
10027 $has_leading_op = $has_leading_op_next;
10028 } # end of loop over lines
10032 sub correct_lp_indentation {
10034 # When the -lp option is used, we need to make a last pass through
10035 # each line to correct the indentation positions in case they differ
10036 # from the predictions. This is necessary because perltidy uses a
10037 # predictor/corrector method for aligning with opening parens. The
10038 # predictor is usually good, but sometimes stumbles. The corrector
10039 # tries to patch things up once the actual opening paren locations
10041 my ( $ri_first, $ri_last ) = @_;
10042 my $do_not_pad = 0;
10044 # Note on flag '$do_not_pad':
10045 # We want to avoid a situation like this, where the aligner inserts
10046 # whitespace before the '=' to align it with a previous '=', because
10047 # otherwise the parens might become mis-aligned in a situation like
10048 # this, where the '=' has become aligned with the previous line,
10049 # pushing the opening '(' forward beyond where we want it.
10051 # $mkFloor::currentRoom = '';
10052 # $mkFloor::c_entry = $c->Entry(
10054 # -relief => 'sunken',
10058 # We leave it to the aligner to decide how to do this.
10060 # first remove continuation indentation if appropriate
10061 my $max_line = @$ri_first - 1;
10063 # looking at each line of this batch..
10064 my ( $ibeg, $iend );
10066 foreach $line ( 0 .. $max_line ) {
10067 $ibeg = $$ri_first[$line];
10068 $iend = $$ri_last[$line];
10070 # looking at each token in this output line..
10072 foreach $i ( $ibeg .. $iend ) {
10074 # How many space characters to place before this token
10075 # for special alignment. Actual padding is done in the
10078 # looking for next unvisited indentation item
10079 my $indentation = $leading_spaces_to_go[$i];
10080 if ( !$indentation->get_MARKED() ) {
10081 $indentation->set_MARKED(1);
10083 # looking for indentation item for which we are aligning
10084 # with parens, braces, and brackets
10085 next unless ( $indentation->get_ALIGN_PAREN() );
10087 # skip closed container on this line
10088 if ( $i > $ibeg ) {
10090 if ( $types_to_go[$im] eq 'b' && $im > $ibeg ) { $im-- }
10091 if ( $type_sequence_to_go[$im]
10092 && $mate_index_to_go[$im] <= $iend )
10098 if ( $line == 1 && $i == $ibeg ) {
10102 # Ok, let's see what the error is and try to fix it
10104 my $predicted_pos = $indentation->get_SPACES();
10105 if ( $i > $ibeg ) {
10107 # token is mid-line - use length to previous token
10108 $actual_pos = total_line_length( $ibeg, $i - 1 );
10110 # for mid-line token, we must check to see if all
10111 # additional lines have continuation indentation,
10112 # and remove it if so. Otherwise, we do not get
10114 my $closing_index = $indentation->get_CLOSED();
10115 if ( $closing_index > $iend ) {
10116 my $ibeg_next = $$ri_first[ $line + 1 ];
10117 if ( $ci_levels_to_go[$ibeg_next] > 0 ) {
10118 undo_lp_ci( $line, $i, $closing_index, $ri_first,
10123 elsif ( $line > 0 ) {
10125 # handle case where token starts a new line;
10126 # use length of previous line
10127 my $ibegm = $$ri_first[ $line - 1 ];
10128 my $iendm = $$ri_last[ $line - 1 ];
10129 $actual_pos = total_line_length( $ibegm, $iendm );
10133 if ( $types_to_go[ $iendm + 1 ] eq 'b' );
10137 # token is first character of first line of batch
10138 $actual_pos = $predicted_pos;
10141 my $move_right = $actual_pos - $predicted_pos;
10143 # done if no error to correct (gnu2.t)
10144 if ( $move_right == 0 ) {
10145 $indentation->set_RECOVERABLE_SPACES($move_right);
10149 # if we have not seen closure for this indentation in
10150 # this batch, we can only pass on a request to the
10152 my $closing_index = $indentation->get_CLOSED();
10154 if ( $closing_index < 0 ) {
10155 $indentation->set_RECOVERABLE_SPACES($move_right);
10159 # If necessary, look ahead to see if there is really any
10160 # leading whitespace dependent on this whitespace, and
10161 # also find the longest line using this whitespace.
10162 # Since it is always safe to move left if there are no
10163 # dependents, we only need to do this if we may have
10164 # dependent nodes or need to move right.
10166 my $right_margin = 0;
10167 my $have_child = $indentation->get_HAVE_CHILD();
10169 my %saw_indentation;
10170 my $line_count = 1;
10171 $saw_indentation{$indentation} = $indentation;
10173 if ( $have_child || $move_right > 0 ) {
10175 my $max_length = 0;
10176 if ( $i == $ibeg ) {
10177 $max_length = total_line_length( $ibeg, $iend );
10180 # look ahead at the rest of the lines of this batch..
10182 foreach $line_t ( $line + 1 .. $max_line ) {
10183 my $ibeg_t = $$ri_first[$line_t];
10184 my $iend_t = $$ri_last[$line_t];
10185 last if ( $closing_index <= $ibeg_t );
10187 # remember all different indentation objects
10188 my $indentation_t = $leading_spaces_to_go[$ibeg_t];
10189 $saw_indentation{$indentation_t} = $indentation_t;
10192 # remember longest line in the group
10193 my $length_t = total_line_length( $ibeg_t, $iend_t );
10194 if ( $length_t > $max_length ) {
10195 $max_length = $length_t;
10198 $right_margin = $rOpts_maximum_line_length - $max_length;
10199 if ( $right_margin < 0 ) { $right_margin = 0 }
10202 my $first_line_comma_count =
10203 grep { $_ eq ',' } @types_to_go[ $ibeg .. $iend ];
10204 my $comma_count = $indentation->get_COMMA_COUNT();
10205 my $arrow_count = $indentation->get_ARROW_COUNT();
10207 # This is a simple approximate test for vertical alignment:
10208 # if we broke just after an opening paren, brace, bracket,
10209 # and there are 2 or more commas in the first line,
10210 # and there are no '=>'s,
10211 # then we are probably vertically aligned. We could set
10212 # an exact flag in sub scan_list, but this is good
10214 my $indentation_count = keys %saw_indentation;
10215 my $is_vertically_aligned =
10217 && $first_line_comma_count > 1
10218 && $indentation_count == 1
10219 && ( $arrow_count == 0 || $arrow_count == $line_count ) );
10221 # Make the move if possible ..
10224 # we can always move left
10227 # but we should only move right if we are sure it will
10228 # not spoil vertical alignment
10229 || ( $comma_count == 0 )
10230 || ( $comma_count > 0 && !$is_vertically_aligned )
10234 ( $move_right <= $right_margin )
10238 foreach ( keys %saw_indentation ) {
10239 $saw_indentation{$_}
10240 ->permanently_decrease_AVAILABLE_SPACES( -$move );
10244 # Otherwise, record what we want and the vertical aligner
10245 # will try to recover it.
10247 $indentation->set_RECOVERABLE_SPACES($move_right);
10252 return $do_not_pad;
10255 # flush is called to output any tokens in the pipeline, so that
10256 # an alternate source of lines can be written in the correct order
10259 destroy_one_line_block();
10260 output_line_to_go();
10261 Perl::Tidy::VerticalAligner::flush();
10264 sub reset_block_text_accumulator {
10266 # save text after 'if' and 'elsif' to append after 'else'
10267 if ($accumulating_text_for_block) {
10269 if ( $accumulating_text_for_block =~ /^(if|elsif)$/ ) {
10270 push @{$rleading_block_if_elsif_text}, $leading_block_text;
10273 $accumulating_text_for_block = "";
10274 $leading_block_text = "";
10275 $leading_block_text_level = 0;
10276 $leading_block_text_length_exceeded = 0;
10277 $leading_block_text_line_number = 0;
10278 $leading_block_text_line_length = 0;
10281 sub set_block_text_accumulator {
10283 $accumulating_text_for_block = $tokens_to_go[$i];
10284 if ( $accumulating_text_for_block !~ /^els/ ) {
10285 $rleading_block_if_elsif_text = [];
10287 $leading_block_text = "";
10288 $leading_block_text_level = $levels_to_go[$i];
10289 $leading_block_text_line_number =
10290 $vertical_aligner_object->get_output_line_number();
10291 $leading_block_text_length_exceeded = 0;
10293 # this will contain the column number of the last character
10294 # of the closing side comment
10295 $leading_block_text_line_length =
10296 length($accumulating_text_for_block) +
10297 length( $rOpts->{'closing-side-comment-prefix'} ) +
10298 $leading_block_text_level * $rOpts_indent_columns + 3;
10301 sub accumulate_block_text {
10304 # accumulate leading text for -csc, ignoring any side comments
10305 if ( $accumulating_text_for_block
10306 && !$leading_block_text_length_exceeded
10307 && $types_to_go[$i] ne '#' )
10310 my $added_length = length( $tokens_to_go[$i] );
10311 $added_length += 1 if $i == 0;
10312 my $new_line_length = $leading_block_text_line_length + $added_length;
10314 # we can add this text if we don't exceed some limits..
10317 # we must not have already exceeded the text length limit
10318 length($leading_block_text) <
10319 $rOpts_closing_side_comment_maximum_text
10322 # the new total line length must be below the line length limit
10323 # or the new length must be below the text length limit
10324 # (ie, we may allow one token to exceed the text length limit)
10325 && ( $new_line_length < $rOpts_maximum_line_length
10326 || length($leading_block_text) + $added_length <
10327 $rOpts_closing_side_comment_maximum_text )
10329 # UNLESS: we are adding a closing paren before the brace we seek.
10330 # This is an attempt to avoid situations where the ... to be
10331 # added are longer than the omitted right paren, as in:
10333 # foreach my $item (@a_rather_long_variable_name_here) {
10335 # } ## end foreach my $item (@a_rather_long_variable_name_here...
10338 $tokens_to_go[$i] eq ')'
10341 $i + 1 <= $max_index_to_go
10342 && $block_type_to_go[ $i + 1 ] eq
10343 $accumulating_text_for_block
10345 || ( $i + 2 <= $max_index_to_go
10346 && $block_type_to_go[ $i + 2 ] eq
10347 $accumulating_text_for_block )
10353 # add an extra space at each newline
10354 if ( $i == 0 ) { $leading_block_text .= ' ' }
10356 # add the token text
10357 $leading_block_text .= $tokens_to_go[$i];
10358 $leading_block_text_line_length = $new_line_length;
10361 # show that text was truncated if necessary
10362 elsif ( $types_to_go[$i] ne 'b' ) {
10363 $leading_block_text_length_exceeded = 1;
10364 $leading_block_text .= '...';
10370 my %is_if_elsif_else_unless_while_until_for_foreach;
10374 # These block types may have text between the keyword and opening
10375 # curly. Note: 'else' does not, but must be included to allow trailing
10376 # if/elsif text to be appended.
10377 # patch for SWITCH/CASE: added 'case' and 'when'
10378 @_ = qw(if elsif else unless while until for foreach case when);
10379 @is_if_elsif_else_unless_while_until_for_foreach{@_} = (1) x scalar(@_);
10382 sub accumulate_csc_text {
10384 # called once per output buffer when -csc is used. Accumulates
10385 # the text placed after certain closing block braces.
10386 # Defines and returns the following for this buffer:
10388 my $block_leading_text = ""; # the leading text of the last '}'
10389 my $rblock_leading_if_elsif_text;
10390 my $i_block_leading_text =
10391 -1; # index of token owning block_leading_text
10392 my $block_line_count = 100; # how many lines the block spans
10393 my $terminal_type = 'b'; # type of last nonblank token
10394 my $i_terminal = 0; # index of last nonblank token
10395 my $terminal_block_type = "";
10397 for my $i ( 0 .. $max_index_to_go ) {
10398 my $type = $types_to_go[$i];
10399 my $block_type = $block_type_to_go[$i];
10400 my $token = $tokens_to_go[$i];
10402 # remember last nonblank token type
10403 if ( $type ne '#' && $type ne 'b' ) {
10404 $terminal_type = $type;
10405 $terminal_block_type = $block_type;
10409 my $type_sequence = $type_sequence_to_go[$i];
10410 if ( $block_type && $type_sequence ) {
10412 if ( $token eq '}' ) {
10414 # restore any leading text saved when we entered this block
10415 if ( defined( $block_leading_text{$type_sequence} ) ) {
10416 ( $block_leading_text, $rblock_leading_if_elsif_text ) =
10417 @{ $block_leading_text{$type_sequence} };
10418 $i_block_leading_text = $i;
10419 delete $block_leading_text{$type_sequence};
10420 $rleading_block_if_elsif_text =
10421 $rblock_leading_if_elsif_text;
10424 # if we run into a '}' then we probably started accumulating
10425 # at something like a trailing 'if' clause..no harm done.
10426 if ( $accumulating_text_for_block
10427 && $levels_to_go[$i] <= $leading_block_text_level )
10429 my $lev = $levels_to_go[$i];
10430 reset_block_text_accumulator();
10433 if ( defined( $block_opening_line_number{$type_sequence} ) )
10435 my $output_line_number =
10436 $vertical_aligner_object->get_output_line_number();
10437 $block_line_count =
10438 $output_line_number -
10439 $block_opening_line_number{$type_sequence} + 1;
10440 delete $block_opening_line_number{$type_sequence};
10444 # Error: block opening line undefined for this line..
10445 # This shouldn't be possible, but it is not a
10446 # significant problem.
10450 elsif ( $token eq '{' ) {
10453 $vertical_aligner_object->get_output_line_number();
10454 $block_opening_line_number{$type_sequence} = $line_number;
10456 if ( $accumulating_text_for_block
10457 && $levels_to_go[$i] == $leading_block_text_level )
10460 if ( $accumulating_text_for_block eq $block_type ) {
10462 # save any leading text before we enter this block
10463 $block_leading_text{$type_sequence} = [
10464 $leading_block_text,
10465 $rleading_block_if_elsif_text
10467 $block_opening_line_number{$type_sequence} =
10468 $leading_block_text_line_number;
10469 reset_block_text_accumulator();
10473 # shouldn't happen, but not a serious error.
10474 # We were accumulating -csc text for block type
10475 # $accumulating_text_for_block and unexpectedly
10476 # encountered a '{' for block type $block_type.
10483 && $csc_new_statement_ok
10484 && $is_if_elsif_else_unless_while_until_for_foreach{$token}
10485 && $token =~ /$closing_side_comment_list_pattern/o )
10487 set_block_text_accumulator($i);
10491 # note: ignoring type 'q' because of tricks being played
10492 # with 'q' for hanging side comments
10493 if ( $type ne 'b' && $type ne '#' && $type ne 'q' ) {
10494 $csc_new_statement_ok =
10495 ( $block_type || $type eq 'J' || $type eq ';' );
10498 && $accumulating_text_for_block
10499 && $levels_to_go[$i] == $leading_block_text_level )
10501 reset_block_text_accumulator();
10504 accumulate_block_text($i);
10509 # Treat an 'else' block specially by adding preceding 'if' and
10510 # 'elsif' text. Otherwise, the 'end else' is not helpful,
10511 # especially for cuddled-else formatting.
10512 if ( $terminal_block_type =~ /^els/ && $rblock_leading_if_elsif_text ) {
10513 $block_leading_text =
10514 make_else_csc_text( $i_terminal, $terminal_block_type,
10515 $block_leading_text, $rblock_leading_if_elsif_text );
10518 return ( $terminal_type, $i_terminal, $i_block_leading_text,
10519 $block_leading_text, $block_line_count );
10523 sub make_else_csc_text {
10525 # create additional -csc text for an 'else' and optionally 'elsif',
10526 # depending on the value of switch
10527 # $rOpts_closing_side_comment_else_flag:
10529 # = 0 add 'if' text to trailing else
10530 # = 1 same as 0 plus:
10531 # add 'if' to 'elsif's if can fit in line length
10532 # add last 'elsif' to trailing else if can fit in one line
10533 # = 2 same as 1 but do not check if exceed line length
10535 # $rif_elsif_text = a reference to a list of all previous closing
10536 # side comments created for this if block
10538 my ( $i_terminal, $block_type, $block_leading_text, $rif_elsif_text ) = @_;
10539 my $csc_text = $block_leading_text;
10541 if ( $block_type eq 'elsif' && $rOpts_closing_side_comment_else_flag == 0 )
10546 my $count = @{$rif_elsif_text};
10547 return $csc_text unless ($count);
10549 my $if_text = '[ if' . $rif_elsif_text->[0];
10551 # always show the leading 'if' text on 'else'
10552 if ( $block_type eq 'else' ) {
10553 $csc_text .= $if_text;
10556 # see if that's all
10557 if ( $rOpts_closing_side_comment_else_flag == 0 ) {
10561 my $last_elsif_text = "";
10562 if ( $count > 1 ) {
10563 $last_elsif_text = ' [elsif' . $rif_elsif_text->[ $count - 1 ];
10564 if ( $count > 2 ) { $last_elsif_text = ' [...' . $last_elsif_text; }
10567 # tentatively append one more item
10568 my $saved_text = $csc_text;
10569 if ( $block_type eq 'else' ) {
10570 $csc_text .= $last_elsif_text;
10573 $csc_text .= ' ' . $if_text;
10576 # all done if no length checks requested
10577 if ( $rOpts_closing_side_comment_else_flag == 2 ) {
10581 # undo it if line length exceeded
10583 length($csc_text) +
10584 length($block_type) +
10585 length( $rOpts->{'closing-side-comment-prefix'} ) +
10586 $levels_to_go[$i_terminal] * $rOpts_indent_columns + 3;
10587 if ( $length > $rOpts_maximum_line_length ) {
10588 $csc_text = $saved_text;
10593 sub add_closing_side_comment {
10595 # add closing side comments after closing block braces if -csc used
10596 my $cscw_block_comment;
10598 #---------------------------------------------------------------
10599 # Step 1: loop through all tokens of this line to accumulate
10600 # the text needed to create the closing side comments. Also see
10601 # how the line ends.
10602 #---------------------------------------------------------------
10604 my ( $terminal_type, $i_terminal, $i_block_leading_text,
10605 $block_leading_text, $block_line_count )
10606 = accumulate_csc_text();
10608 #---------------------------------------------------------------
10609 # Step 2: make the closing side comment if this ends a block
10610 #---------------------------------------------------------------
10611 my $have_side_comment = $i_terminal != $max_index_to_go;
10613 # if this line might end in a block closure..
10615 $terminal_type eq '}'
10620 # the block is long enough
10621 ( $block_line_count >= $rOpts->{'closing-side-comment-interval'} )
10623 # or there is an existing comment to check
10624 || ( $have_side_comment
10625 && $rOpts->{'closing-side-comment-warnings'} )
10628 # .. and if this is one of the types of interest
10629 && $block_type_to_go[$i_terminal] =~
10630 /$closing_side_comment_list_pattern/o
10632 # .. but not an anonymous sub
10633 # These are not normally of interest, and their closing braces are
10634 # often followed by commas or semicolons anyway. This also avoids
10635 # possible erratic output due to line numbering inconsistencies
10636 # in the cases where their closing braces terminate a line.
10637 && $block_type_to_go[$i_terminal] ne 'sub'
10639 # ..and the corresponding opening brace must is not in this batch
10640 # (because we do not need to tag one-line blocks, although this
10641 # should also be caught with a positive -csci value)
10642 && $mate_index_to_go[$i_terminal] < 0
10647 # this is the last token (line doesnt have a side comment)
10648 !$have_side_comment
10650 # or the old side comment is a closing side comment
10651 || $tokens_to_go[$max_index_to_go] =~
10652 /$closing_side_comment_prefix_pattern/o
10657 # then make the closing side comment text
10659 "$rOpts->{'closing-side-comment-prefix'} $block_type_to_go[$i_terminal]";
10661 # append any extra descriptive text collected above
10662 if ( $i_block_leading_text == $i_terminal ) {
10663 $token .= $block_leading_text;
10665 $token =~ s/\s*$//; # trim any trailing whitespace
10667 # handle case of existing closing side comment
10668 if ($have_side_comment) {
10670 # warn if requested and tokens differ significantly
10671 if ( $rOpts->{'closing-side-comment-warnings'} ) {
10672 my $old_csc = $tokens_to_go[$max_index_to_go];
10673 my $new_csc = $token;
10674 $new_csc =~ s/(\.\.\.)\s*$//; # trim trailing '...'
10675 my $new_trailing_dots = $1;
10676 $old_csc =~ s/\.\.\.\s*$//;
10677 $new_csc =~ s/\s+//g; # trim all whitespace
10678 $old_csc =~ s/\s+//g;
10680 # Patch to handle multiple closing side comments at
10681 # else and elsif's. These have become too complicated
10682 # to check, so if we see an indication of
10683 # '[ if' or '[ # elsif', then assume they were made
10685 if ( $block_type_to_go[$i_terminal] eq 'else' ) {
10686 if ( $old_csc =~ /\[\s*elsif/ ) { $old_csc = $new_csc }
10688 elsif ( $block_type_to_go[$i_terminal] eq 'elsif' ) {
10689 if ( $old_csc =~ /\[\s*if/ ) { $old_csc = $new_csc }
10692 # if old comment is contained in new comment,
10693 # only compare the common part.
10694 if ( length($new_csc) > length($old_csc) ) {
10695 $new_csc = substr( $new_csc, 0, length($old_csc) );
10698 # if the new comment is shorter and has been limited,
10699 # only compare the common part.
10700 if ( length($new_csc) < length($old_csc) && $new_trailing_dots )
10702 $old_csc = substr( $old_csc, 0, length($new_csc) );
10705 # any remaining difference?
10706 if ( $new_csc ne $old_csc ) {
10708 # just leave the old comment if we are below the threshold
10709 # for creating side comments
10710 if ( $block_line_count <
10711 $rOpts->{'closing-side-comment-interval'} )
10716 # otherwise we'll make a note of it
10720 "perltidy -cscw replaced: $tokens_to_go[$max_index_to_go]\n"
10723 # save the old side comment in a new trailing block comment
10724 my ( $day, $month, $year ) = (localtime)[ 3, 4, 5 ];
10727 $cscw_block_comment =
10728 "## perltidy -cscw $year-$month-$day: $tokens_to_go[$max_index_to_go]";
10733 # No differences.. we can safely delete old comment if we
10734 # are below the threshold
10735 if ( $block_line_count <
10736 $rOpts->{'closing-side-comment-interval'} )
10739 unstore_token_to_go()
10740 if ( $types_to_go[$max_index_to_go] eq '#' );
10741 unstore_token_to_go()
10742 if ( $types_to_go[$max_index_to_go] eq 'b' );
10747 # switch to the new csc (unless we deleted it!)
10748 $tokens_to_go[$max_index_to_go] = $token if $token;
10751 # handle case of NO existing closing side comment
10754 # insert the new side comment into the output token stream
10756 my $block_type = '';
10757 my $type_sequence = '';
10758 my $container_environment =
10759 $container_environment_to_go[$max_index_to_go];
10760 my $level = $levels_to_go[$max_index_to_go];
10761 my $slevel = $nesting_depth_to_go[$max_index_to_go];
10762 my $no_internal_newlines = 0;
10764 my $nesting_blocks = $nesting_blocks_to_go[$max_index_to_go];
10765 my $ci_level = $ci_levels_to_go[$max_index_to_go];
10766 my $in_continued_quote = 0;
10768 # first insert a blank token
10769 insert_new_token_to_go( ' ', 'b', $slevel, $no_internal_newlines );
10771 # then the side comment
10772 insert_new_token_to_go( $token, $type, $slevel,
10773 $no_internal_newlines );
10776 return $cscw_block_comment;
10779 sub previous_nonblank_token {
10784 elsif ( $types_to_go[ $i - 1 ] ne 'b' ) {
10785 return $tokens_to_go[ $i - 1 ];
10788 return $tokens_to_go[ $i - 2 ];
10795 sub send_lines_to_vertical_aligner {
10797 my ( $ri_first, $ri_last, $do_not_pad ) = @_;
10799 my $rindentation_list = [0]; # ref to indentations for each line
10801 # define the array @matching_token_to_go for the output tokens
10802 # which will be non-blank for each special token (such as =>)
10803 # for which alignment is required.
10804 set_vertical_alignment_markers( $ri_first, $ri_last );
10806 # flush if necessary to avoid unwanted alignment
10807 my $must_flush = 0;
10808 if ( @$ri_first > 1 ) {
10810 # flush before a long if statement
10811 if ( $types_to_go[0] eq 'k' && $tokens_to_go[0] =~ /^(if|unless)$/ ) {
10816 Perl::Tidy::VerticalAligner::flush();
10819 set_logical_padding( $ri_first, $ri_last );
10821 # loop to prepare each line for shipment
10822 my $n_last_line = @$ri_first - 1;
10824 for my $n ( 0 .. $n_last_line ) {
10825 my $ibeg = $$ri_first[$n];
10826 my $iend = $$ri_last[$n];
10831 my $i_start = $ibeg;
10835 my @container_name = ("");
10836 my @multiple_comma_arrows = (undef);
10838 my $j = 0; # field index
10841 for $i ( $ibeg .. $iend ) {
10843 # Keep track of containers balanced on this line only.
10844 # These are used below to prevent unwanted cross-line alignments.
10845 # Unbalanced containers already avoid aligning across
10846 # container boundaries.
10847 if ( $tokens_to_go[$i] eq '(' ) {
10848 my $i_mate = $mate_index_to_go[$i];
10849 if ( $i_mate > $i && $i_mate <= $iend ) {
10851 my $seqno = $type_sequence_to_go[$i];
10852 my $count = comma_arrow_count($seqno);
10853 $multiple_comma_arrows[$depth] = $count && $count > 1;
10854 my $name = previous_nonblank_token($i);
10856 $container_name[$depth] = "+" . $name;
10859 elsif ( $tokens_to_go[$i] eq ')' ) {
10860 $depth-- if $depth > 0;
10863 # if we find a new synchronization token, we are done with
10865 if ( $i > $i_start && $matching_token_to_go[$i] ne '' ) {
10867 my $tok = my $raw_tok = $matching_token_to_go[$i];
10869 # make separators in different nesting depths unique
10870 # by appending the nesting depth digit.
10871 if ( $raw_tok ne '#' ) {
10872 $tok .= "$nesting_depth_to_go[$i]";
10875 # do any special decorations for commas to avoid unwanted
10876 # cross-line alignments.
10877 if ( $raw_tok eq ',' ) {
10878 if ( $container_name[$depth] ) {
10879 $tok .= $container_name[$depth];
10883 # decorate '=>' with:
10884 # - Nothing if this container is unbalanced on this line.
10885 # - The previous token if it is balanced and multiple '=>'s
10886 # - The container name if it is bananced and no other '=>'s
10887 elsif ( $raw_tok eq '=>' ) {
10888 if ( $container_name[$depth] ) {
10889 if ( $multiple_comma_arrows[$depth] ) {
10890 $tok .= "+" . previous_nonblank_token($i);
10893 $tok .= $container_name[$depth];
10898 # concatenate the text of the consecutive tokens to form
10901 join( '', @tokens_to_go[ $i_start .. $i - 1 ] ) );
10903 # store the alignment token for this field
10904 push( @tokens, $tok );
10906 # get ready for the next batch
10909 $patterns[$j] = "";
10912 # continue accumulating tokens
10913 # handle non-keywords..
10914 if ( $types_to_go[$i] ne 'k' ) {
10915 my $type = $types_to_go[$i];
10917 # Mark most things before arrows as a quote to
10918 # get them to line up. Testfile: mixed.pl.
10919 if ( ( $i < $iend - 1 ) && ( $type =~ /^[wnC]$/ ) ) {
10920 my $next_type = $types_to_go[ $i + 1 ];
10921 my $i_next_nonblank =
10922 ( ( $next_type eq 'b' ) ? $i + 2 : $i + 1 );
10924 if ( $types_to_go[$i_next_nonblank] eq '=>' ) {
10929 # minor patch to make numbers and quotes align
10930 if ( $type eq 'n' ) { $type = 'Q' }
10932 $patterns[$j] .= $type;
10935 # for keywords we have to use the actual text
10938 # map certain keywords to the same 'if' class to align
10939 # long if/elsif sequences. my testfile: elsif.pl
10940 my $tok = $tokens_to_go[$i];
10941 if ( $n == 0 && $tok =~ /^(elsif|else|unless)$/ ) {
10944 $patterns[$j] .= $tok;
10948 # done with this line .. join text of tokens to make the last field
10949 push( @fields, join( '', @tokens_to_go[ $i_start .. $iend ] ) );
10951 my ( $indentation, $lev, $level_end, $terminal_type,
10952 $is_semicolon_terminated, $is_outdented_line )
10953 = set_adjusted_indentation( $ibeg, $iend, \@fields, \@patterns,
10954 $ri_first, $ri_last, $rindentation_list );
10956 # we will allow outdenting of long lines..
10957 my $outdent_long_lines = (
10959 # which are long quotes, if allowed
10960 ( $types_to_go[$ibeg] eq 'Q' && $rOpts->{'outdent-long-quotes'} )
10962 # which are long block comments, if allowed
10964 $types_to_go[$ibeg] eq '#'
10965 && $rOpts->{'outdent-long-comments'}
10967 # but not if this is a static block comment
10968 && !$is_static_block_comment
10973 $nesting_depth_to_go[ $iend + 1 ] - $nesting_depth_to_go[$ibeg];
10975 my $rvertical_tightness_flags =
10976 set_vertical_tightness_flags( $n, $n_last_line, $ibeg, $iend,
10977 $ri_first, $ri_last );
10979 # flush an outdented line to avoid any unwanted vertical alignment
10980 Perl::Tidy::VerticalAligner::flush() if ($is_outdented_line);
10982 my $is_terminal_ternary = 0;
10983 if ( $tokens_to_go[$ibeg] eq ':'
10984 || $n > 0 && $tokens_to_go[ $$ri_last[ $n - 1 ] ] eq ':' )
10986 if ( ( $terminal_type eq ';' && $level_end <= $lev )
10987 || ( $level_end < $lev ) )
10989 $is_terminal_ternary = 1;
10993 # send this new line down the pipe
10994 my $forced_breakpoint = $forced_breakpoint_to_go[$iend];
10995 Perl::Tidy::VerticalAligner::append_line(
11002 $forced_breakpoint_to_go[$iend] || $in_comma_list,
11003 $outdent_long_lines,
11004 $is_terminal_ternary,
11005 $is_semicolon_terminated,
11007 $rvertical_tightness_flags,
11011 $tokens_to_go[$iend] eq ',' && $forced_breakpoint_to_go[$iend];
11013 # flush an outdented line to avoid any unwanted vertical alignment
11014 Perl::Tidy::VerticalAligner::flush() if ($is_outdented_line);
11018 } # end of loop to output each line
11020 # remember indentation of lines containing opening containers for
11021 # later use by sub set_adjusted_indentation
11022 save_opening_indentation( $ri_first, $ri_last, $rindentation_list );
11025 { # begin unmatched_indexes
11027 # closure to keep track of unbalanced containers.
11028 # arrays shared by the routines in this block:
11029 my @unmatched_opening_indexes_in_this_batch;
11030 my @unmatched_closing_indexes_in_this_batch;
11031 my %comma_arrow_count;
11033 sub is_unbalanced_batch {
11034 @unmatched_opening_indexes_in_this_batch +
11035 @unmatched_closing_indexes_in_this_batch;
11038 sub comma_arrow_count {
11040 return $comma_arrow_count{$seqno};
11043 sub match_opening_and_closing_tokens {
11045 # Match up indexes of opening and closing braces, etc, in this batch.
11046 # This has to be done after all tokens are stored because unstoring
11047 # of tokens would otherwise cause trouble.
11049 @unmatched_opening_indexes_in_this_batch = ();
11050 @unmatched_closing_indexes_in_this_batch = ();
11051 %comma_arrow_count = ();
11053 my ( $i, $i_mate, $token );
11054 foreach $i ( 0 .. $max_index_to_go ) {
11055 if ( $type_sequence_to_go[$i] ) {
11056 $token = $tokens_to_go[$i];
11057 if ( $token =~ /^[\(\[\{\?]$/ ) {
11058 push @unmatched_opening_indexes_in_this_batch, $i;
11060 elsif ( $token =~ /^[\)\]\}\:]$/ ) {
11062 $i_mate = pop @unmatched_opening_indexes_in_this_batch;
11063 if ( defined($i_mate) && $i_mate >= 0 ) {
11064 if ( $type_sequence_to_go[$i_mate] ==
11065 $type_sequence_to_go[$i] )
11067 $mate_index_to_go[$i] = $i_mate;
11068 $mate_index_to_go[$i_mate] = $i;
11071 push @unmatched_opening_indexes_in_this_batch,
11073 push @unmatched_closing_indexes_in_this_batch, $i;
11077 push @unmatched_closing_indexes_in_this_batch, $i;
11081 elsif ( $tokens_to_go[$i] eq '=>' ) {
11082 if (@unmatched_opening_indexes_in_this_batch) {
11083 my $j = $unmatched_opening_indexes_in_this_batch[-1];
11084 my $seqno = $type_sequence_to_go[$j];
11085 $comma_arrow_count{$seqno}++;
11091 sub save_opening_indentation {
11093 # This should be called after each batch of tokens is output. It
11094 # saves indentations of lines of all unmatched opening tokens.
11095 # These will be used by sub get_opening_indentation.
11097 my ( $ri_first, $ri_last, $rindentation_list ) = @_;
11099 # we no longer need indentations of any saved indentations which
11100 # are unmatched closing tokens in this batch, because we will
11101 # never encounter them again. So we can delete them to keep
11102 # the hash size down.
11103 foreach (@unmatched_closing_indexes_in_this_batch) {
11104 my $seqno = $type_sequence_to_go[$_];
11105 delete $saved_opening_indentation{$seqno};
11108 # we need to save indentations of any unmatched opening tokens
11109 # in this batch because we may need them in a subsequent batch.
11110 foreach (@unmatched_opening_indexes_in_this_batch) {
11111 my $seqno = $type_sequence_to_go[$_];
11112 $saved_opening_indentation{$seqno} = [
11113 lookup_opening_indentation(
11114 $_, $ri_first, $ri_last, $rindentation_list
11119 } # end unmatched_indexes
11121 sub get_opening_indentation {
11123 # get the indentation of the line which output the opening token
11124 # corresponding to a given closing token in the current output batch.
11127 # $i_closing - index in this line of a closing token ')' '}' or ']'
11129 # $ri_first - reference to list of the first index $i for each output
11130 # line in this batch
11131 # $ri_last - reference to list of the last index $i for each output line
11133 # $rindentation_list - reference to a list containing the indentation
11134 # used for each line.
11137 # -the indentation of the line which contained the opening token
11138 # which matches the token at index $i_opening
11139 # -and its offset (number of columns) from the start of the line
11141 my ( $i_closing, $ri_first, $ri_last, $rindentation_list ) = @_;
11143 # first, see if the opening token is in the current batch
11144 my $i_opening = $mate_index_to_go[$i_closing];
11145 my ( $indent, $offset );
11146 if ( $i_opening >= 0 ) {
11148 # it is..look up the indentation
11149 ( $indent, $offset ) =
11150 lookup_opening_indentation( $i_opening, $ri_first, $ri_last,
11151 $rindentation_list );
11154 # if not, it should have been stored in the hash by a previous batch
11156 my $seqno = $type_sequence_to_go[$i_closing];
11158 if ( $saved_opening_indentation{$seqno} ) {
11159 ( $indent, $offset ) = @{ $saved_opening_indentation{$seqno} };
11162 # some kind of serious error
11163 # (example is badfile.t)
11170 # if no sequence number it must be an unbalanced container
11176 return ( $indent, $offset );
11179 sub lookup_opening_indentation {
11181 # get the indentation of the line in the current output batch
11182 # which output a selected opening token
11185 # $i_opening - index of an opening token in the current output batch
11186 # whose line indentation we need
11187 # $ri_first - reference to list of the first index $i for each output
11188 # line in this batch
11189 # $ri_last - reference to list of the last index $i for each output line
11191 # $rindentation_list - reference to a list containing the indentation
11192 # used for each line. (NOTE: the first slot in
11193 # this list is the last returned line number, and this is
11194 # followed by the list of indentations).
11197 # -the indentation of the line which contained token $i_opening
11198 # -and its offset (number of columns) from the start of the line
11200 my ( $i_opening, $ri_start, $ri_last, $rindentation_list ) = @_;
11202 my $nline = $rindentation_list->[0]; # line number of previous lookup
11204 # reset line location if necessary
11205 $nline = 0 if ( $i_opening < $ri_start->[$nline] );
11207 # find the correct line
11208 unless ( $i_opening > $ri_last->[-1] ) {
11209 while ( $i_opening > $ri_last->[$nline] ) { $nline++; }
11212 # error - token index is out of bounds - shouldn't happen
11215 "non-fatal program bug in lookup_opening_indentation - index out of range\n"
11217 report_definite_bug();
11218 $nline = $#{$ri_last};
11221 $rindentation_list->[0] =
11222 $nline; # save line number to start looking next call
11223 my $ibeg = $ri_start->[$nline];
11224 my $offset = token_sequence_length( $ibeg, $i_opening ) - 1;
11225 return ( $rindentation_list->[ $nline + 1 ], $offset );
11229 my %is_if_elsif_else_unless_while_until_for_foreach;
11233 # These block types may have text between the keyword and opening
11234 # curly. Note: 'else' does not, but must be included to allow trailing
11235 # if/elsif text to be appended.
11236 # patch for SWITCH/CASE: added 'case' and 'when'
11237 @_ = qw(if elsif else unless while until for foreach case when);
11238 @is_if_elsif_else_unless_while_until_for_foreach{@_} = (1) x scalar(@_);
11241 sub set_adjusted_indentation {
11243 # This routine has the final say regarding the actual indentation of
11244 # a line. It starts with the basic indentation which has been
11245 # defined for the leading token, and then takes into account any
11246 # options that the user has set regarding special indenting and
11249 my ( $ibeg, $iend, $rfields, $rpatterns, $ri_first, $ri_last,
11250 $rindentation_list )
11253 # we need to know the last token of this line
11254 my ( $terminal_type, $i_terminal ) =
11255 terminal_type( \@types_to_go, \@block_type_to_go, $ibeg, $iend );
11257 my $is_outdented_line = 0;
11259 my $is_semicolon_terminated = $terminal_type eq ';'
11260 && $nesting_depth_to_go[$iend] < $nesting_depth_to_go[$ibeg];
11262 ##########################################################
11263 # Section 1: set a flag and a default indentation
11265 # Most lines are indented according to the initial token.
11266 # But it is common to outdent to the level just after the
11267 # terminal token in certain cases...
11268 # adjust_indentation flag:
11269 # 0 - do not adjust
11271 # 2 - vertically align with opening token
11273 ##########################################################
11274 my $adjust_indentation = 0;
11275 my $default_adjust_indentation = $adjust_indentation;
11277 my ( $opening_indentation, $opening_offset );
11279 # if we are at a closing token of some type..
11280 if ( $types_to_go[$ibeg] =~ /^[\)\}\]]$/ ) {
11282 # get the indentation of the line containing the corresponding
11284 ( $opening_indentation, $opening_offset ) =
11285 get_opening_indentation( $ibeg, $ri_first, $ri_last,
11286 $rindentation_list );
11288 # First set the default behavior:
11289 # default behavior is to outdent closing lines
11290 # of the form: "); }; ]; )->xxx;"
11292 $is_semicolon_terminated
11294 # and 'cuddled parens' of the form: ")->pack("
11296 $terminal_type eq '('
11297 && $types_to_go[$ibeg] eq ')'
11298 && ( $nesting_depth_to_go[$iend] + 1 ==
11299 $nesting_depth_to_go[$ibeg] )
11303 $adjust_indentation = 1;
11306 # TESTING: outdent something like '),'
11308 $terminal_type eq ','
11310 # allow just one character before the comma
11311 && $i_terminal == $ibeg + 1
11313 # requre LIST environment; otherwise, we may outdent too much --
11314 # this can happen in calls without parentheses (overload.t);
11315 && $container_environment_to_go[$i_terminal] eq 'LIST'
11318 $adjust_indentation = 1;
11321 # undo continuation indentation of a terminal closing token if
11322 # it is the last token before a level decrease. This will allow
11323 # a closing token to line up with its opening counterpart, and
11324 # avoids a indentation jump larger than 1 level.
11325 if ( $types_to_go[$i_terminal] =~ /^[\}\]\)R]$/
11326 && $i_terminal == $ibeg )
11328 my $ci = $ci_levels_to_go[$ibeg];
11329 my $lev = $levels_to_go[$ibeg];
11330 my $next_type = $types_to_go[ $ibeg + 1 ];
11331 my $i_next_nonblank =
11332 ( ( $next_type eq 'b' ) ? $ibeg + 2 : $ibeg + 1 );
11333 if ( $i_next_nonblank <= $max_index_to_go
11334 && $levels_to_go[$i_next_nonblank] < $lev )
11336 $adjust_indentation = 1;
11340 $default_adjust_indentation = $adjust_indentation;
11342 # Now modify default behavior according to user request:
11343 # handle option to indent non-blocks of the form ); }; ];
11344 # But don't do special indentation to something like ')->pack('
11345 if ( !$block_type_to_go[$ibeg] ) {
11346 my $cti = $closing_token_indentation{ $tokens_to_go[$ibeg] };
11348 if ( $i_terminal <= $ibeg + 1
11349 || $is_semicolon_terminated )
11351 $adjust_indentation = 2;
11354 $adjust_indentation = 0;
11357 elsif ( $cti == 2 ) {
11358 if ($is_semicolon_terminated) {
11359 $adjust_indentation = 3;
11362 $adjust_indentation = 0;
11365 elsif ( $cti == 3 ) {
11366 $adjust_indentation = 3;
11370 # handle option to indent blocks
11373 $rOpts->{'indent-closing-brace'}
11375 $i_terminal == $ibeg # isolated terminal '}'
11376 || $is_semicolon_terminated
11380 $adjust_indentation = 3;
11385 # if at ');', '};', '>;', and '];' of a terminal qw quote
11386 elsif ($$rpatterns[0] =~ /^qb*;$/
11387 && $$rfields[0] =~ /^([\)\}\]\>]);$/ )
11389 if ( $closing_token_indentation{$1} == 0 ) {
11390 $adjust_indentation = 1;
11393 $adjust_indentation = 3;
11397 ##########################################################
11398 # Section 2: set indentation according to flag set above
11400 # Select the indentation object to define leading
11401 # whitespace. If we are outdenting something like '} } );'
11402 # then we want to use one level below the last token
11403 # ($i_terminal) in order to get it to fully outdent through
11405 ##########################################################
11408 my $level_end = $levels_to_go[$iend];
11410 if ( $adjust_indentation == 0 ) {
11411 $indentation = $leading_spaces_to_go[$ibeg];
11412 $lev = $levels_to_go[$ibeg];
11414 elsif ( $adjust_indentation == 1 ) {
11415 $indentation = $reduced_spaces_to_go[$i_terminal];
11416 $lev = $levels_to_go[$i_terminal];
11419 # handle indented closing token which aligns with opening token
11420 elsif ( $adjust_indentation == 2 ) {
11422 # handle option to align closing token with opening token
11423 $lev = $levels_to_go[$ibeg];
11425 # calculate spaces needed to align with opening token
11427 get_SPACES($opening_indentation) + $opening_offset;
11429 # Indent less than the previous line.
11431 # Problem: For -lp we don't exactly know what it was if there
11432 # were recoverable spaces sent to the aligner. A good solution
11433 # would be to force a flush of the vertical alignment buffer, so
11434 # that we would know. For now, this rule is used for -lp:
11436 # When the last line did not start with a closing token we will
11437 # be optimistic that the aligner will recover everything wanted.
11439 # This rule will prevent us from breaking a hierarchy of closing
11440 # tokens, and in a worst case will leave a closing paren too far
11441 # indented, but this is better than frequently leaving it not
11443 my $last_spaces = get_SPACES($last_indentation_written);
11444 if ( $last_leading_token !~ /^[\}\]\)]$/ ) {
11446 get_RECOVERABLE_SPACES($last_indentation_written);
11449 # reset the indentation to the new space count if it works
11450 # only options are all or none: nothing in-between looks good
11451 $lev = $levels_to_go[$ibeg];
11452 if ( $space_count < $last_spaces ) {
11453 if ($rOpts_line_up_parentheses) {
11454 my $lev = $levels_to_go[$ibeg];
11456 new_lp_indentation_item( $space_count, $lev, 0, 0, 0 );
11459 $indentation = $space_count;
11463 # revert to default if it doesnt work
11465 $space_count = leading_spaces_to_go($ibeg);
11466 if ( $default_adjust_indentation == 0 ) {
11467 $indentation = $leading_spaces_to_go[$ibeg];
11469 elsif ( $default_adjust_indentation == 1 ) {
11470 $indentation = $reduced_spaces_to_go[$i_terminal];
11471 $lev = $levels_to_go[$i_terminal];
11476 # Full indentaion of closing tokens (-icb and -icp or -cti=2)
11479 # handle -icb (indented closing code block braces)
11480 # Updated method for indented block braces: indent one full level if
11481 # there is no continuation indentation. This will occur for major
11482 # structures such as sub, if, else, but not for things like map
11485 # Note: only code blocks without continuation indentation are
11486 # handled here (if, else, unless, ..). In the following snippet,
11487 # the terminal brace of the sort block will have continuation
11488 # indentation as shown so it will not be handled by the coding
11489 # here. We would have to undo the continuation indentation to do
11490 # this, but it probably looks ok as is. This is a possible future
11491 # update for semicolon terminated lines.
11493 # if ($sortby eq 'date' or $sortby eq 'size') {
11495 # $file_data{$a}{$sortby} <=> $file_data{$b}{$sortby}
11500 if ( $block_type_to_go[$ibeg]
11501 && $ci_levels_to_go[$i_terminal] == 0 )
11503 my $spaces = get_SPACES( $leading_spaces_to_go[$i_terminal] );
11504 $indentation = $spaces + $rOpts_indent_columns;
11506 # NOTE: for -lp we could create a new indentation object, but
11507 # there is probably no need to do it
11510 # handle -icp and any -icb block braces which fall through above
11511 # test such as the 'sort' block mentioned above.
11514 # There are currently two ways to handle -icp...
11515 # One way is to use the indentation of the previous line:
11516 # $indentation = $last_indentation_written;
11518 # The other way is to use the indentation that the previous line
11519 # would have had if it hadn't been adjusted:
11520 $indentation = $last_unadjusted_indentation;
11522 # Current method: use the minimum of the two. This avoids
11523 # inconsistent indentation.
11524 if ( get_SPACES($last_indentation_written) <
11525 get_SPACES($indentation) )
11527 $indentation = $last_indentation_written;
11531 # use previous indentation but use own level
11532 # to cause list to be flushed properly
11533 $lev = $levels_to_go[$ibeg];
11536 # remember indentation except for multi-line quotes, which get
11538 unless ( $ibeg == 0 && $starting_in_quote ) {
11539 $last_indentation_written = $indentation;
11540 $last_unadjusted_indentation = $leading_spaces_to_go[$ibeg];
11541 $last_leading_token = $tokens_to_go[$ibeg];
11544 # be sure lines with leading closing tokens are not outdented more
11545 # than the line which contained the corresponding opening token.
11547 #############################################################
11548 # updated per bug report in alex_bug.pl: we must not
11549 # mess with the indentation of closing logical braces so
11550 # we must treat something like '} else {' as if it were
11551 # an isolated brace my $is_isolated_block_brace = (
11552 # $iend == $ibeg ) && $block_type_to_go[$ibeg];
11553 my $is_isolated_block_brace = $block_type_to_go[$ibeg]
11554 && ( $iend == $ibeg
11555 || $is_if_elsif_else_unless_while_until_for_foreach{
11556 $block_type_to_go[$ibeg] } );
11557 #############################################################
11558 if ( !$is_isolated_block_brace && defined($opening_indentation) ) {
11559 if ( get_SPACES($opening_indentation) > get_SPACES($indentation) ) {
11560 $indentation = $opening_indentation;
11564 # remember the indentation of each line of this batch
11565 push @{$rindentation_list}, $indentation;
11567 # outdent lines with certain leading tokens...
11570 # must be first word of this batch
11576 # certain leading keywords if requested
11578 $rOpts->{'outdent-keywords'}
11579 && $types_to_go[$ibeg] eq 'k'
11580 && $outdent_keyword{ $tokens_to_go[$ibeg] }
11583 # or labels if requested
11584 || ( $rOpts->{'outdent-labels'} && $types_to_go[$ibeg] eq 'J' )
11586 # or static block comments if requested
11587 || ( $types_to_go[$ibeg] eq '#'
11588 && $rOpts->{'outdent-static-block-comments'}
11589 && $is_static_block_comment )
11594 my $space_count = leading_spaces_to_go($ibeg);
11595 if ( $space_count > 0 ) {
11596 $space_count -= $rOpts_continuation_indentation;
11597 $is_outdented_line = 1;
11598 if ( $space_count < 0 ) { $space_count = 0 }
11600 # do not promote a spaced static block comment to non-spaced;
11601 # this is not normally necessary but could be for some
11602 # unusual user inputs (such as -ci = -i)
11603 if ( $types_to_go[$ibeg] eq '#' && $space_count == 0 ) {
11607 if ($rOpts_line_up_parentheses) {
11609 new_lp_indentation_item( $space_count, $lev, 0, 0, 0 );
11612 $indentation = $space_count;
11617 return ( $indentation, $lev, $level_end, $terminal_type,
11618 $is_semicolon_terminated, $is_outdented_line );
11622 sub set_vertical_tightness_flags {
11624 my ( $n, $n_last_line, $ibeg, $iend, $ri_first, $ri_last ) = @_;
11626 # Define vertical tightness controls for the nth line of a batch.
11627 # We create an array of parameters which tell the vertical aligner
11628 # if we should combine this line with the next line to achieve the
11629 # desired vertical tightness. The array of parameters contains:
11631 # [0] type: 1=is opening tok 2=is closing tok 3=is opening block brace
11632 # [1] flag: if opening: 1=no multiple steps, 2=multiple steps ok
11633 # if closing: spaces of padding to use
11634 # [2] sequence number of container
11635 # [3] valid flag: do not append if this flag is false. Will be
11636 # true if appropriate -vt flag is set. Otherwise, Will be
11637 # made true only for 2 line container in parens with -lp
11639 # These flags are used by sub set_leading_whitespace in
11640 # the vertical aligner
11642 my $rvertical_tightness_flags = [ 0, 0, 0, 0, 0, 0 ];
11644 # For non-BLOCK tokens, we will need to examine the next line
11645 # too, so we won't consider the last line.
11646 if ( $n < $n_last_line ) {
11648 # see if last token is an opening token...not a BLOCK...
11649 my $ibeg_next = $$ri_first[ $n + 1 ];
11650 my $token_end = $tokens_to_go[$iend];
11651 my $iend_next = $$ri_last[ $n + 1 ];
11653 $type_sequence_to_go[$iend]
11654 && !$block_type_to_go[$iend]
11655 && $is_opening_token{$token_end}
11657 $opening_vertical_tightness{$token_end} > 0
11659 # allow 2-line method call to be closed up
11660 || ( $rOpts_line_up_parentheses
11661 && $token_end eq '('
11663 && $types_to_go[ $iend - 1 ] ne 'b' )
11668 # avoid multiple jumps in nesting depth in one line if
11670 my $ovt = $opening_vertical_tightness{$token_end};
11671 my $iend_next = $$ri_last[ $n + 1 ];
11674 && ( $nesting_depth_to_go[ $iend_next + 1 ] !=
11675 $nesting_depth_to_go[$ibeg_next] )
11679 # If -vt flag has not been set, mark this as invalid
11680 # and aligner will validate it if it sees the closing paren
11682 my $valid_flag = $ovt;
11683 @{$rvertical_tightness_flags} =
11684 ( 1, $ovt, $type_sequence_to_go[$iend], $valid_flag );
11688 # see if first token of next line is a closing token...
11689 # ..and be sure this line does not have a side comment
11690 my $token_next = $tokens_to_go[$ibeg_next];
11691 if ( $type_sequence_to_go[$ibeg_next]
11692 && !$block_type_to_go[$ibeg_next]
11693 && $is_closing_token{$token_next}
11694 && $types_to_go[$iend] !~ '#' ) # for safety, shouldn't happen!
11696 my $ovt = $opening_vertical_tightness{$token_next};
11697 my $cvt = $closing_vertical_tightness{$token_next};
11700 # never append a trailing line like )->pack(
11701 # because it will throw off later alignment
11703 $nesting_depth_to_go[$ibeg_next] ==
11704 $nesting_depth_to_go[ $iend_next + 1 ] + 1
11709 $container_environment_to_go[$ibeg_next] ne 'LIST'
11713 # allow closing up 2-line method calls
11714 || ( $rOpts_line_up_parentheses
11715 && $token_next eq ')' )
11722 # decide which trailing closing tokens to append..
11724 if ( $cvt == 2 || $iend_next == $ibeg_next ) { $ok = 1 }
11726 my $str = join( '',
11727 @types_to_go[ $ibeg_next + 1 .. $ibeg_next + 2 ] );
11729 # append closing token if followed by comment or ';'
11730 if ( $str =~ /^b?[#;]/ ) { $ok = 1 }
11734 my $valid_flag = $cvt;
11735 @{$rvertical_tightness_flags} = (
11737 $tightness{$token_next} == 2 ? 0 : 1,
11738 $type_sequence_to_go[$ibeg_next], $valid_flag,
11744 # Opening Token Right
11745 # If requested, move an isolated trailing opening token to the end of
11746 # the previous line which ended in a comma. We could do this
11747 # in sub recombine_breakpoints but that would cause problems
11748 # with -lp formatting. The problem is that indentation will
11749 # quickly move far to the right in nested expressions. By
11750 # doing it after indentation has been set, we avoid changes
11751 # to the indentation. Actual movement of the token takes place
11752 # in sub write_leader_and_string.
11754 $opening_token_right{ $tokens_to_go[$ibeg_next] }
11756 # previous line is not opening
11757 # (use -sot to combine with it)
11758 && !$is_opening_token{$token_end}
11760 # previous line ended in one of these
11761 # (add other cases if necessary; '=>' and '.' are not necessary
11762 ##&& ($is_opening_token{$token_end} || $token_end eq ',')
11763 && !$block_type_to_go[$ibeg_next]
11765 # this is a line with just an opening token
11766 && ( $iend_next == $ibeg_next
11767 || $iend_next == $ibeg_next + 2
11768 && $types_to_go[$iend_next] eq '#' )
11770 # looks bad if we align vertically with the wrong container
11771 && $tokens_to_go[$ibeg] ne $tokens_to_go[$ibeg_next]
11774 my $valid_flag = 1;
11775 my $spaces = ( $types_to_go[ $ibeg_next - 1 ] eq 'b' ) ? 1 : 0;
11776 @{$rvertical_tightness_flags} =
11777 ( 2, $spaces, $type_sequence_to_go[$ibeg_next], $valid_flag, );
11780 # Stacking of opening and closing tokens
11782 my $token_beg_next = $tokens_to_go[$ibeg_next];
11784 # patch to make something like 'qw(' behave like an opening paren
11786 if ( $types_to_go[$ibeg_next] eq 'q' ) {
11787 if ( $token_beg_next =~ /^qw\s*([\[\(\{])$/ ) {
11788 $token_beg_next = $1;
11792 if ( $is_closing_token{$token_end}
11793 && $is_closing_token{$token_beg_next} )
11795 $stackable = $stack_closing_token{$token_beg_next}
11796 unless ( $block_type_to_go[$ibeg_next] )
11797 ; # shouldn't happen; just checking
11799 elsif ($is_opening_token{$token_end}
11800 && $is_opening_token{$token_beg_next} )
11802 $stackable = $stack_opening_token{$token_beg_next}
11803 unless ( $block_type_to_go[$ibeg_next] )
11804 ; # shouldn't happen; just checking
11809 my $is_semicolon_terminated;
11810 if ( $n + 1 == $n_last_line ) {
11811 my ( $terminal_type, $i_terminal ) = terminal_type(
11812 \@types_to_go, \@block_type_to_go,
11813 $ibeg_next, $iend_next
11815 $is_semicolon_terminated = $terminal_type eq ';'
11816 && $nesting_depth_to_go[$iend_next] <
11817 $nesting_depth_to_go[$ibeg_next];
11820 # this must be a line with just an opening token
11821 # or end in a semicolon
11823 $is_semicolon_terminated
11824 || ( $iend_next == $ibeg_next
11825 || $iend_next == $ibeg_next + 2
11826 && $types_to_go[$iend_next] eq '#' )
11829 my $valid_flag = 1;
11830 my $spaces = ( $types_to_go[ $ibeg_next - 1 ] eq 'b' ) ? 1 : 0;
11831 @{$rvertical_tightness_flags} =
11832 ( 2, $spaces, $type_sequence_to_go[$ibeg_next], $valid_flag,
11838 # Check for a last line with isolated opening BLOCK curly
11839 elsif ($rOpts_block_brace_vertical_tightness
11841 && $types_to_go[$iend] eq '{'
11842 && $block_type_to_go[$iend] =~
11843 /$block_brace_vertical_tightness_pattern/o )
11845 @{$rvertical_tightness_flags} =
11846 ( 3, $rOpts_block_brace_vertical_tightness, 0, 1 );
11849 # pack in the sequence numbers of the ends of this line
11850 $rvertical_tightness_flags->[4] = get_seqno($ibeg);
11851 $rvertical_tightness_flags->[5] = get_seqno($iend);
11852 return $rvertical_tightness_flags;
11857 # get opening and closing sequence numbers of a token for the vertical
11858 # aligner. Assign qw quotes a value to allow qw opening and closing tokens
11859 # to be treated somewhat like opening and closing tokens for stacking
11860 # tokens by the vertical aligner.
11862 my $seqno = $type_sequence_to_go[$ii];
11863 if ( $types_to_go[$ii] eq 'q' ) {
11866 $seqno = $SEQ_QW if ( $tokens_to_go[$ii] =~ /^qw\s*[\(\{\[]/ );
11869 if ( !$ending_in_quote ) {
11870 $seqno = $SEQ_QW if ( $tokens_to_go[$ii] =~ /[\)\}\]]$/ );
11878 my %is_vertical_alignment_type;
11879 my %is_vertical_alignment_keyword;
11884 = **= += *= &= <<= &&= -= /= |= >>= ||= //= .= %= ^= x=
11885 { ? : => =~ && || // ~~ !~~
11887 @is_vertical_alignment_type{@_} = (1) x scalar(@_);
11889 @_ = qw(if unless and or err eq ne for foreach while until);
11890 @is_vertical_alignment_keyword{@_} = (1) x scalar(@_);
11893 sub set_vertical_alignment_markers {
11895 # This routine takes the first step toward vertical alignment of the
11896 # lines of output text. It looks for certain tokens which can serve as
11897 # vertical alignment markers (such as an '=').
11899 # Method: We look at each token $i in this output batch and set
11900 # $matching_token_to_go[$i] equal to those tokens at which we would
11901 # accept vertical alignment.
11903 # nothing to do if we aren't allowed to change whitespace
11904 if ( !$rOpts_add_whitespace ) {
11905 for my $i ( 0 .. $max_index_to_go ) {
11906 $matching_token_to_go[$i] = '';
11911 my ( $ri_first, $ri_last ) = @_;
11913 # remember the index of last nonblank token before any sidecomment
11914 my $i_terminal = $max_index_to_go;
11915 if ( $types_to_go[$i_terminal] eq '#' ) {
11916 if ( $i_terminal > 0 && $types_to_go[ --$i_terminal ] eq 'b' ) {
11917 if ( $i_terminal > 0 ) { --$i_terminal }
11921 # look at each line of this batch..
11922 my $last_vertical_alignment_before_index;
11923 my $vert_last_nonblank_type;
11924 my $vert_last_nonblank_token;
11925 my $vert_last_nonblank_block_type;
11926 my $max_line = @$ri_first - 1;
11927 my ( $i, $type, $token, $block_type, $alignment_type );
11928 my ( $ibeg, $iend, $line );
11930 foreach $line ( 0 .. $max_line ) {
11931 $ibeg = $$ri_first[$line];
11932 $iend = $$ri_last[$line];
11933 $last_vertical_alignment_before_index = -1;
11934 $vert_last_nonblank_type = '';
11935 $vert_last_nonblank_token = '';
11936 $vert_last_nonblank_block_type = '';
11938 # look at each token in this output line..
11939 foreach $i ( $ibeg .. $iend ) {
11940 $alignment_type = '';
11941 $type = $types_to_go[$i];
11942 $block_type = $block_type_to_go[$i];
11943 $token = $tokens_to_go[$i];
11945 # check for flag indicating that we should not align
11947 if ( $matching_token_to_go[$i] ) {
11948 $matching_token_to_go[$i] = '';
11952 #--------------------------------------------------------
11953 # First see if we want to align BEFORE this token
11954 #--------------------------------------------------------
11956 # The first possible token that we can align before
11957 # is index 2 because: 1) it doesn't normally make sense to
11958 # align before the first token and 2) the second
11959 # token must be a blank if we are to align before
11961 if ( $i < $ibeg + 2 ) { }
11963 # must follow a blank token
11964 elsif ( $types_to_go[ $i - 1 ] ne 'b' ) { }
11966 # align a side comment --
11967 elsif ( $type eq '#' ) {
11971 # it is a static side comment
11973 $rOpts->{'static-side-comments'}
11974 && $token =~ /$static_side_comment_pattern/o
11977 # or a closing side comment
11978 || ( $vert_last_nonblank_block_type
11980 /$closing_side_comment_prefix_pattern/o )
11983 $alignment_type = $type;
11984 } ## Example of a static side comment
11987 # otherwise, do not align two in a row to create a
11989 elsif ( $last_vertical_alignment_before_index == $i - 2 ) { }
11991 # align before one of these keywords
11992 # (within a line, since $i>1)
11993 elsif ( $type eq 'k' ) {
11995 # /^(if|unless|and|or|eq|ne)$/
11996 if ( $is_vertical_alignment_keyword{$token} ) {
11997 $alignment_type = $token;
12001 # align before one of these types..
12002 # Note: add '.' after new vertical aligner is operational
12003 elsif ( $is_vertical_alignment_type{$type} ) {
12004 $alignment_type = $token;
12006 # Do not align a terminal token. Although it might
12007 # occasionally look ok to do this, it has been found to be
12008 # a good general rule. The main problems are:
12009 # (1) that the terminal token (such as an = or :) might get
12010 # moved far to the right where it is hard to see because
12011 # nothing follows it, and
12012 # (2) doing so may prevent other good alignments.
12013 if ( $i == $iend || $i >= $i_terminal ) {
12014 $alignment_type = "";
12017 # Do not align leading ': (' or '. ('. This would prevent
12018 # alignment in something like the following:
12020 # ( $input_line_number < 10 ) ? " "
12021 # : ( $input_line_number < 100 ) ? " "
12025 # ( $case_matters ? $accessor : " lc($accessor) " )
12026 # . ( $yesno ? " eq " : " ne " )
12027 if ( $i == $ibeg + 2
12028 && $types_to_go[$ibeg] =~ /^[\.\:]$/
12029 && $types_to_go[ $i - 1 ] eq 'b' )
12031 $alignment_type = "";
12034 # For a paren after keyword, only align something like this:
12036 # elsif ( $b ) { &b }
12037 if ( $token eq '(' && $vert_last_nonblank_type eq 'k' ) {
12038 $alignment_type = ""
12039 unless $vert_last_nonblank_token =~
12040 /^(if|unless|elsif)$/;
12043 # be sure the alignment tokens are unique
12044 # This didn't work well: reason not determined
12045 # if ($token ne $type) {$alignment_type .= $type}
12048 # NOTE: This is deactivated because it causes the previous
12049 # if/elsif alignment to fail
12050 #elsif ( $type eq '}' && $token eq '}' && $block_type_to_go[$i])
12051 #{ $alignment_type = $type; }
12053 if ($alignment_type) {
12054 $last_vertical_alignment_before_index = $i;
12057 #--------------------------------------------------------
12058 # Next see if we want to align AFTER the previous nonblank
12059 #--------------------------------------------------------
12061 # We want to line up ',' and interior ';' tokens, with the added
12062 # space AFTER these tokens. (Note: interior ';' is included
12063 # because it may occur in short blocks).
12066 # we haven't already set it
12069 # and its not the first token of the line
12072 # and it follows a blank
12073 && $types_to_go[ $i - 1 ] eq 'b'
12075 # and previous token IS one of these:
12076 && ( $vert_last_nonblank_type =~ /^[\,\;]$/ )
12078 # and it's NOT one of these
12079 && ( $type !~ /^[b\#\)\]\}]$/ )
12081 # then go ahead and align
12085 $alignment_type = $vert_last_nonblank_type;
12088 #--------------------------------------------------------
12089 # then store the value
12090 #--------------------------------------------------------
12091 $matching_token_to_go[$i] = $alignment_type;
12092 if ( $type ne 'b' ) {
12093 $vert_last_nonblank_type = $type;
12094 $vert_last_nonblank_token = $token;
12095 $vert_last_nonblank_block_type = $block_type;
12102 sub terminal_type {
12104 # returns type of last token on this line (terminal token), as follows:
12105 # returns # for a full-line comment
12106 # returns ' ' for a blank line
12107 # otherwise returns final token type
12109 my ( $rtype, $rblock_type, $ibeg, $iend ) = @_;
12111 # check for full-line comment..
12112 if ( $$rtype[$ibeg] eq '#' ) {
12113 return wantarray ? ( $$rtype[$ibeg], $ibeg ) : $$rtype[$ibeg];
12117 # start at end and walk bakwards..
12118 for ( my $i = $iend ; $i >= $ibeg ; $i-- ) {
12120 # skip past any side comment and blanks
12121 next if ( $$rtype[$i] eq 'b' );
12122 next if ( $$rtype[$i] eq '#' );
12124 # found it..make sure it is a BLOCK termination,
12125 # but hide a terminal } after sort/grep/map because it is not
12126 # necessarily the end of the line. (terminal.t)
12127 my $terminal_type = $$rtype[$i];
12129 $terminal_type eq '}'
12130 && ( !$$rblock_type[$i]
12131 || ( $is_sort_map_grep_eval_do{ $$rblock_type[$i] } ) )
12134 $terminal_type = 'b';
12136 return wantarray ? ( $terminal_type, $i ) : $terminal_type;
12140 return wantarray ? ( ' ', $ibeg ) : ' ';
12145 my %is_good_keyword_breakpoint;
12146 my %is_lt_gt_le_ge;
12148 sub set_bond_strengths {
12152 @_ = qw(if unless while until for foreach);
12153 @is_good_keyword_breakpoint{@_} = (1) x scalar(@_);
12155 @_ = qw(lt gt le ge);
12156 @is_lt_gt_le_ge{@_} = (1) x scalar(@_);
12158 ###############################################################
12159 # NOTE: NO_BREAK's set here are HINTS which may not be honored;
12160 # essential NO_BREAKS's must be enforced in section 2, below.
12161 ###############################################################
12163 # adding NEW_TOKENS: add a left and right bond strength by
12164 # mimmicking what is done for an existing token type. You
12165 # can skip this step at first and take the default, then
12166 # tweak later to get desired results.
12168 # The bond strengths should roughly follow precenence order where
12169 # possible. If you make changes, please check the results very
12170 # carefully on a variety of scripts.
12172 # no break around possible filehandle
12173 $left_bond_strength{'Z'} = NO_BREAK;
12174 $right_bond_strength{'Z'} = NO_BREAK;
12176 # never put a bare word on a new line:
12177 # example print (STDERR, "bla"); will fail with break after (
12178 $left_bond_strength{'w'} = NO_BREAK;
12180 # blanks always have infinite strength to force breaks after real tokens
12181 $right_bond_strength{'b'} = NO_BREAK;
12183 # try not to break on exponentation
12184 @_ = qw" ** .. ... <=> ";
12185 @left_bond_strength{@_} = (STRONG) x scalar(@_);
12186 @right_bond_strength{@_} = (STRONG) x scalar(@_);
12188 # The comma-arrow has very low precedence but not a good break point
12189 $left_bond_strength{'=>'} = NO_BREAK;
12190 $right_bond_strength{'=>'} = NOMINAL;
12192 # ok to break after label
12193 $left_bond_strength{'J'} = NO_BREAK;
12194 $right_bond_strength{'J'} = NOMINAL;
12195 $left_bond_strength{'j'} = STRONG;
12196 $right_bond_strength{'j'} = STRONG;
12197 $left_bond_strength{'A'} = STRONG;
12198 $right_bond_strength{'A'} = STRONG;
12200 $left_bond_strength{'->'} = STRONG;
12201 $right_bond_strength{'->'} = VERY_STRONG;
12203 # breaking AFTER modulus operator is ok:
12205 @left_bond_strength{@_} = (STRONG) x scalar(@_);
12206 @right_bond_strength{@_} =
12207 ( 0.1 * NOMINAL + 0.9 * STRONG ) x scalar(@_);
12209 # Break AFTER math operators * and /
12211 @left_bond_strength{@_} = (STRONG) x scalar(@_);
12212 @right_bond_strength{@_} = (NOMINAL) x scalar(@_);
12214 # Break AFTER weakest math operators + and -
12215 # Make them weaker than * but a bit stronger than '.'
12217 @left_bond_strength{@_} = (STRONG) x scalar(@_);
12218 @right_bond_strength{@_} =
12219 ( 0.91 * NOMINAL + 0.09 * WEAK ) x scalar(@_);
12221 # breaking BEFORE these is just ok:
12223 @right_bond_strength{@_} = (STRONG) x scalar(@_);
12224 @left_bond_strength{@_} = (NOMINAL) x scalar(@_);
12226 # breaking before the string concatenation operator seems best
12227 # because it can be hard to see at the end of a line
12228 $right_bond_strength{'.'} = STRONG;
12229 $left_bond_strength{'.'} = 0.9 * NOMINAL + 0.1 * WEAK;
12232 @left_bond_strength{@_} = (STRONG) x scalar(@_);
12233 @right_bond_strength{@_} = (NOMINAL) x scalar(@_);
12235 # make these a little weaker than nominal so that they get
12236 # favored for end-of-line characters
12237 @_ = qw"!= == =~ !~ ~~ !~~";
12238 @left_bond_strength{@_} = (STRONG) x scalar(@_);
12239 @right_bond_strength{@_} =
12240 ( 0.9 * NOMINAL + 0.1 * WEAK ) x scalar(@_);
12242 # break AFTER these
12243 @_ = qw" < > | & >= <=";
12244 @left_bond_strength{@_} = (VERY_STRONG) x scalar(@_);
12245 @right_bond_strength{@_} =
12246 ( 0.8 * NOMINAL + 0.2 * WEAK ) x scalar(@_);
12248 # breaking either before or after a quote is ok
12249 # but bias for breaking before a quote
12250 $left_bond_strength{'Q'} = NOMINAL;
12251 $right_bond_strength{'Q'} = NOMINAL + 0.02;
12252 $left_bond_strength{'q'} = NOMINAL;
12253 $right_bond_strength{'q'} = NOMINAL;
12255 # starting a line with a keyword is usually ok
12256 $left_bond_strength{'k'} = NOMINAL;
12258 # we usually want to bond a keyword strongly to what immediately
12259 # follows, rather than leaving it stranded at the end of a line
12260 $right_bond_strength{'k'} = STRONG;
12262 $left_bond_strength{'G'} = NOMINAL;
12263 $right_bond_strength{'G'} = STRONG;
12265 # it is good to break AFTER various assignment operators
12267 = **= += *= &= <<= &&=
12268 -= /= |= >>= ||= //=
12272 @left_bond_strength{@_} = (STRONG) x scalar(@_);
12273 @right_bond_strength{@_} =
12274 ( 0.4 * WEAK + 0.6 * VERY_WEAK ) x scalar(@_);
12276 # break BEFORE '&&' and '||' and '//'
12277 # set strength of '||' to same as '=' so that chains like
12278 # $a = $b || $c || $d will break before the first '||'
12279 $right_bond_strength{'||'} = NOMINAL;
12280 $left_bond_strength{'||'} = $right_bond_strength{'='};
12282 # same thing for '//'
12283 $right_bond_strength{'//'} = NOMINAL;
12284 $left_bond_strength{'//'} = $right_bond_strength{'='};
12286 # set strength of && a little higher than ||
12287 $right_bond_strength{'&&'} = NOMINAL;
12288 $left_bond_strength{'&&'} = $left_bond_strength{'||'} + 0.1;
12290 $left_bond_strength{';'} = VERY_STRONG;
12291 $right_bond_strength{';'} = VERY_WEAK;
12292 $left_bond_strength{'f'} = VERY_STRONG;
12294 # make right strength of for ';' a little less than '='
12295 # to make for contents break after the ';' to avoid this:
12296 # for ( $j = $number_of_fields - 1 ; $j < $item_count ; $j +=
12297 # $number_of_fields )
12298 # and make it weaker than ',' and 'and' too
12299 $right_bond_strength{'f'} = VERY_WEAK - 0.03;
12301 # The strengths of ?/: should be somewhere between
12302 # an '=' and a quote (NOMINAL),
12303 # make strength of ':' slightly less than '?' to help
12304 # break long chains of ? : after the colons
12305 $left_bond_strength{':'} = 0.4 * WEAK + 0.6 * NOMINAL;
12306 $right_bond_strength{':'} = NO_BREAK;
12307 $left_bond_strength{'?'} = $left_bond_strength{':'} + 0.01;
12308 $right_bond_strength{'?'} = NO_BREAK;
12310 $left_bond_strength{','} = VERY_STRONG;
12311 $right_bond_strength{','} = VERY_WEAK;
12313 # Set bond strengths of certain keywords
12314 # make 'or', 'err', 'and' slightly weaker than a ','
12315 $left_bond_strength{'and'} = VERY_WEAK - 0.01;
12316 $left_bond_strength{'or'} = VERY_WEAK - 0.02;
12317 $left_bond_strength{'err'} = VERY_WEAK - 0.02;
12318 $left_bond_strength{'xor'} = NOMINAL;
12319 $right_bond_strength{'and'} = NOMINAL;
12320 $right_bond_strength{'or'} = NOMINAL;
12321 $right_bond_strength{'err'} = NOMINAL;
12322 $right_bond_strength{'xor'} = STRONG;
12325 # patch-its always ok to break at end of line
12326 $nobreak_to_go[$max_index_to_go] = 0;
12328 # adding a small 'bias' to strengths is a simple way to make a line
12329 # break at the first of a sequence of identical terms. For example,
12330 # to force long string of conditional operators to break with
12331 # each line ending in a ':', we can add a small number to the bond
12332 # strength of each ':'
12333 my $colon_bias = 0;
12340 my $code_bias = -.01;
12344 my $last_nonblank_type = $type;
12345 my $last_nonblank_token = $token;
12346 my $delta_bias = 0.0001;
12347 my $list_str = $left_bond_strength{'?'};
12349 my ( $block_type, $i_next, $i_next_nonblank, $next_nonblank_token,
12350 $next_nonblank_type, $next_token, $next_type, $total_nesting_depth,
12353 # preliminary loop to compute bond strengths
12354 for ( my $i = 0 ; $i <= $max_index_to_go ; $i++ ) {
12355 $last_type = $type;
12356 if ( $type ne 'b' ) {
12357 $last_nonblank_type = $type;
12358 $last_nonblank_token = $token;
12360 $type = $types_to_go[$i];
12362 # strength on both sides of a blank is the same
12363 if ( $type eq 'b' && $last_type ne 'b' ) {
12364 $bond_strength_to_go[$i] = $bond_strength_to_go[ $i - 1 ];
12368 $token = $tokens_to_go[$i];
12369 $block_type = $block_type_to_go[$i];
12371 $next_type = $types_to_go[$i_next];
12372 $next_token = $tokens_to_go[$i_next];
12373 $total_nesting_depth = $nesting_depth_to_go[$i_next];
12374 $i_next_nonblank = ( ( $next_type eq 'b' ) ? $i + 2 : $i + 1 );
12375 $next_nonblank_type = $types_to_go[$i_next_nonblank];
12376 $next_nonblank_token = $tokens_to_go[$i_next_nonblank];
12378 # Some token chemistry... The decision about where to break a
12379 # line depends upon a "bond strength" between tokens. The LOWER
12380 # the bond strength, the MORE likely a break. The strength
12381 # values are based on trial-and-error, and need to be tweaked
12382 # occasionally to get desired results. Things to keep in mind
12384 # 1. relative strengths are important. small differences
12385 # in strengths can make big formatting differences.
12386 # 2. each indentation level adds one unit of bond strength
12387 # 3. a value of NO_BREAK makes an unbreakable bond
12388 # 4. a value of VERY_WEAK is the strength of a ','
12389 # 5. values below NOMINAL are considered ok break points
12390 # 6. values above NOMINAL are considered poor break points
12391 # We are computing the strength of the bond between the current
12392 # token and the NEXT token.
12393 my $bond_str = VERY_STRONG; # a default, high strength
12395 #---------------------------------------------------------------
12397 # use minimum of left and right bond strengths if defined;
12398 # digraphs and trigraphs like to break on their left
12399 #---------------------------------------------------------------
12400 my $bsr = $right_bond_strength{$type};
12402 if ( !defined($bsr) ) {
12404 if ( $is_digraph{$type} || $is_trigraph{$type} ) {
12408 $bsr = VERY_STRONG;
12412 # define right bond strengths of certain keywords
12413 if ( $type eq 'k' && defined( $right_bond_strength{$token} ) ) {
12414 $bsr = $right_bond_strength{$token};
12416 elsif ( $token eq 'ne' or $token eq 'eq' ) {
12419 my $bsl = $left_bond_strength{$next_nonblank_type};
12421 # set terminal bond strength to the nominal value
12422 # this will cause good preceding breaks to be retained
12423 if ( $i_next_nonblank > $max_index_to_go ) {
12427 if ( !defined($bsl) ) {
12429 if ( $is_digraph{$next_nonblank_type}
12430 || $is_trigraph{$next_nonblank_type} )
12435 $bsl = VERY_STRONG;
12439 # define right bond strengths of certain keywords
12440 if ( $next_nonblank_type eq 'k'
12441 && defined( $left_bond_strength{$next_nonblank_token} ) )
12443 $bsl = $left_bond_strength{$next_nonblank_token};
12445 elsif ($next_nonblank_token eq 'ne'
12446 or $next_nonblank_token eq 'eq' )
12450 elsif ( $is_lt_gt_le_ge{$next_nonblank_token} ) {
12451 $bsl = 0.9 * NOMINAL + 0.1 * STRONG;
12454 # Note: it might seem that we would want to keep a NO_BREAK if
12455 # either token has this value. This didn't work, because in an
12456 # arrow list, it prevents the comma from separating from the
12457 # following bare word (which is probably quoted by its arrow).
12458 # So necessary NO_BREAK's have to be handled as special cases
12459 # in the final section.
12460 $bond_str = ( $bsr < $bsl ) ? $bsr : $bsl;
12461 my $bond_str_1 = $bond_str;
12463 #---------------------------------------------------------------
12466 #---------------------------------------------------------------
12468 # allow long lines before final { in an if statement, as in:
12473 # Otherwise, the line before the { tends to be too short.
12474 if ( $type eq ')' ) {
12475 if ( $next_nonblank_type eq '{' ) {
12476 $bond_str = VERY_WEAK + 0.03;
12480 elsif ( $type eq '(' ) {
12481 if ( $next_nonblank_type eq '{' ) {
12482 $bond_str = NOMINAL;
12486 # break on something like '} (', but keep this stronger than a ','
12487 # example is in 'howe.pl'
12488 elsif ( $type eq 'R' or $type eq '}' ) {
12489 if ( $next_nonblank_type eq '(' ) {
12490 $bond_str = 0.8 * VERY_WEAK + 0.2 * WEAK;
12494 #-----------------------------------------------------------------
12495 # adjust bond strength bias
12496 #-----------------------------------------------------------------
12498 elsif ( $type eq 'f' ) {
12499 $bond_str += $f_bias;
12500 $f_bias += $delta_bias;
12503 # in long ?: conditionals, bias toward just one set per line (colon.t)
12504 elsif ( $type eq ':' ) {
12505 if ( !$want_break_before{$type} ) {
12506 $bond_str += $colon_bias;
12507 $colon_bias += $delta_bias;
12511 if ( $next_nonblank_type eq ':'
12512 && $want_break_before{$next_nonblank_type} )
12514 $bond_str += $colon_bias;
12515 $colon_bias += $delta_bias;
12518 # if leading '.' is used, align all but 'short' quotes;
12519 # the idea is to not place something like "\n" on a single line.
12520 elsif ( $next_nonblank_type eq '.' ) {
12521 if ( $want_break_before{'.'} ) {
12523 $last_nonblank_type eq '.'
12526 $rOpts_short_concatenation_item_length )
12527 && ( $token !~ /^[\)\]\}]$/ )
12530 $dot_bias += $delta_bias;
12532 $bond_str += $dot_bias;
12535 elsif ($next_nonblank_type eq '&&'
12536 && $want_break_before{$next_nonblank_type} )
12538 $bond_str += $amp_bias;
12539 $amp_bias += $delta_bias;
12541 elsif ($next_nonblank_type eq '||'
12542 && $want_break_before{$next_nonblank_type} )
12544 $bond_str += $bar_bias;
12545 $bar_bias += $delta_bias;
12547 elsif ( $next_nonblank_type eq 'k' ) {
12549 if ( $next_nonblank_token eq 'and'
12550 && $want_break_before{$next_nonblank_token} )
12552 $bond_str += $and_bias;
12553 $and_bias += $delta_bias;
12555 elsif ($next_nonblank_token =~ /^(or|err)$/
12556 && $want_break_before{$next_nonblank_token} )
12558 $bond_str += $or_bias;
12559 $or_bias += $delta_bias;
12562 # FIXME: needs more testing
12563 elsif ( $is_keyword_returning_list{$next_nonblank_token} ) {
12564 $bond_str = $list_str if ( $bond_str > $list_str );
12566 elsif ( $token eq 'err'
12567 && !$want_break_before{$token} )
12569 $bond_str += $or_bias;
12570 $or_bias += $delta_bias;
12575 && !$want_break_before{$type} )
12577 $bond_str += $colon_bias;
12578 $colon_bias += $delta_bias;
12580 elsif ( $type eq '&&'
12581 && !$want_break_before{$type} )
12583 $bond_str += $amp_bias;
12584 $amp_bias += $delta_bias;
12586 elsif ( $type eq '||'
12587 && !$want_break_before{$type} )
12589 $bond_str += $bar_bias;
12590 $bar_bias += $delta_bias;
12592 elsif ( $type eq 'k' ) {
12594 if ( $token eq 'and'
12595 && !$want_break_before{$token} )
12597 $bond_str += $and_bias;
12598 $and_bias += $delta_bias;
12600 elsif ( $token eq 'or'
12601 && !$want_break_before{$token} )
12603 $bond_str += $or_bias;
12604 $or_bias += $delta_bias;
12608 # keep matrix and hash indices together
12609 # but make them a little below STRONG to allow breaking open
12610 # something like {'some-word'}{'some-very-long-word'} at the }{
12612 if ( ( $type eq ']' or $type eq 'R' )
12613 && ( $next_nonblank_type eq '[' or $next_nonblank_type eq 'L' )
12616 $bond_str = 0.9 * STRONG + 0.1 * NOMINAL;
12619 if ( $next_nonblank_token =~ /^->/ ) {
12621 # increase strength to the point where a break in the following
12622 # will be after the opening paren rather than at the arrow:
12624 if ( $type eq 'i' ) {
12625 $bond_str = 1.45 * STRONG;
12628 elsif ( $type =~ /^[\)\]\}R]$/ ) {
12629 $bond_str = 0.1 * STRONG + 0.9 * NOMINAL;
12632 # otherwise make strength before an '->' a little over a '+'
12634 if ( $bond_str <= NOMINAL ) {
12635 $bond_str = NOMINAL + 0.01;
12640 if ( $token eq ')' && $next_nonblank_token eq '[' ) {
12641 $bond_str = 0.2 * STRONG + 0.8 * NOMINAL;
12644 # map1.t -- correct for a quirk in perl
12646 && $next_nonblank_type eq 'i'
12647 && $last_nonblank_type eq 'k'
12648 && $is_sort_map_grep{$last_nonblank_token} )
12650 # /^(sort|map|grep)$/ )
12652 $bond_str = NO_BREAK;
12655 # extrude.t: do not break before paren at:
12657 if ( $last_nonblank_type eq 'F' && $next_nonblank_token eq '(' ) {
12658 $bond_str = NO_BREAK;
12661 # good to break after end of code blocks
12662 if ( $type eq '}' && $block_type ) {
12664 $bond_str = 0.5 * WEAK + 0.5 * VERY_WEAK + $code_bias;
12665 $code_bias += $delta_bias;
12668 if ( $type eq 'k' ) {
12670 # allow certain control keywords to stand out
12671 if ( $next_nonblank_type eq 'k'
12672 && $is_last_next_redo_return{$token} )
12674 $bond_str = 0.45 * WEAK + 0.55 * VERY_WEAK;
12677 # Don't break after keyword my. This is a quick fix for a
12678 # rare problem with perl. An example is this line from file
12680 # foreach my $question( Debian::DebConf::ConfigDb::gettree( $this->{'question'} ) )
12682 if ( $token eq 'my' ) {
12683 $bond_str = NO_BREAK;
12688 # good to break before 'if', 'unless', etc
12689 if ( $is_if_brace_follower{$next_nonblank_token} ) {
12690 $bond_str = VERY_WEAK;
12693 if ( $next_nonblank_type eq 'k' ) {
12695 # keywords like 'unless', 'if', etc, within statements
12697 if ( $is_good_keyword_breakpoint{$next_nonblank_token} ) {
12698 $bond_str = VERY_WEAK / 1.05;
12702 # try not to break before a comma-arrow
12703 elsif ( $next_nonblank_type eq '=>' ) {
12704 if ( $bond_str < STRONG ) { $bond_str = STRONG }
12707 #----------------------------------------------------------------------
12708 # only set NO_BREAK's from here on
12709 #----------------------------------------------------------------------
12710 if ( $type eq 'C' or $type eq 'U' ) {
12712 # use strict requires that bare word and => not be separated
12713 if ( $next_nonblank_type eq '=>' ) {
12714 $bond_str = NO_BREAK;
12719 # use strict requires that bare word within braces not start new line
12720 elsif ( $type eq 'L' ) {
12722 if ( $next_nonblank_type eq 'w' ) {
12723 $bond_str = NO_BREAK;
12727 # in older version of perl, use strict can cause problems with
12728 # breaks before bare words following opening parens. For example,
12729 # this will fail under older versions if a break is made between
12732 # open( MAIL, "a long filename or command");
12734 elsif ( $type eq '{' ) {
12736 if ( $token eq '(' && $next_nonblank_type eq 'w' ) {
12738 # but it's fine to break if the word is followed by a '=>'
12739 # or if it is obviously a sub call
12740 my $i_next_next_nonblank = $i_next_nonblank + 1;
12741 my $next_next_type = $types_to_go[$i_next_next_nonblank];
12742 if ( $next_next_type eq 'b'
12743 && $i_next_nonblank < $max_index_to_go )
12745 $i_next_next_nonblank++;
12746 $next_next_type = $types_to_go[$i_next_next_nonblank];
12749 ##if ( $next_next_type ne '=>' ) {
12750 # these are ok: '->xxx', '=>', '('
12752 # We'll check for an old breakpoint and keep a leading
12753 # bareword if it was that way in the input file.
12754 # Presumably it was ok that way. For example, the
12755 # following would remain unchanged:
12758 # January, February, March, April,
12759 # May, June, July, August,
12760 # September, October, November, December,
12763 # This should be sufficient:
12764 if ( !$old_breakpoint_to_go[$i]
12765 && ( $next_next_type eq ',' || $next_next_type eq '}' )
12768 $bond_str = NO_BREAK;
12773 elsif ( $type eq 'w' ) {
12775 if ( $next_nonblank_type eq 'R' ) {
12776 $bond_str = NO_BREAK;
12779 # use strict requires that bare word and => not be separated
12780 if ( $next_nonblank_type eq '=>' ) {
12781 $bond_str = NO_BREAK;
12785 # in fact, use strict hates bare words on any new line. For
12786 # example, a break before the underscore here provokes the
12787 # wrath of use strict:
12788 # if ( -r $fn && ( -s _ || $AllowZeroFilesize)) {
12789 elsif ( $type eq 'F' ) {
12790 $bond_str = NO_BREAK;
12793 # use strict does not allow separating type info from trailing { }
12794 # testfile is readmail.pl
12795 elsif ( $type eq 't' or $type eq 'i' ) {
12797 if ( $next_nonblank_type eq 'L' ) {
12798 $bond_str = NO_BREAK;
12802 # Do not break between a possible filehandle and a ? or / and do
12803 # not introduce a break after it if there is no blank
12805 elsif ( $type eq 'Z' ) {
12810 # if there is no blank and we do not want one. Examples:
12811 # print $x++ # do not break after $x
12812 # print HTML"HELLO" # break ok after HTML
12815 && defined( $want_left_space{$next_type} )
12816 && $want_left_space{$next_type} == WS_NO
12819 # or we might be followed by the start of a quote
12820 || $next_nonblank_type =~ /^[\/\?]$/
12823 $bond_str = NO_BREAK;
12827 # Do not break before a possible file handle
12828 if ( $next_nonblank_type eq 'Z' ) {
12829 $bond_str = NO_BREAK;
12832 # As a defensive measure, do not break between a '(' and a
12833 # filehandle. In some cases, this can cause an error. For
12834 # example, the following program works:
12841 # But this program fails:
12849 # This is normally only a problem with the 'extrude' option
12850 if ( $next_nonblank_type eq 'Y' && $token eq '(' ) {
12851 $bond_str = NO_BREAK;
12854 # patch to put cuddled elses back together when on multiple
12855 # lines, as in: } \n else \n { \n
12856 if ($rOpts_cuddled_else) {
12858 if ( ( $token eq 'else' ) && ( $next_nonblank_type eq '{' )
12859 || ( $type eq '}' ) && ( $next_nonblank_token eq 'else' ) )
12861 $bond_str = NO_BREAK;
12865 # keep '}' together with ';'
12866 if ( ( $token eq '}' ) && ( $next_nonblank_type eq ';' ) ) {
12867 $bond_str = NO_BREAK;
12870 # never break between sub name and opening paren
12871 if ( ( $type eq 'w' ) && ( $next_nonblank_token eq '(' ) ) {
12872 $bond_str = NO_BREAK;
12875 #---------------------------------------------------------------
12877 # now take nesting depth into account
12878 #---------------------------------------------------------------
12879 # final strength incorporates the bond strength and nesting depth
12882 if ( defined($bond_str) && !$nobreak_to_go[$i] ) {
12883 if ( $total_nesting_depth > 0 ) {
12884 $strength = $bond_str + $total_nesting_depth;
12887 $strength = $bond_str;
12891 $strength = NO_BREAK;
12894 # always break after side comment
12895 if ( $type eq '#' ) { $strength = 0 }
12897 $bond_strength_to_go[$i] = $strength;
12899 FORMATTER_DEBUG_FLAG_BOND && do {
12900 my $str = substr( $token, 0, 15 );
12901 $str .= ' ' x ( 16 - length($str) );
12903 "BOND: i=$i $str $type $next_nonblank_type depth=$total_nesting_depth strength=$bond_str_1 -> $bond_str -> $strength \n";
12910 sub pad_array_to_go {
12912 # to simplify coding in scan_list and set_bond_strengths, it helps
12913 # to create some extra blank tokens at the end of the arrays
12914 $tokens_to_go[ $max_index_to_go + 1 ] = '';
12915 $tokens_to_go[ $max_index_to_go + 2 ] = '';
12916 $types_to_go[ $max_index_to_go + 1 ] = 'b';
12917 $types_to_go[ $max_index_to_go + 2 ] = 'b';
12918 $nesting_depth_to_go[ $max_index_to_go + 1 ] =
12919 $nesting_depth_to_go[$max_index_to_go];
12922 if ( $is_closing_type{ $types_to_go[$max_index_to_go] } ) {
12923 if ( $nesting_depth_to_go[$max_index_to_go] <= 0 ) {
12925 # shouldn't happen:
12926 unless ( get_saw_brace_error() ) {
12928 "Program bug in scan_list: hit nesting error which should have been caught\n"
12930 report_definite_bug();
12934 $nesting_depth_to_go[ $max_index_to_go + 1 ] -= 1;
12939 elsif ( $is_opening_type{ $types_to_go[$max_index_to_go] } ) {
12940 $nesting_depth_to_go[ $max_index_to_go + 1 ] += 1;
12944 { # begin scan_list
12947 $block_type, $current_depth,
12949 $i_last_nonblank_token, $last_colon_sequence_number,
12950 $last_nonblank_token, $last_nonblank_type,
12951 $last_old_breakpoint_count, $minimum_depth,
12952 $next_nonblank_block_type, $next_nonblank_token,
12953 $next_nonblank_type, $old_breakpoint_count,
12954 $starting_breakpoint_count, $starting_depth,
12960 @breakpoint_stack, @breakpoint_undo_stack,
12961 @comma_index, @container_type,
12962 @identifier_count_stack, @index_before_arrow,
12963 @interrupted_list, @item_count_stack,
12964 @last_comma_index, @last_dot_index,
12965 @last_nonblank_type, @old_breakpoint_count_stack,
12966 @opening_structure_index_stack, @rfor_semicolon_list,
12967 @has_old_logical_breakpoints, @rand_or_list,
12971 # routine to define essential variables when we go 'up' to
12973 sub check_for_new_minimum_depth {
12975 if ( $depth < $minimum_depth ) {
12977 $minimum_depth = $depth;
12979 # these arrays need not retain values between calls
12980 $breakpoint_stack[$depth] = $starting_breakpoint_count;
12981 $container_type[$depth] = "";
12982 $identifier_count_stack[$depth] = 0;
12983 $index_before_arrow[$depth] = -1;
12984 $interrupted_list[$depth] = 1;
12985 $item_count_stack[$depth] = 0;
12986 $last_nonblank_type[$depth] = "";
12987 $opening_structure_index_stack[$depth] = -1;
12989 $breakpoint_undo_stack[$depth] = undef;
12990 $comma_index[$depth] = undef;
12991 $last_comma_index[$depth] = undef;
12992 $last_dot_index[$depth] = undef;
12993 $old_breakpoint_count_stack[$depth] = undef;
12994 $has_old_logical_breakpoints[$depth] = 0;
12995 $rand_or_list[$depth] = [];
12996 $rfor_semicolon_list[$depth] = [];
12997 $i_equals[$depth] = -1;
12999 # these arrays must retain values between calls
13000 if ( !defined( $has_broken_sublist[$depth] ) ) {
13001 $dont_align[$depth] = 0;
13002 $has_broken_sublist[$depth] = 0;
13003 $want_comma_break[$depth] = 0;
13008 # routine to decide which commas to break at within a container;
13010 # $bp_count = number of comma breakpoints set
13011 # $do_not_break_apart = a flag indicating if container need not
13013 sub set_comma_breakpoints {
13017 my $do_not_break_apart = 0;
13018 if ( $item_count_stack[$dd] && !$dont_align[$dd] ) {
13020 my $fbc = $forced_breakpoint_count;
13022 # always open comma lists not preceded by keywords,
13023 # barewords, identifiers (that is, anything that doesn't
13024 # look like a function call)
13025 my $must_break_open = $last_nonblank_type[$dd] !~ /^[kwiU]$/;
13027 set_comma_breakpoints_do(
13029 $opening_structure_index_stack[$dd],
13031 $item_count_stack[$dd],
13032 $identifier_count_stack[$dd],
13034 $next_nonblank_type,
13035 $container_type[$dd],
13036 $interrupted_list[$dd],
13037 \$do_not_break_apart,
13040 $bp_count = $forced_breakpoint_count - $fbc;
13041 $do_not_break_apart = 0 if $must_break_open;
13043 return ( $bp_count, $do_not_break_apart );
13046 my %is_logical_container;
13049 @_ = qw# if elsif unless while and or err not && | || ? : ! #;
13050 @is_logical_container{@_} = (1) x scalar(@_);
13053 sub set_for_semicolon_breakpoints {
13055 foreach ( @{ $rfor_semicolon_list[$dd] } ) {
13056 set_forced_breakpoint($_);
13060 sub set_logical_breakpoints {
13063 $item_count_stack[$dd] == 0
13064 && $is_logical_container{ $container_type[$dd] }
13067 || $has_old_logical_breakpoints[$dd]
13071 # Look for breaks in this order:
13074 foreach my $i ( 0 .. 3 ) {
13075 if ( $rand_or_list[$dd][$i] ) {
13076 foreach ( @{ $rand_or_list[$dd][$i] } ) {
13077 set_forced_breakpoint($_);
13080 # break at any 'if' and 'unless' too
13081 foreach ( @{ $rand_or_list[$dd][4] } ) {
13082 set_forced_breakpoint($_);
13084 $rand_or_list[$dd] = [];
13091 sub is_unbreakable_container {
13093 # never break a container of one of these types
13094 # because bad things can happen (map1.t)
13096 $is_sort_map_grep{ $container_type[$dd] };
13101 # This routine is responsible for setting line breaks for all lists,
13102 # so that hierarchical structure can be displayed and so that list
13103 # items can be vertically aligned. The output of this routine is
13104 # stored in the array @forced_breakpoint_to_go, which is used to set
13105 # final breakpoints.
13107 $starting_depth = $nesting_depth_to_go[0];
13110 $current_depth = $starting_depth;
13112 $last_colon_sequence_number = -1;
13113 $last_nonblank_token = ';';
13114 $last_nonblank_type = ';';
13115 $last_nonblank_block_type = ' ';
13116 $last_old_breakpoint_count = 0;
13117 $minimum_depth = $current_depth + 1; # forces update in check below
13118 $old_breakpoint_count = 0;
13119 $starting_breakpoint_count = $forced_breakpoint_count;
13122 $type_sequence = '';
13124 check_for_new_minimum_depth($current_depth);
13126 my $is_long_line = excess_line_length( 0, $max_index_to_go ) > 0;
13127 my $want_previous_breakpoint = -1;
13129 my $saw_good_breakpoint;
13130 my $i_line_end = -1;
13131 my $i_line_start = -1;
13133 # loop over all tokens in this batch
13134 while ( ++$i <= $max_index_to_go ) {
13135 if ( $type ne 'b' ) {
13136 $i_last_nonblank_token = $i - 1;
13137 $last_nonblank_type = $type;
13138 $last_nonblank_token = $token;
13139 $last_nonblank_block_type = $block_type;
13141 $type = $types_to_go[$i];
13142 $block_type = $block_type_to_go[$i];
13143 $token = $tokens_to_go[$i];
13144 $type_sequence = $type_sequence_to_go[$i];
13145 my $next_type = $types_to_go[ $i + 1 ];
13146 my $next_token = $tokens_to_go[ $i + 1 ];
13147 my $i_next_nonblank = ( ( $next_type eq 'b' ) ? $i + 2 : $i + 1 );
13148 $next_nonblank_type = $types_to_go[$i_next_nonblank];
13149 $next_nonblank_token = $tokens_to_go[$i_next_nonblank];
13150 $next_nonblank_block_type = $block_type_to_go[$i_next_nonblank];
13152 # set break if flag was set
13153 if ( $want_previous_breakpoint >= 0 ) {
13154 set_forced_breakpoint($want_previous_breakpoint);
13155 $want_previous_breakpoint = -1;
13158 $last_old_breakpoint_count = $old_breakpoint_count;
13159 if ( $old_breakpoint_to_go[$i] ) {
13161 $i_line_start = $i_next_nonblank;
13163 $old_breakpoint_count++;
13165 # Break before certain keywords if user broke there and
13166 # this is a 'safe' break point. The idea is to retain
13167 # any preferred breaks for sequential list operations,
13168 # like a schwartzian transform.
13169 if ($rOpts_break_at_old_keyword_breakpoints) {
13171 $next_nonblank_type eq 'k'
13172 && $is_keyword_returning_list{$next_nonblank_token}
13173 && ( $type =~ /^[=\)\]\}Riw]$/
13175 && $is_keyword_returning_list{$token} )
13179 # we actually have to set this break next time through
13180 # the loop because if we are at a closing token (such
13181 # as '}') which forms a one-line block, this break might
13183 $want_previous_breakpoint = $i;
13187 next if ( $type eq 'b' );
13188 $depth = $nesting_depth_to_go[ $i + 1 ];
13190 # safety check - be sure we always break after a comment
13191 # Shouldn't happen .. an error here probably means that the
13192 # nobreak flag did not get turned off correctly during
13194 if ( $type eq '#' ) {
13195 if ( $i != $max_index_to_go ) {
13197 "Non-fatal program bug: backup logic needed to break after a comment\n"
13199 report_definite_bug();
13200 $nobreak_to_go[$i] = 0;
13201 set_forced_breakpoint($i);
13205 # Force breakpoints at certain tokens in long lines.
13206 # Note that such breakpoints will be undone later if these tokens
13207 # are fully contained within parens on a line.
13210 # break before a keyword within a line
13214 # if one of these keywords:
13215 && $token =~ /^(if|unless|while|until|for)$/
13217 # but do not break at something like '1 while'
13218 && ( $last_nonblank_type ne 'n' || $i > 2 )
13220 # and let keywords follow a closing 'do' brace
13221 && $last_nonblank_block_type ne 'do'
13226 # or container is broken (by side-comment, etc)
13227 || ( $next_nonblank_token eq '('
13228 && $mate_index_to_go[$i_next_nonblank] < $i )
13232 set_forced_breakpoint( $i - 1 );
13235 # remember locations of '||' and '&&' for possible breaks if we
13236 # decide this is a long logical expression.
13237 if ( $type eq '||' ) {
13238 push @{ $rand_or_list[$depth][2] }, $i;
13239 ++$has_old_logical_breakpoints[$depth]
13240 if ( ( $i == $i_line_start || $i == $i_line_end )
13241 && $rOpts_break_at_old_logical_breakpoints );
13243 elsif ( $type eq '&&' ) {
13244 push @{ $rand_or_list[$depth][3] }, $i;
13245 ++$has_old_logical_breakpoints[$depth]
13246 if ( ( $i == $i_line_start || $i == $i_line_end )
13247 && $rOpts_break_at_old_logical_breakpoints );
13249 elsif ( $type eq 'f' ) {
13250 push @{ $rfor_semicolon_list[$depth] }, $i;
13252 elsif ( $type eq 'k' ) {
13253 if ( $token eq 'and' ) {
13254 push @{ $rand_or_list[$depth][1] }, $i;
13255 ++$has_old_logical_breakpoints[$depth]
13256 if ( ( $i == $i_line_start || $i == $i_line_end )
13257 && $rOpts_break_at_old_logical_breakpoints );
13260 # break immediately at 'or's which are probably not in a logical
13261 # block -- but we will break in logical breaks below so that
13262 # they do not add to the forced_breakpoint_count
13263 elsif ( $token eq 'or' ) {
13264 push @{ $rand_or_list[$depth][0] }, $i;
13265 ++$has_old_logical_breakpoints[$depth]
13266 if ( ( $i == $i_line_start || $i == $i_line_end )
13267 && $rOpts_break_at_old_logical_breakpoints );
13268 if ( $is_logical_container{ $container_type[$depth] } ) {
13271 if ($is_long_line) { set_forced_breakpoint($i) }
13272 elsif ( ( $i == $i_line_start || $i == $i_line_end )
13273 && $rOpts_break_at_old_logical_breakpoints )
13275 $saw_good_breakpoint = 1;
13279 elsif ( $token eq 'if' || $token eq 'unless' ) {
13280 push @{ $rand_or_list[$depth][4] }, $i;
13281 if ( ( $i == $i_line_start || $i == $i_line_end )
13282 && $rOpts_break_at_old_logical_breakpoints )
13284 set_forced_breakpoint($i);
13288 elsif ( $is_assignment{$type} ) {
13289 $i_equals[$depth] = $i;
13292 if ($type_sequence) {
13294 # handle any postponed closing breakpoints
13295 if ( $token =~ /^[\)\]\}\:]$/ ) {
13296 if ( $type eq ':' ) {
13297 $last_colon_sequence_number = $type_sequence;
13299 # TESTING: retain break at a ':' line break
13300 if ( ( $i == $i_line_start || $i == $i_line_end )
13301 && $rOpts_break_at_old_ternary_breakpoints )
13305 set_forced_breakpoint($i);
13307 # break at previous '='
13308 if ( $i_equals[$depth] > 0 ) {
13309 set_forced_breakpoint( $i_equals[$depth] );
13310 $i_equals[$depth] = -1;
13314 if ( defined( $postponed_breakpoint{$type_sequence} ) ) {
13315 my $inc = ( $type eq ':' ) ? 0 : 1;
13316 set_forced_breakpoint( $i - $inc );
13317 delete $postponed_breakpoint{$type_sequence};
13321 # set breaks at ?/: if they will get separated (and are
13322 # not a ?/: chain), or if the '?' is at the end of the
13324 elsif ( $token eq '?' ) {
13325 my $i_colon = $mate_index_to_go[$i];
13327 $i_colon <= 0 # the ':' is not in this batch
13328 || $i == 0 # this '?' is the first token of the line
13330 $max_index_to_go # or this '?' is the last token
13334 # don't break at a '?' if preceded by ':' on
13335 # this line of previous ?/: pair on this line.
13336 # This is an attempt to preserve a chain of ?/:
13337 # expressions (elsif2.t). And don't break if
13338 # this has a side comment.
13339 set_forced_breakpoint($i)
13341 $type_sequence == (
13342 $last_colon_sequence_number +
13343 TYPE_SEQUENCE_INCREMENT
13345 || $tokens_to_go[$max_index_to_go] eq '#'
13347 set_closing_breakpoint($i);
13352 #print "LISTX sees: i=$i type=$type tok=$token block=$block_type depth=$depth\n";
13354 #------------------------------------------------------------
13355 # Handle Increasing Depth..
13357 # prepare for a new list when depth increases
13358 # token $i is a '(','{', or '['
13359 #------------------------------------------------------------
13360 if ( $depth > $current_depth ) {
13362 $breakpoint_stack[$depth] = $forced_breakpoint_count;
13363 $breakpoint_undo_stack[$depth] = $forced_breakpoint_undo_count;
13364 $has_broken_sublist[$depth] = 0;
13365 $identifier_count_stack[$depth] = 0;
13366 $index_before_arrow[$depth] = -1;
13367 $interrupted_list[$depth] = 0;
13368 $item_count_stack[$depth] = 0;
13369 $last_comma_index[$depth] = undef;
13370 $last_dot_index[$depth] = undef;
13371 $last_nonblank_type[$depth] = $last_nonblank_type;
13372 $old_breakpoint_count_stack[$depth] = $old_breakpoint_count;
13373 $opening_structure_index_stack[$depth] = $i;
13374 $rand_or_list[$depth] = [];
13375 $rfor_semicolon_list[$depth] = [];
13376 $i_equals[$depth] = -1;
13377 $want_comma_break[$depth] = 0;
13378 $container_type[$depth] =
13379 ( $last_nonblank_type =~ /^(k|=>|&&|\|\||\?|\:|\.)$/ )
13380 ? $last_nonblank_token
13382 $has_old_logical_breakpoints[$depth] = 0;
13384 # if line ends here then signal closing token to break
13385 if ( $next_nonblank_type eq 'b' || $next_nonblank_type eq '#' )
13387 set_closing_breakpoint($i);
13390 # Not all lists of values should be vertically aligned..
13391 $dont_align[$depth] =
13393 # code BLOCKS are handled at a higher level
13394 ( $block_type ne "" )
13396 # certain paren lists
13397 || ( $type eq '(' ) && (
13399 # it does not usually look good to align a list of
13400 # identifiers in a parameter list, as in:
13401 # my($var1, $var2, ...)
13402 # (This test should probably be refined, for now I'm just
13403 # testing for any keyword)
13404 ( $last_nonblank_type eq 'k' )
13406 # a trailing '(' usually indicates a non-list
13407 || ( $next_nonblank_type eq '(' )
13410 # patch to outdent opening brace of long if/for/..
13411 # statements (like this one). See similar coding in
13412 # set_continuation breaks. We have also catch it here for
13413 # short line fragments which otherwise will not go through
13414 # set_continuation_breaks.
13418 # if we have the ')' but not its '(' in this batch..
13419 && ( $last_nonblank_token eq ')' )
13420 && $mate_index_to_go[$i_last_nonblank_token] < 0
13422 # and user wants brace to left
13423 && !$rOpts->{'opening-brace-always-on-right'}
13425 && ( $type eq '{' ) # should be true
13426 && ( $token eq '{' ) # should be true
13429 set_forced_breakpoint( $i - 1 );
13433 #------------------------------------------------------------
13434 # Handle Decreasing Depth..
13436 # finish off any old list when depth decreases
13437 # token $i is a ')','}', or ']'
13438 #------------------------------------------------------------
13439 elsif ( $depth < $current_depth ) {
13441 check_for_new_minimum_depth($depth);
13443 # force all outer logical containers to break after we see on
13445 $has_old_logical_breakpoints[$depth] ||=
13446 $has_old_logical_breakpoints[$current_depth];
13448 # Patch to break between ') {' if the paren list is broken.
13449 # There is similar logic in set_continuation_breaks for
13450 # non-broken lists.
13452 && $next_nonblank_block_type
13453 && $interrupted_list[$current_depth]
13454 && $next_nonblank_type eq '{'
13455 && !$rOpts->{'opening-brace-always-on-right'} )
13457 set_forced_breakpoint($i);
13460 #print "LISTY sees: i=$i type=$type tok=$token block=$block_type depth=$depth next=$next_nonblank_type next_block=$next_nonblank_block_type inter=$interrupted_list[$current_depth]\n";
13462 # set breaks at commas if necessary
13463 my ( $bp_count, $do_not_break_apart ) =
13464 set_comma_breakpoints($current_depth);
13466 my $i_opening = $opening_structure_index_stack[$current_depth];
13467 my $saw_opening_structure = ( $i_opening >= 0 );
13469 # this term is long if we had to break at interior commas..
13470 my $is_long_term = $bp_count > 0;
13472 # ..or if the length between opening and closing parens exceeds
13473 # allowed line length
13474 if ( !$is_long_term && $saw_opening_structure ) {
13475 my $i_opening_minus = find_token_starting_list($i_opening);
13477 # Note: we have to allow for one extra space after a
13478 # closing token so that we do not strand a comma or
13479 # semicolon, hence the '>=' here (oneline.t)
13481 excess_line_length( $i_opening_minus, $i ) >= 0;
13484 # We've set breaks after all comma-arrows. Now we have to
13485 # undo them if this can be a one-line block
13486 # (the only breakpoints set will be due to comma-arrows)
13489 # user doesn't require breaking after all comma-arrows
13490 ( $rOpts_comma_arrow_breakpoints != 0 )
13492 # and if the opening structure is in this batch
13493 && $saw_opening_structure
13495 # and either on the same old line
13497 $old_breakpoint_count_stack[$current_depth] ==
13498 $last_old_breakpoint_count
13500 # or user wants to form long blocks with arrows
13501 || $rOpts_comma_arrow_breakpoints == 2
13504 # and we made some breakpoints between the opening and closing
13505 && ( $breakpoint_undo_stack[$current_depth] <
13506 $forced_breakpoint_undo_count )
13508 # and this block is short enough to fit on one line
13509 # Note: use < because need 1 more space for possible comma
13514 undo_forced_breakpoint_stack(
13515 $breakpoint_undo_stack[$current_depth] );
13518 # now see if we have any comma breakpoints left
13519 my $has_comma_breakpoints =
13520 ( $breakpoint_stack[$current_depth] !=
13521 $forced_breakpoint_count );
13523 # update broken-sublist flag of the outer container
13524 $has_broken_sublist[$depth] = $has_broken_sublist[$depth]
13525 || $has_broken_sublist[$current_depth]
13527 || $has_comma_breakpoints;
13529 # Having come to the closing ')', '}', or ']', now we have to decide if we
13530 # should 'open up' the structure by placing breaks at the opening and
13531 # closing containers. This is a tricky decision. Here are some of the
13532 # basic considerations:
13534 # -If this is a BLOCK container, then any breakpoints will have already
13535 # been set (and according to user preferences), so we need do nothing here.
13537 # -If we have a comma-separated list for which we can align the list items,
13538 # then we need to do so because otherwise the vertical aligner cannot
13539 # currently do the alignment.
13541 # -If this container does itself contain a container which has been broken
13542 # open, then it should be broken open to properly show the structure.
13544 # -If there is nothing to align, and no other reason to break apart,
13545 # then do not do it.
13547 # We will not break open the parens of a long but 'simple' logical expression.
13550 # This is an example of a simple logical expression and its formatting:
13552 # if ( $bigwasteofspace1 && $bigwasteofspace2
13553 # || $bigwasteofspace3 && $bigwasteofspace4 )
13555 # Most people would prefer this than the 'spacey' version:
13558 # $bigwasteofspace1 && $bigwasteofspace2
13559 # || $bigwasteofspace3 && $bigwasteofspace4
13562 # To illustrate the rules for breaking logical expressions, consider:
13566 # and ( exists $ids_excl_uc{$id_uc}
13567 # or grep $id_uc =~ /$_/, @ids_excl_uc ))
13569 # This is on the verge of being difficult to read. The current default is to
13570 # open it up like this:
13575 # and ( exists $ids_excl_uc{$id_uc}
13576 # or grep $id_uc =~ /$_/, @ids_excl_uc )
13579 # This is a compromise which tries to avoid being too dense and to spacey.
13580 # A more spaced version would be:
13586 # exists $ids_excl_uc{$id_uc}
13587 # or grep $id_uc =~ /$_/, @ids_excl_uc
13591 # Some people might prefer the spacey version -- an option could be added. The
13592 # innermost expression contains a long block '( exists $ids_... ')'.
13594 # Here is how the logic goes: We will force a break at the 'or' that the
13595 # innermost expression contains, but we will not break apart its opening and
13596 # closing containers because (1) it contains no multi-line sub-containers itself,
13597 # and (2) there is no alignment to be gained by breaking it open like this
13600 # exists $ids_excl_uc{$id_uc}
13601 # or grep $id_uc =~ /$_/, @ids_excl_uc
13604 # (although this looks perfectly ok and might be good for long expressions). The
13605 # outer 'if' container, though, contains a broken sub-container, so it will be
13606 # broken open to avoid too much density. Also, since it contains no 'or's, there
13607 # will be a forced break at its 'and'.
13609 # set some flags telling something about this container..
13610 my $is_simple_logical_expression = 0;
13611 if ( $item_count_stack[$current_depth] == 0
13612 && $saw_opening_structure
13613 && $tokens_to_go[$i_opening] eq '('
13614 && $is_logical_container{ $container_type[$current_depth] }
13618 # This seems to be a simple logical expression with
13619 # no existing breakpoints. Set a flag to prevent
13621 if ( !$has_comma_breakpoints ) {
13622 $is_simple_logical_expression = 1;
13625 # This seems to be a simple logical expression with
13626 # breakpoints (broken sublists, for example). Break
13627 # at all 'or's and '||'s.
13629 set_logical_breakpoints($current_depth);
13634 && @{ $rfor_semicolon_list[$current_depth] } )
13636 set_for_semicolon_breakpoints($current_depth);
13638 # open up a long 'for' or 'foreach' container to allow
13639 # leading term alignment unless -lp is used.
13640 $has_comma_breakpoints = 1
13641 unless $rOpts_line_up_parentheses;
13646 # breaks for code BLOCKS are handled at a higher level
13649 # we do not need to break at the top level of an 'if'
13651 && !$is_simple_logical_expression
13653 ## modification to keep ': (' containers vertically tight;
13654 ## but probably better to let user set -vt=1 to avoid
13655 ## inconsistency with other paren types
13656 ## && ($container_type[$current_depth] ne ':')
13658 # otherwise, we require one of these reasons for breaking:
13661 # - this term has forced line breaks
13662 $has_comma_breakpoints
13664 # - the opening container is separated from this batch
13665 # for some reason (comment, blank line, code block)
13666 # - this is a non-paren container spanning multiple lines
13667 || !$saw_opening_structure
13669 # - this is a long block contained in another breakable
13672 && $container_environment_to_go[$i_opening] ne
13678 # For -lp option, we must put a breakpoint before
13679 # the token which has been identified as starting
13680 # this indentation level. This is necessary for
13681 # proper alignment.
13682 if ( $rOpts_line_up_parentheses && $saw_opening_structure )
13684 my $item = $leading_spaces_to_go[ $i_opening + 1 ];
13685 if ( $i_opening + 1 < $max_index_to_go
13686 && $types_to_go[ $i_opening + 1 ] eq 'b' )
13688 $item = $leading_spaces_to_go[ $i_opening + 2 ];
13690 if ( defined($item) ) {
13691 my $i_start_2 = $item->get_STARTING_INDEX();
13693 defined($i_start_2)
13695 # we are breaking after an opening brace, paren,
13696 # so don't break before it too
13697 && $i_start_2 ne $i_opening
13701 # Only break for breakpoints at the same
13702 # indentation level as the opening paren
13703 my $test1 = $nesting_depth_to_go[$i_opening];
13704 my $test2 = $nesting_depth_to_go[$i_start_2];
13705 if ( $test2 == $test1 ) {
13706 set_forced_breakpoint( $i_start_2 - 1 );
13712 # break after opening structure.
13713 # note: break before closing structure will be automatic
13714 if ( $minimum_depth <= $current_depth ) {
13716 set_forced_breakpoint($i_opening)
13717 unless ( $do_not_break_apart
13718 || is_unbreakable_container($current_depth) );
13720 # break at '.' of lower depth level before opening token
13721 if ( $last_dot_index[$depth] ) {
13722 set_forced_breakpoint( $last_dot_index[$depth] );
13725 # break before opening structure if preeced by another
13726 # closing structure and a comma. This is normally
13727 # done by the previous closing brace, but not
13728 # if it was a one-line block.
13729 if ( $i_opening > 2 ) {
13731 ( $types_to_go[ $i_opening - 1 ] eq 'b' )
13735 if ( $types_to_go[$i_prev] eq ','
13736 && $types_to_go[ $i_prev - 1 ] =~ /^[\)\}]$/ )
13738 set_forced_breakpoint($i_prev);
13741 # also break before something like ':(' or '?('
13744 $types_to_go[$i_prev] =~ /^([k\:\?]|&&|\|\|)$/ )
13746 my $token_prev = $tokens_to_go[$i_prev];
13747 if ( $want_break_before{$token_prev} ) {
13748 set_forced_breakpoint($i_prev);
13754 # break after comma following closing structure
13755 if ( $next_type eq ',' ) {
13756 set_forced_breakpoint( $i + 1 );
13759 # break before an '=' following closing structure
13761 $is_assignment{$next_nonblank_type}
13762 && ( $breakpoint_stack[$current_depth] !=
13763 $forced_breakpoint_count )
13766 set_forced_breakpoint($i);
13769 # break at any comma before the opening structure Added
13770 # for -lp, but seems to be good in general. It isn't
13771 # obvious how far back to look; the '5' below seems to
13772 # work well and will catch the comma in something like
13773 # push @list, myfunc( $param, $param, ..
13775 my $icomma = $last_comma_index[$depth];
13776 if ( defined($icomma) && ( $i_opening - $icomma ) < 5 ) {
13777 unless ( $forced_breakpoint_to_go[$icomma] ) {
13778 set_forced_breakpoint($icomma);
13781 } # end logic to open up a container
13783 # Break open a logical container open if it was already open
13784 elsif ($is_simple_logical_expression
13785 && $has_old_logical_breakpoints[$current_depth] )
13787 set_logical_breakpoints($current_depth);
13790 # Handle long container which does not get opened up
13791 elsif ($is_long_term) {
13793 # must set fake breakpoint to alert outer containers that
13795 set_fake_breakpoint();
13799 #------------------------------------------------------------
13800 # Handle this token
13801 #------------------------------------------------------------
13803 $current_depth = $depth;
13805 # handle comma-arrow
13806 if ( $type eq '=>' ) {
13807 next if ( $last_nonblank_type eq '=>' );
13808 next if $rOpts_break_at_old_comma_breakpoints;
13809 next if $rOpts_comma_arrow_breakpoints == 3;
13810 $want_comma_break[$depth] = 1;
13811 $index_before_arrow[$depth] = $i_last_nonblank_token;
13815 elsif ( $type eq '.' ) {
13816 $last_dot_index[$depth] = $i;
13819 # Turn off alignment if we are sure that this is not a list
13820 # environment. To be safe, we will do this if we see certain
13821 # non-list tokens, such as ';', and also the environment is
13822 # not a list. Note that '=' could be in any of the = operators
13823 # (lextest.t). We can't just use the reported environment
13824 # because it can be incorrect in some cases.
13825 elsif ( ( $type =~ /^[\;\<\>\~]$/ || $is_assignment{$type} )
13826 && $container_environment_to_go[$i] ne 'LIST' )
13828 $dont_align[$depth] = 1;
13829 $want_comma_break[$depth] = 0;
13830 $index_before_arrow[$depth] = -1;
13833 # now just handle any commas
13834 next unless ( $type eq ',' );
13836 $last_dot_index[$depth] = undef;
13837 $last_comma_index[$depth] = $i;
13839 # break here if this comma follows a '=>'
13840 # but not if there is a side comment after the comma
13841 if ( $want_comma_break[$depth] ) {
13843 if ( $next_nonblank_type =~ /^[\)\}\]R]$/ ) {
13844 $want_comma_break[$depth] = 0;
13845 $index_before_arrow[$depth] = -1;
13849 set_forced_breakpoint($i) unless ( $next_nonblank_type eq '#' );
13851 # break before the previous token if it looks safe
13852 # Example of something that we will not try to break before:
13853 # DBI::SQL_SMALLINT() => $ado_consts->{adSmallInt},
13854 # Also we don't want to break at a binary operator (like +):
13858 # $y - $R, -fill => 'black',
13860 my $ibreak = $index_before_arrow[$depth] - 1;
13862 && $tokens_to_go[ $ibreak + 1 ] !~ /^[\)\}\]]$/ )
13864 if ( $tokens_to_go[$ibreak] eq '-' ) { $ibreak-- }
13865 if ( $types_to_go[$ibreak] eq 'b' ) { $ibreak-- }
13866 if ( $types_to_go[$ibreak] =~ /^[,wiZCUG\(\{\[]$/ ) {
13868 # don't break pointer calls, such as the following:
13869 # File::Spec->curdir => 1,
13870 # (This is tokenized as adjacent 'w' tokens)
13871 if ( $tokens_to_go[ $ibreak + 1 ] !~ /^->/ ) {
13872 set_forced_breakpoint($ibreak);
13877 $want_comma_break[$depth] = 0;
13878 $index_before_arrow[$depth] = -1;
13880 # handle list which mixes '=>'s and ','s:
13881 # treat any list items so far as an interrupted list
13882 $interrupted_list[$depth] = 1;
13886 # skip past these commas if we are not supposed to format them
13887 next if ( $dont_align[$depth] );
13889 # break after all commas above starting depth
13890 if ( $depth < $starting_depth ) {
13891 set_forced_breakpoint($i) unless ( $next_nonblank_type eq '#' );
13895 # add this comma to the list..
13896 my $item_count = $item_count_stack[$depth];
13897 if ( $item_count == 0 ) {
13899 # but do not form a list with no opening structure
13902 # open INFILE_COPY, ">$input_file_copy"
13903 # or die ("very long message");
13905 if ( ( $opening_structure_index_stack[$depth] < 0 )
13906 && $container_environment_to_go[$i] eq 'BLOCK' )
13908 $dont_align[$depth] = 1;
13913 $comma_index[$depth][$item_count] = $i;
13914 ++$item_count_stack[$depth];
13915 if ( $last_nonblank_type =~ /^[iR\]]$/ ) {
13916 $identifier_count_stack[$depth]++;
13920 #-------------------------------------------
13921 # end of loop over all tokens in this batch
13922 #-------------------------------------------
13924 # set breaks for any unfinished lists ..
13925 for ( my $dd = $current_depth ; $dd >= $minimum_depth ; $dd-- ) {
13927 $interrupted_list[$dd] = 1;
13928 $has_broken_sublist[$dd] = 1 if ( $dd < $current_depth );
13929 set_comma_breakpoints($dd);
13930 set_logical_breakpoints($dd)
13931 if ( $has_old_logical_breakpoints[$dd] );
13932 set_for_semicolon_breakpoints($dd);
13934 # break open container...
13935 my $i_opening = $opening_structure_index_stack[$dd];
13936 set_forced_breakpoint($i_opening)
13938 is_unbreakable_container($dd)
13940 # Avoid a break which would place an isolated ' or "
13943 && $i_opening >= $max_index_to_go - 2
13944 && $token =~ /^['"]$/ )
13948 # Return a flag indicating if the input file had some good breakpoints.
13949 # This flag will be used to force a break in a line shorter than the
13950 # allowed line length.
13951 if ( $has_old_logical_breakpoints[$current_depth] ) {
13952 $saw_good_breakpoint = 1;
13954 return $saw_good_breakpoint;
13958 sub find_token_starting_list {
13960 # When testing to see if a block will fit on one line, some
13961 # previous token(s) may also need to be on the line; particularly
13962 # if this is a sub call. So we will look back at least one
13963 # token. NOTE: This isn't perfect, but not critical, because
13964 # if we mis-identify a block, it will be wrapped and therefore
13965 # fixed the next time it is formatted.
13966 my $i_opening_paren = shift;
13967 my $i_opening_minus = $i_opening_paren;
13968 my $im1 = $i_opening_paren - 1;
13969 my $im2 = $i_opening_paren - 2;
13970 my $im3 = $i_opening_paren - 3;
13971 my $typem1 = $types_to_go[$im1];
13972 my $typem2 = $im2 >= 0 ? $types_to_go[$im2] : 'b';
13973 if ( $typem1 eq ',' || ( $typem1 eq 'b' && $typem2 eq ',' ) ) {
13974 $i_opening_minus = $i_opening_paren;
13976 elsif ( $tokens_to_go[$i_opening_paren] eq '(' ) {
13977 $i_opening_minus = $im1 if $im1 >= 0;
13979 # walk back to improve length estimate
13980 for ( my $j = $im1 ; $j >= 0 ; $j-- ) {
13981 last if ( $types_to_go[$j] =~ /^[\(\[\{L\}\]\)Rb,]$/ );
13982 $i_opening_minus = $j;
13984 if ( $types_to_go[$i_opening_minus] eq 'b' ) { $i_opening_minus++ }
13986 elsif ( $typem1 eq 'k' ) { $i_opening_minus = $im1 }
13987 elsif ( $typem1 eq 'b' && $im2 >= 0 && $types_to_go[$im2] eq 'k' ) {
13988 $i_opening_minus = $im2;
13990 return $i_opening_minus;
13993 { # begin set_comma_breakpoints_do
13995 my %is_keyword_with_special_leading_term;
13999 # These keywords have prototypes which allow a special leading item
14000 # followed by a list
14002 qw(formline grep kill map printf sprintf push chmod join pack unshift);
14003 @is_keyword_with_special_leading_term{@_} = (1) x scalar(@_);
14006 sub set_comma_breakpoints_do {
14008 # Given a list with some commas, set breakpoints at some of the
14009 # commas, if necessary, to make it easy to read. This list is
14012 $depth, $i_opening_paren, $i_closing_paren,
14013 $item_count, $identifier_count, $rcomma_index,
14014 $next_nonblank_type, $list_type, $interrupted,
14015 $rdo_not_break_apart, $must_break_open,
14018 # nothing to do if no commas seen
14019 return if ( $item_count < 1 );
14020 my $i_first_comma = $$rcomma_index[0];
14021 my $i_true_last_comma = $$rcomma_index[ $item_count - 1 ];
14022 my $i_last_comma = $i_true_last_comma;
14023 if ( $i_last_comma >= $max_index_to_go ) {
14024 $i_last_comma = $$rcomma_index[ --$item_count - 1 ];
14025 return if ( $item_count < 1 );
14028 #---------------------------------------------------------------
14029 # find lengths of all items in the list to calculate page layout
14030 #---------------------------------------------------------------
14031 my $comma_count = $item_count;
14037 my @max_length = ( 0, 0 );
14038 my $first_term_length;
14039 my $i = $i_opening_paren;
14042 for ( my $j = 0 ; $j < $comma_count ; $j++ ) {
14043 $is_odd = 1 - $is_odd;
14044 $i_prev_plus = $i + 1;
14045 $i = $$rcomma_index[$j];
14048 ( $types_to_go[ $i - 1 ] eq 'b' ) ? $i - 2 : $i - 1;
14050 ( $types_to_go[$i_prev_plus] eq 'b' )
14053 push @i_term_begin, $i_term_begin;
14054 push @i_term_end, $i_term_end;
14055 push @i_term_comma, $i;
14057 # note: currently adding 2 to all lengths (for comma and space)
14059 2 + token_sequence_length( $i_term_begin, $i_term_end );
14060 push @item_lengths, $length;
14063 $first_term_length = $length;
14067 if ( $length > $max_length[$is_odd] ) {
14068 $max_length[$is_odd] = $length;
14073 # now we have to make a distinction between the comma count and item
14074 # count, because the item count will be one greater than the comma
14075 # count if the last item is not terminated with a comma
14077 ( $types_to_go[ $i_last_comma + 1 ] eq 'b' )
14078 ? $i_last_comma + 1
14081 ( $types_to_go[ $i_closing_paren - 1 ] eq 'b' )
14082 ? $i_closing_paren - 2
14083 : $i_closing_paren - 1;
14084 my $i_effective_last_comma = $i_last_comma;
14086 my $last_item_length = token_sequence_length( $i_b + 1, $i_e );
14088 if ( $last_item_length > 0 ) {
14090 # add 2 to length because other lengths include a comma and a blank
14091 $last_item_length += 2;
14092 push @item_lengths, $last_item_length;
14093 push @i_term_begin, $i_b + 1;
14094 push @i_term_end, $i_e;
14095 push @i_term_comma, undef;
14097 my $i_odd = $item_count % 2;
14099 if ( $last_item_length > $max_length[$i_odd] ) {
14100 $max_length[$i_odd] = $last_item_length;
14104 $i_effective_last_comma = $i_e + 1;
14106 if ( $types_to_go[ $i_b + 1 ] =~ /^[iR\]]$/ ) {
14107 $identifier_count++;
14111 #---------------------------------------------------------------
14112 # End of length calculations
14113 #---------------------------------------------------------------
14115 #---------------------------------------------------------------
14116 # Compound List Rule 1:
14117 # Break at (almost) every comma for a list containing a broken
14118 # sublist. This has higher priority than the Interrupted List
14120 #---------------------------------------------------------------
14121 if ( $has_broken_sublist[$depth] ) {
14123 # Break at every comma except for a comma between two
14124 # simple, small terms. This prevents long vertical
14125 # columns of, say, just 0's.
14126 my $small_length = 10; # 2 + actual maximum length wanted
14128 # We'll insert a break in long runs of small terms to
14129 # allow alignment in uniform tables.
14130 my $skipped_count = 0;
14131 my $columns = table_columns_available($i_first_comma);
14132 my $fields = int( $columns / $small_length );
14133 if ( $rOpts_maximum_fields_per_table
14134 && $fields > $rOpts_maximum_fields_per_table )
14136 $fields = $rOpts_maximum_fields_per_table;
14138 my $max_skipped_count = $fields - 1;
14140 my $is_simple_last_term = 0;
14141 my $is_simple_next_term = 0;
14142 foreach my $j ( 0 .. $item_count ) {
14143 $is_simple_last_term = $is_simple_next_term;
14144 $is_simple_next_term = 0;
14145 if ( $j < $item_count
14146 && $i_term_end[$j] == $i_term_begin[$j]
14147 && $item_lengths[$j] <= $small_length )
14149 $is_simple_next_term = 1;
14152 if ( $is_simple_last_term
14153 && $is_simple_next_term
14154 && $skipped_count < $max_skipped_count )
14159 $skipped_count = 0;
14160 my $i = $i_term_comma[ $j - 1 ];
14161 last unless defined $i;
14162 set_forced_breakpoint($i);
14166 # always break at the last comma if this list is
14167 # interrupted; we wouldn't want to leave a terminal '{', for
14169 if ($interrupted) { set_forced_breakpoint($i_true_last_comma) }
14173 #my ( $a, $b, $c ) = caller();
14174 #print "LISTX: in set_list $a $c interupt=$interrupted count=$item_count
14175 #i_first = $i_first_comma i_last=$i_last_comma max=$max_index_to_go\n";
14176 #print "depth=$depth has_broken=$has_broken_sublist[$depth] is_multi=$is_multiline opening_paren=($i_opening_paren) \n";
14178 #---------------------------------------------------------------
14179 # Interrupted List Rule:
14180 # A list is is forced to use old breakpoints if it was interrupted
14181 # by side comments or blank lines, or requested by user.
14182 #---------------------------------------------------------------
14183 if ( $rOpts_break_at_old_comma_breakpoints
14185 || $i_opening_paren < 0 )
14187 copy_old_breakpoints( $i_first_comma, $i_true_last_comma );
14191 #---------------------------------------------------------------
14192 # Looks like a list of items. We have to look at it and size it up.
14193 #---------------------------------------------------------------
14195 my $opening_token = $tokens_to_go[$i_opening_paren];
14196 my $opening_environment =
14197 $container_environment_to_go[$i_opening_paren];
14199 #-------------------------------------------------------------------
14200 # Return if this will fit on one line
14201 #-------------------------------------------------------------------
14203 my $i_opening_minus = find_token_starting_list($i_opening_paren);
14205 unless excess_line_length( $i_opening_minus, $i_closing_paren ) > 0;
14207 #-------------------------------------------------------------------
14208 # Now we know that this block spans multiple lines; we have to set
14209 # at least one breakpoint -- real or fake -- as a signal to break
14210 # open any outer containers.
14211 #-------------------------------------------------------------------
14212 set_fake_breakpoint();
14214 # be sure we do not extend beyond the current list length
14215 if ( $i_effective_last_comma >= $max_index_to_go ) {
14216 $i_effective_last_comma = $max_index_to_go - 1;
14219 # Set a flag indicating if we need to break open to keep -lp
14220 # items aligned. This is necessary if any of the list terms
14221 # exceeds the available space after the '('.
14222 my $need_lp_break_open = $must_break_open;
14223 if ( $rOpts_line_up_parentheses && !$must_break_open ) {
14224 my $columns_if_unbroken = $rOpts_maximum_line_length -
14225 total_line_length( $i_opening_minus, $i_opening_paren );
14226 $need_lp_break_open = ( $max_length[0] > $columns_if_unbroken )
14227 || ( $max_length[1] > $columns_if_unbroken )
14228 || ( $first_term_length > $columns_if_unbroken );
14231 # Specify if the list must have an even number of fields or not.
14232 # It is generally safest to assume an even number, because the
14233 # list items might be a hash list. But if we can be sure that
14234 # it is not a hash, then we can allow an odd number for more
14236 my $odd_or_even = 2; # 1 = odd field count ok, 2 = want even count
14238 if ( $identifier_count >= $item_count - 1
14239 || $is_assignment{$next_nonblank_type}
14240 || ( $list_type && $list_type ne '=>' && $list_type !~ /^[\:\?]$/ )
14246 # do we have a long first term which should be
14247 # left on a line by itself?
14248 my $use_separate_first_term = (
14249 $odd_or_even == 1 # only if we can use 1 field/line
14250 && $item_count > 3 # need several items
14251 && $first_term_length >
14252 2 * $max_length[0] - 2 # need long first term
14253 && $first_term_length >
14254 2 * $max_length[1] - 2 # need long first term
14257 # or do we know from the type of list that the first term should
14259 if ( !$use_separate_first_term ) {
14260 if ( $is_keyword_with_special_leading_term{$list_type} ) {
14261 $use_separate_first_term = 1;
14263 # should the container be broken open?
14264 if ( $item_count < 3 ) {
14265 if ( $i_first_comma - $i_opening_paren < 4 ) {
14266 $$rdo_not_break_apart = 1;
14269 elsif ($first_term_length < 20
14270 && $i_first_comma - $i_opening_paren < 4 )
14272 my $columns = table_columns_available($i_first_comma);
14273 if ( $first_term_length < $columns ) {
14274 $$rdo_not_break_apart = 1;
14281 if ($use_separate_first_term) {
14283 # ..set a break and update starting values
14284 $use_separate_first_term = 1;
14285 set_forced_breakpoint($i_first_comma);
14286 $i_opening_paren = $i_first_comma;
14287 $i_first_comma = $$rcomma_index[1];
14289 return if $comma_count == 1;
14290 shift @item_lengths;
14291 shift @i_term_begin;
14293 shift @i_term_comma;
14296 # if not, update the metrics to include the first term
14298 if ( $first_term_length > $max_length[0] ) {
14299 $max_length[0] = $first_term_length;
14303 # Field width parameters
14304 my $pair_width = ( $max_length[0] + $max_length[1] );
14306 ( $max_length[0] > $max_length[1] ) ? $max_length[0] : $max_length[1];
14308 # Number of free columns across the page width for laying out tables
14309 my $columns = table_columns_available($i_first_comma);
14311 # Estimated maximum number of fields which fit this space
14312 # This will be our first guess
14313 my $number_of_fields_max =
14314 maximum_number_of_fields( $columns, $odd_or_even, $max_width,
14316 my $number_of_fields = $number_of_fields_max;
14318 # Find the best-looking number of fields
14319 # and make this our second guess if possible
14320 my ( $number_of_fields_best, $ri_ragged_break_list,
14321 $new_identifier_count )
14322 = study_list_complexity( \@i_term_begin, \@i_term_end, \@item_lengths,
14325 if ( $number_of_fields_best != 0
14326 && $number_of_fields_best < $number_of_fields_max )
14328 $number_of_fields = $number_of_fields_best;
14331 # ----------------------------------------------------------------------
14332 # If we are crowded and the -lp option is being used, try to
14333 # undo some indentation
14334 # ----------------------------------------------------------------------
14336 $rOpts_line_up_parentheses
14338 $number_of_fields == 0
14339 || ( $number_of_fields == 1
14340 && $number_of_fields != $number_of_fields_best )
14344 my $available_spaces = get_AVAILABLE_SPACES_to_go($i_first_comma);
14345 if ( $available_spaces > 0 ) {
14347 my $spaces_wanted = $max_width - $columns; # for 1 field
14349 if ( $number_of_fields_best == 0 ) {
14350 $number_of_fields_best =
14351 get_maximum_fields_wanted( \@item_lengths );
14354 if ( $number_of_fields_best != 1 ) {
14355 my $spaces_wanted_2 =
14356 1 + $pair_width - $columns; # for 2 fields
14357 if ( $available_spaces > $spaces_wanted_2 ) {
14358 $spaces_wanted = $spaces_wanted_2;
14362 if ( $spaces_wanted > 0 ) {
14363 my $deleted_spaces =
14364 reduce_lp_indentation( $i_first_comma, $spaces_wanted );
14367 if ( $deleted_spaces > 0 ) {
14368 $columns = table_columns_available($i_first_comma);
14369 $number_of_fields_max =
14370 maximum_number_of_fields( $columns, $odd_or_even,
14371 $max_width, $pair_width );
14372 $number_of_fields = $number_of_fields_max;
14374 if ( $number_of_fields_best == 1
14375 && $number_of_fields >= 1 )
14377 $number_of_fields = $number_of_fields_best;
14384 # try for one column if two won't work
14385 if ( $number_of_fields <= 0 ) {
14386 $number_of_fields = int( $columns / $max_width );
14389 # The user can place an upper bound on the number of fields,
14390 # which can be useful for doing maintenance on tables
14391 if ( $rOpts_maximum_fields_per_table
14392 && $number_of_fields > $rOpts_maximum_fields_per_table )
14394 $number_of_fields = $rOpts_maximum_fields_per_table;
14397 # How many columns (characters) and lines would this container take
14398 # if no additional whitespace were added?
14399 my $packed_columns = token_sequence_length( $i_opening_paren + 1,
14400 $i_effective_last_comma + 1 );
14401 if ( $columns <= 0 ) { $columns = 1 } # avoid divide by zero
14402 my $packed_lines = 1 + int( $packed_columns / $columns );
14404 # are we an item contained in an outer list?
14405 my $in_hierarchical_list = $next_nonblank_type =~ /^[\}\,]$/;
14407 if ( $number_of_fields <= 0 ) {
14409 # #---------------------------------------------------------------
14410 # # We're in trouble. We can't find a single field width that works.
14411 # # There is no simple answer here; we may have a single long list
14413 # #---------------------------------------------------------------
14415 # In many cases, it may be best to not force a break if there is just one
14416 # comma, because the standard continuation break logic will do a better
14419 # In the common case that all but one of the terms can fit
14420 # on a single line, it may look better not to break open the
14421 # containing parens. Consider, for example
14425 # sort { $color_value{$::a} <=> $color_value{$::b}; }
14428 # which will look like this with the container broken:
14432 # sort { $color_value{$::a} <=> $color_value{$::b}; } keys %colors
14435 # Here is an example of this rule for a long last term:
14437 # log_message( 0, 256, 128,
14438 # "Number of routes in adj-RIB-in to be considered: $peercount" );
14440 # And here is an example with a long first term:
14443 # "%2d wallclock secs (%$f usr %$f sys + %$f cusr %$f csys = %$f CPU)",
14444 # $r, $pu, $ps, $cu, $cs, $tt
14446 # if $style eq 'all';
14448 my $i_last_comma = $$rcomma_index[ $comma_count - 1 ];
14449 my $long_last_term = excess_line_length( 0, $i_last_comma ) <= 0;
14450 my $long_first_term =
14451 excess_line_length( $i_first_comma + 1, $max_index_to_go ) <= 0;
14453 # break at every comma ...
14456 # if requested by user or is best looking
14457 $number_of_fields_best == 1
14459 # or if this is a sublist of a larger list
14460 || $in_hierarchical_list
14462 # or if multiple commas and we dont have a long first or last
14464 || ( $comma_count > 1
14465 && !( $long_last_term || $long_first_term ) )
14468 foreach ( 0 .. $comma_count - 1 ) {
14469 set_forced_breakpoint( $$rcomma_index[$_] );
14472 elsif ($long_last_term) {
14474 set_forced_breakpoint($i_last_comma);
14475 $$rdo_not_break_apart = 1 unless $must_break_open;
14477 elsif ($long_first_term) {
14479 set_forced_breakpoint($i_first_comma);
14483 # let breaks be defined by default bond strength logic
14488 # --------------------------------------------------------
14489 # We have a tentative field count that seems to work.
14490 # How many lines will this require?
14491 # --------------------------------------------------------
14492 my $formatted_lines = $item_count / ($number_of_fields);
14493 if ( $formatted_lines != int $formatted_lines ) {
14494 $formatted_lines = 1 + int $formatted_lines;
14497 # So far we've been trying to fill out to the right margin. But
14498 # compact tables are easier to read, so let's see if we can use fewer
14499 # fields without increasing the number of lines.
14500 $number_of_fields =
14501 compactify_table( $item_count, $number_of_fields, $formatted_lines,
14504 # How many spaces across the page will we fill?
14505 my $columns_per_line =
14506 ( int $number_of_fields / 2 ) * $pair_width +
14507 ( $number_of_fields % 2 ) * $max_width;
14509 my $formatted_columns;
14511 if ( $number_of_fields > 1 ) {
14512 $formatted_columns =
14513 ( $pair_width * ( int( $item_count / 2 ) ) +
14514 ( $item_count % 2 ) * $max_width );
14517 $formatted_columns = $max_width * $item_count;
14519 if ( $formatted_columns < $packed_columns ) {
14520 $formatted_columns = $packed_columns;
14523 my $unused_columns = $formatted_columns - $packed_columns;
14525 # set some empirical parameters to help decide if we should try to
14526 # align; high sparsity does not look good, especially with few lines
14527 my $sparsity = ($unused_columns) / ($formatted_columns);
14528 my $max_allowed_sparsity =
14529 ( $item_count < 3 ) ? 0.1
14530 : ( $packed_lines == 1 ) ? 0.15
14531 : ( $packed_lines == 2 ) ? 0.4
14534 # Begin check for shortcut methods, which avoid treating a list
14535 # as a table for relatively small parenthesized lists. These
14536 # are usually easier to read if not formatted as tables.
14538 $packed_lines <= 2 # probably can fit in 2 lines
14539 && $item_count < 9 # doesn't have too many items
14540 && $opening_environment eq 'BLOCK' # not a sub-container
14541 && $opening_token eq '(' # is paren list
14545 # Shortcut method 1: for -lp and just one comma:
14546 # This is a no-brainer, just break at the comma.
14548 $rOpts_line_up_parentheses # -lp
14549 && $item_count == 2 # two items, one comma
14550 && !$must_break_open
14553 my $i_break = $$rcomma_index[0];
14554 set_forced_breakpoint($i_break);
14555 $$rdo_not_break_apart = 1;
14556 set_non_alignment_flags( $comma_count, $rcomma_index );
14561 # method 2 is for most small ragged lists which might look
14562 # best if not displayed as a table.
14564 ( $number_of_fields == 2 && $item_count == 3 )
14566 $new_identifier_count > 0 # isn't all quotes
14567 && $sparsity > 0.15
14568 ) # would be fairly spaced gaps if aligned
14572 my $break_count = set_ragged_breakpoints( \@i_term_comma,
14573 $ri_ragged_break_list );
14574 ++$break_count if ($use_separate_first_term);
14576 # NOTE: we should really use the true break count here,
14577 # which can be greater if there are large terms and
14578 # little space, but usually this will work well enough.
14579 unless ($must_break_open) {
14581 if ( $break_count <= 1 ) {
14582 $$rdo_not_break_apart = 1;
14584 elsif ( $rOpts_line_up_parentheses && !$need_lp_break_open )
14586 $$rdo_not_break_apart = 1;
14589 set_non_alignment_flags( $comma_count, $rcomma_index );
14593 } # end shortcut methods
14597 FORMATTER_DEBUG_FLAG_SPARSE && do {
14599 "SPARSE:cols=$columns commas=$comma_count items:$item_count ids=$identifier_count pairwidth=$pair_width fields=$number_of_fields lines packed: $packed_lines packed_cols=$packed_columns fmtd:$formatted_lines cols /line:$columns_per_line unused:$unused_columns fmtd:$formatted_columns sparsity=$sparsity allow=$max_allowed_sparsity\n";
14603 #---------------------------------------------------------------
14604 # Compound List Rule 2:
14605 # If this list is too long for one line, and it is an item of a
14606 # larger list, then we must format it, regardless of sparsity
14607 # (ian.t). One reason that we have to do this is to trigger
14608 # Compound List Rule 1, above, which causes breaks at all commas of
14609 # all outer lists. In this way, the structure will be properly
14611 #---------------------------------------------------------------
14613 # Decide if this list is too long for one line unless broken
14614 my $total_columns = table_columns_available($i_opening_paren);
14615 my $too_long = $packed_columns > $total_columns;
14617 # For a paren list, include the length of the token just before the
14618 # '(' because this is likely a sub call, and we would have to
14619 # include the sub name on the same line as the list. This is still
14620 # imprecise, but not too bad. (steve.t)
14621 if ( !$too_long && $i_opening_paren > 0 && $opening_token eq '(' ) {
14623 $too_long = excess_line_length( $i_opening_minus,
14624 $i_effective_last_comma + 1 ) > 0;
14627 # FIXME: For an item after a '=>', try to include the length of the
14628 # thing before the '=>'. This is crude and should be improved by
14629 # actually looking back token by token.
14630 if ( !$too_long && $i_opening_paren > 0 && $list_type eq '=>' ) {
14631 my $i_opening_minus = $i_opening_paren - 4;
14632 if ( $i_opening_minus >= 0 ) {
14633 $too_long = excess_line_length( $i_opening_minus,
14634 $i_effective_last_comma + 1 ) > 0;
14638 # Always break lists contained in '[' and '{' if too long for 1 line,
14639 # and always break lists which are too long and part of a more complex
14641 my $must_break_open_container = $must_break_open
14643 && ( $in_hierarchical_list || $opening_token ne '(' ) );
14645 #print "LISTX: next=$next_nonblank_type avail cols=$columns packed=$packed_columns must format = $must_break_open_container too-long=$too_long opening=$opening_token list_type=$list_type formatted_lines=$formatted_lines packed=$packed_lines max_sparsity= $max_allowed_sparsity sparsity=$sparsity \n";
14647 #---------------------------------------------------------------
14648 # The main decision:
14649 # Now decide if we will align the data into aligned columns. Do not
14650 # attempt to align columns if this is a tiny table or it would be
14651 # too spaced. It seems that the more packed lines we have, the
14652 # sparser the list that can be allowed and still look ok.
14653 #---------------------------------------------------------------
14655 if ( ( $formatted_lines < 3 && $packed_lines < $formatted_lines )
14656 || ( $formatted_lines < 2 )
14657 || ( $unused_columns > $max_allowed_sparsity * $formatted_columns )
14661 #---------------------------------------------------------------
14662 # too sparse: would look ugly if aligned in a table;
14663 #---------------------------------------------------------------
14665 # use old breakpoints if this is a 'big' list
14666 # FIXME: goal is to improve set_ragged_breakpoints so that
14667 # this is not necessary.
14668 if ( $packed_lines > 2 && $item_count > 10 ) {
14669 write_logfile_entry("List sparse: using old breakpoints\n");
14670 copy_old_breakpoints( $i_first_comma, $i_last_comma );
14673 # let the continuation logic handle it if 2 lines
14676 my $break_count = set_ragged_breakpoints( \@i_term_comma,
14677 $ri_ragged_break_list );
14678 ++$break_count if ($use_separate_first_term);
14680 unless ($must_break_open_container) {
14681 if ( $break_count <= 1 ) {
14682 $$rdo_not_break_apart = 1;
14684 elsif ( $rOpts_line_up_parentheses && !$need_lp_break_open )
14686 $$rdo_not_break_apart = 1;
14689 set_non_alignment_flags( $comma_count, $rcomma_index );
14694 #---------------------------------------------------------------
14695 # go ahead and format as a table
14696 #---------------------------------------------------------------
14697 write_logfile_entry(
14698 "List: auto formatting with $number_of_fields fields/row\n");
14700 my $j_first_break =
14701 $use_separate_first_term ? $number_of_fields : $number_of_fields - 1;
14704 my $j = $j_first_break ;
14705 $j < $comma_count ;
14706 $j += $number_of_fields
14709 my $i = $$rcomma_index[$j];
14710 set_forced_breakpoint($i);
14716 sub set_non_alignment_flags {
14718 # set flag which indicates that these commas should not be
14720 my ( $comma_count, $rcomma_index ) = @_;
14721 foreach ( 0 .. $comma_count - 1 ) {
14722 $matching_token_to_go[ $$rcomma_index[$_] ] = 1;
14726 sub study_list_complexity {
14728 # Look for complex tables which should be formatted with one term per line.
14729 # Returns the following:
14731 # \@i_ragged_break_list = list of good breakpoints to avoid lines
14732 # which are hard to read
14733 # $number_of_fields_best = suggested number of fields based on
14734 # complexity; = 0 if any number may be used.
14736 my ( $ri_term_begin, $ri_term_end, $ritem_lengths, $max_width ) = @_;
14737 my $item_count = @{$ri_term_begin};
14738 my $complex_item_count = 0;
14739 my $number_of_fields_best = $rOpts_maximum_fields_per_table;
14740 my $i_max = @{$ritem_lengths} - 1;
14741 ##my @item_complexity;
14743 my $i_last_last_break = -3;
14744 my $i_last_break = -2;
14745 my @i_ragged_break_list;
14747 my $definitely_complex = 30;
14748 my $definitely_simple = 12;
14749 my $quote_count = 0;
14751 for my $i ( 0 .. $i_max ) {
14752 my $ib = $ri_term_begin->[$i];
14753 my $ie = $ri_term_end->[$i];
14755 # define complexity: start with the actual term length
14756 my $weighted_length = ( $ritem_lengths->[$i] - 2 );
14758 ##TBD: join types here and check for variations
14759 ##my $str=join "", @tokens_to_go[$ib..$ie];
14762 if ( $types_to_go[$ib] =~ /^[qQ]$/ ) {
14766 elsif ( $types_to_go[$ib] =~ /^[w\-]$/ ) {
14770 if ( $ib eq $ie ) {
14771 if ( $is_quote && $tokens_to_go[$ib] =~ /\s/ ) {
14772 $complex_item_count++;
14773 $weighted_length *= 2;
14779 if ( grep { $_ eq 'b' } @types_to_go[ $ib .. $ie ] ) {
14780 $complex_item_count++;
14781 $weighted_length *= 2;
14783 if ( grep { $_ eq '..' } @types_to_go[ $ib .. $ie ] ) {
14784 $weighted_length += 4;
14788 # add weight for extra tokens.
14789 $weighted_length += 2 * ( $ie - $ib );
14791 ## my $BUB = join '', @tokens_to_go[$ib..$ie];
14792 ## print "# COMPLEXITY:$weighted_length $BUB\n";
14794 ##push @item_complexity, $weighted_length;
14796 # now mark a ragged break after this item it if it is 'long and
14798 if ( $weighted_length >= $definitely_complex ) {
14800 # if we broke after the previous term
14801 # then break before it too
14802 if ( $i_last_break == $i - 1
14804 && $i_last_last_break != $i - 2 )
14807 ## FIXME: don't strand a small term
14808 pop @i_ragged_break_list;
14809 push @i_ragged_break_list, $i - 2;
14810 push @i_ragged_break_list, $i - 1;
14813 push @i_ragged_break_list, $i;
14814 $i_last_last_break = $i_last_break;
14815 $i_last_break = $i;
14818 # don't break before a small last term -- it will
14819 # not look good on a line by itself.
14820 elsif ($i == $i_max
14821 && $i_last_break == $i - 1
14822 && $weighted_length <= $definitely_simple )
14824 pop @i_ragged_break_list;
14828 my $identifier_count = $i_max + 1 - $quote_count;
14830 # Need more tuning here..
14831 if ( $max_width > 12
14832 && $complex_item_count > $item_count / 2
14833 && $number_of_fields_best != 2 )
14835 $number_of_fields_best = 1;
14838 return ( $number_of_fields_best, \@i_ragged_break_list, $identifier_count );
14841 sub get_maximum_fields_wanted {
14843 # Not all tables look good with more than one field of items.
14844 # This routine looks at a table and decides if it should be
14845 # formatted with just one field or not.
14846 # This coding is still under development.
14847 my ($ritem_lengths) = @_;
14849 my $number_of_fields_best = 0;
14851 # For just a few items, we tentatively assume just 1 field.
14852 my $item_count = @{$ritem_lengths};
14853 if ( $item_count <= 5 ) {
14854 $number_of_fields_best = 1;
14857 # For larger tables, look at it both ways and see what looks best
14861 my @max_length = ( 0, 0 );
14862 my @last_length_2 = ( undef, undef );
14863 my @first_length_2 = ( undef, undef );
14864 my $last_length = undef;
14865 my $total_variation_1 = 0;
14866 my $total_variation_2 = 0;
14867 my @total_variation_2 = ( 0, 0 );
14868 for ( my $j = 0 ; $j < $item_count ; $j++ ) {
14870 $is_odd = 1 - $is_odd;
14871 my $length = $ritem_lengths->[$j];
14872 if ( $length > $max_length[$is_odd] ) {
14873 $max_length[$is_odd] = $length;
14876 if ( defined($last_length) ) {
14877 my $dl = abs( $length - $last_length );
14878 $total_variation_1 += $dl;
14880 $last_length = $length;
14882 my $ll = $last_length_2[$is_odd];
14883 if ( defined($ll) ) {
14884 my $dl = abs( $length - $ll );
14885 $total_variation_2[$is_odd] += $dl;
14888 $first_length_2[$is_odd] = $length;
14890 $last_length_2[$is_odd] = $length;
14892 $total_variation_2 = $total_variation_2[0] + $total_variation_2[1];
14894 my $factor = ( $item_count > 10 ) ? 1 : ( $item_count > 5 ) ? 0.75 : 0;
14895 unless ( $total_variation_2 < $factor * $total_variation_1 ) {
14896 $number_of_fields_best = 1;
14899 return ($number_of_fields_best);
14902 sub table_columns_available {
14903 my $i_first_comma = shift;
14905 $rOpts_maximum_line_length - leading_spaces_to_go($i_first_comma);
14907 # Patch: the vertical formatter does not line up lines whose lengths
14908 # exactly equal the available line length because of allowances
14909 # that must be made for side comments. Therefore, the number of
14910 # available columns is reduced by 1 character.
14915 sub maximum_number_of_fields {
14917 # how many fields will fit in the available space?
14918 my ( $columns, $odd_or_even, $max_width, $pair_width ) = @_;
14919 my $max_pairs = int( $columns / $pair_width );
14920 my $number_of_fields = $max_pairs * 2;
14921 if ( $odd_or_even == 1
14922 && $max_pairs * $pair_width + $max_width <= $columns )
14924 $number_of_fields++;
14926 return $number_of_fields;
14929 sub compactify_table {
14931 # given a table with a certain number of fields and a certain number
14932 # of lines, see if reducing the number of fields will make it look
14934 my ( $item_count, $number_of_fields, $formatted_lines, $odd_or_even ) = @_;
14935 if ( $number_of_fields >= $odd_or_even * 2 && $formatted_lines > 0 ) {
14939 $min_fields = $number_of_fields ;
14940 $min_fields >= $odd_or_even
14941 && $min_fields * $formatted_lines >= $item_count ;
14942 $min_fields -= $odd_or_even
14945 $number_of_fields = $min_fields;
14948 return $number_of_fields;
14951 sub set_ragged_breakpoints {
14953 # Set breakpoints in a list that cannot be formatted nicely as a
14955 my ( $ri_term_comma, $ri_ragged_break_list ) = @_;
14957 my $break_count = 0;
14958 foreach (@$ri_ragged_break_list) {
14959 my $j = $ri_term_comma->[$_];
14961 set_forced_breakpoint($j);
14965 return $break_count;
14968 sub copy_old_breakpoints {
14969 my ( $i_first_comma, $i_last_comma ) = @_;
14970 for my $i ( $i_first_comma .. $i_last_comma ) {
14971 if ( $old_breakpoint_to_go[$i] ) {
14972 set_forced_breakpoint($i);
14978 my ( $i, $j ) = @_;
14979 if ( $i >= 0 && $i <= $j && $j <= $max_index_to_go ) {
14981 FORMATTER_DEBUG_FLAG_NOBREAK && do {
14982 my ( $a, $b, $c ) = caller();
14984 "NOBREAK: forced_breakpoint $forced_breakpoint_count from $a $c with i=$i max=$max_index_to_go type=$types_to_go[$i]\n"
14988 @nobreak_to_go[ $i .. $j ] = (1) x ( $j - $i + 1 );
14991 # shouldn't happen; non-critical error
14993 FORMATTER_DEBUG_FLAG_NOBREAK && do {
14994 my ( $a, $b, $c ) = caller();
14996 "NOBREAK ERROR: from $a $c with i=$i j=$j max=$max_index_to_go\n"
15002 sub set_fake_breakpoint {
15004 # Just bump up the breakpoint count as a signal that there are breaks.
15005 # This is useful if we have breaks but may want to postpone deciding where
15007 $forced_breakpoint_count++;
15010 sub set_forced_breakpoint {
15013 return unless defined $i && $i >= 0;
15015 # when called with certain tokens, use bond strengths to decide
15016 # if we break before or after it
15017 my $token = $tokens_to_go[$i];
15019 if ( $token =~ /^([\=\.\,\:\?]|and|or|xor|&&|\|\|)$/ ) {
15020 if ( $want_break_before{$token} && $i >= 0 ) { $i-- }
15023 # breaks are forced before 'if' and 'unless'
15024 elsif ( $is_if_unless{$token} ) { $i-- }
15026 if ( $i >= 0 && $i <= $max_index_to_go ) {
15027 my $i_nonblank = ( $types_to_go[$i] ne 'b' ) ? $i : $i - 1;
15029 FORMATTER_DEBUG_FLAG_FORCE && do {
15030 my ( $a, $b, $c ) = caller();
15032 "FORCE forced_breakpoint $forced_breakpoint_count from $a $c with i=$i_nonblank max=$max_index_to_go tok=$tokens_to_go[$i_nonblank] type=$types_to_go[$i_nonblank] nobr=$nobreak_to_go[$i_nonblank]\n";
15035 if ( $i_nonblank >= 0 && $nobreak_to_go[$i_nonblank] == 0 ) {
15036 $forced_breakpoint_to_go[$i_nonblank] = 1;
15038 if ( $i_nonblank > $index_max_forced_break ) {
15039 $index_max_forced_break = $i_nonblank;
15041 $forced_breakpoint_count++;
15042 $forced_breakpoint_undo_stack[ $forced_breakpoint_undo_count++ ] =
15045 # if we break at an opening container..break at the closing
15046 if ( $tokens_to_go[$i_nonblank] =~ /^[\{\[\(\?]$/ ) {
15047 set_closing_breakpoint($i_nonblank);
15053 sub clear_breakpoint_undo_stack {
15054 $forced_breakpoint_undo_count = 0;
15057 sub undo_forced_breakpoint_stack {
15059 my $i_start = shift;
15060 if ( $i_start < 0 ) {
15062 my ( $a, $b, $c ) = caller();
15064 "Program Bug: undo_forced_breakpoint_stack from $a $c has i=$i_start "
15068 while ( $forced_breakpoint_undo_count > $i_start ) {
15070 $forced_breakpoint_undo_stack[ --$forced_breakpoint_undo_count ];
15071 if ( $i >= 0 && $i <= $max_index_to_go ) {
15072 $forced_breakpoint_to_go[$i] = 0;
15073 $forced_breakpoint_count--;
15075 FORMATTER_DEBUG_FLAG_UNDOBP && do {
15076 my ( $a, $b, $c ) = caller();
15078 "UNDOBP: undo forced_breakpoint i=$i $forced_breakpoint_undo_count from $a $c max=$max_index_to_go\n"
15083 # shouldn't happen, but not a critical error
15085 FORMATTER_DEBUG_FLAG_UNDOBP && do {
15086 my ( $a, $b, $c ) = caller();
15088 "Program Bug: undo_forced_breakpoint from $a $c has i=$i but max=$max_index_to_go"
15095 sub recombine_breakpoints {
15097 # sub set_continuation_breaks is very liberal in setting line breaks
15098 # for long lines, always setting breaks at good breakpoints, even
15099 # when that creates small lines. Occasionally small line fragments
15100 # are produced which would look better if they were combined.
15101 # That's the task of this routine, recombine_breakpoints.
15102 my ( $ri_first, $ri_last ) = @_;
15103 my $more_to_do = 1;
15105 # We keep looping over all of the lines of this batch
15106 # until there are no more possible recombinations
15107 my $nmax_last = @$ri_last;
15108 while ($more_to_do) {
15112 my $nmax = @$ri_last - 1;
15114 # safety check for infinite loop
15115 unless ( $nmax < $nmax_last ) {
15117 # shouldn't happen because splice below decreases nmax on each pass:
15118 # but i get paranoid sometimes
15119 die "Program bug-infinite loop in recombine breakpoints\n";
15121 $nmax_last = $nmax;
15123 my $previous_outdentable_closing_paren;
15124 my $leading_amp_count = 0;
15125 my $this_line_is_semicolon_terminated;
15127 # loop over all remaining lines in this batch
15128 for $n ( 1 .. $nmax ) {
15130 #----------------------------------------------------------
15131 # If we join the current pair of lines,
15132 # line $n-1 will become the left part of the joined line
15133 # line $n will become the right part of the joined line
15135 # Here are Indexes of the endpoint tokens of the two lines:
15137 # ---left---- | ---right---
15138 # $if $imid | $imidr $il
15140 # We want to decide if we should join tokens $imid to $imidr
15142 # We will apply a number of ad-hoc tests to see if joining
15143 # here will look ok. The code will just issue a 'next'
15144 # command if the join doesn't look good. If we get through
15145 # the gauntlet of tests, the lines will be recombined.
15146 #----------------------------------------------------------
15147 my $if = $$ri_first[ $n - 1 ];
15148 my $il = $$ri_last[$n];
15149 my $imid = $$ri_last[ $n - 1 ];
15150 my $imidr = $$ri_first[$n];
15152 #my $depth_increase=( $nesting_depth_to_go[$imidr] -
15153 # $nesting_depth_to_go[$if] );
15155 ##print "RECOMBINE: n=$n imid=$imid if=$if type=$types_to_go[$if] =$tokens_to_go[$if] next_type=$types_to_go[$imidr] next_tok=$tokens_to_go[$imidr]\n";
15157 # If line $n is the last line, we set some flags and
15158 # do any special checks for it
15159 if ( $n == $nmax ) {
15161 # a terminal '{' should stay where it is
15162 next if $types_to_go[$imidr] eq '{';
15164 # set flag if statement $n ends in ';'
15165 $this_line_is_semicolon_terminated = $types_to_go[$il] eq ';'
15167 # with possible side comment
15168 || ( $types_to_go[$il] eq '#'
15169 && $il - $imidr >= 2
15170 && $types_to_go[ $il - 2 ] eq ';'
15171 && $types_to_go[ $il - 1 ] eq 'b' );
15174 #----------------------------------------------------------
15175 # Section 1: examine token at $imid (right end of first line
15177 #----------------------------------------------------------
15179 # an isolated '}' may join with a ';' terminated segment
15180 if ( $types_to_go[$imid] eq '}' ) {
15182 # Check for cases where combining a semicolon terminated
15183 # statement with a previous isolated closing paren will
15184 # allow the combined line to be outdented. This is
15185 # generally a good move. For example, we can join up
15186 # the last two lines here:
15188 # $dev, $ino, $mode, $nlink, $uid, $gid, $rdev,
15189 # $size, $atime, $mtime, $ctime, $blksize, $blocks
15195 # $dev, $ino, $mode, $nlink, $uid, $gid, $rdev,
15196 # $size, $atime, $mtime, $ctime, $blksize, $blocks
15199 # which makes the parens line up.
15201 # Another example, from Joe Matarazzo, probably looks best
15202 # with the 'or' clause appended to the trailing paren:
15203 # $self->some_method(
15206 # ) or die "Some_method didn't work";
15208 $previous_outdentable_closing_paren =
15209 $this_line_is_semicolon_terminated # ends in ';'
15210 && $if == $imid # only one token on last line
15211 && $tokens_to_go[$imid] eq ')' # must be structural paren
15213 # only &&, ||, and : if no others seen
15214 # (but note: our count made below could be wrong
15215 # due to intervening comments)
15216 && ( $leading_amp_count == 0
15217 || $types_to_go[$imidr] !~ /^(:|\&\&|\|\|)$/ )
15219 # but leading colons probably line up with with a
15220 # previous colon or question (count could be wrong).
15221 && $types_to_go[$imidr] ne ':'
15223 # only one step in depth allowed. this line must not
15224 # begin with a ')' itself.
15225 && ( $nesting_depth_to_go[$imid] ==
15226 $nesting_depth_to_go[$il] + 1 );
15230 $previous_outdentable_closing_paren
15232 # handle '.' and '?' specially below
15233 || ( $types_to_go[$imidr] =~ /^[\.\?]$/ )
15237 # do not recombine lines with ending &&, ||, or :
15238 elsif ( $types_to_go[$imid] =~ /^(|:|\&\&|\|\|)$/ ) {
15239 next unless $want_break_before{ $types_to_go[$imid] };
15242 # for lines ending in a comma...
15243 elsif ( $types_to_go[$imid] eq ',' ) {
15245 # an isolated '},' may join with an identifier + ';'
15246 # this is useful for the class of a 'bless' statement (bless.t)
15247 if ( $types_to_go[$if] eq '}'
15248 && $types_to_go[$imidr] eq 'i' )
15251 unless ( ( $if == ( $imid - 1 ) )
15252 && ( $il == ( $imidr + 1 ) )
15253 && $this_line_is_semicolon_terminated );
15255 # override breakpoint
15256 $forced_breakpoint_to_go[$imid] = 0;
15259 # but otherwise, do not recombine unless this will leave
15262 next unless ( $n + 1 >= $nmax );
15267 elsif ( $types_to_go[$imid] eq '(' ) {
15269 # No longer doing this
15272 elsif ( $types_to_go[$imid] eq ')' ) {
15274 # No longer doing this
15277 # keep a terminal colon
15278 elsif ( $types_to_go[$imid] eq ':' ) {
15282 # keep a terminal for-semicolon
15283 elsif ( $types_to_go[$imid] eq 'f' ) {
15287 # if '=' at end of line ...
15288 elsif ( $is_assignment{ $types_to_go[$imid] } ) {
15290 my $is_short_quote =
15291 ( $types_to_go[$imidr] eq 'Q'
15293 && length( $tokens_to_go[$imidr] ) <
15294 $rOpts_short_concatenation_item_length );
15295 my $ifnmax = $$ri_first[$nmax];
15296 my $ifnp = ( $nmax > $n ) ? $$ri_first[ $n + 1 ] : $ifnmax;
15298 ( $types_to_go[$if] eq '?' && $types_to_go[$ifnp] eq ':' );
15300 # always join an isolated '=', a short quote, or if this
15301 # will put ?/: at start of adjacent lines
15303 && !$is_short_quote
15310 # unless we can reduce this to two lines
15313 # or three lines, the last with a leading semicolon
15314 || ( $nmax == $n + 2
15315 && $types_to_go[$ifnmax] eq ';' )
15317 # or the next line ends with a here doc
15318 || $types_to_go[$il] eq 'h'
15321 # do not recombine if the two lines might align well
15322 # this is a very approximate test for this
15323 && $types_to_go[$imidr] ne $types_to_go[$ifnp]
15326 # -lp users often prefer this:
15327 # my $title = function($env, $env, $sysarea,
15328 # "bubba Borrower Entry");
15329 # so we will recombine if -lp is used we have ending comma
15330 if ( !$rOpts_line_up_parentheses
15331 || $types_to_go[$il] ne ',' )
15334 # otherwise, scan the rhs line up to last token for
15335 # complexity. Note that we are not counting the last
15336 # token in case it is an opening paren.
15338 my $depth = $nesting_depth_to_go[$imidr];
15339 for ( my $i = $imidr + 1 ; $i < $il ; $i++ ) {
15340 if ( $nesting_depth_to_go[$i] != $depth ) {
15342 last if ( $tv > 1 );
15344 $depth = $nesting_depth_to_go[$i];
15347 # ok to recombine if no level changes before last token
15350 # otherwise, do not recombine if more than two
15352 next if ( $tv > 1 );
15354 # check total complexity of the two adjacent lines
15355 # that will occur if we do this join
15357 ( $n < $nmax ) ? $$ri_last[ $n + 1 ] : $il;
15358 for ( my $i = $il ; $i <= $istop ; $i++ ) {
15359 if ( $nesting_depth_to_go[$i] != $depth ) {
15361 last if ( $tv > 2 );
15363 $depth = $nesting_depth_to_go[$i];
15366 # do not recombine if total is more than 2 level changes
15367 next if ( $tv > 2 );
15372 unless ( $tokens_to_go[$imidr] =~ /^[\{\(\[]$/ ) {
15373 $forced_breakpoint_to_go[$imid] = 0;
15378 elsif ( $types_to_go[$imid] eq 'k' ) {
15380 # make major control keywords stand out
15385 #/^(last|next|redo|return)$/
15386 $is_last_next_redo_return{ $tokens_to_go[$imid] }
15388 # but only if followed by multiple lines
15392 if ( $is_and_or{ $tokens_to_go[$imid] } ) {
15393 next unless $want_break_before{ $tokens_to_go[$imid] };
15397 # handle trailing + - * /
15398 elsif ( $types_to_go[$imid] =~ /^[\+\-\*\/]$/ ) {
15399 my $i_next_nonblank = $imidr;
15400 my $i_next_next = $i_next_nonblank + 1;
15401 $i_next_next++ if ( $types_to_go[$i_next_next] eq 'b' );
15403 # do not strand numbers
15406 $types_to_go[$i_next_nonblank] eq 'n'
15408 $i_next_nonblank == $il
15409 || ( $i_next_next == $il
15410 && $types_to_go[$i_next_next] =~ /^[\+\-\*\/]$/ )
15411 || $types_to_go[$i_next_next] eq ';'
15416 #----------------------------------------------------------
15417 # Section 2: Now examine token at $imidr (left end of second
15419 #----------------------------------------------------------
15421 # join lines identified above as capable of
15422 # causing an outdented line with leading closing paren
15423 if ($previous_outdentable_closing_paren) {
15424 $forced_breakpoint_to_go[$imid] = 0;
15427 # do not recombine lines with leading &&, ||, or :
15428 elsif ( $types_to_go[$imidr] =~ /^(:|\&\&|\|\|)$/ ) {
15429 $leading_amp_count++;
15430 next if $want_break_before{ $types_to_go[$imidr] };
15433 # Identify and recombine a broken ?/: chain
15434 elsif ( $types_to_go[$imidr] eq '?' ) {
15436 # indexes of line first tokens --
15437 # mm - line before previous line
15438 # f - previous line
15441 # fff - line after next
15442 my $iff = $n < $nmax ? $$ri_first[ $n + 1 ] : -1;
15443 my $ifff = $n + 2 <= $nmax ? $$ri_first[ $n + 2 ] : -1;
15444 my $imm = $n > 1 ? $$ri_first[ $n - 2 ] : -1;
15445 my $seqno = $type_sequence_to_go[$imidr];
15447 ( $types_to_go[$if] eq ':'
15448 && $type_sequence_to_go[$if] ==
15449 $seqno - TYPE_SEQUENCE_INCREMENT );
15452 && $types_to_go[$imm] eq ':'
15453 && $type_sequence_to_go[$imm] ==
15454 $seqno - 2 * TYPE_SEQUENCE_INCREMENT );
15458 && $types_to_go[$iff] eq ':'
15459 && $type_sequence_to_go[$iff] == $seqno );
15462 && $types_to_go[$ifff] eq ':'
15463 && $type_sequence_to_go[$ifff] ==
15464 $seqno + TYPE_SEQUENCE_INCREMENT );
15466 # we require that this '?' be part of a correct sequence
15467 # of 3 in a row or else no recombination is done.
15469 unless ( ( $ff_ok || $mm_ok ) && ( $f_ok || $fff_ok ) );
15470 $forced_breakpoint_to_go[$imid] = 0;
15473 # do not recombine lines with leading '.'
15474 elsif ( $types_to_go[$imidr] =~ /^(\.)$/ ) {
15475 my $i_next_nonblank = $imidr + 1;
15476 if ( $types_to_go[$i_next_nonblank] eq 'b' ) {
15477 $i_next_nonblank++;
15483 # ... unless there is just one and we can reduce
15484 # this to two lines if we do. For example, this
15488 # '($dummy, $pat) = &get_next_tex_cmd;' . '$args .= $pat;'
15490 # looks better than this:
15491 # $bodyA .= '($dummy, $pat) = &get_next_tex_cmd;'
15492 # . '$args .= $pat;'
15497 && $types_to_go[$if] ne $types_to_go[$imidr]
15500 # ... or this would strand a short quote , like this
15501 # . "some long qoute"
15504 || ( $types_to_go[$i_next_nonblank] eq 'Q'
15505 && $i_next_nonblank >= $il - 1
15506 && length( $tokens_to_go[$i_next_nonblank] ) <
15507 $rOpts_short_concatenation_item_length )
15511 # handle leading keyword..
15512 elsif ( $types_to_go[$imidr] eq 'k' ) {
15514 # handle leading "and" and "or"
15515 if ( $is_and_or{ $tokens_to_go[$imidr] } ) {
15517 # Decide if we will combine a single terminal 'and' and
15518 # 'or' after an 'if' or 'unless'. We should consider the
15519 # possible vertical alignment, and visual clutter.
15521 # This looks best with the 'and' on the same
15522 # line as the 'if':
15525 # if $seconds and $nu < 2;
15527 # But this looks better as shown:
15530 # if !$this->{Parents}{$_}
15531 # or $this->{Parents}{$_} eq $_;
15533 # Eventually, it would be nice to look for
15534 # similarities (such as 'this' or 'Parents'), but
15535 # for now I'm using a simple rule that says that
15536 # the resulting line length must not be more than
15537 # half the maximum line length (making it 80/2 =
15538 # 40 characters by default).
15541 $this_line_is_semicolon_terminated
15544 # following 'if' or 'unless'
15545 $types_to_go[$if] eq 'k'
15546 && $is_if_unless{ $tokens_to_go[$if] }
15552 # handle leading "if" and "unless"
15553 elsif ( $is_if_unless{ $tokens_to_go[$imidr] } ) {
15555 # FIXME: This is still experimental..may not be too useful
15558 $this_line_is_semicolon_terminated
15560 # previous line begins with 'and' or 'or'
15561 && $types_to_go[$if] eq 'k'
15562 && $is_and_or{ $tokens_to_go[$if] }
15567 # handle all other leading keywords
15570 # keywords look best at start of lines,
15571 # but combine things like "1 while"
15572 unless ( $is_assignment{ $types_to_go[$imid] } ) {
15574 if ( ( $types_to_go[$imid] ne 'k' )
15575 && ( $tokens_to_go[$imidr] ne 'while' ) );
15580 # similar treatment of && and || as above for 'and' and 'or':
15581 # NOTE: This block of code is currently bypassed because
15582 # of a previous block but is retained for possible future use.
15583 elsif ( $types_to_go[$imidr] =~ /^(&&|\|\|)$/ ) {
15585 # maybe looking at something like:
15586 # unless $TEXTONLY || $item =~ m%</?(hr>|p>|a|img)%i;
15590 $this_line_is_semicolon_terminated
15592 # previous line begins with an 'if' or 'unless' keyword
15593 && $types_to_go[$if] eq 'k'
15594 && $is_if_unless{ $tokens_to_go[$if] }
15599 # handle leading + - * /
15600 elsif ( $types_to_go[$imidr] =~ /^[\+\-\*\/]$/ ) {
15601 my $i_next_nonblank = $imidr + 1;
15602 if ( $types_to_go[$i_next_nonblank] eq 'b' ) {
15603 $i_next_nonblank++;
15606 my $i_next_next = $i_next_nonblank + 1;
15607 $i_next_next++ if ( $types_to_go[$i_next_next] eq 'b' );
15612 # unless there is just one and we can reduce
15613 # this to two lines if we do. For example, this
15617 && $types_to_go[$if] ne $types_to_go[$imidr]
15620 # do not strand numbers
15622 $types_to_go[$i_next_nonblank] eq 'n'
15623 && ( $i_next_nonblank >= $il - 1
15624 || $types_to_go[$i_next_next] eq ';' )
15629 # handle line with leading = or similar
15630 elsif ( $is_assignment{ $types_to_go[$imidr] } ) {
15631 next unless $n == 1;
15632 my $ifnmax = $$ri_first[$nmax];
15636 # unless we can reduce this to two lines
15639 # or three lines, the last with a leading semicolon
15640 || ( $nmax == 3 && $types_to_go[$ifnmax] eq ';' )
15642 # or the next line ends with a here doc
15643 || $types_to_go[$il] eq 'h'
15647 #----------------------------------------------------------
15649 # Combine the lines if we arrive here and it is possible
15650 #----------------------------------------------------------
15652 # honor hard breakpoints
15653 next if ( $forced_breakpoint_to_go[$imid] > 0 );
15655 my $bs = $bond_strength_to_go[$imid];
15657 # combined line cannot be too long
15659 if excess_line_length( $if, $il ) > 0;
15661 # do not recombine if we would skip in indentation levels
15662 if ( $n < $nmax ) {
15663 my $if_next = $$ri_first[ $n + 1 ];
15666 $levels_to_go[$if] < $levels_to_go[$imidr]
15667 && $levels_to_go[$imidr] < $levels_to_go[$if_next]
15669 # but an isolated 'if (' is undesirable
15672 && $imid - $if <= 2
15673 && $types_to_go[$if] eq 'k'
15674 && $tokens_to_go[$if] eq 'if'
15675 && $tokens_to_go[$imid] ne '('
15681 next if ( $bs == NO_BREAK );
15683 # remember the pair with the greatest bond strength
15690 if ( $bs > $bs_best ) {
15695 # we have 2 or more candidates, so need another pass
15700 # recombine the pair with the greatest bond strength
15702 splice @$ri_first, $n_best, 1;
15703 splice @$ri_last, $n_best - 1, 1;
15706 return ( $ri_first, $ri_last );
15709 sub break_all_chain_tokens {
15711 # scan the current breakpoints looking for breaks at certain "chain
15712 # operators" (. : && || + etc) which often occur repeatedly in a long
15713 # statement. If we see a break at any one, break at all similar tokens
15714 # within the same container.
15717 # does not handle nested ?: operators correctly
15718 # coordinate better with ?: logic in set_continuation_breaks
15719 my ( $ri_left, $ri_right ) = @_;
15721 my %saw_chain_type;
15722 my %left_chain_type;
15723 my %right_chain_type;
15724 my %interior_chain_type;
15725 my $nmax = @$ri_right - 1;
15727 # scan the left and right end tokens of all lines
15729 for my $n ( 0 .. $nmax ) {
15730 my $il = $$ri_left[$n];
15731 my $ir = $$ri_right[$n];
15732 my $typel = $types_to_go[$il];
15733 my $typer = $types_to_go[$ir];
15734 $typel = '+' if ( $typel eq '-' ); # treat + and - the same
15735 $typer = '+' if ( $typer eq '-' );
15736 $typel = '*' if ( $typel eq '/' ); # treat * and / the same
15737 $typer = '*' if ( $typer eq '/' );
15738 my $tokenl = $tokens_to_go[$il];
15739 my $tokenr = $tokens_to_go[$ir];
15741 if ( $is_chain_operator{$tokenl} && $want_break_before{$typel} ) {
15742 next if ( $typel eq '?' );
15743 push @{ $left_chain_type{$typel} }, $il;
15744 $saw_chain_type{$typel} = 1;
15747 if ( $is_chain_operator{$tokenr} && !$want_break_before{$typer} ) {
15748 next if ( $typer eq '?' );
15749 push @{ $right_chain_type{$typer} }, $ir;
15750 $saw_chain_type{$typer} = 1;
15754 return unless $count;
15756 # now look for any interior tokens of the same types
15758 for my $n ( 0 .. $nmax ) {
15759 my $il = $$ri_left[$n];
15760 my $ir = $$ri_right[$n];
15761 for ( my $i = $il + 1 ; $i < $ir ; $i++ ) {
15762 my $type = $types_to_go[$i];
15763 $type = '+' if ( $type eq '-' );
15764 $type = '*' if ( $type eq '/' );
15765 if ( $saw_chain_type{$type} ) {
15766 push @{ $interior_chain_type{$type} }, $i;
15771 return unless $count;
15773 # now make a list of all new break points
15776 # loop over all chain types
15777 foreach my $type ( keys %saw_chain_type ) {
15779 # quit if just ONE continuation line with leading . For example--
15780 # print LATEXFILE '\framebox{\parbox[c][' . $h . '][t]{' . $w . '}{'
15782 last if ( $nmax == 1 && $type =~ /^[\.\+]$/ );
15784 # loop over all interior chain tokens
15785 foreach my $itest ( @{ $interior_chain_type{$type} } ) {
15787 # loop over all left end tokens of same type
15788 if ( $left_chain_type{$type} ) {
15789 next if $nobreak_to_go[ $itest - 1 ];
15790 foreach my $i ( @{ $left_chain_type{$type} } ) {
15791 next unless in_same_container( $i, $itest );
15792 push @insert_list, $itest - 1;
15797 # loop over all right end tokens of same type
15798 if ( $right_chain_type{$type} ) {
15799 next if $nobreak_to_go[$itest];
15800 foreach my $i ( @{ $right_chain_type{$type} } ) {
15801 next unless in_same_container( $i, $itest );
15802 push @insert_list, $itest;
15809 # insert any new break points
15810 if (@insert_list) {
15811 insert_additional_breaks( \@insert_list, $ri_left, $ri_right );
15815 sub in_same_container {
15817 # check to see if tokens at i1 and i2 are in the
15818 # same container, and not separated by a comma, ? or :
15819 my ( $i1, $i2 ) = @_;
15820 my $type = $types_to_go[$i1];
15821 my $depth = $nesting_depth_to_go[$i1];
15822 return unless ( $nesting_depth_to_go[$i2] == $depth );
15823 if ( $i2 < $i1 ) { ( $i1, $i2 ) = ( $i2, $i1 ) }
15824 for ( my $i = $i1 + 1 ; $i < $i2 ; $i++ ) {
15825 next if ( $nesting_depth_to_go[$i] > $depth );
15826 return if ( $nesting_depth_to_go[$i] < $depth );
15828 my $tok = $tokens_to_go[$i];
15829 $tok = ',' if $tok eq '=>'; # treat => same as ,
15831 # Example: we would not want to break at any of these .'s
15832 # : "<A HREF=\"#item_" . htmlify( 0, $s2 ) . "\">$str</A>"
15833 if ( $type ne ':' ) {
15834 return if ( $tok =~ /^[\,\:\?]$/ ) || $tok eq '||' || $tok eq 'or';
15837 return if ( $tok =~ /^[\,]$/ );
15843 sub set_continuation_breaks {
15845 # Define an array of indexes for inserting newline characters to
15846 # keep the line lengths below the maximum desired length. There is
15847 # an implied break after the last token, so it need not be included.
15850 # This routine is part of series of routines which adjust line
15851 # lengths. It is only called if a statement is longer than the
15852 # maximum line length, or if a preliminary scanning located
15853 # desirable break points. Sub scan_list has already looked at
15854 # these tokens and set breakpoints (in array
15855 # $forced_breakpoint_to_go[$i]) where it wants breaks (for example
15856 # after commas, after opening parens, and before closing parens).
15857 # This routine will honor these breakpoints and also add additional
15858 # breakpoints as necessary to keep the line length below the maximum
15859 # requested. It bases its decision on where the 'bond strength' is
15862 # Output: returns references to the arrays:
15865 # which contain the indexes $i of the first and last tokens on each
15868 # In addition, the array:
15869 # $forced_breakpoint_to_go[$i]
15870 # may be updated to be =1 for any index $i after which there must be
15871 # a break. This signals later routines not to undo the breakpoint.
15873 my $saw_good_break = shift;
15874 my @i_first = (); # the first index to output
15875 my @i_last = (); # the last index to output
15876 my @i_colon_breaks = (); # needed to decide if we have to break at ?'s
15877 if ( $types_to_go[0] eq ':' ) { push @i_colon_breaks, 0 }
15879 set_bond_strengths();
15882 my $imax = $max_index_to_go;
15883 if ( $types_to_go[$imin] eq 'b' ) { $imin++ }
15884 if ( $types_to_go[$imax] eq 'b' ) { $imax-- }
15885 my $i_begin = $imin; # index for starting next iteration
15887 my $leading_spaces = leading_spaces_to_go($imin);
15888 my $line_count = 0;
15889 my $last_break_strength = NO_BREAK;
15890 my $i_last_break = -1;
15891 my $max_bias = 0.001;
15892 my $tiny_bias = 0.0001;
15893 my $leading_alignment_token = "";
15894 my $leading_alignment_type = "";
15896 # see if any ?/:'s are in order
15897 my $colons_in_order = 1;
15899 my @colon_list = grep /^[\?\:]$/, @tokens_to_go[ 0 .. $max_index_to_go ];
15900 foreach (@colon_list) {
15901 if ( $_ eq $last_tok ) { $colons_in_order = 0; last }
15905 # This is a sufficient but not necessary condition for colon chain
15906 my $is_colon_chain = ( $colons_in_order && @colon_list > 2 );
15908 #-------------------------------------------------------
15909 # BEGINNING of main loop to set continuation breakpoints
15910 # Keep iterating until we reach the end
15911 #-------------------------------------------------------
15912 while ( $i_begin <= $imax ) {
15913 my $lowest_strength = NO_BREAK;
15914 my $starting_sum = $lengths_to_go[$i_begin];
15917 my $lowest_next_token = '';
15918 my $lowest_next_type = 'b';
15919 my $i_lowest_next_nonblank = -1;
15921 #-------------------------------------------------------
15922 # BEGINNING of inner loop to find the best next breakpoint
15923 #-------------------------------------------------------
15924 for ( $i_test = $i_begin ; $i_test <= $imax ; $i_test++ ) {
15925 my $type = $types_to_go[$i_test];
15926 my $token = $tokens_to_go[$i_test];
15927 my $next_type = $types_to_go[ $i_test + 1 ];
15928 my $next_token = $tokens_to_go[ $i_test + 1 ];
15929 my $i_next_nonblank =
15930 ( ( $next_type eq 'b' ) ? $i_test + 2 : $i_test + 1 );
15931 my $next_nonblank_type = $types_to_go[$i_next_nonblank];
15932 my $next_nonblank_token = $tokens_to_go[$i_next_nonblank];
15933 my $next_nonblank_block_type = $block_type_to_go[$i_next_nonblank];
15934 my $strength = $bond_strength_to_go[$i_test];
15935 my $must_break = 0;
15937 # FIXME: TESTING: Might want to be able to break after these
15938 # force an immediate break at certain operators
15939 # with lower level than the start of the line
15942 $next_nonblank_type =~ /^(\.|\&\&|\|\|)$/
15943 || ( $next_nonblank_type eq 'k'
15944 && $next_nonblank_token =~ /^(and|or)$/ )
15946 && ( $nesting_depth_to_go[$i_begin] >
15947 $nesting_depth_to_go[$i_next_nonblank] )
15950 set_forced_breakpoint($i_next_nonblank);
15955 # Try to put a break where requested by scan_list
15956 $forced_breakpoint_to_go[$i_test]
15958 # break between ) { in a continued line so that the '{' can
15960 # See similar logic in scan_list which catches instances
15961 # where a line is just something like ') {'
15963 && ( $token eq ')' )
15964 && ( $next_nonblank_type eq '{' )
15965 && ($next_nonblank_block_type)
15966 && !$rOpts->{'opening-brace-always-on-right'} )
15968 # There is an implied forced break at a terminal opening brace
15969 || ( ( $type eq '{' ) && ( $i_test == $imax ) )
15973 # Forced breakpoints must sometimes be overridden, for example
15974 # because of a side comment causing a NO_BREAK. It is easier
15975 # to catch this here than when they are set.
15976 if ( $strength < NO_BREAK ) {
15977 $strength = $lowest_strength - $tiny_bias;
15982 # quit if a break here would put a good terminal token on
15983 # the next line and we already have a possible break
15986 && ( $next_nonblank_type =~ /^[\;\,]$/ )
15990 $lengths_to_go[ $i_next_nonblank + 1 ] -
15992 ) > $rOpts_maximum_line_length
15996 last if ( $i_lowest >= 0 );
15999 # Avoid a break which would strand a single punctuation
16000 # token. For example, we do not want to strand a leading
16001 # '.' which is followed by a long quoted string.
16004 && ( $i_test == $i_begin )
16005 && ( $i_test < $imax )
16006 && ( $token eq $type )
16010 $lengths_to_go[ $i_test + 1 ] -
16012 ) <= $rOpts_maximum_line_length
16018 if ( ( $i_test < $imax ) && ( $next_type eq 'b' ) ) {
16024 if ( ( $strength <= $lowest_strength ) && ( $strength < NO_BREAK ) )
16027 # break at previous best break if it would have produced
16028 # a leading alignment of certain common tokens, and it
16029 # is different from the latest candidate break
16031 if ($leading_alignment_type);
16033 # Force at least one breakpoint if old code had good
16034 # break It is only called if a breakpoint is required or
16035 # desired. This will probably need some adjustments
16036 # over time. A goal is to try to be sure that, if a new
16037 # side comment is introduced into formated text, then
16038 # the same breakpoints will occur. scbreak.t
16041 $i_test == $imax # we are at the end
16042 && !$forced_breakpoint_count #
16043 && $saw_good_break # old line had good break
16044 && $type =~ /^[#;\{]$/ # and this line ends in
16045 # ';' or side comment
16046 && $i_last_break < 0 # and we haven't made a break
16047 && $i_lowest > 0 # and we saw a possible break
16048 && $i_lowest < $imax - 1 # (but not just before this ;)
16049 && $strength - $lowest_strength < 0.5 * WEAK # and it's good
16052 $lowest_strength = $strength;
16053 $i_lowest = $i_test;
16054 $lowest_next_token = $next_nonblank_token;
16055 $lowest_next_type = $next_nonblank_type;
16056 $i_lowest_next_nonblank = $i_next_nonblank;
16057 last if $must_break;
16059 # set flags to remember if a break here will produce a
16060 # leading alignment of certain common tokens
16061 if ( $line_count > 0
16063 && ( $lowest_strength - $last_break_strength <= $max_bias )
16066 my $i_last_end = $i_begin - 1;
16067 if ( $types_to_go[$i_last_end] eq 'b' ) { $i_last_end -= 1 }
16068 my $tok_beg = $tokens_to_go[$i_begin];
16069 my $type_beg = $types_to_go[$i_begin];
16072 # check for leading alignment of certain tokens
16074 $tok_beg eq $next_nonblank_token
16075 && $is_chain_operator{$tok_beg}
16076 && ( $type_beg eq 'k'
16077 || $type_beg eq $tok_beg )
16078 && $nesting_depth_to_go[$i_begin] >=
16079 $nesting_depth_to_go[$i_next_nonblank]
16082 || ( $tokens_to_go[$i_last_end] eq $token
16083 && $is_chain_operator{$token}
16084 && ( $type eq 'k' || $type eq $token )
16085 && $nesting_depth_to_go[$i_last_end] >=
16086 $nesting_depth_to_go[$i_test] )
16089 $leading_alignment_token = $next_nonblank_token;
16090 $leading_alignment_type = $next_nonblank_type;
16096 ( $i_test >= $imax )
16101 $lengths_to_go[ $i_test + 2 ] -
16103 ) > $rOpts_maximum_line_length
16106 FORMATTER_DEBUG_FLAG_BREAK
16108 "BREAK: testing i = $i_test imax=$imax $types_to_go[$i_test] $next_nonblank_type leading sp=($leading_spaces) next length = $lengths_to_go[$i_test+2] too_long=$too_long str=$strength\n";
16110 # allow one extra terminal token after exceeding line length
16111 # if it would strand this token.
16112 if ( $rOpts_fuzzy_line_length
16114 && ( $i_lowest == $i_test )
16115 && ( length($token) > 1 )
16116 && ( $next_nonblank_type =~ /^[\;\,]$/ ) )
16123 ( $i_test == $imax ) # we're done if no more tokens,
16125 ( $i_lowest >= 0 ) # or no more space and we have a break
16131 #-------------------------------------------------------
16132 # END of inner loop to find the best next breakpoint
16133 # Now decide exactly where to put the breakpoint
16134 #-------------------------------------------------------
16136 # it's always ok to break at imax if no other break was found
16137 if ( $i_lowest < 0 ) { $i_lowest = $imax }
16139 # semi-final index calculation
16140 my $i_next_nonblank = (
16141 ( $types_to_go[ $i_lowest + 1 ] eq 'b' )
16145 my $next_nonblank_type = $types_to_go[$i_next_nonblank];
16146 my $next_nonblank_token = $tokens_to_go[$i_next_nonblank];
16148 #-------------------------------------------------------
16149 # ?/: rule 1 : if a break here will separate a '?' on this
16150 # line from its closing ':', then break at the '?' instead.
16151 #-------------------------------------------------------
16153 foreach $i ( $i_begin + 1 .. $i_lowest - 1 ) {
16154 next unless ( $tokens_to_go[$i] eq '?' );
16156 # do not break if probable sequence of ?/: statements
16157 next if ($is_colon_chain);
16159 # do not break if statement is broken by side comment
16162 $tokens_to_go[$max_index_to_go] eq '#'
16163 && terminal_type( \@types_to_go, \@block_type_to_go, 0,
16164 $max_index_to_go ) !~ /^[\;\}]$/
16167 # no break needed if matching : is also on the line
16169 if ( $mate_index_to_go[$i] >= 0
16170 && $mate_index_to_go[$i] <= $i_next_nonblank );
16173 if ( $want_break_before{'?'} ) { $i_lowest-- }
16177 #-------------------------------------------------------
16178 # END of inner loop to find the best next breakpoint:
16179 # Break the line after the token with index i=$i_lowest
16180 #-------------------------------------------------------
16182 # final index calculation
16183 $i_next_nonblank = (
16184 ( $types_to_go[ $i_lowest + 1 ] eq 'b' )
16188 $next_nonblank_type = $types_to_go[$i_next_nonblank];
16189 $next_nonblank_token = $tokens_to_go[$i_next_nonblank];
16191 FORMATTER_DEBUG_FLAG_BREAK
16192 && print "BREAK: best is i = $i_lowest strength = $lowest_strength\n";
16194 #-------------------------------------------------------
16195 # ?/: rule 2 : if we break at a '?', then break at its ':'
16197 # Note: this rule is also in sub scan_list to handle a break
16198 # at the start and end of a line (in case breaks are dictated
16199 # by side comments).
16200 #-------------------------------------------------------
16201 if ( $next_nonblank_type eq '?' ) {
16202 set_closing_breakpoint($i_next_nonblank);
16204 elsif ( $types_to_go[$i_lowest] eq '?' ) {
16205 set_closing_breakpoint($i_lowest);
16208 #-------------------------------------------------------
16209 # ?/: rule 3 : if we break at a ':' then we save
16210 # its location for further work below. We may need to go
16211 # back and break at its '?'.
16212 #-------------------------------------------------------
16213 if ( $next_nonblank_type eq ':' ) {
16214 push @i_colon_breaks, $i_next_nonblank;
16216 elsif ( $types_to_go[$i_lowest] eq ':' ) {
16217 push @i_colon_breaks, $i_lowest;
16220 # here we should set breaks for all '?'/':' pairs which are
16221 # separated by this line
16225 # save this line segment, after trimming blanks at the ends
16227 ( $types_to_go[$i_begin] eq 'b' ) ? $i_begin + 1 : $i_begin );
16229 ( $types_to_go[$i_lowest] eq 'b' ) ? $i_lowest - 1 : $i_lowest );
16231 # set a forced breakpoint at a container opening, if necessary, to
16232 # signal a break at a closing container. Excepting '(' for now.
16233 if ( $tokens_to_go[$i_lowest] =~ /^[\{\[]$/
16234 && !$forced_breakpoint_to_go[$i_lowest] )
16236 set_closing_breakpoint($i_lowest);
16239 # get ready to go again
16240 $i_begin = $i_lowest + 1;
16241 $last_break_strength = $lowest_strength;
16242 $i_last_break = $i_lowest;
16243 $leading_alignment_token = "";
16244 $leading_alignment_type = "";
16245 $lowest_next_token = '';
16246 $lowest_next_type = 'b';
16248 if ( ( $i_begin <= $imax ) && ( $types_to_go[$i_begin] eq 'b' ) ) {
16252 # update indentation size
16253 if ( $i_begin <= $imax ) {
16254 $leading_spaces = leading_spaces_to_go($i_begin);
16258 #-------------------------------------------------------
16259 # END of main loop to set continuation breakpoints
16260 # Now go back and make any necessary corrections
16261 #-------------------------------------------------------
16263 #-------------------------------------------------------
16264 # ?/: rule 4 -- if we broke at a ':', then break at
16265 # corresponding '?' unless this is a chain of ?: expressions
16266 #-------------------------------------------------------
16267 if (@i_colon_breaks) {
16269 # using a simple method for deciding if we are in a ?/: chain --
16270 # this is a chain if it has multiple ?/: pairs all in order;
16272 # Note that if line starts in a ':' we count that above as a break
16273 my $is_chain = ( $colons_in_order && @i_colon_breaks > 1 );
16275 unless ($is_chain) {
16276 my @insert_list = ();
16277 foreach (@i_colon_breaks) {
16278 my $i_question = $mate_index_to_go[$_];
16279 if ( $i_question >= 0 ) {
16280 if ( $want_break_before{'?'} ) {
16282 if ( $i_question > 0
16283 && $types_to_go[$i_question] eq 'b' )
16289 if ( $i_question >= 0 ) {
16290 push @insert_list, $i_question;
16293 insert_additional_breaks( \@insert_list, \@i_first, \@i_last );
16297 return \@i_first, \@i_last;
16300 sub insert_additional_breaks {
16302 # this routine will add line breaks at requested locations after
16303 # sub set_continuation_breaks has made preliminary breaks.
16305 my ( $ri_break_list, $ri_first, $ri_last ) = @_;
16308 my $line_number = 0;
16310 foreach $i_break_left ( sort { $a <=> $b } @$ri_break_list ) {
16312 $i_f = $$ri_first[$line_number];
16313 $i_l = $$ri_last[$line_number];
16314 while ( $i_break_left >= $i_l ) {
16317 # shouldn't happen unless caller passes bad indexes
16318 if ( $line_number >= @$ri_last ) {
16320 "Non-fatal program bug: couldn't set break at $i_break_left\n"
16322 report_definite_bug();
16325 $i_f = $$ri_first[$line_number];
16326 $i_l = $$ri_last[$line_number];
16329 my $i_break_right = $i_break_left + 1;
16330 if ( $types_to_go[$i_break_right] eq 'b' ) { $i_break_right++ }
16332 if ( $i_break_left >= $i_f
16333 && $i_break_left < $i_l
16334 && $i_break_right > $i_f
16335 && $i_break_right <= $i_l )
16337 splice( @$ri_first, $line_number, 1, ( $i_f, $i_break_right ) );
16338 splice( @$ri_last, $line_number, 1, ( $i_break_left, $i_l ) );
16343 sub set_closing_breakpoint {
16345 # set a breakpoint at a matching closing token
16346 # at present, this is only used to break at a ':' which matches a '?'
16347 my $i_break = shift;
16349 if ( $mate_index_to_go[$i_break] >= 0 ) {
16351 # CAUTION: infinite recursion possible here:
16352 # set_closing_breakpoint calls set_forced_breakpoint, and
16353 # set_forced_breakpoint call set_closing_breakpoint
16354 # ( test files attrib.t, BasicLyx.pm.html).
16355 # Don't reduce the '2' in the statement below
16356 if ( $mate_index_to_go[$i_break] > $i_break + 2 ) {
16358 # break before } ] and ), but sub set_forced_breakpoint will decide
16359 # to break before or after a ? and :
16360 my $inc = ( $tokens_to_go[$i_break] eq '?' ) ? 0 : 1;
16361 set_forced_breakpoint( $mate_index_to_go[$i_break] - $inc );
16365 my $type_sequence = $type_sequence_to_go[$i_break];
16366 if ($type_sequence) {
16367 my $closing_token = $matching_token{ $tokens_to_go[$i_break] };
16368 $postponed_breakpoint{$type_sequence} = 1;
16373 # check to see if output line tabbing agrees with input line
16374 # this can be very useful for debugging a script which has an extra
16376 sub compare_indentation_levels {
16378 my ( $python_indentation_level, $structural_indentation_level ) = @_;
16379 if ( ( $python_indentation_level ne $structural_indentation_level ) ) {
16380 $last_tabbing_disagreement = $input_line_number;
16382 if ($in_tabbing_disagreement) {
16385 $tabbing_disagreement_count++;
16387 if ( $tabbing_disagreement_count <= MAX_NAG_MESSAGES ) {
16388 write_logfile_entry(
16389 "Start indentation disagreement: input=$python_indentation_level; output=$structural_indentation_level\n"
16392 $in_tabbing_disagreement = $input_line_number;
16393 $first_tabbing_disagreement = $in_tabbing_disagreement
16394 unless ($first_tabbing_disagreement);
16399 if ($in_tabbing_disagreement) {
16401 if ( $tabbing_disagreement_count <= MAX_NAG_MESSAGES ) {
16402 write_logfile_entry(
16403 "End indentation disagreement from input line $in_tabbing_disagreement\n"
16406 if ( $tabbing_disagreement_count == MAX_NAG_MESSAGES ) {
16407 write_logfile_entry(
16408 "No further tabbing disagreements will be noted\n");
16411 $in_tabbing_disagreement = 0;
16416 #####################################################################
16418 # the Perl::Tidy::IndentationItem class supplies items which contain
16419 # how much whitespace should be used at the start of a line
16421 #####################################################################
16423 package Perl::Tidy::IndentationItem;
16425 # Indexes for indentation items
16426 use constant SPACES => 0; # total leading white spaces
16427 use constant LEVEL => 1; # the indentation 'level'
16428 use constant CI_LEVEL => 2; # the 'continuation level'
16429 use constant AVAILABLE_SPACES => 3; # how many left spaces available
16431 use constant CLOSED => 4; # index where we saw closing '}'
16432 use constant COMMA_COUNT => 5; # how many commas at this level?
16433 use constant SEQUENCE_NUMBER => 6; # output batch number
16434 use constant INDEX => 7; # index in output batch list
16435 use constant HAVE_CHILD => 8; # any dependents?
16436 use constant RECOVERABLE_SPACES => 9; # how many spaces to the right
16437 # we would like to move to get
16438 # alignment (negative if left)
16439 use constant ALIGN_PAREN => 10; # do we want to try to align
16440 # with an opening structure?
16441 use constant MARKED => 11; # if visited by corrector logic
16442 use constant STACK_DEPTH => 12; # indentation nesting depth
16443 use constant STARTING_INDEX => 13; # first token index of this level
16444 use constant ARROW_COUNT => 14; # how many =>'s
16448 # Create an 'indentation_item' which describes one level of leading
16449 # whitespace when the '-lp' indentation is used. We return
16450 # a reference to an anonymous array of associated variables.
16451 # See above constants for storage scheme.
16453 $class, $spaces, $level,
16454 $ci_level, $available_spaces, $index,
16455 $gnu_sequence_number, $align_paren, $stack_depth,
16459 my $arrow_count = 0;
16460 my $comma_count = 0;
16461 my $have_child = 0;
16462 my $want_right_spaces = 0;
16465 $spaces, $level, $ci_level,
16466 $available_spaces, $closed, $comma_count,
16467 $gnu_sequence_number, $index, $have_child,
16468 $want_right_spaces, $align_paren, $marked,
16469 $stack_depth, $starting_index, $arrow_count,
16473 sub permanently_decrease_AVAILABLE_SPACES {
16475 # make a permanent reduction in the available indentation spaces
16476 # at one indentation item. NOTE: if there are child nodes, their
16477 # total SPACES must be reduced by the caller.
16479 my ( $item, $spaces_needed ) = @_;
16480 my $available_spaces = $item->get_AVAILABLE_SPACES();
16481 my $deleted_spaces =
16482 ( $available_spaces > $spaces_needed )
16484 : $available_spaces;
16485 $item->decrease_AVAILABLE_SPACES($deleted_spaces);
16486 $item->decrease_SPACES($deleted_spaces);
16487 $item->set_RECOVERABLE_SPACES(0);
16489 return $deleted_spaces;
16492 sub tentatively_decrease_AVAILABLE_SPACES {
16494 # We are asked to tentatively delete $spaces_needed of indentation
16495 # for a indentation item. We may want to undo this later. NOTE: if
16496 # there are child nodes, their total SPACES must be reduced by the
16498 my ( $item, $spaces_needed ) = @_;
16499 my $available_spaces = $item->get_AVAILABLE_SPACES();
16500 my $deleted_spaces =
16501 ( $available_spaces > $spaces_needed )
16503 : $available_spaces;
16504 $item->decrease_AVAILABLE_SPACES($deleted_spaces);
16505 $item->decrease_SPACES($deleted_spaces);
16506 $item->increase_RECOVERABLE_SPACES($deleted_spaces);
16507 return $deleted_spaces;
16510 sub get_STACK_DEPTH {
16512 return $self->[STACK_DEPTH];
16517 return $self->[SPACES];
16522 return $self->[MARKED];
16526 my ( $self, $value ) = @_;
16527 if ( defined($value) ) {
16528 $self->[MARKED] = $value;
16530 return $self->[MARKED];
16533 sub get_AVAILABLE_SPACES {
16535 return $self->[AVAILABLE_SPACES];
16538 sub decrease_SPACES {
16539 my ( $self, $value ) = @_;
16540 if ( defined($value) ) {
16541 $self->[SPACES] -= $value;
16543 return $self->[SPACES];
16546 sub decrease_AVAILABLE_SPACES {
16547 my ( $self, $value ) = @_;
16548 if ( defined($value) ) {
16549 $self->[AVAILABLE_SPACES] -= $value;
16551 return $self->[AVAILABLE_SPACES];
16554 sub get_ALIGN_PAREN {
16556 return $self->[ALIGN_PAREN];
16559 sub get_RECOVERABLE_SPACES {
16561 return $self->[RECOVERABLE_SPACES];
16564 sub set_RECOVERABLE_SPACES {
16565 my ( $self, $value ) = @_;
16566 if ( defined($value) ) {
16567 $self->[RECOVERABLE_SPACES] = $value;
16569 return $self->[RECOVERABLE_SPACES];
16572 sub increase_RECOVERABLE_SPACES {
16573 my ( $self, $value ) = @_;
16574 if ( defined($value) ) {
16575 $self->[RECOVERABLE_SPACES] += $value;
16577 return $self->[RECOVERABLE_SPACES];
16582 return $self->[CI_LEVEL];
16587 return $self->[LEVEL];
16590 sub get_SEQUENCE_NUMBER {
16592 return $self->[SEQUENCE_NUMBER];
16597 return $self->[INDEX];
16600 sub get_STARTING_INDEX {
16602 return $self->[STARTING_INDEX];
16605 sub set_HAVE_CHILD {
16606 my ( $self, $value ) = @_;
16607 if ( defined($value) ) {
16608 $self->[HAVE_CHILD] = $value;
16610 return $self->[HAVE_CHILD];
16613 sub get_HAVE_CHILD {
16615 return $self->[HAVE_CHILD];
16618 sub set_ARROW_COUNT {
16619 my ( $self, $value ) = @_;
16620 if ( defined($value) ) {
16621 $self->[ARROW_COUNT] = $value;
16623 return $self->[ARROW_COUNT];
16626 sub get_ARROW_COUNT {
16628 return $self->[ARROW_COUNT];
16631 sub set_COMMA_COUNT {
16632 my ( $self, $value ) = @_;
16633 if ( defined($value) ) {
16634 $self->[COMMA_COUNT] = $value;
16636 return $self->[COMMA_COUNT];
16639 sub get_COMMA_COUNT {
16641 return $self->[COMMA_COUNT];
16645 my ( $self, $value ) = @_;
16646 if ( defined($value) ) {
16647 $self->[CLOSED] = $value;
16649 return $self->[CLOSED];
16654 return $self->[CLOSED];
16657 #####################################################################
16659 # the Perl::Tidy::VerticalAligner::Line class supplies an object to
16660 # contain a single output line
16662 #####################################################################
16664 package Perl::Tidy::VerticalAligner::Line;
16671 use constant JMAX => 0;
16672 use constant JMAX_ORIGINAL_LINE => 1;
16673 use constant RTOKENS => 2;
16674 use constant RFIELDS => 3;
16675 use constant RPATTERNS => 4;
16676 use constant INDENTATION => 5;
16677 use constant LEADING_SPACE_COUNT => 6;
16678 use constant OUTDENT_LONG_LINES => 7;
16679 use constant LIST_TYPE => 8;
16680 use constant IS_HANGING_SIDE_COMMENT => 9;
16681 use constant RALIGNMENTS => 10;
16682 use constant MAXIMUM_LINE_LENGTH => 11;
16683 use constant RVERTICAL_TIGHTNESS_FLAGS => 12;
16686 $_index_map{jmax} = JMAX;
16687 $_index_map{jmax_original_line} = JMAX_ORIGINAL_LINE;
16688 $_index_map{rtokens} = RTOKENS;
16689 $_index_map{rfields} = RFIELDS;
16690 $_index_map{rpatterns} = RPATTERNS;
16691 $_index_map{indentation} = INDENTATION;
16692 $_index_map{leading_space_count} = LEADING_SPACE_COUNT;
16693 $_index_map{outdent_long_lines} = OUTDENT_LONG_LINES;
16694 $_index_map{list_type} = LIST_TYPE;
16695 $_index_map{is_hanging_side_comment} = IS_HANGING_SIDE_COMMENT;
16696 $_index_map{ralignments} = RALIGNMENTS;
16697 $_index_map{maximum_line_length} = MAXIMUM_LINE_LENGTH;
16698 $_index_map{rvertical_tightness_flags} = RVERTICAL_TIGHTNESS_FLAGS;
16700 my @_default_data = ();
16701 $_default_data[JMAX] = undef;
16702 $_default_data[JMAX_ORIGINAL_LINE] = undef;
16703 $_default_data[RTOKENS] = undef;
16704 $_default_data[RFIELDS] = undef;
16705 $_default_data[RPATTERNS] = undef;
16706 $_default_data[INDENTATION] = undef;
16707 $_default_data[LEADING_SPACE_COUNT] = undef;
16708 $_default_data[OUTDENT_LONG_LINES] = undef;
16709 $_default_data[LIST_TYPE] = undef;
16710 $_default_data[IS_HANGING_SIDE_COMMENT] = undef;
16711 $_default_data[RALIGNMENTS] = [];
16712 $_default_data[MAXIMUM_LINE_LENGTH] = undef;
16713 $_default_data[RVERTICAL_TIGHTNESS_FLAGS] = undef;
16717 # methods to count object population
16719 sub get_count { $_count; }
16720 sub _increment_count { ++$_count }
16721 sub _decrement_count { --$_count }
16724 # Constructor may be called as a class method
16726 my ( $caller, %arg ) = @_;
16727 my $caller_is_obj = ref($caller);
16728 my $class = $caller_is_obj || $caller;
16730 my $self = bless [], $class;
16732 $self->[RALIGNMENTS] = [];
16735 foreach ( keys %_index_map ) {
16736 $index = $_index_map{$_};
16737 if ( exists $arg{$_} ) { $self->[$index] = $arg{$_} }
16738 elsif ($caller_is_obj) { $self->[$index] = $caller->[$index] }
16739 else { $self->[$index] = $_default_data[$index] }
16742 $self->_increment_count();
16747 $_[0]->_decrement_count();
16750 sub get_jmax { $_[0]->[JMAX] }
16751 sub get_jmax_original_line { $_[0]->[JMAX_ORIGINAL_LINE] }
16752 sub get_rtokens { $_[0]->[RTOKENS] }
16753 sub get_rfields { $_[0]->[RFIELDS] }
16754 sub get_rpatterns { $_[0]->[RPATTERNS] }
16755 sub get_indentation { $_[0]->[INDENTATION] }
16756 sub get_leading_space_count { $_[0]->[LEADING_SPACE_COUNT] }
16757 sub get_outdent_long_lines { $_[0]->[OUTDENT_LONG_LINES] }
16758 sub get_list_type { $_[0]->[LIST_TYPE] }
16759 sub get_is_hanging_side_comment { $_[0]->[IS_HANGING_SIDE_COMMENT] }
16760 sub get_rvertical_tightness_flags { $_[0]->[RVERTICAL_TIGHTNESS_FLAGS] }
16762 sub set_column { $_[0]->[RALIGNMENTS]->[ $_[1] ]->set_column( $_[2] ) }
16763 sub get_alignment { $_[0]->[RALIGNMENTS]->[ $_[1] ] }
16764 sub get_alignments { @{ $_[0]->[RALIGNMENTS] } }
16765 sub get_column { $_[0]->[RALIGNMENTS]->[ $_[1] ]->get_column() }
16767 sub get_starting_column {
16768 $_[0]->[RALIGNMENTS]->[ $_[1] ]->get_starting_column();
16771 sub increment_column {
16772 $_[0]->[RALIGNMENTS]->[ $_[1] ]->increment_column( $_[2] );
16774 sub set_alignments { my $self = shift; @{ $self->[RALIGNMENTS] } = @_; }
16776 sub current_field_width {
16780 return $self->get_column($j);
16783 return $self->get_column($j) - $self->get_column( $j - 1 );
16787 sub field_width_growth {
16790 return $self->get_column($j) - $self->get_starting_column($j);
16793 sub starting_field_width {
16797 return $self->get_starting_column($j);
16800 return $self->get_starting_column($j) -
16801 $self->get_starting_column( $j - 1 );
16805 sub increase_field_width {
16808 my ( $j, $pad ) = @_;
16809 my $jmax = $self->get_jmax();
16810 for my $k ( $j .. $jmax ) {
16811 $self->increment_column( $k, $pad );
16815 sub get_available_space_on_right {
16817 my $jmax = $self->get_jmax();
16818 return $self->[MAXIMUM_LINE_LENGTH] - $self->get_column($jmax);
16821 sub set_jmax { $_[0]->[JMAX] = $_[1] }
16822 sub set_jmax_original_line { $_[0]->[JMAX_ORIGINAL_LINE] = $_[1] }
16823 sub set_rtokens { $_[0]->[RTOKENS] = $_[1] }
16824 sub set_rfields { $_[0]->[RFIELDS] = $_[1] }
16825 sub set_rpatterns { $_[0]->[RPATTERNS] = $_[1] }
16826 sub set_indentation { $_[0]->[INDENTATION] = $_[1] }
16827 sub set_leading_space_count { $_[0]->[LEADING_SPACE_COUNT] = $_[1] }
16828 sub set_outdent_long_lines { $_[0]->[OUTDENT_LONG_LINES] = $_[1] }
16829 sub set_list_type { $_[0]->[LIST_TYPE] = $_[1] }
16830 sub set_is_hanging_side_comment { $_[0]->[IS_HANGING_SIDE_COMMENT] = $_[1] }
16831 sub set_alignment { $_[0]->[RALIGNMENTS]->[ $_[1] ] = $_[2] }
16835 #####################################################################
16837 # the Perl::Tidy::VerticalAligner::Alignment class holds information
16838 # on a single column being aligned
16840 #####################################################################
16841 package Perl::Tidy::VerticalAligner::Alignment;
16849 # Symbolic array indexes
16850 use constant COLUMN => 0; # the current column number
16851 use constant STARTING_COLUMN => 1; # column number when created
16852 use constant MATCHING_TOKEN => 2; # what token we are matching
16853 use constant STARTING_LINE => 3; # the line index of creation
16854 use constant ENDING_LINE => 4; # the most recent line to use it
16855 use constant SAVED_COLUMN => 5; # the most recent line to use it
16856 use constant SERIAL_NUMBER => 6; # unique number for this alignment
16857 # (just its index in an array)
16859 # Correspondence between variables and array indexes
16861 $_index_map{column} = COLUMN;
16862 $_index_map{starting_column} = STARTING_COLUMN;
16863 $_index_map{matching_token} = MATCHING_TOKEN;
16864 $_index_map{starting_line} = STARTING_LINE;
16865 $_index_map{ending_line} = ENDING_LINE;
16866 $_index_map{saved_column} = SAVED_COLUMN;
16867 $_index_map{serial_number} = SERIAL_NUMBER;
16869 my @_default_data = ();
16870 $_default_data[COLUMN] = undef;
16871 $_default_data[STARTING_COLUMN] = undef;
16872 $_default_data[MATCHING_TOKEN] = undef;
16873 $_default_data[STARTING_LINE] = undef;
16874 $_default_data[ENDING_LINE] = undef;
16875 $_default_data[SAVED_COLUMN] = undef;
16876 $_default_data[SERIAL_NUMBER] = undef;
16878 # class population count
16881 sub get_count { $_count; }
16882 sub _increment_count { ++$_count }
16883 sub _decrement_count { --$_count }
16888 my ( $caller, %arg ) = @_;
16889 my $caller_is_obj = ref($caller);
16890 my $class = $caller_is_obj || $caller;
16892 my $self = bless [], $class;
16894 foreach ( keys %_index_map ) {
16895 my $index = $_index_map{$_};
16896 if ( exists $arg{$_} ) { $self->[$index] = $arg{$_} }
16897 elsif ($caller_is_obj) { $self->[$index] = $caller->[$index] }
16898 else { $self->[$index] = $_default_data[$index] }
16900 $self->_increment_count();
16905 $_[0]->_decrement_count();
16908 sub get_column { return $_[0]->[COLUMN] }
16909 sub get_starting_column { return $_[0]->[STARTING_COLUMN] }
16910 sub get_matching_token { return $_[0]->[MATCHING_TOKEN] }
16911 sub get_starting_line { return $_[0]->[STARTING_LINE] }
16912 sub get_ending_line { return $_[0]->[ENDING_LINE] }
16913 sub get_serial_number { return $_[0]->[SERIAL_NUMBER] }
16915 sub set_column { $_[0]->[COLUMN] = $_[1] }
16916 sub set_starting_column { $_[0]->[STARTING_COLUMN] = $_[1] }
16917 sub set_matching_token { $_[0]->[MATCHING_TOKEN] = $_[1] }
16918 sub set_starting_line { $_[0]->[STARTING_LINE] = $_[1] }
16919 sub set_ending_line { $_[0]->[ENDING_LINE] = $_[1] }
16920 sub increment_column { $_[0]->[COLUMN] += $_[1] }
16922 sub save_column { $_[0]->[SAVED_COLUMN] = $_[0]->[COLUMN] }
16923 sub restore_column { $_[0]->[COLUMN] = $_[0]->[SAVED_COLUMN] }
16927 package Perl::Tidy::VerticalAligner;
16929 # The Perl::Tidy::VerticalAligner package collects output lines and
16930 # attempts to line up certain common tokens, such as => and #, which are
16931 # identified by the calling routine.
16933 # There are two main routines: append_line and flush. Append acts as a
16934 # storage buffer, collecting lines into a group which can be vertically
16935 # aligned. When alignment is no longer possible or desirable, it dumps
16936 # the group to flush.
16938 # append_line -----> flush
16946 # Caution: these debug flags produce a lot of output
16947 # They should all be 0 except when debugging small scripts
16949 use constant VALIGN_DEBUG_FLAG_APPEND => 0;
16950 use constant VALIGN_DEBUG_FLAG_APPEND0 => 0;
16951 use constant VALIGN_DEBUG_FLAG_TERNARY => 0;
16953 my $debug_warning = sub {
16954 print "VALIGN_DEBUGGING with key $_[0]\n";
16957 VALIGN_DEBUG_FLAG_APPEND && $debug_warning->('APPEND');
16958 VALIGN_DEBUG_FLAG_APPEND0 && $debug_warning->('APPEND0');
16963 $vertical_aligner_self
16965 $maximum_alignment_index
16969 $previous_minimum_jmax_seen
16970 $previous_maximum_jmax_seen
16971 $maximum_line_index
16976 $last_group_level_written
16977 $last_leading_space_count
16981 $last_comment_column
16982 $last_side_comment_line_number
16983 $last_side_comment_length
16984 $last_side_comment_level
16985 $outdented_line_count
16986 $first_outdented_line_at
16987 $last_outdented_line_at
16988 $diagnostics_object
16990 $file_writer_object
16991 @side_comment_history
16992 $comment_leading_space_count
16993 $is_matching_terminal_line
17000 $cached_line_leading_space_count
17001 $cached_seqno_string
17004 $last_nonblank_seqno_string
17008 $rOpts_maximum_line_length
17009 $rOpts_continuation_indentation
17010 $rOpts_indent_columns
17012 $rOpts_entab_leading_whitespace
17015 $rOpts_minimum_space_to_comment
17023 ( $class, $rOpts, $file_writer_object, $logger_object, $diagnostics_object )
17026 # variables describing the entire space group:
17027 $ralignment_list = [];
17029 $last_group_level_written = -1;
17030 $extra_indent_ok = 0; # can we move all lines to the right?
17031 $last_side_comment_length = 0;
17032 $maximum_jmax_seen = 0;
17033 $minimum_jmax_seen = 0;
17034 $previous_minimum_jmax_seen = 0;
17035 $previous_maximum_jmax_seen = 0;
17037 # variables describing each line of the group
17038 @group_lines = (); # list of all lines in group
17040 $outdented_line_count = 0;
17041 $first_outdented_line_at = 0;
17042 $last_outdented_line_at = 0;
17043 $last_side_comment_line_number = 0;
17044 $last_side_comment_level = -1;
17045 $is_matching_terminal_line = 0;
17047 # most recent 3 side comments; [ line number, column ]
17048 $side_comment_history[0] = [ -300, 0 ];
17049 $side_comment_history[1] = [ -200, 0 ];
17050 $side_comment_history[2] = [ -100, 0 ];
17052 # write_leader_and_string cache:
17053 $cached_line_text = "";
17054 $cached_line_type = 0;
17055 $cached_line_flag = 0;
17057 $cached_line_valid = 0;
17058 $cached_line_leading_space_count = 0;
17059 $cached_seqno_string = "";
17061 # string of sequence numbers joined together
17062 $seqno_string = "";
17063 $last_nonblank_seqno_string = "";
17065 # frequently used parameters
17066 $rOpts_indent_columns = $rOpts->{'indent-columns'};
17067 $rOpts_tabs = $rOpts->{'tabs'};
17068 $rOpts_entab_leading_whitespace = $rOpts->{'entab-leading-whitespace'};
17069 $rOpts_minimum_space_to_comment = $rOpts->{'minimum-space-to-comment'};
17070 $rOpts_maximum_line_length = $rOpts->{'maximum-line-length'};
17071 $rOpts_valign = $rOpts->{'valign'};
17073 forget_side_comment();
17075 initialize_for_new_group();
17077 $vertical_aligner_self = {};
17078 bless $vertical_aligner_self, $class;
17079 return $vertical_aligner_self;
17082 sub initialize_for_new_group {
17083 $maximum_line_index = -1; # lines in the current group
17084 $maximum_alignment_index = -1; # alignments in current group
17085 $zero_count = 0; # count consecutive lines without tokens
17086 $current_line = undef; # line being matched for alignment
17087 $group_maximum_gap = 0; # largest gap introduced
17089 $marginal_match = 0;
17090 $comment_leading_space_count = 0;
17091 $last_leading_space_count = 0;
17094 # interface to Perl::Tidy::Diagnostics routines
17095 sub write_diagnostics {
17096 if ($diagnostics_object) {
17097 $diagnostics_object->write_diagnostics(@_);
17101 # interface to Perl::Tidy::Logger routines
17103 if ($logger_object) {
17104 $logger_object->warning(@_);
17108 sub write_logfile_entry {
17109 if ($logger_object) {
17110 $logger_object->write_logfile_entry(@_);
17114 sub report_definite_bug {
17115 if ($logger_object) {
17116 $logger_object->report_definite_bug();
17122 # return the number of leading spaces associated with an indentation
17123 # variable $indentation is either a constant number of spaces or an
17124 # object with a get_SPACES method.
17125 my $indentation = shift;
17126 return ref($indentation) ? $indentation->get_SPACES() : $indentation;
17129 sub get_RECOVERABLE_SPACES {
17131 # return the number of spaces (+ means shift right, - means shift left)
17132 # that we would like to shift a group of lines with the same indentation
17133 # to get them to line up with their opening parens
17134 my $indentation = shift;
17135 return ref($indentation) ? $indentation->get_RECOVERABLE_SPACES() : 0;
17138 sub get_STACK_DEPTH {
17140 my $indentation = shift;
17141 return ref($indentation) ? $indentation->get_STACK_DEPTH() : 0;
17144 sub make_alignment {
17145 my ( $col, $token ) = @_;
17147 # make one new alignment at column $col which aligns token $token
17148 ++$maximum_alignment_index;
17149 my $alignment = new Perl::Tidy::VerticalAligner::Alignment(
17151 starting_column => $col,
17152 matching_token => $token,
17153 starting_line => $maximum_line_index,
17154 ending_line => $maximum_line_index,
17155 serial_number => $maximum_alignment_index,
17157 $ralignment_list->[$maximum_alignment_index] = $alignment;
17161 sub dump_alignments {
17163 "Current Alignments:\ni\ttoken\tstarting_column\tcolumn\tstarting_line\tending_line\n";
17164 for my $i ( 0 .. $maximum_alignment_index ) {
17165 my $column = $ralignment_list->[$i]->get_column();
17166 my $starting_column = $ralignment_list->[$i]->get_starting_column();
17167 my $matching_token = $ralignment_list->[$i]->get_matching_token();
17168 my $starting_line = $ralignment_list->[$i]->get_starting_line();
17169 my $ending_line = $ralignment_list->[$i]->get_ending_line();
17171 "$i\t$matching_token\t$starting_column\t$column\t$starting_line\t$ending_line\n";
17175 sub save_alignment_columns {
17176 for my $i ( 0 .. $maximum_alignment_index ) {
17177 $ralignment_list->[$i]->save_column();
17181 sub restore_alignment_columns {
17182 for my $i ( 0 .. $maximum_alignment_index ) {
17183 $ralignment_list->[$i]->restore_column();
17187 sub forget_side_comment {
17188 $last_comment_column = 0;
17193 # sub append is called to place one line in the current vertical group.
17195 # The input parameters are:
17196 # $level = indentation level of this line
17197 # $rfields = reference to array of fields
17198 # $rpatterns = reference to array of patterns, one per field
17199 # $rtokens = reference to array of tokens starting fields 1,2,..
17201 # Here is an example of what this package does. In this example,
17202 # we are trying to line up both the '=>' and the '#'.
17204 # '18' => 'grave', # \`
17205 # '19' => 'acute', # `'
17206 # '20' => 'caron', # \v
17207 # <-tabs-><f1-><--field 2 ---><-f3->
17210 # col1 col2 col3 col4
17212 # The calling routine has already broken the entire line into 3 fields as
17213 # indicated. (So the work of identifying promising common tokens has
17214 # already been done).
17216 # In this example, there will be 2 tokens being matched: '=>' and '#'.
17217 # They are the leading parts of fields 2 and 3, but we do need to know
17218 # what they are so that we can dump a group of lines when these tokens
17221 # The fields contain the actual characters of each field. The patterns
17222 # are like the fields, but they contain mainly token types instead
17223 # of tokens, so they have fewer characters. They are used to be
17224 # sure we are matching fields of similar type.
17226 # In this example, there will be 4 column indexes being adjusted. The
17227 # first one is always at zero. The interior columns are at the start of
17228 # the matching tokens, and the last one tracks the maximum line length.
17230 # Basically, each time a new line comes in, it joins the current vertical
17231 # group if possible. Otherwise it causes the current group to be dumped
17232 # and a new group is started.
17234 # For each new group member, the column locations are increased, as
17235 # necessary, to make room for the new fields. When the group is finally
17236 # output, these column numbers are used to compute the amount of spaces of
17237 # padding needed for each field.
17239 # Programming note: the fields are assumed not to have any tab characters.
17240 # Tabs have been previously removed except for tabs in quoted strings and
17241 # side comments. Tabs in these fields can mess up the column counting.
17242 # The log file warns the user if there are any such tabs.
17245 $level, $level_end,
17246 $indentation, $rfields,
17247 $rtokens, $rpatterns,
17248 $is_forced_break, $outdent_long_lines,
17249 $is_terminal_ternary, $is_terminal_statement,
17250 $do_not_pad, $rvertical_tightness_flags,
17254 # number of fields is $jmax
17255 # number of tokens between fields is $jmax-1
17256 my $jmax = $#{$rfields};
17258 my $leading_space_count = get_SPACES($indentation);
17260 # set outdented flag to be sure we either align within statements or
17261 # across statement boundaries, but not both.
17262 my $is_outdented = $last_leading_space_count > $leading_space_count;
17263 $last_leading_space_count = $leading_space_count;
17265 # Patch: undo for hanging side comment
17266 my $is_hanging_side_comment =
17267 ( $jmax == 1 && $rtokens->[0] eq '#' && $rfields->[0] =~ /^\s*$/ );
17268 $is_outdented = 0 if $is_hanging_side_comment;
17270 VALIGN_DEBUG_FLAG_APPEND0 && do {
17272 "APPEND0: entering lines=$maximum_line_index new #fields= $jmax, leading_count=$leading_space_count last_cmt=$last_comment_column force=$is_forced_break\n";
17275 # Validate cached line if necessary: If we can produce a container
17276 # with just 2 lines total by combining an existing cached opening
17277 # token with the closing token to follow, then we will mark both
17278 # cached flags as valid.
17279 if ($rvertical_tightness_flags) {
17280 if ( $maximum_line_index <= 0
17281 && $cached_line_type
17283 && $rvertical_tightness_flags->[2]
17284 && $rvertical_tightness_flags->[2] == $cached_seqno )
17286 $rvertical_tightness_flags->[3] ||= 1;
17287 $cached_line_valid ||= 1;
17291 # do not join an opening block brace with an unbalanced line
17292 # unless requested with a flag value of 2
17293 if ( $cached_line_type == 3
17294 && $maximum_line_index < 0
17295 && $cached_line_flag < 2
17296 && $level_jump != 0 )
17298 $cached_line_valid = 0;
17301 # patch until new aligner is finished
17302 if ($do_not_pad) { my_flush() }
17304 # shouldn't happen:
17305 if ( $level < 0 ) { $level = 0 }
17307 # do not align code across indentation level changes
17308 # or if vertical alignment is turned off for debugging
17309 if ( $level != $group_level || $is_outdented || !$rOpts_valign ) {
17311 # we are allowed to shift a group of lines to the right if its
17312 # level is greater than the previous and next group
17314 ( $level < $group_level && $last_group_level_written < $group_level );
17318 # If we know that this line will get flushed out by itself because
17319 # of level changes, we can leave the extra_indent_ok flag set.
17320 # That way, if we get an external flush call, we will still be
17321 # able to do some -lp alignment if necessary.
17322 $extra_indent_ok = ( $is_terminal_statement && $level > $group_level );
17324 $group_level = $level;
17326 # wait until after the above flush to get the leading space
17327 # count because it may have been changed if the -icp flag is in
17329 $leading_space_count = get_SPACES($indentation);
17333 # --------------------------------------------------------------------
17334 # Patch to collect outdentable block COMMENTS
17335 # --------------------------------------------------------------------
17336 my $is_blank_line = "";
17337 my $is_block_comment = ( $jmax == 0 && $rfields->[0] =~ /^#/ );
17338 if ( $group_type eq 'COMMENT' ) {
17342 && $outdent_long_lines
17343 && $leading_space_count == $comment_leading_space_count
17348 $group_lines[ ++$maximum_line_index ] = $rfields->[0];
17356 # --------------------------------------------------------------------
17357 # add dummy fields for terminal ternary
17358 # --------------------------------------------------------------------
17359 my $j_terminal_match;
17360 if ( $is_terminal_ternary && $current_line ) {
17361 $j_terminal_match =
17362 fix_terminal_ternary( $rfields, $rtokens, $rpatterns );
17363 $jmax = @{$rfields} - 1;
17366 # --------------------------------------------------------------------
17367 # add dummy fields for else statement
17368 # --------------------------------------------------------------------
17369 if ( $rfields->[0] =~ /^else\s*$/
17371 && $level_jump == 0 )
17373 $j_terminal_match = fix_terminal_else( $rfields, $rtokens, $rpatterns );
17374 $jmax = @{$rfields} - 1;
17377 # --------------------------------------------------------------------
17378 # Step 1. Handle simple line of code with no fields to match.
17379 # --------------------------------------------------------------------
17380 if ( $jmax <= 0 ) {
17383 if ( $maximum_line_index >= 0
17384 && !get_RECOVERABLE_SPACES( $group_lines[0]->get_indentation() ) )
17387 # flush the current group if it has some aligned columns..
17388 if ( $group_lines[0]->get_jmax() > 1 ) { my_flush() }
17390 # flush current group if we are just collecting side comments..
17393 # ...and we haven't seen a comment lately
17394 ( $zero_count > 3 )
17396 # ..or if this new line doesn't fit to the left of the comments
17397 || ( ( $leading_space_count + length( $$rfields[0] ) ) >
17398 $group_lines[0]->get_column(0) )
17405 # patch to start new COMMENT group if this comment may be outdented
17406 if ( $is_block_comment
17407 && $outdent_long_lines
17408 && $maximum_line_index < 0 )
17410 $group_type = 'COMMENT';
17411 $comment_leading_space_count = $leading_space_count;
17412 $group_lines[ ++$maximum_line_index ] = $rfields->[0];
17416 # just write this line directly if no current group, no side comment,
17417 # and no space recovery is needed.
17418 if ( $maximum_line_index < 0 && !get_RECOVERABLE_SPACES($indentation) )
17420 write_leader_and_string( $leading_space_count, $$rfields[0], 0,
17421 $outdent_long_lines, $rvertical_tightness_flags );
17429 # programming check: (shouldn't happen)
17430 # an error here implies an incorrect call was made
17431 if ( $jmax > 0 && ( $#{$rtokens} != ( $jmax - 1 ) ) ) {
17433 "Program bug in Perl::Tidy::VerticalAligner - number of tokens = $#{$rtokens} should be one less than number of fields: $#{$rfields})\n"
17435 report_definite_bug();
17438 # --------------------------------------------------------------------
17439 # create an object to hold this line
17440 # --------------------------------------------------------------------
17441 my $new_line = new Perl::Tidy::VerticalAligner::Line(
17443 jmax_original_line => $jmax,
17444 rtokens => $rtokens,
17445 rfields => $rfields,
17446 rpatterns => $rpatterns,
17447 indentation => $indentation,
17448 leading_space_count => $leading_space_count,
17449 outdent_long_lines => $outdent_long_lines,
17451 is_hanging_side_comment => $is_hanging_side_comment,
17452 maximum_line_length => $rOpts->{'maximum-line-length'},
17453 rvertical_tightness_flags => $rvertical_tightness_flags,
17456 # Initialize a global flag saying if the last line of the group should
17457 # match end of group and also terminate the group. There should be no
17458 # returns between here and where the flag is handled at the bottom.
17459 my $col_matching_terminal = 0;
17460 if ( defined($j_terminal_match) ) {
17462 # remember the column of the terminal ? or { to match with
17463 $col_matching_terminal = $current_line->get_column($j_terminal_match);
17465 # set global flag for sub decide_if_aligned
17466 $is_matching_terminal_line = 1;
17469 # --------------------------------------------------------------------
17470 # It simplifies things to create a zero length side comment
17472 # --------------------------------------------------------------------
17473 make_side_comment( $new_line, $level_end );
17475 # --------------------------------------------------------------------
17476 # Decide if this is a simple list of items.
17477 # There are 3 list types: none, comma, comma-arrow.
17478 # We use this below to be less restrictive in deciding what to align.
17479 # --------------------------------------------------------------------
17480 if ($is_forced_break) {
17481 decide_if_list($new_line);
17484 if ($current_line) {
17486 # --------------------------------------------------------------------
17487 # Allow hanging side comment to join current group, if any
17488 # This will help keep side comments aligned, because otherwise we
17489 # will have to start a new group, making alignment less likely.
17490 # --------------------------------------------------------------------
17491 join_hanging_comment( $new_line, $current_line )
17492 if $is_hanging_side_comment;
17494 # --------------------------------------------------------------------
17495 # If there is just one previous line, and it has more fields
17496 # than the new line, try to join fields together to get a match with
17497 # the new line. At the present time, only a single leading '=' is
17498 # allowed to be compressed out. This is useful in rare cases where
17499 # a table is forced to use old breakpoints because of side comments,
17500 # and the table starts out something like this:
17501 # my %MonthChars = ('0', 'Jan', # side comment
17504 # Eliminating the '=' field will allow the remaining fields to line up.
17505 # This situation does not occur if there are no side comments
17506 # because scan_list would put a break after the opening '('.
17507 # --------------------------------------------------------------------
17508 eliminate_old_fields( $new_line, $current_line );
17510 # --------------------------------------------------------------------
17511 # If the new line has more fields than the current group,
17512 # see if we can match the first fields and combine the remaining
17513 # fields of the new line.
17514 # --------------------------------------------------------------------
17515 eliminate_new_fields( $new_line, $current_line );
17517 # --------------------------------------------------------------------
17518 # Flush previous group unless all common tokens and patterns match..
17519 # --------------------------------------------------------------------
17520 check_match( $new_line, $current_line );
17522 # --------------------------------------------------------------------
17523 # See if there is space for this line in the current group (if any)
17524 # --------------------------------------------------------------------
17525 if ($current_line) {
17526 check_fit( $new_line, $current_line );
17530 # --------------------------------------------------------------------
17531 # Append this line to the current group (or start new group)
17532 # --------------------------------------------------------------------
17533 accept_line($new_line);
17535 # Future update to allow this to vary:
17536 $current_line = $new_line if ( $maximum_line_index == 0 );
17538 # output this group if it ends in a terminal else or ternary line
17539 if ( defined($j_terminal_match) ) {
17541 # if there is only one line in the group (maybe due to failure to match
17542 # perfectly with previous lines), then align the ? or { of this
17543 # terminal line with the previous one unless that would make the line
17545 if ( $maximum_line_index == 0 ) {
17546 my $col_now = $current_line->get_column($j_terminal_match);
17547 my $pad = $col_matching_terminal - $col_now;
17548 my $padding_available =
17549 $current_line->get_available_space_on_right();
17550 if ( $pad > 0 && $pad <= $padding_available ) {
17551 $current_line->increase_field_width( $j_terminal_match, $pad );
17555 $is_matching_terminal_line = 0;
17558 # --------------------------------------------------------------------
17559 # Step 8. Some old debugging stuff
17560 # --------------------------------------------------------------------
17561 VALIGN_DEBUG_FLAG_APPEND && do {
17562 print "APPEND fields:";
17563 dump_array(@$rfields);
17564 print "APPEND tokens:";
17565 dump_array(@$rtokens);
17566 print "APPEND patterns:";
17567 dump_array(@$rpatterns);
17574 sub join_hanging_comment {
17577 my $jmax = $line->get_jmax();
17578 return 0 unless $jmax == 1; # must be 2 fields
17579 my $rtokens = $line->get_rtokens();
17580 return 0 unless $$rtokens[0] eq '#'; # the second field is a comment..
17581 my $rfields = $line->get_rfields();
17582 return 0 unless $$rfields[0] =~ /^\s*$/; # the first field is empty...
17583 my $old_line = shift;
17584 my $maximum_field_index = $old_line->get_jmax();
17586 unless $maximum_field_index > $jmax; # the current line has more fields
17587 my $rpatterns = $line->get_rpatterns();
17589 $line->set_is_hanging_side_comment(1);
17590 $jmax = $maximum_field_index;
17591 $line->set_jmax($jmax);
17592 $$rfields[$jmax] = $$rfields[1];
17593 $$rtokens[ $jmax - 1 ] = $$rtokens[0];
17594 $$rpatterns[ $jmax - 1 ] = $$rpatterns[0];
17595 for ( my $j = 1 ; $j < $jmax ; $j++ ) {
17596 $$rfields[$j] = " "; # NOTE: caused glitch unless 1 blank, why?
17597 $$rtokens[ $j - 1 ] = "";
17598 $$rpatterns[ $j - 1 ] = "";
17603 sub eliminate_old_fields {
17605 my $new_line = shift;
17606 my $jmax = $new_line->get_jmax();
17607 if ( $jmax > $maximum_jmax_seen ) { $maximum_jmax_seen = $jmax }
17608 if ( $jmax < $minimum_jmax_seen ) { $minimum_jmax_seen = $jmax }
17610 # there must be one previous line
17611 return unless ( $maximum_line_index == 0 );
17613 my $old_line = shift;
17614 my $maximum_field_index = $old_line->get_jmax();
17616 # this line must have fewer fields
17617 return unless $maximum_field_index > $jmax;
17619 # Identify specific cases where field elimination is allowed:
17620 # case=1: both lines have comma-separated lists, and the first
17621 # line has an equals
17622 # case=2: both lines have leading equals
17624 # case 1 is the default
17627 # See if case 2: both lines have leading '='
17628 # We'll require smiliar leading patterns in this case
17629 my $old_rtokens = $old_line->get_rtokens();
17630 my $rtokens = $new_line->get_rtokens();
17631 my $rpatterns = $new_line->get_rpatterns();
17632 my $old_rpatterns = $old_line->get_rpatterns();
17633 if ( $rtokens->[0] =~ /^=\d*$/
17634 && $old_rtokens->[0] eq $rtokens->[0]
17635 && $old_rpatterns->[0] eq $rpatterns->[0] )
17640 # not too many fewer fields in new line for case 1
17641 return unless ( $case != 1 || $maximum_field_index - 2 <= $jmax );
17643 # case 1 must have side comment
17644 my $old_rfields = $old_line->get_rfields();
17647 && length( $$old_rfields[$maximum_field_index] ) == 0 );
17649 my $rfields = $new_line->get_rfields();
17651 my $hid_equals = 0;
17653 my @new_alignments = ();
17654 my @new_fields = ();
17655 my @new_matching_patterns = ();
17656 my @new_matching_tokens = ();
17660 my $current_field = '';
17661 my $current_pattern = '';
17663 # loop over all old tokens
17665 for ( $k = 0 ; $k < $maximum_field_index ; $k++ ) {
17666 $current_field .= $$old_rfields[$k];
17667 $current_pattern .= $$old_rpatterns[$k];
17668 last if ( $j > $jmax - 1 );
17670 if ( $$old_rtokens[$k] eq $$rtokens[$j] ) {
17672 $new_fields[$j] = $current_field;
17673 $new_matching_patterns[$j] = $current_pattern;
17674 $current_field = '';
17675 $current_pattern = '';
17676 $new_matching_tokens[$j] = $$old_rtokens[$k];
17677 $new_alignments[$j] = $old_line->get_alignment($k);
17682 if ( $$old_rtokens[$k] =~ /^\=\d*$/ ) {
17683 last if ( $case == 2 ); # avoid problems with stuff
17684 # like: $a=$b=$c=$d;
17688 if ( $in_match && $case == 1 )
17689 ; # disallow gaps in matching field types in case 1
17693 # Modify the current state if we are successful.
17694 # We must exactly reach the ends of both lists for success.
17695 if ( ( $j == $jmax )
17696 && ( $current_field eq '' )
17697 && ( $case != 1 || $hid_equals ) )
17699 $k = $maximum_field_index;
17700 $current_field .= $$old_rfields[$k];
17701 $current_pattern .= $$old_rpatterns[$k];
17702 $new_fields[$j] = $current_field;
17703 $new_matching_patterns[$j] = $current_pattern;
17705 $new_alignments[$j] = $old_line->get_alignment($k);
17706 $maximum_field_index = $j;
17708 $old_line->set_alignments(@new_alignments);
17709 $old_line->set_jmax($jmax);
17710 $old_line->set_rtokens( \@new_matching_tokens );
17711 $old_line->set_rfields( \@new_fields );
17712 $old_line->set_rpatterns( \@$rpatterns );
17716 # create an empty side comment if none exists
17717 sub make_side_comment {
17718 my $new_line = shift;
17719 my $level_end = shift;
17720 my $jmax = $new_line->get_jmax();
17721 my $rtokens = $new_line->get_rtokens();
17723 # if line does not have a side comment...
17724 if ( ( $jmax == 0 ) || ( $$rtokens[ $jmax - 1 ] ne '#' ) ) {
17725 my $rfields = $new_line->get_rfields();
17726 my $rpatterns = $new_line->get_rpatterns();
17727 $$rtokens[$jmax] = '#';
17728 $$rfields[ ++$jmax ] = '';
17729 $$rpatterns[$jmax] = '#';
17730 $new_line->set_jmax($jmax);
17731 $new_line->set_jmax_original_line($jmax);
17734 # line has a side comment..
17737 # don't remember old side comment location for very long
17738 my $line_number = $vertical_aligner_self->get_output_line_number();
17739 my $rfields = $new_line->get_rfields();
17741 $line_number - $last_side_comment_line_number > 12
17743 # and don't remember comment location across block level changes
17744 || ( $level_end < $last_side_comment_level && $$rfields[0] =~ /^}/ )
17747 forget_side_comment();
17749 $last_side_comment_line_number = $line_number;
17750 $last_side_comment_level = $level_end;
17754 sub decide_if_list {
17758 # A list will be taken to be a line with a forced break in which all
17759 # of the field separators are commas or comma-arrows (except for the
17762 # List separator tokens are things like ',3' or '=>2',
17763 # where the trailing digit is the nesting depth. Allow braces
17764 # to allow nested list items.
17765 my $rtokens = $line->get_rtokens();
17766 my $test_token = $$rtokens[0];
17767 if ( $test_token =~ /^(\,|=>)/ ) {
17768 my $list_type = $test_token;
17769 my $jmax = $line->get_jmax();
17771 foreach ( 1 .. $jmax - 2 ) {
17772 if ( $$rtokens[$_] !~ /^(\,|=>|\{)/ ) {
17777 $line->set_list_type($list_type);
17781 sub eliminate_new_fields {
17783 return unless ( $maximum_line_index >= 0 );
17784 my ( $new_line, $old_line ) = @_;
17785 my $jmax = $new_line->get_jmax();
17787 my $old_rtokens = $old_line->get_rtokens();
17788 my $rtokens = $new_line->get_rtokens();
17789 my $is_assignment =
17790 ( $rtokens->[0] =~ /^=\d*$/ && ( $old_rtokens->[0] eq $rtokens->[0] ) );
17792 # must be monotonic variation
17793 return unless ( $is_assignment || $previous_maximum_jmax_seen <= $jmax );
17795 # must be more fields in the new line
17796 my $maximum_field_index = $old_line->get_jmax();
17797 return unless ( $maximum_field_index < $jmax );
17799 unless ($is_assignment) {
17801 unless ( $old_line->get_jmax_original_line() == $minimum_jmax_seen )
17802 ; # only if monotonic
17804 # never combine fields of a comma list
17806 unless ( $maximum_field_index > 1 )
17807 && ( $new_line->get_list_type() !~ /^,/ );
17810 my $rfields = $new_line->get_rfields();
17811 my $rpatterns = $new_line->get_rpatterns();
17812 my $old_rpatterns = $old_line->get_rpatterns();
17814 # loop over all OLD tokens except comment and check match
17817 for ( $k = 0 ; $k < $maximum_field_index - 1 ; $k++ ) {
17818 if ( ( $$old_rtokens[$k] ne $$rtokens[$k] )
17819 || ( $$old_rpatterns[$k] ne $$rpatterns[$k] ) )
17826 # first tokens agree, so combine extra new tokens
17828 for $k ( $maximum_field_index .. $jmax - 1 ) {
17830 $$rfields[ $maximum_field_index - 1 ] .= $$rfields[$k];
17831 $$rfields[$k] = "";
17832 $$rpatterns[ $maximum_field_index - 1 ] .= $$rpatterns[$k];
17833 $$rpatterns[$k] = "";
17836 $$rtokens[ $maximum_field_index - 1 ] = '#';
17837 $$rfields[$maximum_field_index] = $$rfields[$jmax];
17838 $$rpatterns[$maximum_field_index] = $$rpatterns[$jmax];
17839 $jmax = $maximum_field_index;
17841 $new_line->set_jmax($jmax);
17844 sub fix_terminal_ternary {
17846 # Add empty fields as necessary to align a ternary term
17851 # : $year % 100 ? 1
17852 # : $year % 400 ? 0
17855 # returns 1 if the terminal item should be indented
17857 my ( $rfields, $rtokens, $rpatterns ) = @_;
17859 my $jmax = @{$rfields} - 1;
17860 my $old_line = $group_lines[$maximum_line_index];
17861 my $rfields_old = $old_line->get_rfields();
17863 my $rpatterns_old = $old_line->get_rpatterns();
17864 my $rtokens_old = $old_line->get_rtokens();
17865 my $maximum_field_index = $old_line->get_jmax();
17867 # look for the question mark after the :
17869 my $depth_question;
17871 for ( my $j = 0 ; $j < $maximum_field_index ; $j++ ) {
17872 my $tok = $rtokens_old->[$j];
17873 if ( $tok =~ /^\?(\d+)$/ ) {
17874 $depth_question = $1;
17876 # depth must be correct
17877 next unless ( $depth_question eq $group_level );
17880 if ( $rfields_old->[ $j + 1 ] =~ /^(\?\s*)/ ) {
17881 $pad = " " x length($1);
17884 return; # shouldn't happen
17889 return unless ( defined($jquestion) ); # shouldn't happen
17891 # Now splice the tokens and patterns of the previous line
17892 # into the else line to insure a match. Add empty fields
17894 my $jadd = $jquestion;
17896 # Work on copies of the actual arrays in case we have
17897 # to return due to an error
17898 my @fields = @{$rfields};
17899 my @patterns = @{$rpatterns};
17900 my @tokens = @{$rtokens};
17902 VALIGN_DEBUG_FLAG_TERNARY && do {
17904 print "CURRENT FIELDS=<@{$rfields_old}>\n";
17905 print "CURRENT TOKENS=<@{$rtokens_old}>\n";
17906 print "CURRENT PATTERNS=<@{$rpatterns_old}>\n";
17907 print "UNMODIFIED FIELDS=<@{$rfields}>\n";
17908 print "UNMODIFIED TOKENS=<@{$rtokens}>\n";
17909 print "UNMODIFIED PATTERNS=<@{$rpatterns}>\n";
17912 # handle cases of leading colon on this line
17913 if ( $fields[0] =~ /^(:\s*)(.*)$/ ) {
17915 my ( $colon, $therest ) = ( $1, $2 );
17917 # Handle sub-case of first field with leading colon plus additional code
17918 # This is the usual situation as at the '1' below:
17920 # : $year % 400 ? 0
17924 # Split the first field after the leading colon and insert padding.
17925 # Note that this padding will remain even if the terminal value goes
17926 # out on a separate line. This does not seem to look to bad, so no
17927 # mechanism has been included to undo it.
17928 my $field1 = shift @fields;
17929 unshift @fields, ( $colon, $pad . $therest );
17931 # change the leading pattern from : to ?
17932 return unless ( $patterns[0] =~ s/^\:/?/ );
17934 # install leading tokens and patterns of existing line
17935 unshift( @tokens, @{$rtokens_old}[ 0 .. $jquestion ] );
17936 unshift( @patterns, @{$rpatterns_old}[ 0 .. $jquestion ] );
17938 # insert appropriate number of empty fields
17939 splice( @fields, 1, 0, ('') x $jadd ) if $jadd;
17942 # handle sub-case of first field just equal to leading colon.
17943 # This can happen for example in the example below where
17944 # the leading '(' would create a new alignment token
17945 # : ( $name =~ /[]}]$/ ) ? ( $mname = $name )
17946 # : ( $mname = $name . '->' );
17949 return unless ( $jmax > 0 && $tokens[0] ne '#' ); # shouldn't happen
17951 # prepend a leading ? onto the second pattern
17952 $patterns[1] = "?b" . $patterns[1];
17954 # pad the second field
17955 $fields[1] = $pad . $fields[1];
17957 # install leading tokens and patterns of existing line, replacing
17958 # leading token and inserting appropriate number of empty fields
17959 splice( @tokens, 0, 1, @{$rtokens_old}[ 0 .. $jquestion ] );
17960 splice( @patterns, 1, 0, @{$rpatterns_old}[ 1 .. $jquestion ] );
17961 splice( @fields, 1, 0, ('') x $jadd ) if $jadd;
17965 # Handle case of no leading colon on this line. This will
17966 # be the case when -wba=':' is used. For example,
17967 # $year % 400 ? 0 :
17971 # install leading tokens and patterns of existing line
17972 $patterns[0] = '?' . 'b' . $patterns[0];
17973 unshift( @tokens, @{$rtokens_old}[ 0 .. $jquestion ] );
17974 unshift( @patterns, @{$rpatterns_old}[ 0 .. $jquestion ] );
17976 # insert appropriate number of empty fields
17977 $jadd = $jquestion + 1;
17978 $fields[0] = $pad . $fields[0];
17979 splice( @fields, 0, 0, ('') x $jadd ) if $jadd;
17982 VALIGN_DEBUG_FLAG_TERNARY && do {
17984 print "MODIFIED TOKENS=<@tokens>\n";
17985 print "MODIFIED PATTERNS=<@patterns>\n";
17986 print "MODIFIED FIELDS=<@fields>\n";
17989 # all ok .. update the arrays
17990 @{$rfields} = @fields;
17991 @{$rtokens} = @tokens;
17992 @{$rpatterns} = @patterns;
17994 # force a flush after this line
17998 sub fix_terminal_else {
18000 # Add empty fields as necessary to align a balanced terminal
18001 # else block to a previous if/elsif/unless block,
18004 # if ( 1 || $x ) { print "ok 13\n"; }
18005 # else { print "not ok 13\n"; }
18007 # returns 1 if the else block should be indented
18009 my ( $rfields, $rtokens, $rpatterns ) = @_;
18010 my $jmax = @{$rfields} - 1;
18011 return unless ( $jmax > 0 );
18013 # check for balanced else block following if/elsif/unless
18014 my $rfields_old = $current_line->get_rfields();
18016 # TBD: add handling for 'case'
18017 return unless ( $rfields_old->[0] =~ /^(if|elsif|unless)\s*$/ );
18019 # look for the opening brace after the else, and extrace the depth
18020 my $tok_brace = $rtokens->[0];
18022 if ( $tok_brace =~ /^\{(\d+)$/ ) { $depth_brace = $1; }
18024 # probably: "else # side_comment"
18027 my $rpatterns_old = $current_line->get_rpatterns();
18028 my $rtokens_old = $current_line->get_rtokens();
18029 my $maximum_field_index = $current_line->get_jmax();
18031 # be sure the previous if/elsif is followed by an opening paren
18033 my $tok_paren = '(' . $depth_brace;
18034 my $tok_test = $rtokens_old->[$jparen];
18035 return unless ( $tok_test eq $tok_paren ); # shouldn't happen
18037 # Now find the opening block brace
18039 for ( my $j = 1 ; $j < $maximum_field_index ; $j++ ) {
18040 my $tok = $rtokens_old->[$j];
18041 if ( $tok eq $tok_brace ) {
18046 return unless ( defined($jbrace) ); # shouldn't happen
18048 # Now splice the tokens and patterns of the previous line
18049 # into the else line to insure a match. Add empty fields
18051 my $jadd = $jbrace - $jparen;
18052 splice( @{$rtokens}, 0, 0, @{$rtokens_old}[ $jparen .. $jbrace - 1 ] );
18053 splice( @{$rpatterns}, 1, 0, @{$rpatterns_old}[ $jparen + 1 .. $jbrace ] );
18054 splice( @{$rfields}, 1, 0, ('') x $jadd );
18056 # force a flush after this line if it does not follow a case
18058 unless ( $rfields_old->[0] =~ /^case\s*$/ );
18063 my $new_line = shift;
18064 my $old_line = shift;
18066 # uses global variables:
18067 # $previous_minimum_jmax_seen
18068 # $maximum_jmax_seen
18069 # $maximum_line_index
18071 my $jmax = $new_line->get_jmax();
18072 my $maximum_field_index = $old_line->get_jmax();
18074 # flush if this line has too many fields
18075 if ( $jmax > $maximum_field_index ) { my_flush(); return }
18077 # flush if adding this line would make a non-monotonic field count
18079 ( $maximum_field_index > $jmax ) # this has too few fields
18081 ( $previous_minimum_jmax_seen < $jmax ) # and wouldn't be monotonic
18082 || ( $old_line->get_jmax_original_line() != $maximum_jmax_seen )
18090 # otherwise append this line if everything matches
18091 my $jmax_original_line = $new_line->get_jmax_original_line();
18092 my $is_hanging_side_comment = $new_line->get_is_hanging_side_comment();
18093 my $rtokens = $new_line->get_rtokens();
18094 my $rfields = $new_line->get_rfields();
18095 my $rpatterns = $new_line->get_rpatterns();
18096 my $list_type = $new_line->get_list_type();
18098 my $group_list_type = $old_line->get_list_type();
18099 my $old_rpatterns = $old_line->get_rpatterns();
18100 my $old_rtokens = $old_line->get_rtokens();
18102 my $jlimit = $jmax - 1;
18103 if ( $maximum_field_index > $jmax ) {
18104 $jlimit = $jmax_original_line;
18105 --$jlimit unless ( length( $new_line->get_rfields()->[$jmax] ) );
18108 my $everything_matches = 1;
18110 # common list types always match
18111 unless ( ( $group_list_type && ( $list_type eq $group_list_type ) )
18112 || $is_hanging_side_comment )
18115 my $leading_space_count = $new_line->get_leading_space_count();
18116 my $saw_equals = 0;
18117 for my $j ( 0 .. $jlimit ) {
18120 my $old_tok = $$old_rtokens[$j];
18121 my $new_tok = $$rtokens[$j];
18123 # Dumb down the match AFTER an equals and
18124 # also dumb down after seeing a ? ternary operator ...
18125 # Everything after a + is the token which preceded the previous
18126 # opening paren (container name). We won't require them to match.
18127 if ( $saw_equals && $new_tok =~ /(.*)\+/ ) {
18129 $old_tok =~ s/\+.*$//;
18132 if ( $new_tok =~ /^[\?=]\d*$/ ) { $saw_equals = 1 }
18134 # we never match if the matching tokens differ
18136 && $old_tok ne $new_tok )
18141 # otherwise, if patterns match, we always have a match.
18142 # However, if patterns don't match, we have to be careful...
18143 elsif ( $$old_rpatterns[$j] ne $$rpatterns[$j] ) {
18145 # We have to be very careful about aligning commas when the
18146 # pattern's don't match, because it can be worse to create an
18147 # alignment where none is needed than to omit one. The current
18148 # rule: if we are within a matching sub call (indicated by '+'
18149 # in the matching token), we'll allow a marginal match, but
18152 # Here's an example where we'd like to align the '='
18153 # my $cfile = File::Spec->catfile( 't', 'callext.c' );
18154 # my $inc = File::Spec->catdir( 'Basic', 'Core' );
18155 # because the function names differ.
18156 # Future alignment logic should make this unnecessary.
18158 # Here's an example where the ','s are not contained in a call.
18159 # The first line below should probably not match the next two:
18160 # ( $a, $b ) = ( $b, $r );
18161 # ( $x1, $x2 ) = ( $x2 - $q * $x1, $x1 );
18162 # ( $y1, $y2 ) = ( $y2 - $q * $y1, $y1 );
18163 if ( $new_tok =~ /^,/ ) {
18164 if ( $$rtokens[$j] =~ /[A-Za-z]/ ) {
18165 $marginal_match = 1;
18172 # parens don't align well unless patterns match
18173 elsif ( $new_tok =~ /^\(/ ) {
18177 # Handle an '=' alignment with different patterns to
18179 elsif ( $new_tok =~ /^=\d*$/ ) {
18183 # It is best to be a little restrictive when
18184 # aligning '=' tokens. Here is an example of
18185 # two lines that we will not align:
18188 # The problem is that one is a 'my' declaration,
18189 # and the other isn't, so they're not very similar.
18190 # We will filter these out by comparing the first
18191 # letter of the pattern. This is crude, but works
18194 substr( $$old_rpatterns[$j], 0, 1 ) ne
18195 substr( $$rpatterns[$j], 0, 1 ) )
18200 # If we pass that test, we'll call it a marginal match.
18201 # Here is an example of a marginal match:
18203 # $op = compile_bblock($op);
18204 # The left tokens are both identifiers, but
18205 # one accesses a hash and the other doesn't.
18206 # We'll let this be a tentative match and undo
18207 # it later if we don't find more than 2 lines
18209 elsif ( $maximum_line_index == 0 ) {
18210 $marginal_match = 1;
18215 # Don't let line with fewer fields increase column widths
18217 if ( $maximum_field_index > $jmax ) {
18219 length( $$rfields[$j] ) - $old_line->current_field_width($j);
18222 $pad += $leading_space_count;
18225 # TESTING: suspend this rule to allow last lines to join
18226 if ( $pad > 0 ) { $match = 0; }
18230 $everything_matches = 0;
18236 if ( $maximum_field_index > $jmax ) {
18238 if ($everything_matches) {
18240 my $comment = $$rfields[$jmax];
18241 for $jmax ( $jlimit .. $maximum_field_index ) {
18242 $$rtokens[$jmax] = $$old_rtokens[$jmax];
18243 $$rfields[ ++$jmax ] = '';
18244 $$rpatterns[$jmax] = $$old_rpatterns[$jmax];
18246 $$rfields[$jmax] = $comment;
18247 $new_line->set_jmax($jmax);
18251 my_flush() unless ($everything_matches);
18256 return unless ( $maximum_line_index >= 0 );
18257 my $new_line = shift;
18258 my $old_line = shift;
18260 my $jmax = $new_line->get_jmax();
18261 my $leading_space_count = $new_line->get_leading_space_count();
18262 my $is_hanging_side_comment = $new_line->get_is_hanging_side_comment();
18263 my $rtokens = $new_line->get_rtokens();
18264 my $rfields = $new_line->get_rfields();
18265 my $rpatterns = $new_line->get_rpatterns();
18267 my $group_list_type = $group_lines[0]->get_list_type();
18269 my $padding_so_far = 0;
18270 my $padding_available = $old_line->get_available_space_on_right();
18272 # save current columns in case this doesn't work
18273 save_alignment_columns();
18275 my ( $j, $pad, $eight );
18276 my $maximum_field_index = $old_line->get_jmax();
18277 for $j ( 0 .. $jmax ) {
18279 $pad = length( $$rfields[$j] ) - $old_line->current_field_width($j);
18282 $pad += $leading_space_count;
18285 # remember largest gap of the group, excluding gap to side comment
18287 && $group_maximum_gap < -$pad
18289 && $j < $jmax - 1 )
18291 $group_maximum_gap = -$pad;
18296 ## This patch helps sometimes, but it doesn't check to see if
18297 ## the line is too long even without the side comment. It needs
18299 ##don't let a long token with no trailing side comment push
18300 ##side comments out, or end a group. (sidecmt1.t)
18301 ##next if ($j==$jmax-1 && length($$rfields[$jmax])==0);
18303 # This line will need space; lets see if we want to accept it..
18306 # not if this won't fit
18307 ( $pad > $padding_available )
18309 # previously, there were upper bounds placed on padding here
18310 # (maximum_whitespace_columns), but they were not really helpful
18315 # revert to starting state then flush; things didn't work out
18316 restore_alignment_columns();
18321 # patch to avoid excessive gaps in previous lines,
18322 # due to a line of fewer fields.
18323 # return join( ".",
18324 # $self->{"dfi"}, $self->{"aa"}, $self->rsvd, $self->{"rd"},
18325 # $self->{"area"}, $self->{"id"}, $self->{"sel"} );
18326 next if ( $jmax < $maximum_field_index && $j == $jmax - 1 );
18328 # looks ok, squeeze this field in
18329 $old_line->increase_field_width( $j, $pad );
18330 $padding_available -= $pad;
18332 # remember largest gap of the group, excluding gap to side comment
18333 if ( $pad > $group_maximum_gap && $j > 0 && $j < $jmax - 1 ) {
18334 $group_maximum_gap = $pad;
18341 # The current line either starts a new alignment group or is
18342 # accepted into the current alignment group.
18343 my $new_line = shift;
18344 $group_lines[ ++$maximum_line_index ] = $new_line;
18346 # initialize field lengths if starting new group
18347 if ( $maximum_line_index == 0 ) {
18349 my $jmax = $new_line->get_jmax();
18350 my $rfields = $new_line->get_rfields();
18351 my $rtokens = $new_line->get_rtokens();
18353 my $col = $new_line->get_leading_space_count();
18355 for $j ( 0 .. $jmax ) {
18356 $col += length( $$rfields[$j] );
18358 # create initial alignments for the new group
18360 if ( $j < $jmax ) { $token = $$rtokens[$j] }
18361 my $alignment = make_alignment( $col, $token );
18362 $new_line->set_alignment( $j, $alignment );
18365 $maximum_jmax_seen = $jmax;
18366 $minimum_jmax_seen = $jmax;
18369 # use previous alignments otherwise
18371 my @new_alignments =
18372 $group_lines[ $maximum_line_index - 1 ]->get_alignments();
18373 $new_line->set_alignments(@new_alignments);
18376 # remember group jmax extremes for next call to append_line
18377 $previous_minimum_jmax_seen = $minimum_jmax_seen;
18378 $previous_maximum_jmax_seen = $maximum_jmax_seen;
18383 # debug routine to dump array contents
18388 # flush() sends the current Perl::Tidy::VerticalAligner group down the
18389 # pipeline to Perl::Tidy::FileWriter.
18391 # This is the external flush, which also empties the cache
18394 if ( $maximum_line_index < 0 ) {
18395 if ($cached_line_type) {
18396 $seqno_string = $cached_seqno_string;
18397 entab_and_output( $cached_line_text,
18398 $cached_line_leading_space_count,
18399 $last_group_level_written );
18400 $cached_line_type = 0;
18401 $cached_line_text = "";
18402 $cached_seqno_string = "";
18410 # This is the internal flush, which leaves the cache intact
18413 return if ( $maximum_line_index < 0 );
18415 # handle a group of comment lines
18416 if ( $group_type eq 'COMMENT' ) {
18418 VALIGN_DEBUG_FLAG_APPEND0 && do {
18419 my ( $a, $b, $c ) = caller();
18421 "APPEND0: Flush called from $a $b $c for COMMENT group: lines=$maximum_line_index \n";
18424 my $leading_space_count = $comment_leading_space_count;
18425 my $leading_string = get_leading_string($leading_space_count);
18427 # zero leading space count if any lines are too long
18428 my $max_excess = 0;
18429 for my $i ( 0 .. $maximum_line_index ) {
18430 my $str = $group_lines[$i];
18432 length($str) + $leading_space_count - $rOpts_maximum_line_length;
18433 if ( $excess > $max_excess ) {
18434 $max_excess = $excess;
18438 if ( $max_excess > 0 ) {
18439 $leading_space_count -= $max_excess;
18440 if ( $leading_space_count < 0 ) { $leading_space_count = 0 }
18441 $last_outdented_line_at =
18442 $file_writer_object->get_output_line_number();
18443 unless ($outdented_line_count) {
18444 $first_outdented_line_at = $last_outdented_line_at;
18446 $outdented_line_count += ( $maximum_line_index + 1 );
18449 # write the group of lines
18450 my $outdent_long_lines = 0;
18451 for my $i ( 0 .. $maximum_line_index ) {
18452 write_leader_and_string( $leading_space_count, $group_lines[$i], 0,
18453 $outdent_long_lines, "" );
18457 # handle a group of code lines
18460 VALIGN_DEBUG_FLAG_APPEND0 && do {
18461 my $group_list_type = $group_lines[0]->get_list_type();
18462 my ( $a, $b, $c ) = caller();
18463 my $maximum_field_index = $group_lines[0]->get_jmax();
18465 "APPEND0: Flush called from $a $b $c fields=$maximum_field_index list=$group_list_type lines=$maximum_line_index extra=$extra_indent_ok\n";
18469 # some small groups are best left unaligned
18470 my $do_not_align = decide_if_aligned();
18472 # optimize side comment location
18473 $do_not_align = adjust_side_comment($do_not_align);
18475 # recover spaces for -lp option if possible
18476 my $extra_leading_spaces = get_extra_leading_spaces();
18478 # all lines of this group have the same basic leading spacing
18479 my $group_leader_length = $group_lines[0]->get_leading_space_count();
18481 # add extra leading spaces if helpful
18482 my $min_ci_gap = improve_continuation_indentation( $do_not_align,
18483 $group_leader_length );
18485 # loop to output all lines
18486 for my $i ( 0 .. $maximum_line_index ) {
18487 my $line = $group_lines[$i];
18488 write_vertically_aligned_line( $line, $min_ci_gap, $do_not_align,
18489 $group_leader_length, $extra_leading_spaces );
18492 initialize_for_new_group();
18495 sub decide_if_aligned {
18497 # Do not try to align two lines which are not really similar
18498 return unless $maximum_line_index == 1;
18499 return if ($is_matching_terminal_line);
18501 my $group_list_type = $group_lines[0]->get_list_type();
18503 my $do_not_align = (
18505 # always align lists
18510 # don't align if it was just a marginal match
18513 # don't align two lines with big gap
18514 || $group_maximum_gap > 12
18516 # or lines with differing number of alignment tokens
18517 # TODO: this could be improved. It occasionally rejects
18519 || $previous_maximum_jmax_seen != $previous_minimum_jmax_seen
18523 # But try to convert them into a simple comment group if the first line
18524 # a has side comment
18525 my $rfields = $group_lines[0]->get_rfields();
18526 my $maximum_field_index = $group_lines[0]->get_jmax();
18528 && ( $maximum_line_index > 0 )
18529 && ( length( $$rfields[$maximum_field_index] ) > 0 ) )
18534 return $do_not_align;
18537 sub adjust_side_comment {
18539 my $do_not_align = shift;
18541 # let's see if we can move the side comment field out a little
18542 # to improve readability (the last field is always a side comment field)
18543 my $have_side_comment = 0;
18544 my $first_side_comment_line = -1;
18545 my $maximum_field_index = $group_lines[0]->get_jmax();
18546 for my $i ( 0 .. $maximum_line_index ) {
18547 my $line = $group_lines[$i];
18549 if ( length( $line->get_rfields()->[$maximum_field_index] ) ) {
18550 $have_side_comment = 1;
18551 $first_side_comment_line = $i;
18556 my $kmax = $maximum_field_index + 1;
18558 if ($have_side_comment) {
18560 my $line = $group_lines[0];
18562 # the maximum space without exceeding the line length:
18563 my $avail = $line->get_available_space_on_right();
18565 # try to use the previous comment column
18566 my $side_comment_column = $line->get_column( $kmax - 2 );
18567 my $move = $last_comment_column - $side_comment_column;
18569 ## my $sc_line0 = $side_comment_history[0]->[0];
18570 ## my $sc_col0 = $side_comment_history[0]->[1];
18571 ## my $sc_line1 = $side_comment_history[1]->[0];
18572 ## my $sc_col1 = $side_comment_history[1]->[1];
18573 ## my $sc_line2 = $side_comment_history[2]->[0];
18574 ## my $sc_col2 = $side_comment_history[2]->[1];
18576 ## # FUTURE UPDATES:
18577 ## # Be sure to ignore 'do not align' and '} # end comments'
18578 ## # Find first $move > 0 and $move <= $avail as follows:
18579 ## # 1. try sc_col1 if sc_col1 == sc_col0 && (line-sc_line0) < 12
18580 ## # 2. try sc_col2 if (line-sc_line2) < 12
18581 ## # 3. try min possible space, plus up to 8,
18582 ## # 4. try min possible space
18584 if ( $kmax > 0 && !$do_not_align ) {
18586 # but if this doesn't work, give up and use the minimum space
18587 if ( $move > $avail ) {
18588 $move = $rOpts_minimum_space_to_comment - 1;
18591 # but we want some minimum space to the comment
18592 my $min_move = $rOpts_minimum_space_to_comment - 1;
18594 && $last_side_comment_length > 0
18595 && ( $first_side_comment_line == 0 )
18596 && $group_level == $last_group_level_written )
18601 if ( $move < $min_move ) {
18605 # prevously, an upper bound was placed on $move here,
18606 # (maximum_space_to_comment), but it was not helpful
18608 # don't exceed the available space
18609 if ( $move > $avail ) { $move = $avail }
18611 # we can only increase space, never decrease
18613 $line->increase_field_width( $maximum_field_index - 1, $move );
18616 # remember this column for the next group
18617 $last_comment_column = $line->get_column( $kmax - 2 );
18621 # try to at least line up the existing side comment location
18622 if ( $kmax > 0 && $move > 0 && $move < $avail ) {
18623 $line->increase_field_width( $maximum_field_index - 1, $move );
18627 # reset side comment column if we can't align
18629 forget_side_comment();
18633 return $do_not_align;
18636 sub improve_continuation_indentation {
18637 my ( $do_not_align, $group_leader_length ) = @_;
18639 # See if we can increase the continuation indentation
18640 # to move all continuation lines closer to the next field
18641 # (unless it is a comment).
18643 # '$min_ci_gap'is the extra indentation that we may need to introduce.
18644 # We will only introduce this to fields which already have some ci.
18645 # Without this variable, we would occasionally get something like this
18648 # use overload '+' => \&plus,
18650 # '*' => \&multiply,
18653 # 'atan2' => \&atan2,
18655 # Whereas with this variable, we can shift variables over to get this:
18657 # use overload '+' => \&plus,
18659 # '*' => \&multiply,
18662 # 'atan2' => \&atan2,
18664 ## BUB: Deactivated####################
18665 # The trouble with this patch is that it may, for example,
18666 # move in some 'or's or ':'s, and leave some out, so that the
18667 # left edge alignment suffers.
18669 ###########################################
18671 my $maximum_field_index = $group_lines[0]->get_jmax();
18673 my $min_ci_gap = $rOpts_maximum_line_length;
18674 if ( $maximum_field_index > 1 && !$do_not_align ) {
18676 for my $i ( 0 .. $maximum_line_index ) {
18677 my $line = $group_lines[$i];
18678 my $leading_space_count = $line->get_leading_space_count();
18679 my $rfields = $line->get_rfields();
18682 $line->get_column(0) -
18683 $leading_space_count -
18684 length( $$rfields[0] );
18686 if ( $leading_space_count > $group_leader_length ) {
18687 if ( $gap < $min_ci_gap ) { $min_ci_gap = $gap }
18691 if ( $min_ci_gap >= $rOpts_maximum_line_length ) {
18698 return $min_ci_gap;
18701 sub write_vertically_aligned_line {
18703 my ( $line, $min_ci_gap, $do_not_align, $group_leader_length,
18704 $extra_leading_spaces )
18706 my $rfields = $line->get_rfields();
18707 my $leading_space_count = $line->get_leading_space_count();
18708 my $outdent_long_lines = $line->get_outdent_long_lines();
18709 my $maximum_field_index = $line->get_jmax();
18710 my $rvertical_tightness_flags = $line->get_rvertical_tightness_flags();
18712 # add any extra spaces
18713 if ( $leading_space_count > $group_leader_length ) {
18714 $leading_space_count += $min_ci_gap;
18717 my $str = $$rfields[0];
18719 # loop to concatenate all fields of this line and needed padding
18720 my $total_pad_count = 0;
18722 for $j ( 1 .. $maximum_field_index ) {
18724 # skip zero-length side comments
18726 if ( ( $j == $maximum_field_index )
18727 && ( !defined( $$rfields[$j] ) || ( length( $$rfields[$j] ) == 0 ) )
18730 # compute spaces of padding before this field
18731 my $col = $line->get_column( $j - 1 );
18732 $pad = $col - ( length($str) + $leading_space_count );
18734 if ($do_not_align) {
18736 ( $j < $maximum_field_index )
18738 : $rOpts_minimum_space_to_comment - 1;
18741 # accumulate the padding
18742 if ( $pad > 0 ) { $total_pad_count += $pad; }
18745 if ( !defined $$rfields[$j] ) {
18746 write_diagnostics("UNDEFined field at j=$j\n");
18749 # only add padding when we have a finite field;
18750 # this avoids extra terminal spaces if we have empty fields
18751 if ( length( $$rfields[$j] ) > 0 ) {
18752 $str .= ' ' x $total_pad_count;
18753 $total_pad_count = 0;
18754 $str .= $$rfields[$j];
18757 $total_pad_count = 0;
18760 # update side comment history buffer
18761 if ( $j == $maximum_field_index ) {
18762 my $lineno = $file_writer_object->get_output_line_number();
18763 shift @side_comment_history;
18764 push @side_comment_history, [ $lineno, $col ];
18768 my $side_comment_length = ( length( $$rfields[$maximum_field_index] ) );
18770 # ship this line off
18771 write_leader_and_string( $leading_space_count + $extra_leading_spaces,
18772 $str, $side_comment_length, $outdent_long_lines,
18773 $rvertical_tightness_flags );
18776 sub get_extra_leading_spaces {
18778 #----------------------------------------------------------
18779 # Define any extra indentation space (for the -lp option).
18781 # If a list has side comments, sub scan_list must dump the
18782 # list before it sees everything. When this happens, it sets
18783 # the indentation to the standard scheme, but notes how
18784 # many spaces it would have liked to use. We may be able
18785 # to recover that space here in the event that that all of the
18786 # lines of a list are back together again.
18787 #----------------------------------------------------------
18789 my $extra_leading_spaces = 0;
18790 if ($extra_indent_ok) {
18791 my $object = $group_lines[0]->get_indentation();
18792 if ( ref($object) ) {
18793 my $extra_indentation_spaces_wanted =
18794 get_RECOVERABLE_SPACES($object);
18796 # all indentation objects must be the same
18798 for $i ( 1 .. $maximum_line_index ) {
18799 if ( $object != $group_lines[$i]->get_indentation() ) {
18800 $extra_indentation_spaces_wanted = 0;
18805 if ($extra_indentation_spaces_wanted) {
18807 # the maximum space without exceeding the line length:
18808 my $avail = $group_lines[0]->get_available_space_on_right();
18809 $extra_leading_spaces =
18810 ( $avail > $extra_indentation_spaces_wanted )
18811 ? $extra_indentation_spaces_wanted
18814 # update the indentation object because with -icp the terminal
18815 # ');' will use the same adjustment.
18816 $object->permanently_decrease_AVAILABLE_SPACES(
18817 -$extra_leading_spaces );
18821 return $extra_leading_spaces;
18824 sub combine_fields {
18826 # combine all fields except for the comment field ( sidecmt.t )
18827 # Uses global variables:
18829 # $maximum_line_index
18831 my $maximum_field_index = $group_lines[0]->get_jmax();
18832 for ( $j = 0 ; $j <= $maximum_line_index ; $j++ ) {
18833 my $line = $group_lines[$j];
18834 my $rfields = $line->get_rfields();
18835 foreach ( 1 .. $maximum_field_index - 1 ) {
18836 $$rfields[0] .= $$rfields[$_];
18838 $$rfields[1] = $$rfields[$maximum_field_index];
18840 $line->set_jmax(1);
18841 $line->set_column( 0, 0 );
18842 $line->set_column( 1, 0 );
18845 $maximum_field_index = 1;
18847 for $j ( 0 .. $maximum_line_index ) {
18848 my $line = $group_lines[$j];
18849 my $rfields = $line->get_rfields();
18850 for $k ( 0 .. $maximum_field_index ) {
18851 my $pad = length( $$rfields[$k] ) - $line->current_field_width($k);
18853 $pad += $group_lines[$j]->get_leading_space_count();
18856 if ( $pad > 0 ) { $line->increase_field_width( $k, $pad ) }
18862 sub get_output_line_number {
18864 # the output line number reported to a caller is the number of items
18865 # written plus the number of items in the buffer
18867 1 + $maximum_line_index + $file_writer_object->get_output_line_number();
18870 sub write_leader_and_string {
18872 my ( $leading_space_count, $str, $side_comment_length, $outdent_long_lines,
18873 $rvertical_tightness_flags )
18876 # handle outdenting of long lines:
18877 if ($outdent_long_lines) {
18880 $side_comment_length +
18881 $leading_space_count -
18882 $rOpts_maximum_line_length;
18883 if ( $excess > 0 ) {
18884 $leading_space_count = 0;
18885 $last_outdented_line_at =
18886 $file_writer_object->get_output_line_number();
18888 unless ($outdented_line_count) {
18889 $first_outdented_line_at = $last_outdented_line_at;
18891 $outdented_line_count++;
18895 # Make preliminary leading whitespace. It could get changed
18896 # later by entabbing, so we have to keep track of any changes
18897 # to the leading_space_count from here on.
18898 my $leading_string =
18899 $leading_space_count > 0 ? ( ' ' x $leading_space_count ) : "";
18901 # Unpack any recombination data; it was packed by
18902 # sub send_lines_to_vertical_aligner. Contents:
18904 # [0] type: 1=opening 2=closing 3=opening block brace
18905 # [1] flag: if opening: 1=no multiple steps, 2=multiple steps ok
18906 # if closing: spaces of padding to use
18907 # [2] sequence number of container
18908 # [3] valid flag: do not append if this flag is false
18910 my ( $open_or_close, $tightness_flag, $seqno, $valid, $seqno_beg,
18912 if ($rvertical_tightness_flags) {
18914 $open_or_close, $tightness_flag, $seqno, $valid, $seqno_beg,
18916 ) = @{$rvertical_tightness_flags};
18919 $seqno_string = $seqno_end;
18921 # handle any cached line ..
18922 # either append this line to it or write it out
18923 if ( length($cached_line_text) ) {
18925 if ( !$cached_line_valid ) {
18926 entab_and_output( $cached_line_text,
18927 $cached_line_leading_space_count,
18928 $last_group_level_written );
18931 # handle cached line with opening container token
18932 elsif ( $cached_line_type == 1 || $cached_line_type == 3 ) {
18934 my $gap = $leading_space_count - length($cached_line_text);
18936 # handle option of just one tight opening per line:
18937 if ( $cached_line_flag == 1 ) {
18938 if ( defined($open_or_close) && $open_or_close == 1 ) {
18944 $leading_string = $cached_line_text . ' ' x $gap;
18945 $leading_space_count = $cached_line_leading_space_count;
18946 $seqno_string = $cached_seqno_string . ':' . $seqno_beg;
18949 entab_and_output( $cached_line_text,
18950 $cached_line_leading_space_count,
18951 $last_group_level_written );
18955 # handle cached line to place before this closing container token
18957 my $test_line = $cached_line_text . ' ' x $cached_line_flag . $str;
18959 if ( length($test_line) <= $rOpts_maximum_line_length ) {
18961 $seqno_string = $cached_seqno_string . ':' . $seqno_beg;
18963 # Patch to outdent closing tokens ending # in ');'
18964 # If we are joining a line like ');' to a previous stacked
18965 # set of closing tokens, then decide if we may outdent the
18966 # combined stack to the indentation of the ');'. Since we
18967 # should not normally outdent any of the other tokens more than
18968 # the indentation of the lines that contained them, we will
18969 # only do this if all of the corresponding opening
18970 # tokens were on the same line. This can happen with
18971 # -sot and -sct. For example, it is ok here:
18972 # __PACKAGE__->load_components( qw(
18977 # But, for example, we do not outdent in this example because
18978 # that would put the closing sub brace out farther than the
18979 # opening sub brace:
18981 # perltidy -sot -sct
18983 # '<Control-f>' => sub {
18985 # my $e = $c->XEvent;
18986 # itemsUnderArea $c;
18989 if ( $str =~ /^\);/ && $cached_line_text =~ /^[\)\}\]\s]*$/ ) {
18991 # The way to tell this is if the stacked sequence numbers
18992 # of this output line are the reverse of the stacked
18993 # sequence numbers of the previous non-blank line of
18994 # sequence numbers. So we can join if the previous
18995 # nonblank string of tokens is the mirror image. For
18996 # example if stack )}] is 13:8:6 then we are looking for a
18997 # leading stack like [{( which is 6:8:13 We only need to
18998 # check the two ends, because the intermediate tokens must
18999 # fall in order. Note on speed: having to split on colons
19000 # and eliminate multiple colons might appear to be slow,
19001 # but it's not an issue because we almost never come
19002 # through here. In a typical file we don't.
19003 $seqno_string =~ s/^:+//;
19004 $last_nonblank_seqno_string =~ s/^:+//;
19005 $seqno_string =~ s/:+/:/g;
19006 $last_nonblank_seqno_string =~ s/:+/:/g;
19008 # how many spaces can we outdent?
19010 $cached_line_leading_space_count - $leading_space_count;
19012 && length($seqno_string)
19013 && length($last_nonblank_seqno_string) ==
19014 length($seqno_string) )
19017 ( split ':', $last_nonblank_seqno_string );
19018 my @seqno_now = ( split ':', $seqno_string );
19019 if ( $seqno_now[-1] == $seqno_last[0]
19020 && $seqno_now[0] == $seqno_last[-1] )
19024 # for absolute safety, be sure we only remove
19026 my $ws = substr( $test_line, 0, $diff );
19027 if ( ( length($ws) == $diff ) && $ws =~ /^\s+$/ ) {
19029 $test_line = substr( $test_line, $diff );
19030 $cached_line_leading_space_count -= $diff;
19033 # shouldn't happen, but not critical:
19035 ## ERROR transferring indentation here
19042 $leading_string = "";
19043 $leading_space_count = $cached_line_leading_space_count;
19046 entab_and_output( $cached_line_text,
19047 $cached_line_leading_space_count,
19048 $last_group_level_written );
19052 $cached_line_type = 0;
19053 $cached_line_text = "";
19055 # make the line to be written
19056 my $line = $leading_string . $str;
19058 # write or cache this line
19059 if ( !$open_or_close || $side_comment_length > 0 ) {
19060 entab_and_output( $line, $leading_space_count, $group_level );
19063 $cached_line_text = $line;
19064 $cached_line_type = $open_or_close;
19065 $cached_line_flag = $tightness_flag;
19066 $cached_seqno = $seqno;
19067 $cached_line_valid = $valid;
19068 $cached_line_leading_space_count = $leading_space_count;
19069 $cached_seqno_string = $seqno_string;
19072 $last_group_level_written = $group_level;
19073 $last_side_comment_length = $side_comment_length;
19074 $extra_indent_ok = 0;
19077 sub entab_and_output {
19078 my ( $line, $leading_space_count, $level ) = @_;
19080 # The line is currently correct if there is no tabbing (recommended!)
19081 # We may have to lop off some leading spaces and replace with tabs.
19082 if ( $leading_space_count > 0 ) {
19084 # Nothing to do if no tabs
19085 if ( !( $rOpts_tabs || $rOpts_entab_leading_whitespace )
19086 || $rOpts_indent_columns <= 0 )
19092 # Handle entab option
19093 elsif ($rOpts_entab_leading_whitespace) {
19095 $leading_space_count % $rOpts_entab_leading_whitespace;
19097 int( $leading_space_count / $rOpts_entab_leading_whitespace );
19098 my $leading_string = "\t" x $tab_count . ' ' x $space_count;
19099 if ( $line =~ /^\s{$leading_space_count,$leading_space_count}/ ) {
19100 substr( $line, 0, $leading_space_count ) = $leading_string;
19104 # REMOVE AFTER TESTING
19105 # shouldn't happen - program error counting whitespace
19106 # we'll skip entabbing
19108 "Error entabbing in entab_and_output: expected count=$leading_space_count\n"
19113 # Handle option of one tab per level
19115 my $leading_string = ( "\t" x $level );
19117 $leading_space_count - $level * $rOpts_indent_columns;
19119 # shouldn't happen:
19120 if ( $space_count < 0 ) {
19122 "Error entabbing in append_line: for level=$group_level count=$leading_space_count\n"
19124 $leading_string = ( ' ' x $leading_space_count );
19127 $leading_string .= ( ' ' x $space_count );
19129 if ( $line =~ /^\s{$leading_space_count,$leading_space_count}/ ) {
19130 substr( $line, 0, $leading_space_count ) = $leading_string;
19134 # REMOVE AFTER TESTING
19135 # shouldn't happen - program error counting whitespace
19136 # we'll skip entabbing
19138 "Error entabbing in entab_and_output: expected count=$leading_space_count\n"
19143 $file_writer_object->write_code_line( $line . "\n" );
19144 if ($seqno_string) {
19145 $last_nonblank_seqno_string = $seqno_string;
19149 { # begin get_leading_string
19151 my @leading_string_cache;
19153 sub get_leading_string {
19155 # define the leading whitespace string for this line..
19156 my $leading_whitespace_count = shift;
19158 # Handle case of zero whitespace, which includes multi-line quotes
19159 # (which may have a finite level; this prevents tab problems)
19160 if ( $leading_whitespace_count <= 0 ) {
19164 # look for previous result
19165 elsif ( $leading_string_cache[$leading_whitespace_count] ) {
19166 return $leading_string_cache[$leading_whitespace_count];
19169 # must compute a string for this number of spaces
19170 my $leading_string;
19172 # Handle simple case of no tabs
19173 if ( !( $rOpts_tabs || $rOpts_entab_leading_whitespace )
19174 || $rOpts_indent_columns <= 0 )
19176 $leading_string = ( ' ' x $leading_whitespace_count );
19179 # Handle entab option
19180 elsif ($rOpts_entab_leading_whitespace) {
19182 $leading_whitespace_count % $rOpts_entab_leading_whitespace;
19183 my $tab_count = int(
19184 $leading_whitespace_count / $rOpts_entab_leading_whitespace );
19185 $leading_string = "\t" x $tab_count . ' ' x $space_count;
19188 # Handle option of one tab per level
19190 $leading_string = ( "\t" x $group_level );
19192 $leading_whitespace_count - $group_level * $rOpts_indent_columns;
19194 # shouldn't happen:
19195 if ( $space_count < 0 ) {
19197 "Error in append_line: for level=$group_level count=$leading_whitespace_count\n"
19199 $leading_string = ( ' ' x $leading_whitespace_count );
19202 $leading_string .= ( ' ' x $space_count );
19205 $leading_string_cache[$leading_whitespace_count] = $leading_string;
19206 return $leading_string;
19208 } # end get_leading_string
19210 sub report_anything_unusual {
19212 if ( $outdented_line_count > 0 ) {
19213 write_logfile_entry(
19214 "$outdented_line_count long lines were outdented:\n");
19215 write_logfile_entry(
19216 " First at output line $first_outdented_line_at\n");
19218 if ( $outdented_line_count > 1 ) {
19219 write_logfile_entry(
19220 " Last at output line $last_outdented_line_at\n");
19222 write_logfile_entry(
19223 " use -noll to prevent outdenting, -l=n to increase line length\n"
19225 write_logfile_entry("\n");
19229 #####################################################################
19231 # the Perl::Tidy::FileWriter class writes the output file
19233 #####################################################################
19235 package Perl::Tidy::FileWriter;
19237 # Maximum number of little messages; probably need not be changed.
19238 use constant MAX_NAG_MESSAGES => 6;
19240 sub write_logfile_entry {
19242 my $logger_object = $self->{_logger_object};
19243 if ($logger_object) {
19244 $logger_object->write_logfile_entry(@_);
19250 my ( $line_sink_object, $rOpts, $logger_object ) = @_;
19253 _line_sink_object => $line_sink_object,
19254 _logger_object => $logger_object,
19256 _output_line_number => 1,
19257 _consecutive_blank_lines => 0,
19258 _consecutive_nonblank_lines => 0,
19259 _first_line_length_error => 0,
19260 _max_line_length_error => 0,
19261 _last_line_length_error => 0,
19262 _first_line_length_error_at => 0,
19263 _max_line_length_error_at => 0,
19264 _last_line_length_error_at => 0,
19265 _line_length_error_count => 0,
19266 _max_output_line_length => 0,
19267 _max_output_line_length_at => 0,
19273 $self->{_line_sink_object}->tee_on();
19278 $self->{_line_sink_object}->tee_off();
19281 sub get_output_line_number {
19283 return $self->{_output_line_number};
19286 sub decrement_output_line_number {
19288 $self->{_output_line_number}--;
19291 sub get_consecutive_nonblank_lines {
19293 return $self->{_consecutive_nonblank_lines};
19296 sub reset_consecutive_blank_lines {
19298 $self->{_consecutive_blank_lines} = 0;
19301 sub want_blank_line {
19303 unless ( $self->{_consecutive_blank_lines} ) {
19304 $self->write_blank_code_line();
19308 sub write_blank_code_line {
19310 my $rOpts = $self->{_rOpts};
19312 if ( $self->{_consecutive_blank_lines} >=
19313 $rOpts->{'maximum-consecutive-blank-lines'} );
19314 $self->{_consecutive_blank_lines}++;
19315 $self->{_consecutive_nonblank_lines} = 0;
19316 $self->write_line("\n");
19319 sub write_code_line {
19323 if ( $a =~ /^\s*$/ ) {
19324 my $rOpts = $self->{_rOpts};
19326 if ( $self->{_consecutive_blank_lines} >=
19327 $rOpts->{'maximum-consecutive-blank-lines'} );
19328 $self->{_consecutive_blank_lines}++;
19329 $self->{_consecutive_nonblank_lines} = 0;
19332 $self->{_consecutive_blank_lines} = 0;
19333 $self->{_consecutive_nonblank_lines}++;
19335 $self->write_line($a);
19342 # TODO: go through and see if the test is necessary here
19343 if ( $a =~ /\n$/ ) { $self->{_output_line_number}++; }
19345 $self->{_line_sink_object}->write_line($a);
19347 # This calculation of excess line length ignores any internal tabs
19348 my $rOpts = $self->{_rOpts};
19349 my $exceed = length($a) - $rOpts->{'maximum-line-length'} - 1;
19350 if ( $a =~ /^\t+/g ) {
19351 $exceed += pos($a) * ( $rOpts->{'indent-columns'} - 1 );
19354 # Note that we just incremented output line number to future value
19355 # so we must subtract 1 for current line number
19356 if ( length($a) > 1 + $self->{_max_output_line_length} ) {
19357 $self->{_max_output_line_length} = length($a) - 1;
19358 $self->{_max_output_line_length_at} = $self->{_output_line_number} - 1;
19361 if ( $exceed > 0 ) {
19362 my $output_line_number = $self->{_output_line_number};
19363 $self->{_last_line_length_error} = $exceed;
19364 $self->{_last_line_length_error_at} = $output_line_number - 1;
19365 if ( $self->{_line_length_error_count} == 0 ) {
19366 $self->{_first_line_length_error} = $exceed;
19367 $self->{_first_line_length_error_at} = $output_line_number - 1;
19371 $self->{_last_line_length_error} > $self->{_max_line_length_error} )
19373 $self->{_max_line_length_error} = $exceed;
19374 $self->{_max_line_length_error_at} = $output_line_number - 1;
19377 if ( $self->{_line_length_error_count} < MAX_NAG_MESSAGES ) {
19378 $self->write_logfile_entry(
19379 "Line length exceeded by $exceed characters\n");
19381 $self->{_line_length_error_count}++;
19386 sub report_line_length_errors {
19388 my $rOpts = $self->{_rOpts};
19389 my $line_length_error_count = $self->{_line_length_error_count};
19390 if ( $line_length_error_count == 0 ) {
19391 $self->write_logfile_entry(
19392 "No lines exceeded $rOpts->{'maximum-line-length'} characters\n");
19393 my $max_output_line_length = $self->{_max_output_line_length};
19394 my $max_output_line_length_at = $self->{_max_output_line_length_at};
19395 $self->write_logfile_entry(
19396 " Maximum output line length was $max_output_line_length at line $max_output_line_length_at\n"
19402 my $word = ( $line_length_error_count > 1 ) ? "s" : "";
19403 $self->write_logfile_entry(
19404 "$line_length_error_count output line$word exceeded $rOpts->{'maximum-line-length'} characters:\n"
19407 $word = ( $line_length_error_count > 1 ) ? "First" : "";
19408 my $first_line_length_error = $self->{_first_line_length_error};
19409 my $first_line_length_error_at = $self->{_first_line_length_error_at};
19410 $self->write_logfile_entry(
19411 " $word at line $first_line_length_error_at by $first_line_length_error characters\n"
19414 if ( $line_length_error_count > 1 ) {
19415 my $max_line_length_error = $self->{_max_line_length_error};
19416 my $max_line_length_error_at = $self->{_max_line_length_error_at};
19417 my $last_line_length_error = $self->{_last_line_length_error};
19418 my $last_line_length_error_at = $self->{_last_line_length_error_at};
19419 $self->write_logfile_entry(
19420 " Maximum at line $max_line_length_error_at by $max_line_length_error characters\n"
19422 $self->write_logfile_entry(
19423 " Last at line $last_line_length_error_at by $last_line_length_error characters\n"
19429 #####################################################################
19431 # The Perl::Tidy::Debugger class shows line tokenization
19433 #####################################################################
19435 package Perl::Tidy::Debugger;
19439 my ( $class, $filename ) = @_;
19442 _debug_file => $filename,
19443 _debug_file_opened => 0,
19448 sub really_open_debug_file {
19451 my $debug_file = $self->{_debug_file};
19453 unless ( $fh = IO::File->new("> $debug_file") ) {
19454 warn("can't open $debug_file: $!\n");
19456 $self->{_debug_file_opened} = 1;
19457 $self->{_fh} = $fh;
19459 "Use -dump-token-types (-dtt) to get a list of token type codes\n";
19462 sub close_debug_file {
19465 my $fh = $self->{_fh};
19466 if ( $self->{_debug_file_opened} ) {
19468 eval { $self->{_fh}->close() };
19472 sub write_debug_entry {
19474 # This is a debug dump routine which may be modified as necessary
19475 # to dump tokens on a line-by-line basis. The output will be written
19476 # to the .DEBUG file when the -D flag is entered.
19478 my $line_of_tokens = shift;
19480 my $input_line = $line_of_tokens->{_line_text};
19481 my $rtoken_type = $line_of_tokens->{_rtoken_type};
19482 my $rtokens = $line_of_tokens->{_rtokens};
19483 my $rlevels = $line_of_tokens->{_rlevels};
19484 my $rslevels = $line_of_tokens->{_rslevels};
19485 my $rblock_type = $line_of_tokens->{_rblock_type};
19486 my $input_line_number = $line_of_tokens->{_line_number};
19487 my $line_type = $line_of_tokens->{_line_type};
19491 my $token_str = "$input_line_number: ";
19492 my $reconstructed_original = "$input_line_number: ";
19493 my $block_str = "$input_line_number: ";
19495 #$token_str .= "$line_type: ";
19496 #$reconstructed_original .= "$line_type: ";
19499 my @next_char = ( '"', '"' );
19501 unless ( $self->{_debug_file_opened} ) { $self->really_open_debug_file() }
19502 my $fh = $self->{_fh};
19504 for ( $j = 0 ; $j < @$rtoken_type ; $j++ ) {
19507 if ( $$rtoken_type[$j] eq 'k' ) {
19508 $pattern .= $$rtokens[$j];
19511 $pattern .= $$rtoken_type[$j];
19513 $reconstructed_original .= $$rtokens[$j];
19514 $block_str .= "($$rblock_type[$j])";
19515 $num = length( $$rtokens[$j] );
19516 my $type_str = $$rtoken_type[$j];
19518 # be sure there are no blank tokens (shouldn't happen)
19519 # This can only happen if a programming error has been made
19520 # because all valid tokens are non-blank
19521 if ( $type_str eq ' ' ) {
19522 print $fh "BLANK TOKEN on the next line\n";
19523 $type_str = $next_char[$i_next];
19524 $i_next = 1 - $i_next;
19527 if ( length($type_str) == 1 ) {
19528 $type_str = $type_str x $num;
19530 $token_str .= $type_str;
19533 # Write what you want here ...
19534 # print $fh "$input_line\n";
19535 # print $fh "$pattern\n";
19536 print $fh "$reconstructed_original\n";
19537 print $fh "$token_str\n";
19539 #print $fh "$block_str\n";
19542 #####################################################################
19544 # The Perl::Tidy::LineBuffer class supplies a 'get_line()'
19545 # method for returning the next line to be parsed, as well as a
19546 # 'peek_ahead()' method
19548 # The input parameter is an object with a 'get_line()' method
19549 # which returns the next line to be parsed
19551 #####################################################################
19553 package Perl::Tidy::LineBuffer;
19558 my $line_source_object = shift;
19561 _line_source_object => $line_source_object,
19562 _rlookahead_buffer => [],
19568 my $buffer_index = shift;
19570 my $line_source_object = $self->{_line_source_object};
19571 my $rlookahead_buffer = $self->{_rlookahead_buffer};
19572 if ( $buffer_index < scalar(@$rlookahead_buffer) ) {
19573 $line = $$rlookahead_buffer[$buffer_index];
19576 $line = $line_source_object->get_line();
19577 push( @$rlookahead_buffer, $line );
19585 my $line_source_object = $self->{_line_source_object};
19586 my $rlookahead_buffer = $self->{_rlookahead_buffer};
19588 if ( scalar(@$rlookahead_buffer) ) {
19589 $line = shift @$rlookahead_buffer;
19592 $line = $line_source_object->get_line();
19597 ########################################################################
19599 # the Perl::Tidy::Tokenizer package is essentially a filter which
19600 # reads lines of perl source code from a source object and provides
19601 # corresponding tokenized lines through its get_line() method. Lines
19602 # flow from the source_object to the caller like this:
19604 # source_object --> LineBuffer_object --> Tokenizer --> calling routine
19605 # get_line() get_line() get_line() line_of_tokens
19607 # The source object can be any object with a get_line() method which
19608 # supplies one line (a character string) perl call.
19609 # The LineBuffer object is created by the Tokenizer.
19610 # The Tokenizer returns a reference to a data structure 'line_of_tokens'
19611 # containing one tokenized line for each call to its get_line() method.
19613 # WARNING: This is not a real class yet. Only one tokenizer my be used.
19615 ########################################################################
19617 package Perl::Tidy::Tokenizer;
19621 # Caution: these debug flags produce a lot of output
19622 # They should all be 0 except when debugging small scripts
19624 use constant TOKENIZER_DEBUG_FLAG_EXPECT => 0;
19625 use constant TOKENIZER_DEBUG_FLAG_NSCAN => 0;
19626 use constant TOKENIZER_DEBUG_FLAG_QUOTE => 0;
19627 use constant TOKENIZER_DEBUG_FLAG_SCAN_ID => 0;
19628 use constant TOKENIZER_DEBUG_FLAG_TOKENIZE => 0;
19630 my $debug_warning = sub {
19631 print "TOKENIZER_DEBUGGING with key $_[0]\n";
19634 TOKENIZER_DEBUG_FLAG_EXPECT && $debug_warning->('EXPECT');
19635 TOKENIZER_DEBUG_FLAG_NSCAN && $debug_warning->('NSCAN');
19636 TOKENIZER_DEBUG_FLAG_QUOTE && $debug_warning->('QUOTE');
19637 TOKENIZER_DEBUG_FLAG_SCAN_ID && $debug_warning->('SCAN_ID');
19638 TOKENIZER_DEBUG_FLAG_TOKENIZE && $debug_warning->('TOKENIZE');
19644 # PACKAGE VARIABLES for for processing an entire FILE.
19648 $last_nonblank_token
19649 $last_nonblank_type
19650 $last_nonblank_block_type
19658 %user_function_prototype
19660 %is_block_list_function
19661 %saw_function_definition
19665 $square_bracket_depth
19668 @nesting_sequence_number
19669 @current_sequence_number
19671 @paren_semicolon_count
19672 @paren_structural_type
19674 @brace_structural_type
19675 @brace_statement_type
19678 @square_bracket_type
19679 @square_bracket_structural_type
19681 @starting_line_of_current_depth
19684 # GLOBAL CONSTANTS for routines in this package
19686 %is_indirect_object_taker
19688 %expecting_operator_token
19689 %expecting_operator_types
19690 %expecting_term_types
19691 %expecting_term_token
19693 %is_file_test_operator
19695 %is_valid_token_type
19697 %is_code_block_token
19699 @opening_brace_names
19700 @closing_brace_names
19701 %is_keyword_taking_list
19702 %is_q_qq_qw_qx_qr_s_y_tr_m
19705 # possible values of operator_expected()
19706 use constant TERM => -1;
19707 use constant UNKNOWN => 0;
19708 use constant OPERATOR => 1;
19710 # possible values of context
19711 use constant SCALAR_CONTEXT => -1;
19712 use constant UNKNOWN_CONTEXT => 0;
19713 use constant LIST_CONTEXT => 1;
19715 # Maximum number of little messages; probably need not be changed.
19716 use constant MAX_NAG_MESSAGES => 6;
19720 # methods to count instances
19722 sub get_count { $_count; }
19723 sub _increment_count { ++$_count }
19724 sub _decrement_count { --$_count }
19728 $_[0]->_decrement_count();
19735 # Note: 'tabs' and 'indent_columns' are temporary and should be
19738 source_object => undef,
19739 debugger_object => undef,
19740 diagnostics_object => undef,
19741 logger_object => undef,
19742 starting_level => undef,
19743 indent_columns => 4,
19745 look_for_hash_bang => 0,
19747 look_for_autoloader => 1,
19748 look_for_selfloader => 1,
19749 starting_line_number => 1,
19751 my %args = ( %defaults, @_ );
19753 # we are given an object with a get_line() method to supply source lines
19754 my $source_object = $args{source_object};
19756 # we create another object with a get_line() and peek_ahead() method
19757 my $line_buffer_object = Perl::Tidy::LineBuffer->new($source_object);
19759 # Tokenizer state data is as follows:
19760 # _rhere_target_list reference to list of here-doc targets
19761 # _here_doc_target the target string for a here document
19762 # _here_quote_character the type of here-doc quoting (" ' ` or none)
19763 # to determine if interpolation is done
19764 # _quote_target character we seek if chasing a quote
19765 # _line_start_quote line where we started looking for a long quote
19766 # _in_here_doc flag indicating if we are in a here-doc
19767 # _in_pod flag set if we are in pod documentation
19768 # _in_error flag set if we saw severe error (binary in script)
19769 # _in_data flag set if we are in __DATA__ section
19770 # _in_end flag set if we are in __END__ section
19771 # _in_format flag set if we are in a format description
19772 # _in_attribute_list flag telling if we are looking for attributes
19773 # _in_quote flag telling if we are chasing a quote
19774 # _starting_level indentation level of first line
19775 # _input_tabstr string denoting one indentation level of input file
19776 # _know_input_tabstr flag indicating if we know _input_tabstr
19777 # _line_buffer_object object with get_line() method to supply source code
19778 # _diagnostics_object place to write debugging information
19779 # _unexpected_error_count error count used to limit output
19780 # _lower_case_labels_at line numbers where lower case labels seen
19781 $tokenizer_self = {
19782 _rhere_target_list => [],
19784 _here_doc_target => "",
19785 _here_quote_character => "",
19791 _in_attribute_list => 0,
19793 _quote_target => "",
19794 _line_start_quote => -1,
19795 _starting_level => $args{starting_level},
19796 _know_starting_level => defined( $args{starting_level} ),
19797 _tabs => $args{tabs},
19798 _indent_columns => $args{indent_columns},
19799 _look_for_hash_bang => $args{look_for_hash_bang},
19800 _trim_qw => $args{trim_qw},
19801 _input_tabstr => "",
19802 _know_input_tabstr => -1,
19803 _last_line_number => $args{starting_line_number} - 1,
19804 _saw_perl_dash_P => 0,
19805 _saw_perl_dash_w => 0,
19806 _saw_use_strict => 0,
19807 _saw_v_string => 0,
19808 _look_for_autoloader => $args{look_for_autoloader},
19809 _look_for_selfloader => $args{look_for_selfloader},
19810 _saw_autoloader => 0,
19811 _saw_selfloader => 0,
19812 _saw_hash_bang => 0,
19815 _saw_negative_indentation => 0,
19816 _started_tokenizing => 0,
19817 _line_buffer_object => $line_buffer_object,
19818 _debugger_object => $args{debugger_object},
19819 _diagnostics_object => $args{diagnostics_object},
19820 _logger_object => $args{logger_object},
19821 _unexpected_error_count => 0,
19822 _started_looking_for_here_target_at => 0,
19823 _nearly_matched_here_target_at => undef,
19825 _rlower_case_labels_at => undef,
19828 prepare_for_a_new_file();
19829 find_starting_indentation_level();
19831 bless $tokenizer_self, $class;
19833 # This is not a full class yet, so die if an attempt is made to
19834 # create more than one object.
19836 if ( _increment_count() > 1 ) {
19838 "Attempt to create more than 1 object in $class, which is not a true class yet\n";
19841 return $tokenizer_self;
19845 # interface to Perl::Tidy::Logger routines
19847 my $logger_object = $tokenizer_self->{_logger_object};
19848 if ($logger_object) {
19849 $logger_object->warning(@_);
19854 my $logger_object = $tokenizer_self->{_logger_object};
19855 if ($logger_object) {
19856 $logger_object->complain(@_);
19860 sub write_logfile_entry {
19861 my $logger_object = $tokenizer_self->{_logger_object};
19862 if ($logger_object) {
19863 $logger_object->write_logfile_entry(@_);
19867 sub interrupt_logfile {
19868 my $logger_object = $tokenizer_self->{_logger_object};
19869 if ($logger_object) {
19870 $logger_object->interrupt_logfile();
19874 sub resume_logfile {
19875 my $logger_object = $tokenizer_self->{_logger_object};
19876 if ($logger_object) {
19877 $logger_object->resume_logfile();
19881 sub increment_brace_error {
19882 my $logger_object = $tokenizer_self->{_logger_object};
19883 if ($logger_object) {
19884 $logger_object->increment_brace_error();
19888 sub report_definite_bug {
19889 my $logger_object = $tokenizer_self->{_logger_object};
19890 if ($logger_object) {
19891 $logger_object->report_definite_bug();
19895 sub brace_warning {
19896 my $logger_object = $tokenizer_self->{_logger_object};
19897 if ($logger_object) {
19898 $logger_object->brace_warning(@_);
19902 sub get_saw_brace_error {
19903 my $logger_object = $tokenizer_self->{_logger_object};
19904 if ($logger_object) {
19905 $logger_object->get_saw_brace_error();
19912 # interface to Perl::Tidy::Diagnostics routines
19913 sub write_diagnostics {
19914 if ( $tokenizer_self->{_diagnostics_object} ) {
19915 $tokenizer_self->{_diagnostics_object}->write_diagnostics(@_);
19919 sub report_tokenization_errors {
19923 my $level = get_indentation_level();
19924 if ( $level != $tokenizer_self->{_starting_level} ) {
19925 warning("final indentation level: $level\n");
19928 check_final_nesting_depths();
19930 if ( $tokenizer_self->{_look_for_hash_bang}
19931 && !$tokenizer_self->{_saw_hash_bang} )
19934 "hit EOF without seeing hash-bang line; maybe don't need -x?\n");
19937 if ( $tokenizer_self->{_in_format} ) {
19938 warning("hit EOF while in format description\n");
19941 if ( $tokenizer_self->{_in_pod} ) {
19943 # Just write log entry if this is after __END__ or __DATA__
19944 # because this happens to often, and it is not likely to be
19946 if ( $tokenizer_self->{_saw_data} || $tokenizer_self->{_saw_end} ) {
19947 write_logfile_entry(
19948 "hit eof while in pod documentation (no =cut seen)\n\tthis can cause trouble with some pod utilities\n"
19954 "hit eof while in pod documentation (no =cut seen)\n\tthis can cause trouble with some pod utilities\n"
19960 if ( $tokenizer_self->{_in_here_doc} ) {
19961 my $here_doc_target = $tokenizer_self->{_here_doc_target};
19962 my $started_looking_for_here_target_at =
19963 $tokenizer_self->{_started_looking_for_here_target_at};
19964 if ($here_doc_target) {
19966 "hit EOF in here document starting at line $started_looking_for_here_target_at with target: $here_doc_target\n"
19971 "hit EOF in here document starting at line $started_looking_for_here_target_at with empty target string\n"
19974 my $nearly_matched_here_target_at =
19975 $tokenizer_self->{_nearly_matched_here_target_at};
19976 if ($nearly_matched_here_target_at) {
19978 "NOTE: almost matched at input line $nearly_matched_here_target_at except for whitespace\n"
19983 if ( $tokenizer_self->{_in_quote} ) {
19984 my $line_start_quote = $tokenizer_self->{_line_start_quote};
19985 my $quote_target = $tokenizer_self->{_quote_target};
19987 ( $tokenizer_self->{_in_attribute_list} )
19991 "hit EOF seeking end of $what starting at line $line_start_quote ending in $quote_target\n"
19995 unless ( $tokenizer_self->{_saw_perl_dash_w} ) {
19996 if ( $] < 5.006 ) {
19997 write_logfile_entry("Suggest including '-w parameter'\n");
20000 write_logfile_entry("Suggest including 'use warnings;'\n");
20004 if ( $tokenizer_self->{_saw_perl_dash_P} ) {
20005 write_logfile_entry("Use of -P parameter for defines is discouraged\n");
20008 unless ( $tokenizer_self->{_saw_use_strict} ) {
20009 write_logfile_entry("Suggest including 'use strict;'\n");
20012 # it is suggested that lables have at least one upper case character
20013 # for legibility and to avoid code breakage as new keywords are introduced
20014 if ( $tokenizer_self->{_rlower_case_labels_at} ) {
20015 my @lower_case_labels_at =
20016 @{ $tokenizer_self->{_rlower_case_labels_at} };
20017 write_logfile_entry(
20018 "Suggest using upper case characters in label(s)\n");
20020 write_logfile_entry(" defined at line(s): (@lower_case_labels_at)\n");
20024 sub report_v_string {
20026 # warn if this version can't handle v-strings
20028 unless ( $tokenizer_self->{_saw_v_string} ) {
20029 $tokenizer_self->{_saw_v_string} = $tokenizer_self->{_last_line_number};
20031 if ( $] < 5.006 ) {
20033 "Found v-string '$tok' but v-strings are not implemented in your version of perl; see Camel 3 book ch 2\n"
20038 sub get_input_line_number {
20039 return $tokenizer_self->{_last_line_number};
20042 # returns the next tokenized line
20047 # USES GLOBAL VARIABLES: $tokenizer_self, $brace_depth,
20048 # $square_bracket_depth, $paren_depth
20050 my $input_line = $tokenizer_self->{_line_buffer_object}->get_line();
20051 $tokenizer_self->{_line_text} = $input_line;
20053 return undef unless ($input_line);
20055 my $input_line_number = ++$tokenizer_self->{_last_line_number};
20057 # Find and remove what characters terminate this line, including any
20059 my $input_line_separator = "";
20060 if ( chomp($input_line) ) { $input_line_separator = $/ }
20062 # TODO: what other characters should be included here?
20063 if ( $input_line =~ s/((\r|\035|\032)+)$// ) {
20064 $input_line_separator = $2 . $input_line_separator;
20067 # for backwards compatability we keep the line text terminated with
20068 # a newline character
20069 $input_line .= "\n";
20070 $tokenizer_self->{_line_text} = $input_line; # update
20072 # create a data structure describing this line which will be
20073 # returned to the caller.
20075 # _line_type codes are:
20076 # SYSTEM - system-specific code before hash-bang line
20077 # CODE - line of perl code (including comments)
20078 # POD_START - line starting pod, such as '=head'
20079 # POD - pod documentation text
20080 # POD_END - last line of pod section, '=cut'
20081 # HERE - text of here-document
20082 # HERE_END - last line of here-doc (target word)
20083 # FORMAT - format section
20084 # FORMAT_END - last line of format section, '.'
20085 # DATA_START - __DATA__ line
20086 # DATA - unidentified text following __DATA__
20087 # END_START - __END__ line
20088 # END - unidentified text following __END__
20089 # ERROR - we are in big trouble, probably not a perl script
20092 # _curly_brace_depth - depth of curly braces at start of line
20093 # _square_bracket_depth - depth of square brackets at start of line
20094 # _paren_depth - depth of parens at start of line
20095 # _starting_in_quote - this line continues a multi-line quote
20096 # (so don't trim leading blanks!)
20097 # _ending_in_quote - this line ends in a multi-line quote
20098 # (so don't trim trailing blanks!)
20099 my $line_of_tokens = {
20100 _line_type => 'EOF',
20101 _line_text => $input_line,
20102 _line_number => $input_line_number,
20103 _rtoken_type => undef,
20106 _rslevels => undef,
20107 _rblock_type => undef,
20108 _rcontainer_type => undef,
20109 _rcontainer_environment => undef,
20110 _rtype_sequence => undef,
20111 _rnesting_tokens => undef,
20112 _rci_levels => undef,
20113 _rnesting_blocks => undef,
20114 _python_indentation_level => -1, ## 0,
20115 _starting_in_quote => 0, # to be set by subroutine
20116 _ending_in_quote => 0,
20117 _curly_brace_depth => $brace_depth,
20118 _square_bracket_depth => $square_bracket_depth,
20119 _paren_depth => $paren_depth,
20120 _quote_character => '',
20123 # must print line unchanged if we are in a here document
20124 if ( $tokenizer_self->{_in_here_doc} ) {
20126 $line_of_tokens->{_line_type} = 'HERE';
20127 my $here_doc_target = $tokenizer_self->{_here_doc_target};
20128 my $here_quote_character = $tokenizer_self->{_here_quote_character};
20129 my $candidate_target = $input_line;
20130 chomp $candidate_target;
20131 if ( $candidate_target eq $here_doc_target ) {
20132 $tokenizer_self->{_nearly_matched_here_target_at} = undef;
20133 $line_of_tokens->{_line_type} = 'HERE_END';
20134 write_logfile_entry("Exiting HERE document $here_doc_target\n");
20136 my $rhere_target_list = $tokenizer_self->{_rhere_target_list};
20137 if (@$rhere_target_list) { # there can be multiple here targets
20138 ( $here_doc_target, $here_quote_character ) =
20139 @{ shift @$rhere_target_list };
20140 $tokenizer_self->{_here_doc_target} = $here_doc_target;
20141 $tokenizer_self->{_here_quote_character} =
20142 $here_quote_character;
20143 write_logfile_entry(
20144 "Entering HERE document $here_doc_target\n");
20145 $tokenizer_self->{_nearly_matched_here_target_at} = undef;
20146 $tokenizer_self->{_started_looking_for_here_target_at} =
20147 $input_line_number;
20150 $tokenizer_self->{_in_here_doc} = 0;
20151 $tokenizer_self->{_here_doc_target} = "";
20152 $tokenizer_self->{_here_quote_character} = "";
20156 # check for error of extra whitespace
20157 # note for PERL6: leading whitespace is allowed
20159 $candidate_target =~ s/\s*$//;
20160 $candidate_target =~ s/^\s*//;
20161 if ( $candidate_target eq $here_doc_target ) {
20162 $tokenizer_self->{_nearly_matched_here_target_at} =
20163 $input_line_number;
20166 return $line_of_tokens;
20169 # must print line unchanged if we are in a format section
20170 elsif ( $tokenizer_self->{_in_format} ) {
20172 if ( $input_line =~ /^\.[\s#]*$/ ) {
20173 write_logfile_entry("Exiting format section\n");
20174 $tokenizer_self->{_in_format} = 0;
20175 $line_of_tokens->{_line_type} = 'FORMAT_END';
20178 $line_of_tokens->{_line_type} = 'FORMAT';
20180 return $line_of_tokens;
20183 # must print line unchanged if we are in pod documentation
20184 elsif ( $tokenizer_self->{_in_pod} ) {
20186 $line_of_tokens->{_line_type} = 'POD';
20187 if ( $input_line =~ /^=cut/ ) {
20188 $line_of_tokens->{_line_type} = 'POD_END';
20189 write_logfile_entry("Exiting POD section\n");
20190 $tokenizer_self->{_in_pod} = 0;
20192 if ( $input_line =~ /^\#\!.*perl\b/ ) {
20194 "Hash-bang in pod can cause older versions of perl to fail! \n"
20198 return $line_of_tokens;
20201 # must print line unchanged if we have seen a severe error (i.e., we
20202 # are seeing illegal tokens and connot continue. Syntax errors do
20203 # not pass this route). Calling routine can decide what to do, but
20204 # the default can be to just pass all lines as if they were after __END__
20205 elsif ( $tokenizer_self->{_in_error} ) {
20206 $line_of_tokens->{_line_type} = 'ERROR';
20207 return $line_of_tokens;
20210 # print line unchanged if we are __DATA__ section
20211 elsif ( $tokenizer_self->{_in_data} ) {
20213 # ...but look for POD
20214 # Note that the _in_data and _in_end flags remain set
20215 # so that we return to that state after seeing the
20216 # end of a pod section
20217 if ( $input_line =~ /^=(?!cut)/ ) {
20218 $line_of_tokens->{_line_type} = 'POD_START';
20219 write_logfile_entry("Entering POD section\n");
20220 $tokenizer_self->{_in_pod} = 1;
20221 return $line_of_tokens;
20224 $line_of_tokens->{_line_type} = 'DATA';
20225 return $line_of_tokens;
20229 # print line unchanged if we are in __END__ section
20230 elsif ( $tokenizer_self->{_in_end} ) {
20232 # ...but look for POD
20233 # Note that the _in_data and _in_end flags remain set
20234 # so that we return to that state after seeing the
20235 # end of a pod section
20236 if ( $input_line =~ /^=(?!cut)/ ) {
20237 $line_of_tokens->{_line_type} = 'POD_START';
20238 write_logfile_entry("Entering POD section\n");
20239 $tokenizer_self->{_in_pod} = 1;
20240 return $line_of_tokens;
20243 $line_of_tokens->{_line_type} = 'END';
20244 return $line_of_tokens;
20248 # check for a hash-bang line if we haven't seen one
20249 if ( !$tokenizer_self->{_saw_hash_bang} ) {
20250 if ( $input_line =~ /^\#\!.*perl\b/ ) {
20251 $tokenizer_self->{_saw_hash_bang} = $input_line_number;
20253 # check for -w and -P flags
20254 if ( $input_line =~ /^\#\!.*perl\s.*-.*P/ ) {
20255 $tokenizer_self->{_saw_perl_dash_P} = 1;
20258 if ( $input_line =~ /^\#\!.*perl\s.*-.*w/ ) {
20259 $tokenizer_self->{_saw_perl_dash_w} = 1;
20262 if ( ( $input_line_number > 1 )
20263 && ( !$tokenizer_self->{_look_for_hash_bang} ) )
20266 # this is helpful for VMS systems; we may have accidentally
20267 # tokenized some DCL commands
20268 if ( $tokenizer_self->{_started_tokenizing} ) {
20270 "There seems to be a hash-bang after line 1; do you need to run with -x ?\n"
20274 complain("Useless hash-bang after line 1\n");
20278 # Report the leading hash-bang as a system line
20279 # This will prevent -dac from deleting it
20281 $line_of_tokens->{_line_type} = 'SYSTEM';
20282 return $line_of_tokens;
20287 # wait for a hash-bang before parsing if the user invoked us with -x
20288 if ( $tokenizer_self->{_look_for_hash_bang}
20289 && !$tokenizer_self->{_saw_hash_bang} )
20291 $line_of_tokens->{_line_type} = 'SYSTEM';
20292 return $line_of_tokens;
20295 # a first line of the form ': #' will be marked as SYSTEM
20296 # since lines of this form may be used by tcsh
20297 if ( $input_line_number == 1 && $input_line =~ /^\s*\:\s*\#/ ) {
20298 $line_of_tokens->{_line_type} = 'SYSTEM';
20299 return $line_of_tokens;
20302 # now we know that it is ok to tokenize the line...
20303 # the line tokenizer will modify any of these private variables:
20304 # _rhere_target_list
20311 my $ending_in_quote_last = $tokenizer_self->{_in_quote};
20312 tokenize_this_line($line_of_tokens);
20314 # Now finish defining the return structure and return it
20315 $line_of_tokens->{_ending_in_quote} = $tokenizer_self->{_in_quote};
20317 # handle severe error (binary data in script)
20318 if ( $tokenizer_self->{_in_error} ) {
20319 $tokenizer_self->{_in_quote} = 0; # to avoid any more messages
20320 warning("Giving up after error\n");
20321 $line_of_tokens->{_line_type} = 'ERROR';
20322 reset_indentation_level(0); # avoid error messages
20323 return $line_of_tokens;
20326 # handle start of pod documentation
20327 if ( $tokenizer_self->{_in_pod} ) {
20329 # This gets tricky..above a __DATA__ or __END__ section, perl
20330 # accepts '=cut' as the start of pod section. But afterwards,
20331 # only pod utilities see it and they may ignore an =cut without
20332 # leading =head. In any case, this isn't good.
20333 if ( $input_line =~ /^=cut\b/ ) {
20334 if ( $tokenizer_self->{_saw_data} || $tokenizer_self->{_saw_end} ) {
20335 complain("=cut while not in pod ignored\n");
20336 $tokenizer_self->{_in_pod} = 0;
20337 $line_of_tokens->{_line_type} = 'POD_END';
20340 $line_of_tokens->{_line_type} = 'POD_START';
20342 "=cut starts a pod section .. this can fool pod utilities.\n"
20344 write_logfile_entry("Entering POD section\n");
20349 $line_of_tokens->{_line_type} = 'POD_START';
20350 write_logfile_entry("Entering POD section\n");
20353 return $line_of_tokens;
20356 # update indentation levels for log messages
20357 if ( $input_line !~ /^\s*$/ ) {
20358 my $rlevels = $line_of_tokens->{_rlevels};
20359 my $structural_indentation_level = $$rlevels[0];
20360 my ( $python_indentation_level, $msg ) =
20361 find_indentation_level( $input_line, $structural_indentation_level );
20362 if ($msg) { write_logfile_entry("$msg") }
20363 if ( $tokenizer_self->{_know_input_tabstr} == 1 ) {
20364 $line_of_tokens->{_python_indentation_level} =
20365 $python_indentation_level;
20369 # see if this line contains here doc targets
20370 my $rhere_target_list = $tokenizer_self->{_rhere_target_list};
20371 if (@$rhere_target_list) {
20373 my ( $here_doc_target, $here_quote_character ) =
20374 @{ shift @$rhere_target_list };
20375 $tokenizer_self->{_in_here_doc} = 1;
20376 $tokenizer_self->{_here_doc_target} = $here_doc_target;
20377 $tokenizer_self->{_here_quote_character} = $here_quote_character;
20378 write_logfile_entry("Entering HERE document $here_doc_target\n");
20379 $tokenizer_self->{_started_looking_for_here_target_at} =
20380 $input_line_number;
20383 # NOTE: __END__ and __DATA__ statements are written unformatted
20384 # because they can theoretically contain additional characters
20385 # which are not tokenized (and cannot be read with <DATA> either!).
20386 if ( $tokenizer_self->{_in_data} ) {
20387 $line_of_tokens->{_line_type} = 'DATA_START';
20388 write_logfile_entry("Starting __DATA__ section\n");
20389 $tokenizer_self->{_saw_data} = 1;
20391 # keep parsing after __DATA__ if use SelfLoader was seen
20392 if ( $tokenizer_self->{_saw_selfloader} ) {
20393 $tokenizer_self->{_in_data} = 0;
20394 write_logfile_entry(
20395 "SelfLoader seen, continuing; -nlsl deactivates\n");
20398 return $line_of_tokens;
20401 elsif ( $tokenizer_self->{_in_end} ) {
20402 $line_of_tokens->{_line_type} = 'END_START';
20403 write_logfile_entry("Starting __END__ section\n");
20404 $tokenizer_self->{_saw_end} = 1;
20406 # keep parsing after __END__ if use AutoLoader was seen
20407 if ( $tokenizer_self->{_saw_autoloader} ) {
20408 $tokenizer_self->{_in_end} = 0;
20409 write_logfile_entry(
20410 "AutoLoader seen, continuing; -nlal deactivates\n");
20412 return $line_of_tokens;
20415 # now, finally, we know that this line is type 'CODE'
20416 $line_of_tokens->{_line_type} = 'CODE';
20418 # remember if we have seen any real code
20419 if ( !$tokenizer_self->{_started_tokenizing}
20420 && $input_line !~ /^\s*$/
20421 && $input_line !~ /^\s*#/ )
20423 $tokenizer_self->{_started_tokenizing} = 1;
20426 if ( $tokenizer_self->{_debugger_object} ) {
20427 $tokenizer_self->{_debugger_object}->write_debug_entry($line_of_tokens);
20430 # Note: if keyword 'format' occurs in this line code, it is still CODE
20431 # (keyword 'format' need not start a line)
20432 if ( $tokenizer_self->{_in_format} ) {
20433 write_logfile_entry("Entering format section\n");
20436 if ( $tokenizer_self->{_in_quote}
20437 and ( $tokenizer_self->{_line_start_quote} < 0 ) )
20440 #if ( ( my $quote_target = get_quote_target() ) !~ /^\s*$/ ) {
20442 ( my $quote_target = $tokenizer_self->{_quote_target} ) !~ /^\s*$/ )
20444 $tokenizer_self->{_line_start_quote} = $input_line_number;
20445 write_logfile_entry(
20446 "Start multi-line quote or pattern ending in $quote_target\n");
20449 elsif ( ( $tokenizer_self->{_line_start_quote} >= 0 )
20450 and !$tokenizer_self->{_in_quote} )
20452 $tokenizer_self->{_line_start_quote} = -1;
20453 write_logfile_entry("End of multi-line quote or pattern\n");
20456 # we are returning a line of CODE
20457 return $line_of_tokens;
20460 sub find_starting_indentation_level {
20462 # USES GLOBAL VARIABLES: $tokenizer_self
20463 my $starting_level = 0;
20464 my $know_input_tabstr = -1; # flag for find_indentation_level
20466 # use value if given as parameter
20467 if ( $tokenizer_self->{_know_starting_level} ) {
20468 $starting_level = $tokenizer_self->{_starting_level};
20471 # if we know there is a hash_bang line, the level must be zero
20472 elsif ( $tokenizer_self->{_look_for_hash_bang} ) {
20473 $tokenizer_self->{_know_starting_level} = 1;
20476 # otherwise figure it out from the input file
20480 my $structural_indentation_level = -1; # flag for find_indentation_level
20484 $tokenizer_self->{_line_buffer_object}->peek_ahead( $i++ ) )
20487 # if first line is #! then assume starting level is zero
20488 if ( $i == 1 && $line =~ /^\#\!/ ) {
20489 $starting_level = 0;
20492 next if ( $line =~ /^\s*#/ ); # must not be comment
20493 next if ( $line =~ /^\s*$/ ); # must not be blank
20494 ( $starting_level, $msg ) =
20495 find_indentation_level( $line, $structural_indentation_level );
20496 if ($msg) { write_logfile_entry("$msg") }
20499 $msg = "Line $i implies starting-indentation-level = $starting_level\n";
20501 if ( $starting_level > 0 ) {
20503 my $input_tabstr = $tokenizer_self->{_input_tabstr};
20504 if ( $input_tabstr eq "\t" ) {
20505 $msg .= "by guessing input tabbing uses 1 tab per level\n";
20508 my $cols = length($input_tabstr);
20510 "by guessing input tabbing uses $cols blanks per level\n";
20513 write_logfile_entry("$msg");
20515 $tokenizer_self->{_starting_level} = $starting_level;
20516 reset_indentation_level($starting_level);
20519 # Find indentation level given a input line. At the same time, try to
20520 # figure out the input tabbing scheme.
20522 # There are two types of calls:
20524 # Type 1: $structural_indentation_level < 0
20525 # In this case we have to guess $input_tabstr to figure out the level.
20527 # Type 2: $structural_indentation_level >= 0
20528 # In this case the level of this line is known, and this routine can
20529 # update the tabbing string, if still unknown, to make the level correct.
20531 sub find_indentation_level {
20532 my ( $line, $structural_indentation_level ) = @_;
20534 # USES GLOBAL VARIABLES: $tokenizer_self
20538 my $know_input_tabstr = $tokenizer_self->{_know_input_tabstr};
20539 my $input_tabstr = $tokenizer_self->{_input_tabstr};
20541 # find leading whitespace
20542 my $leading_whitespace = ( $line =~ /^(\s*)/ ) ? $1 : "";
20544 # make first guess at input tabbing scheme if necessary
20545 if ( $know_input_tabstr < 0 ) {
20547 $know_input_tabstr = 0;
20549 if ( $tokenizer_self->{_tabs} ) {
20550 $input_tabstr = "\t";
20551 if ( length($leading_whitespace) > 0 ) {
20552 if ( $leading_whitespace !~ /\t/ ) {
20554 my $cols = $tokenizer_self->{_indent_columns};
20556 if ( length($leading_whitespace) < $cols ) {
20557 $cols = length($leading_whitespace);
20559 $input_tabstr = " " x $cols;
20564 $input_tabstr = " " x $tokenizer_self->{_indent_columns};
20566 if ( length($leading_whitespace) > 0 ) {
20567 if ( $leading_whitespace =~ /^\t/ ) {
20568 $input_tabstr = "\t";
20572 $tokenizer_self->{_know_input_tabstr} = $know_input_tabstr;
20573 $tokenizer_self->{_input_tabstr} = $input_tabstr;
20576 # determine the input tabbing scheme if possible
20577 if ( ( $know_input_tabstr == 0 )
20578 && ( length($leading_whitespace) > 0 )
20579 && ( $structural_indentation_level > 0 ) )
20581 my $saved_input_tabstr = $input_tabstr;
20583 # check for common case of one tab per indentation level
20584 if ( $leading_whitespace eq "\t" x $structural_indentation_level ) {
20585 if ( $leading_whitespace eq "\t" x $structural_indentation_level ) {
20586 $input_tabstr = "\t";
20587 $msg = "Guessing old indentation was tab character\n";
20593 # detab any tabs based on 8 blanks per tab
20595 if ( $leading_whitespace =~ s/^\t+/ /g ) {
20596 $entabbed = "entabbed";
20599 # now compute tabbing from number of spaces
20601 length($leading_whitespace) / $structural_indentation_level;
20602 if ( $columns == int $columns ) {
20604 "Guessing old indentation was $columns $entabbed spaces\n";
20607 $columns = int $columns;
20609 "old indentation is unclear, using $columns $entabbed spaces\n";
20611 $input_tabstr = " " x $columns;
20613 $know_input_tabstr = 1;
20614 $tokenizer_self->{_know_input_tabstr} = $know_input_tabstr;
20615 $tokenizer_self->{_input_tabstr} = $input_tabstr;
20617 # see if mistakes were made
20618 if ( ( $tokenizer_self->{_starting_level} > 0 )
20619 && !$tokenizer_self->{_know_starting_level} )
20622 if ( $input_tabstr ne $saved_input_tabstr ) {
20624 "I made a bad starting level guess; rerun with a value for -sil \n"
20630 # use current guess at input tabbing to get input indentation level
20632 # Patch to handle a common case of entabbed leading whitespace
20633 # If the leading whitespace equals 4 spaces and we also have
20634 # tabs, detab the input whitespace assuming 8 spaces per tab.
20635 if ( length($input_tabstr) == 4 ) {
20636 $leading_whitespace =~ s/^\t+/ /g;
20639 if ( ( my $len_tab = length($input_tabstr) ) > 0 ) {
20642 while ( substr( $leading_whitespace, $pos, $len_tab ) eq $input_tabstr )
20648 return ( $level, $msg );
20651 # This is a currently unused debug routine
20652 sub dump_functions {
20656 foreach $pkg ( keys %is_user_function ) {
20657 print $fh "\nnon-constant subs in package $pkg\n";
20659 foreach $sub ( keys %{ $is_user_function{$pkg} } ) {
20661 if ( $is_block_list_function{$pkg}{$sub} ) {
20662 $msg = 'block_list';
20665 if ( $is_block_function{$pkg}{$sub} ) {
20668 print $fh "$sub $msg\n";
20672 foreach $pkg ( keys %is_constant ) {
20673 print $fh "\nconstants and constant subs in package $pkg\n";
20675 foreach $sub ( keys %{ $is_constant{$pkg} } ) {
20676 print $fh "$sub\n";
20683 # count number of 1's in a string of 1's and 0's
20684 # example: ones_count("010101010101") gives 6
20685 return ( my $cis = $_[0] ) =~ tr/1/0/;
20688 sub prepare_for_a_new_file {
20690 # previous tokens needed to determine what to expect next
20691 $last_nonblank_token = ';'; # the only possible starting state which
20692 $last_nonblank_type = ';'; # will make a leading brace a code block
20693 $last_nonblank_block_type = '';
20695 # scalars for remembering statement types across multiple lines
20696 $statement_type = ''; # '' or 'use' or 'sub..' or 'case..'
20697 $in_attribute_list = 0;
20699 # scalars for remembering where we are in the file
20700 $current_package = "main";
20701 $context = UNKNOWN_CONTEXT;
20703 # hashes used to remember function information
20704 %is_constant = (); # user-defined constants
20705 %is_user_function = (); # user-defined functions
20706 %user_function_prototype = (); # their prototypes
20707 %is_block_function = ();
20708 %is_block_list_function = ();
20709 %saw_function_definition = ();
20711 # variables used to track depths of various containers
20712 # and report nesting errors
20715 $square_bracket_depth = 0;
20716 @current_depth[ 0 .. $#closing_brace_names ] =
20717 (0) x scalar @closing_brace_names;
20718 @nesting_sequence_number[ 0 .. $#closing_brace_names ] =
20719 ( 0 .. $#closing_brace_names );
20720 @current_sequence_number = ();
20721 $paren_type[$paren_depth] = '';
20722 $paren_semicolon_count[$paren_depth] = 0;
20723 $paren_structural_type[$brace_depth] = '';
20724 $brace_type[$brace_depth] = ';'; # identify opening brace as code block
20725 $brace_structural_type[$brace_depth] = '';
20726 $brace_statement_type[$brace_depth] = "";
20727 $brace_context[$brace_depth] = UNKNOWN_CONTEXT;
20728 $brace_package[$paren_depth] = $current_package;
20729 $square_bracket_type[$square_bracket_depth] = '';
20730 $square_bracket_structural_type[$square_bracket_depth] = '';
20732 initialize_tokenizer_state();
20735 { # begin tokenize_this_line
20737 use constant BRACE => 0;
20738 use constant SQUARE_BRACKET => 1;
20739 use constant PAREN => 2;
20740 use constant QUESTION_COLON => 3;
20742 # TV1: scalars for processing one LINE.
20743 # Re-initialized on each entry to sub tokenize_this_line.
20745 $block_type, $container_type, $expecting,
20746 $i, $i_tok, $input_line,
20747 $input_line_number, $last_nonblank_i, $max_token_index,
20748 $next_tok, $next_type, $peeked_ahead,
20749 $prototype, $rhere_target_list, $rtoken_map,
20750 $rtoken_type, $rtokens, $tok,
20751 $type, $type_sequence,
20754 # TV2: refs to ARRAYS for processing one LINE
20755 # Re-initialized on each call.
20756 my $routput_token_list = []; # stack of output token indexes
20757 my $routput_token_type = []; # token types
20758 my $routput_block_type = []; # types of code block
20759 my $routput_container_type = []; # paren types, such as if, elsif, ..
20760 my $routput_type_sequence = []; # nesting sequential number
20762 # TV3: SCALARS for quote variables. These are initialized with a
20763 # subroutine call and continually updated as lines are processed.
20764 my ( $in_quote, $quote_type, $quote_character, $quote_pos, $quote_depth,
20765 $quoted_string_1, $quoted_string_2, $allowed_quote_modifiers, );
20767 # TV4: SCALARS for multi-line identifiers and
20768 # statements. These are initialized with a subroutine call
20769 # and continually updated as lines are processed.
20770 my ( $id_scan_state, $identifier, $want_paren, );
20772 # TV5: SCALARS for tracking indentation level.
20773 # Initialized once and continually updated as lines are
20776 $nesting_token_string, $nesting_type_string,
20777 $nesting_block_string, $nesting_block_flag,
20778 $nesting_list_string, $nesting_list_flag,
20779 $ci_string_in_tokenizer, $continuation_string_in_tokenizer,
20780 $in_statement_continuation, $level_in_tokenizer,
20781 $slevel_in_tokenizer, $rslevel_stack,
20784 # TV6: SCALARS for remembering several previous
20785 # tokens. Initialized once and continually updated as
20786 # lines are processed.
20788 $last_nonblank_container_type, $last_nonblank_type_sequence,
20789 $last_last_nonblank_token, $last_last_nonblank_type,
20790 $last_last_nonblank_block_type, $last_last_nonblank_container_type,
20791 $last_last_nonblank_type_sequence, $last_nonblank_prototype,
20794 # ----------------------------------------------------------------
20795 # beginning of tokenizer variable access and manipulation routines
20796 # ----------------------------------------------------------------
20798 sub initialize_tokenizer_state {
20800 # TV1: initialized on each call
20801 # TV2: initialized on each call
20805 $quote_character = "";
20808 $quoted_string_1 = "";
20809 $quoted_string_2 = "";
20810 $allowed_quote_modifiers = "";
20813 $id_scan_state = '';
20818 $nesting_token_string = "";
20819 $nesting_type_string = "";
20820 $nesting_block_string = '1'; # initially in a block
20821 $nesting_block_flag = 1;
20822 $nesting_list_string = '0'; # initially not in a list
20823 $nesting_list_flag = 0; # initially not in a list
20824 $ci_string_in_tokenizer = "";
20825 $continuation_string_in_tokenizer = "0";
20826 $in_statement_continuation = 0;
20827 $level_in_tokenizer = 0;
20828 $slevel_in_tokenizer = 0;
20829 $rslevel_stack = [];
20832 $last_nonblank_container_type = '';
20833 $last_nonblank_type_sequence = '';
20834 $last_last_nonblank_token = ';';
20835 $last_last_nonblank_type = ';';
20836 $last_last_nonblank_block_type = '';
20837 $last_last_nonblank_container_type = '';
20838 $last_last_nonblank_type_sequence = '';
20839 $last_nonblank_prototype = "";
20842 sub save_tokenizer_state {
20845 $block_type, $container_type, $expecting,
20846 $i, $i_tok, $input_line,
20847 $input_line_number, $last_nonblank_i, $max_token_index,
20848 $next_tok, $next_type, $peeked_ahead,
20849 $prototype, $rhere_target_list, $rtoken_map,
20850 $rtoken_type, $rtokens, $tok,
20851 $type, $type_sequence,
20855 $routput_token_list, $routput_token_type,
20856 $routput_block_type, $routput_container_type,
20857 $routput_type_sequence,
20861 $in_quote, $quote_type,
20862 $quote_character, $quote_pos,
20863 $quote_depth, $quoted_string_1,
20864 $quoted_string_2, $allowed_quote_modifiers,
20867 my $rTV4 = [ $id_scan_state, $identifier, $want_paren, ];
20870 $nesting_token_string, $nesting_type_string,
20871 $nesting_block_string, $nesting_block_flag,
20872 $nesting_list_string, $nesting_list_flag,
20873 $ci_string_in_tokenizer, $continuation_string_in_tokenizer,
20874 $in_statement_continuation, $level_in_tokenizer,
20875 $slevel_in_tokenizer, $rslevel_stack,
20879 $last_nonblank_container_type,
20880 $last_nonblank_type_sequence,
20881 $last_last_nonblank_token,
20882 $last_last_nonblank_type,
20883 $last_last_nonblank_block_type,
20884 $last_last_nonblank_container_type,
20885 $last_last_nonblank_type_sequence,
20886 $last_nonblank_prototype,
20888 return [ $rTV1, $rTV2, $rTV3, $rTV4, $rTV5, $rTV6 ];
20891 sub restore_tokenizer_state {
20893 my ( $rTV1, $rTV2, $rTV3, $rTV4, $rTV5, $rTV6 ) = @{$rstate};
20895 $block_type, $container_type, $expecting,
20896 $i, $i_tok, $input_line,
20897 $input_line_number, $last_nonblank_i, $max_token_index,
20898 $next_tok, $next_type, $peeked_ahead,
20899 $prototype, $rhere_target_list, $rtoken_map,
20900 $rtoken_type, $rtokens, $tok,
20901 $type, $type_sequence,
20905 $routput_token_list, $routput_token_type,
20906 $routput_block_type, $routput_container_type,
20907 $routput_type_sequence,
20911 $in_quote, $quote_type, $quote_character, $quote_pos, $quote_depth,
20912 $quoted_string_1, $quoted_string_2, $allowed_quote_modifiers,
20915 ( $id_scan_state, $identifier, $want_paren, ) = @{$rTV4};
20918 $nesting_token_string, $nesting_type_string,
20919 $nesting_block_string, $nesting_block_flag,
20920 $nesting_list_string, $nesting_list_flag,
20921 $ci_string_in_tokenizer, $continuation_string_in_tokenizer,
20922 $in_statement_continuation, $level_in_tokenizer,
20923 $slevel_in_tokenizer, $rslevel_stack,
20927 $last_nonblank_container_type,
20928 $last_nonblank_type_sequence,
20929 $last_last_nonblank_token,
20930 $last_last_nonblank_type,
20931 $last_last_nonblank_block_type,
20932 $last_last_nonblank_container_type,
20933 $last_last_nonblank_type_sequence,
20934 $last_nonblank_prototype,
20938 sub get_indentation_level {
20939 return $level_in_tokenizer;
20942 sub reset_indentation_level {
20943 $level_in_tokenizer = $_[0];
20944 $slevel_in_tokenizer = $_[0];
20945 push @{$rslevel_stack}, $slevel_in_tokenizer;
20949 $peeked_ahead = defined( $_[0] ) ? $_[0] : $peeked_ahead;
20952 # ------------------------------------------------------------
20953 # end of tokenizer variable access and manipulation routines
20954 # ------------------------------------------------------------
20956 # ------------------------------------------------------------
20957 # beginning of various scanner interface routines
20958 # ------------------------------------------------------------
20959 sub scan_replacement_text {
20961 # check for here-docs in replacement text invoked by
20962 # a substitution operator with executable modifier 'e'.
20965 # $replacement_text
20967 # $rht = reference to any here-doc targets
20968 my ($replacement_text) = @_;
20971 return undef unless ( $replacement_text =~ /<</ );
20973 write_logfile_entry("scanning replacement text for here-doc targets\n");
20975 # save the logger object for error messages
20976 my $logger_object = $tokenizer_self->{_logger_object};
20978 # localize all package variables
20980 $tokenizer_self, $last_nonblank_token,
20981 $last_nonblank_type, $last_nonblank_block_type,
20982 $statement_type, $in_attribute_list,
20983 $current_package, $context,
20984 %is_constant, %is_user_function,
20985 %user_function_prototype, %is_block_function,
20986 %is_block_list_function, %saw_function_definition,
20987 $brace_depth, $paren_depth,
20988 $square_bracket_depth, @current_depth,
20989 @nesting_sequence_number, @current_sequence_number,
20990 @paren_type, @paren_semicolon_count,
20991 @paren_structural_type, @brace_type,
20992 @brace_structural_type, @brace_statement_type,
20993 @brace_context, @brace_package,
20994 @square_bracket_type, @square_bracket_structural_type,
20995 @depth_array, @starting_line_of_current_depth,
20998 # save all lexical variables
20999 my $rstate = save_tokenizer_state();
21000 _decrement_count(); # avoid error check for multiple tokenizers
21002 # make a new tokenizer
21004 my $rpending_logfile_message;
21005 my $source_object =
21006 Perl::Tidy::LineSource->new( \$replacement_text, $rOpts,
21007 $rpending_logfile_message );
21008 my $tokenizer = Perl::Tidy::Tokenizer->new(
21009 source_object => $source_object,
21010 logger_object => $logger_object,
21011 starting_line_number => $input_line_number,
21014 # scan the replacement text
21015 1 while ( $tokenizer->get_line() );
21017 # remove any here doc targets
21019 if ( $tokenizer_self->{_in_here_doc} ) {
21023 $tokenizer_self->{_here_doc_target},
21024 $tokenizer_self->{_here_quote_character}
21026 if ( $tokenizer_self->{_rhere_target_list} ) {
21027 push @{$rht}, @{ $tokenizer_self->{_rhere_target_list} };
21028 $tokenizer_self->{_rhere_target_list} = undef;
21030 $tokenizer_self->{_in_here_doc} = undef;
21033 # now its safe to report errors
21034 $tokenizer->report_tokenization_errors();
21036 # restore all tokenizer lexical variables
21037 restore_tokenizer_state($rstate);
21039 # return the here doc targets
21043 sub scan_bare_identifier {
21044 ( $i, $tok, $type, $prototype ) =
21045 scan_bare_identifier_do( $input_line, $i, $tok, $type, $prototype,
21046 $rtoken_map, $max_token_index );
21049 sub scan_identifier {
21050 ( $i, $tok, $type, $id_scan_state, $identifier ) =
21051 scan_identifier_do( $i, $id_scan_state, $identifier, $rtokens,
21052 $max_token_index );
21056 ( $i, $tok, $type, $id_scan_state ) =
21057 scan_id_do( $input_line, $i, $tok, $rtokens, $rtoken_map,
21058 $id_scan_state, $max_token_index );
21063 ( $i, $type, $number ) =
21064 scan_number_do( $input_line, $i, $rtoken_map, $type,
21065 $max_token_index );
21069 # a sub to warn if token found where term expected
21070 sub error_if_expecting_TERM {
21071 if ( $expecting == TERM ) {
21072 if ( $really_want_term{$last_nonblank_type} ) {
21073 unexpected( $tok, "term", $i_tok, $last_nonblank_i, $rtoken_map,
21074 $rtoken_type, $input_line );
21080 # a sub to warn if token found where operator expected
21081 sub error_if_expecting_OPERATOR {
21082 if ( $expecting == OPERATOR ) {
21083 my $thing = defined $_[0] ? $_[0] : $tok;
21084 unexpected( $thing, "operator", $i_tok, $last_nonblank_i,
21085 $rtoken_map, $rtoken_type, $input_line );
21086 if ( $i_tok == 0 ) {
21087 interrupt_logfile();
21088 warning("Missing ';' above?\n");
21095 # ------------------------------------------------------------
21096 # end scanner interfaces
21097 # ------------------------------------------------------------
21099 my %is_for_foreach;
21100 @_ = qw(for foreach);
21101 @is_for_foreach{@_} = (1) x scalar(@_);
21105 @is_my_our{@_} = (1) x scalar(@_);
21107 # These keywords may introduce blocks after parenthesized expressions,
21109 # keyword ( .... ) { BLOCK }
21110 # patch for SWITCH/CASE: added 'switch' 'case' 'given' 'when'
21111 my %is_blocktype_with_paren;
21112 @_ = qw(if elsif unless while until for foreach switch case given when);
21113 @is_blocktype_with_paren{@_} = (1) x scalar(@_);
21115 # ------------------------------------------------------------
21116 # begin hash of code for handling most token types
21117 # ------------------------------------------------------------
21118 my $tokenization_code = {
21120 # no special code for these types yet, but syntax checks
21155 error_if_expecting_TERM()
21156 if ( $expecting == TERM );
21159 error_if_expecting_TERM()
21160 if ( $expecting == TERM );
21164 # start looking for a scalar
21165 error_if_expecting_OPERATOR("Scalar")
21166 if ( $expecting == OPERATOR );
21169 if ( $identifier eq '$^W' ) {
21170 $tokenizer_self->{_saw_perl_dash_w} = 1;
21173 # Check for indentifier in indirect object slot
21174 # (vorboard.pl, sort.t). Something like:
21175 # /^(print|printf|sort|exec|system)$/
21177 $is_indirect_object_taker{$last_nonblank_token}
21179 || ( ( $last_nonblank_token eq '(' )
21180 && $is_indirect_object_taker{ $paren_type[$paren_depth] } )
21181 || ( $last_nonblank_type =~ /^[Uw]$/ ) # possible object
21190 $paren_semicolon_count[$paren_depth] = 0;
21192 $container_type = $want_paren;
21196 $container_type = $last_nonblank_token;
21198 # We can check for a syntax error here of unexpected '(',
21199 # but this is going to get messy...
21201 $expecting == OPERATOR
21203 # be sure this is not a method call of the form
21204 # &method(...), $method->(..), &{method}(...),
21205 # $ref[2](list) is ok & short for $ref[2]->(list)
21206 # NOTE: at present, braces in something like &{ xxx }
21207 # are not marked as a block, we might have a method call
21208 && $last_nonblank_token !~ /^([\]\}\&]|\-\>)/
21213 # ref: camel 3 p 703.
21214 if ( $last_last_nonblank_token eq 'do' ) {
21216 "do SUBROUTINE is deprecated; consider & or -> notation\n"
21221 # if this is an empty list, (), then it is not an
21222 # error; for example, we might have a constant pi and
21223 # invoke it with pi() or just pi;
21224 my ( $next_nonblank_token, $i_next ) =
21225 find_next_nonblank_token( $i, $rtokens,
21226 $max_token_index );
21227 if ( $next_nonblank_token ne ')' ) {
21229 error_if_expecting_OPERATOR('(');
21231 if ( $last_nonblank_type eq 'C' ) {
21233 "$last_nonblank_token has a void prototype\n";
21235 elsif ( $last_nonblank_type eq 'i' ) {
21237 && $last_nonblank_token =~ /^\$/ )
21240 "Do you mean '$last_nonblank_token->(' ?\n";
21244 interrupt_logfile();
21248 } ## end if ( $next_nonblank_token...
21249 } ## end else [ if ( $last_last_nonblank_token...
21250 } ## end if ( $expecting == OPERATOR...
21252 $paren_type[$paren_depth] = $container_type;
21254 increase_nesting_depth( PAREN, $$rtoken_map[$i_tok] );
21256 # propagate types down through nested parens
21257 # for example: the second paren in 'if ((' would be structural
21258 # since the first is.
21260 if ( $last_nonblank_token eq '(' ) {
21261 $type = $last_nonblank_type;
21264 # We exclude parens as structural after a ',' because it
21265 # causes subtle problems with continuation indentation for
21266 # something like this, where the first 'or' will not get
21271 # ( not defined $check )
21273 # or $check eq "new"
21274 # or $check eq "old",
21277 # Likewise, we exclude parens where a statement can start
21278 # because of problems with continuation indentation, like
21281 # ($firstline =~ /^#\!.*perl/)
21282 # and (print $File::Find::name, "\n")
21285 # (ref($usage_fref) =~ /CODE/)
21287 # : (&blast_usage, &blast_params, &blast_general_params);
21293 if ( $last_nonblank_type eq ')' ) {
21295 "Syntax error? found token '$last_nonblank_type' then '('\n"
21298 $paren_structural_type[$paren_depth] = $type;
21303 decrease_nesting_depth( PAREN, $$rtoken_map[$i_tok] );
21305 if ( $paren_structural_type[$paren_depth] eq '{' ) {
21309 $container_type = $paren_type[$paren_depth];
21311 # /^(for|foreach)$/
21312 if ( $is_for_foreach{ $paren_type[$paren_depth] } ) {
21313 my $num_sc = $paren_semicolon_count[$paren_depth];
21314 if ( $num_sc > 0 && $num_sc != 2 ) {
21315 warning("Expected 2 ';' in 'for(;;)' but saw $num_sc\n");
21319 if ( $paren_depth > 0 ) { $paren_depth-- }
21322 if ( $last_nonblank_type eq ',' ) {
21323 complain("Repeated ','s \n");
21326 # patch for operator_expected: note if we are in the list (use.t)
21327 if ( $statement_type eq 'use' ) { $statement_type = '_use' }
21328 ## FIXME: need to move this elsewhere, perhaps check after a '('
21329 ## elsif ($last_nonblank_token eq '(') {
21330 ## warning("Leading ','s illegal in some versions of perl\n");
21334 $context = UNKNOWN_CONTEXT;
21335 $statement_type = '';
21337 # /^(for|foreach)$/
21338 if ( $is_for_foreach{ $paren_type[$paren_depth] } )
21339 { # mark ; in for loop
21341 # Be careful: we do not want a semicolon such as the
21342 # following to be included:
21344 # for (sort {strcoll($a,$b);} keys %investments) {
21346 if ( $brace_depth == $depth_array[PAREN][BRACE][$paren_depth]
21347 && $square_bracket_depth ==
21348 $depth_array[PAREN][SQUARE_BRACKET][$paren_depth] )
21352 $paren_semicolon_count[$paren_depth]++;
21358 error_if_expecting_OPERATOR("String")
21359 if ( $expecting == OPERATOR );
21362 $allowed_quote_modifiers = "";
21365 error_if_expecting_OPERATOR("String")
21366 if ( $expecting == OPERATOR );
21369 $allowed_quote_modifiers = "";
21372 error_if_expecting_OPERATOR("String")
21373 if ( $expecting == OPERATOR );
21376 $allowed_quote_modifiers = "";
21381 if ( $expecting == UNKNOWN ) { # indeterminte, must guess..
21383 ( $is_pattern, $msg ) =
21384 guess_if_pattern_or_division( $i, $rtokens, $rtoken_map,
21385 $max_token_index );
21388 write_diagnostics("DIVIDE:$msg\n");
21389 write_logfile_entry($msg);
21392 else { $is_pattern = ( $expecting == TERM ) }
21397 $allowed_quote_modifiers = '[cgimosx]';
21399 else { # not a pattern; check for a /= token
21401 if ( $$rtokens[ $i + 1 ] eq '=' ) { # form token /=
21407 #DEBUG - collecting info on what tokens follow a divide
21408 # for development of guessing algorithm
21409 #if ( numerator_expected( $i, $rtokens, $max_token_index ) < 0 ) {
21410 # #write_diagnostics( "DIVIDE? $input_line\n" );
21416 # if we just saw a ')', we will label this block with
21417 # its type. We need to do this to allow sub
21418 # code_block_type to determine if this brace starts a
21419 # code block or anonymous hash. (The type of a paren
21420 # pair is the preceding token, such as 'if', 'else',
21422 $container_type = "";
21424 # ATTRS: for a '{' following an attribute list, reset
21425 # things to look like we just saw the sub name
21426 if ( $statement_type =~ /^sub/ ) {
21427 $last_nonblank_token = $statement_type;
21428 $last_nonblank_type = 'i';
21429 $statement_type = "";
21432 # patch for SWITCH/CASE: hide these keywords from an immediately
21433 # following opening brace
21434 elsif ( ( $statement_type eq 'case' || $statement_type eq 'when' )
21435 && $statement_type eq $last_nonblank_token )
21437 $last_nonblank_token = ";";
21440 elsif ( $last_nonblank_token eq ')' ) {
21441 $last_nonblank_token = $paren_type[ $paren_depth + 1 ];
21443 # defensive move in case of a nesting error (pbug.t)
21444 # in which this ')' had no previous '('
21445 # this nesting error will have been caught
21446 if ( !defined($last_nonblank_token) ) {
21447 $last_nonblank_token = 'if';
21450 # check for syntax error here;
21451 unless ( $is_blocktype_with_paren{$last_nonblank_token} ) {
21452 my $list = join( ' ', sort keys %is_blocktype_with_paren );
21454 "syntax error at ') {', didn't see one of: $list\n");
21458 # patch for paren-less for/foreach glitch, part 2.
21459 # see note below under 'qw'
21460 elsif ($last_nonblank_token eq 'qw'
21461 && $is_for_foreach{$want_paren} )
21463 $last_nonblank_token = $want_paren;
21464 if ( $last_last_nonblank_token eq $want_paren ) {
21466 "syntax error at '$want_paren .. {' -- missing \$ loop variable\n"
21473 # now identify which of the three possible types of
21474 # curly braces we have: hash index container, anonymous
21475 # hash reference, or code block.
21477 # non-structural (hash index) curly brace pair
21478 # get marked 'L' and 'R'
21479 if ( is_non_structural_brace() ) {
21482 # patch for SWITCH/CASE:
21483 # allow paren-less identifier after 'when'
21484 # if the brace is preceded by a space
21485 if ( $statement_type eq 'when'
21486 && $last_nonblank_type eq 'i'
21487 && $last_last_nonblank_type eq 'k'
21488 && ( $i_tok == 0 || $rtoken_type->[ $i_tok - 1 ] eq 'b' ) )
21491 $block_type = $statement_type;
21495 # code and anonymous hash have the same type, '{', but are
21496 # distinguished by 'block_type',
21497 # which will be blank for an anonymous hash
21500 $block_type = code_block_type( $i_tok, $rtokens, $rtoken_type,
21501 $max_token_index );
21503 # patch to promote bareword type to function taking block
21505 && $last_nonblank_type eq 'w'
21506 && $last_nonblank_i >= 0 )
21508 if ( $routput_token_type->[$last_nonblank_i] eq 'w' ) {
21509 $routput_token_type->[$last_nonblank_i] = 'G';
21513 # patch for SWITCH/CASE: if we find a stray opening block brace
21514 # where we might accept a 'case' or 'when' block, then take it
21515 if ( $statement_type eq 'case'
21516 || $statement_type eq 'when' )
21518 if ( !$block_type || $block_type eq '}' ) {
21519 $block_type = $statement_type;
21523 $brace_type[ ++$brace_depth ] = $block_type;
21524 $brace_package[$brace_depth] = $current_package;
21526 increase_nesting_depth( BRACE, $$rtoken_map[$i_tok] );
21527 $brace_structural_type[$brace_depth] = $type;
21528 $brace_context[$brace_depth] = $context;
21529 $brace_statement_type[$brace_depth] = $statement_type;
21532 $block_type = $brace_type[$brace_depth];
21533 if ($block_type) { $statement_type = '' }
21534 if ( defined( $brace_package[$brace_depth] ) ) {
21535 $current_package = $brace_package[$brace_depth];
21538 # can happen on brace error (caught elsewhere)
21542 decrease_nesting_depth( BRACE, $$rtoken_map[$i_tok] );
21544 if ( $brace_structural_type[$brace_depth] eq 'L' ) {
21548 # propagate type information for 'do' and 'eval' blocks.
21549 # This is necessary to enable us to know if an operator
21550 # or term is expected next
21551 if ( $is_block_operator{ $brace_type[$brace_depth] } ) {
21552 $tok = $brace_type[$brace_depth];
21555 $context = $brace_context[$brace_depth];
21556 $statement_type = $brace_statement_type[$brace_depth];
21557 if ( $brace_depth > 0 ) { $brace_depth--; }
21559 '&' => sub { # maybe sub call? start looking
21561 # We have to check for sub call unless we are sure we
21562 # are expecting an operator. This example from s2p
21563 # got mistaken as a q operator in an early version:
21564 # print BODY &q(<<'EOT');
21565 if ( $expecting != OPERATOR ) {
21571 '<' => sub { # angle operator or less than?
21573 if ( $expecting != OPERATOR ) {
21575 find_angle_operator_termination( $input_line, $i, $rtoken_map,
21576 $expecting, $max_token_index );
21582 '?' => sub { # ?: conditional or starting pattern?
21586 if ( $expecting == UNKNOWN ) {
21589 ( $is_pattern, $msg ) =
21590 guess_if_pattern_or_conditional( $i, $rtokens, $rtoken_map,
21591 $max_token_index );
21593 if ($msg) { write_logfile_entry($msg) }
21595 else { $is_pattern = ( $expecting == TERM ) }
21600 $allowed_quote_modifiers = '[cgimosx]'; # TBD:check this
21604 increase_nesting_depth( QUESTION_COLON,
21605 $$rtoken_map[$i_tok] );
21608 '*' => sub { # typeglob, or multiply?
21610 if ( $expecting == TERM ) {
21615 if ( $$rtokens[ $i + 1 ] eq '=' ) {
21620 elsif ( $$rtokens[ $i + 1 ] eq '*' ) {
21624 if ( $$rtokens[ $i + 1 ] eq '=' ) {
21632 '.' => sub { # what kind of . ?
21634 if ( $expecting != OPERATOR ) {
21636 if ( $type eq '.' ) {
21637 error_if_expecting_TERM()
21638 if ( $expecting == TERM );
21646 # if this is the first nonblank character, call it a label
21647 # since perl seems to just swallow it
21648 if ( $input_line_number == 1 && $last_nonblank_i == -1 ) {
21652 # ATTRS: check for a ':' which introduces an attribute list
21653 # (this might eventually get its own token type)
21654 elsif ( $statement_type =~ /^sub/ ) {
21656 $in_attribute_list = 1;
21659 # check for scalar attribute, such as
21660 # my $foo : shared = 1;
21661 elsif ($is_my_our{$statement_type}
21662 && $current_depth[QUESTION_COLON] == 0 )
21665 $in_attribute_list = 1;
21668 # otherwise, it should be part of a ?/: operator
21671 decrease_nesting_depth( QUESTION_COLON,
21672 $$rtoken_map[$i_tok] );
21673 if ( $last_nonblank_token eq '?' ) {
21674 warning("Syntax error near ? :\n");
21678 '+' => sub { # what kind of plus?
21680 if ( $expecting == TERM ) {
21681 my $number = scan_number();
21683 # unary plus is safest assumption if not a number
21684 if ( !defined($number) ) { $type = 'p'; }
21686 elsif ( $expecting == OPERATOR ) {
21689 if ( $next_type eq 'w' ) { $type = 'p' }
21694 error_if_expecting_OPERATOR("Array")
21695 if ( $expecting == OPERATOR );
21698 '%' => sub { # hash or modulo?
21700 # first guess is hash if no following blank
21701 if ( $expecting == UNKNOWN ) {
21702 if ( $next_type ne 'b' ) { $expecting = TERM }
21704 if ( $expecting == TERM ) {
21709 $square_bracket_type[ ++$square_bracket_depth ] =
21710 $last_nonblank_token;
21712 increase_nesting_depth( SQUARE_BRACKET, $$rtoken_map[$i_tok] );
21714 # It may seem odd, but structural square brackets have
21715 # type '{' and '}'. This simplifies the indentation logic.
21716 if ( !is_non_structural_brace() ) {
21719 $square_bracket_structural_type[$square_bracket_depth] = $type;
21723 decrease_nesting_depth( SQUARE_BRACKET, $$rtoken_map[$i_tok] );
21725 if ( $square_bracket_structural_type[$square_bracket_depth] eq '{' )
21729 if ( $square_bracket_depth > 0 ) { $square_bracket_depth--; }
21731 '-' => sub { # what kind of minus?
21733 if ( ( $expecting != OPERATOR )
21734 && $is_file_test_operator{$next_tok} )
21740 elsif ( $expecting == TERM ) {
21741 my $number = scan_number();
21743 # maybe part of bareword token? unary is safest
21744 if ( !defined($number) ) { $type = 'm'; }
21747 elsif ( $expecting == OPERATOR ) {
21751 if ( $next_type eq 'w' ) {
21759 # check for special variables like ${^WARNING_BITS}
21760 if ( $expecting == TERM ) {
21762 # FIXME: this should work but will not catch errors
21763 # because we also have to be sure that previous token is
21764 # a type character ($,@,%).
21765 if ( $last_nonblank_token eq '{'
21766 && ( $next_tok =~ /^[A-Za-z_]/ ) )
21769 if ( $next_tok eq 'W' ) {
21770 $tokenizer_self->{_saw_perl_dash_w} = 1;
21772 $tok = $tok . $next_tok;
21778 unless ( error_if_expecting_TERM() ) {
21780 # Something like this is valid but strange:
21782 complain("The '^' seems unusual here\n");
21788 '::' => sub { # probably a sub call
21789 scan_bare_identifier();
21791 '<<' => sub { # maybe a here-doc?
21793 unless ( $i < $max_token_index )
21794 ; # here-doc not possible if end of line
21796 if ( $expecting != OPERATOR ) {
21797 my ( $found_target, $here_doc_target, $here_quote_character,
21800 $found_target, $here_doc_target, $here_quote_character, $i,
21803 = find_here_doc( $expecting, $i, $rtokens, $rtoken_map,
21804 $max_token_index );
21806 if ($found_target) {
21807 push @{$rhere_target_list},
21808 [ $here_doc_target, $here_quote_character ];
21810 if ( length($here_doc_target) > 80 ) {
21811 my $truncated = substr( $here_doc_target, 0, 80 );
21812 complain("Long here-target: '$truncated' ...\n");
21814 elsif ( $here_doc_target !~ /^[A-Z_]\w+$/ ) {
21816 "Unconventional here-target: '$here_doc_target'\n"
21820 elsif ( $expecting == TERM ) {
21821 unless ($saw_error) {
21823 # shouldn't happen..
21824 warning("Program bug; didn't find here doc target\n");
21825 report_definite_bug();
21834 # if -> points to a bare word, we must scan for an identifier,
21835 # otherwise something like ->y would look like the y operator
21839 # type = 'pp' for pre-increment, '++' for post-increment
21841 if ( $expecting == TERM ) { $type = 'pp' }
21842 elsif ( $expecting == UNKNOWN ) {
21843 my ( $next_nonblank_token, $i_next ) =
21844 find_next_nonblank_token( $i, $rtokens, $max_token_index );
21845 if ( $next_nonblank_token eq '$' ) { $type = 'pp' }
21850 if ( $last_nonblank_type eq $tok ) {
21851 complain("Repeated '=>'s \n");
21854 # patch for operator_expected: note if we are in the list (use.t)
21855 # TODO: make version numbers a new token type
21856 if ( $statement_type eq 'use' ) { $statement_type = '_use' }
21859 # type = 'mm' for pre-decrement, '--' for post-decrement
21862 if ( $expecting == TERM ) { $type = 'mm' }
21863 elsif ( $expecting == UNKNOWN ) {
21864 my ( $next_nonblank_token, $i_next ) =
21865 find_next_nonblank_token( $i, $rtokens, $max_token_index );
21866 if ( $next_nonblank_token eq '$' ) { $type = 'mm' }
21871 error_if_expecting_TERM()
21872 if ( $expecting == TERM );
21876 error_if_expecting_TERM()
21877 if ( $expecting == TERM );
21881 error_if_expecting_TERM()
21882 if ( $expecting == TERM );
21886 # ------------------------------------------------------------
21887 # end hash of code for handling individual token types
21888 # ------------------------------------------------------------
21890 my %matching_start_token = ( '}' => '{', ']' => '[', ')' => '(' );
21892 # These block types terminate statements and do not need a trailing
21894 # patched for SWITCH/CASE:
21895 my %is_zero_continuation_block_type;
21896 @_ = qw( } { BEGIN END CHECK INIT AUTOLOAD DESTROY continue ;
21897 if elsif else unless while until for foreach switch case given when);
21898 @is_zero_continuation_block_type{@_} = (1) x scalar(@_);
21900 my %is_not_zero_continuation_block_type;
21901 @_ = qw(sort grep map do eval);
21902 @is_not_zero_continuation_block_type{@_} = (1) x scalar(@_);
21904 my %is_logical_container;
21905 @_ = qw(if elsif unless while and or err not && ! || for foreach);
21906 @is_logical_container{@_} = (1) x scalar(@_);
21908 my %is_binary_type;
21910 @is_binary_type{@_} = (1) x scalar(@_);
21912 my %is_binary_keyword;
21913 @_ = qw(and or err eq ne cmp);
21914 @is_binary_keyword{@_} = (1) x scalar(@_);
21916 # 'L' is token for opening { at hash key
21917 my %is_opening_type;
21918 @_ = qw" L { ( [ ";
21919 @is_opening_type{@_} = (1) x scalar(@_);
21921 # 'R' is token for closing } at hash key
21922 my %is_closing_type;
21923 @_ = qw" R } ) ] ";
21924 @is_closing_type{@_} = (1) x scalar(@_);
21926 my %is_redo_last_next_goto;
21927 @_ = qw(redo last next goto);
21928 @is_redo_last_next_goto{@_} = (1) x scalar(@_);
21930 my %is_use_require;
21931 @_ = qw(use require);
21932 @is_use_require{@_} = (1) x scalar(@_);
21934 my %is_sub_package;
21935 @_ = qw(sub package);
21936 @is_sub_package{@_} = (1) x scalar(@_);
21938 # This hash holds the hash key in $tokenizer_self for these keywords:
21939 my %is_format_END_DATA = (
21940 'format' => '_in_format',
21941 '__END__' => '_in_end',
21942 '__DATA__' => '_in_data',
21945 # ref: camel 3 p 147,
21946 # but perl may accept undocumented flags
21947 my %quote_modifiers = (
21948 's' => '[cegimosx]',
21951 'm' => '[cgimosx]',
21959 # table showing how many quoted things to look for after quote operator..
21960 # s, y, tr have 2 (pattern and replacement)
21961 # others have 1 (pattern only)
21962 my %quote_items = (
21974 sub tokenize_this_line {
21976 # This routine breaks a line of perl code into tokens which are of use in
21977 # indentation and reformatting. One of my goals has been to define tokens
21978 # such that a newline may be inserted between any pair of tokens without
21979 # changing or invalidating the program. This version comes close to this,
21980 # although there are necessarily a few exceptions which must be caught by
21981 # the formatter. Many of these involve the treatment of bare words.
21983 # The tokens and their types are returned in arrays. See previous
21984 # routine for their names.
21986 # See also the array "valid_token_types" in the BEGIN section for an
21989 # To simplify things, token types are either a single character, or they
21990 # are identical to the tokens themselves.
21992 # As a debugging aid, the -D flag creates a file containing a side-by-side
21993 # comparison of the input string and its tokenization for each line of a file.
21994 # This is an invaluable debugging aid.
21996 # In addition to tokens, and some associated quantities, the tokenizer
21997 # also returns flags indication any special line types. These include
21998 # quotes, here_docs, formats.
22000 # -----------------------------------------------------------------------
22002 # How to add NEW_TOKENS:
22004 # New token types will undoubtedly be needed in the future both to keep up
22005 # with changes in perl and to help adapt the tokenizer to other applications.
22007 # Here are some notes on the minimal steps. I wrote these notes while
22008 # adding the 'v' token type for v-strings, which are things like version
22009 # numbers 5.6.0, and ip addresses, and will use that as an example. ( You
22010 # can use your editor to search for the string "NEW_TOKENS" to find the
22011 # appropriate sections to change):
22013 # *. Try to talk somebody else into doing it! If not, ..
22015 # *. Make a backup of your current version in case things don't work out!
22017 # *. Think of a new, unused character for the token type, and add to
22018 # the array @valid_token_types in the BEGIN section of this package.
22019 # For example, I used 'v' for v-strings.
22021 # *. Implement coding to recognize the $type of the token in this routine.
22022 # This is the hardest part, and is best done by immitating or modifying
22023 # some of the existing coding. For example, to recognize v-strings, I
22024 # patched 'sub scan_bare_identifier' to recognize v-strings beginning with
22025 # 'v' and 'sub scan_number' to recognize v-strings without the leading 'v'.
22027 # *. Update sub operator_expected. This update is critically important but
22028 # the coding is trivial. Look at the comments in that routine for help.
22029 # For v-strings, which should behave like numbers, I just added 'v' to the
22030 # regex used to handle numbers and strings (types 'n' and 'Q').
22032 # *. Implement a 'bond strength' rule in sub set_bond_strengths in
22033 # Perl::Tidy::Formatter for breaking lines around this token type. You can
22034 # skip this step and take the default at first, then adjust later to get
22035 # desired results. For adding type 'v', I looked at sub bond_strength and
22036 # saw that number type 'n' was using default strengths, so I didn't do
22037 # anything. I may tune it up someday if I don't like the way line
22038 # breaks with v-strings look.
22040 # *. Implement a 'whitespace' rule in sub set_white_space_flag in
22041 # Perl::Tidy::Formatter. For adding type 'v', I looked at this routine
22042 # and saw that type 'n' used spaces on both sides, so I just added 'v'
22043 # to the array @spaces_both_sides.
22045 # *. Update HtmlWriter package so that users can colorize the token as
22046 # desired. This is quite easy; see comments identified by 'NEW_TOKENS' in
22047 # that package. For v-strings, I initially chose to use a default color
22048 # equal to the default for numbers, but it might be nice to change that
22051 # *. Update comments in Perl::Tidy::Tokenizer::dump_token_types.
22053 # *. Run lots and lots of debug tests. Start with special files designed
22054 # to test the new token type. Run with the -D flag to create a .DEBUG
22055 # file which shows the tokenization. When these work ok, test as many old
22056 # scripts as possible. Start with all of the '.t' files in the 'test'
22057 # directory of the distribution file. Compare .tdy output with previous
22058 # version and updated version to see the differences. Then include as
22059 # many more files as possible. My own technique has been to collect a huge
22060 # number of perl scripts (thousands!) into one directory and run perltidy
22061 # *, then run diff between the output of the previous version and the
22064 # *. For another example, search for the smartmatch operator '~~'
22065 # with your editor to see where updates were made for it.
22067 # -----------------------------------------------------------------------
22069 my $line_of_tokens = shift;
22070 my ($untrimmed_input_line) = $line_of_tokens->{_line_text};
22072 # patch while coding change is underway
22073 # make callers private data to allow access
22074 # $tokenizer_self = $caller_tokenizer_self;
22076 # extract line number for use in error messages
22077 $input_line_number = $line_of_tokens->{_line_number};
22079 # reinitialize for multi-line quote
22080 $line_of_tokens->{_starting_in_quote} = $in_quote && $quote_type eq 'Q';
22082 # check for pod documentation
22083 if ( ( $untrimmed_input_line =~ /^=[A-Za-z_]/ ) ) {
22085 # must not be in multi-line quote
22086 # and must not be in an eqn
22087 if ( !$in_quote and ( operator_expected( 'b', '=', 'b' ) == TERM ) )
22089 $tokenizer_self->{_in_pod} = 1;
22094 $input_line = $untrimmed_input_line;
22098 # trim start of this line unless we are continuing a quoted line
22099 # do not trim end because we might end in a quote (test: deken4.pl)
22100 # Perl::Tidy::Formatter will delete needless trailing blanks
22101 unless ( $in_quote && ( $quote_type eq 'Q' ) ) {
22102 $input_line =~ s/^\s*//; # trim left end
22105 # update the copy of the line for use in error messages
22106 # This must be exactly what we give the pre_tokenizer
22107 $tokenizer_self->{_line_text} = $input_line;
22109 # re-initialize for the main loop
22110 $routput_token_list = []; # stack of output token indexes
22111 $routput_token_type = []; # token types
22112 $routput_block_type = []; # types of code block
22113 $routput_container_type = []; # paren types, such as if, elsif, ..
22114 $routput_type_sequence = []; # nesting sequential number
22116 $rhere_target_list = [];
22118 $tok = $last_nonblank_token;
22119 $type = $last_nonblank_type;
22120 $prototype = $last_nonblank_prototype;
22121 $last_nonblank_i = -1;
22122 $block_type = $last_nonblank_block_type;
22123 $container_type = $last_nonblank_container_type;
22124 $type_sequence = $last_nonblank_type_sequence;
22127 # tokenization is done in two stages..
22128 # stage 1 is a very simple pre-tokenization
22129 my $max_tokens_wanted = 0; # this signals pre_tokenize to get all tokens
22131 # a little optimization for a full-line comment
22132 if ( !$in_quote && ( $input_line =~ /^#/ ) ) {
22133 $max_tokens_wanted = 1 # no use tokenizing a comment
22136 # start by breaking the line into pre-tokens
22137 ( $rtokens, $rtoken_map, $rtoken_type ) =
22138 pre_tokenize( $input_line, $max_tokens_wanted );
22140 $max_token_index = scalar(@$rtokens) - 1;
22141 push( @$rtokens, ' ', ' ', ' ' ); # extra whitespace simplifies logic
22142 push( @$rtoken_map, 0, 0, 0 ); # shouldn't be referenced
22143 push( @$rtoken_type, 'b', 'b', 'b' );
22145 # initialize for main loop
22146 for $i ( 0 .. $max_token_index + 3 ) {
22147 $routput_token_type->[$i] = "";
22148 $routput_block_type->[$i] = "";
22149 $routput_container_type->[$i] = "";
22150 $routput_type_sequence->[$i] = "";
22155 # ------------------------------------------------------------
22156 # begin main tokenization loop
22157 # ------------------------------------------------------------
22159 # we are looking at each pre-token of one line and combining them
22161 while ( ++$i <= $max_token_index ) {
22163 if ($in_quote) { # continue looking for end of a quote
22164 $type = $quote_type;
22166 unless ( @{$routput_token_list} )
22167 { # initialize if continuation line
22168 push( @{$routput_token_list}, $i );
22169 $routput_token_type->[$i] = $type;
22172 $tok = $quote_character unless ( $quote_character =~ /^\s*$/ );
22174 # scan for the end of the quote or pattern
22176 $i, $in_quote, $quote_character, $quote_pos, $quote_depth,
22177 $quoted_string_1, $quoted_string_2
22180 $i, $in_quote, $quote_character,
22181 $quote_pos, $quote_depth, $quoted_string_1,
22182 $quoted_string_2, $rtokens, $rtoken_map,
22186 # all done if we didn't find it
22187 last if ($in_quote);
22189 # save pattern and replacement text for rescanning
22190 my $qs1 = $quoted_string_1;
22191 my $qs2 = $quoted_string_2;
22193 # re-initialize for next search
22194 $quote_character = '';
22197 $quoted_string_1 = "";
22198 $quoted_string_2 = "";
22199 last if ( ++$i > $max_token_index );
22201 # look for any modifiers
22202 if ($allowed_quote_modifiers) {
22204 # check for exact quote modifiers
22205 if ( $$rtokens[$i] =~ /^[A-Za-z_]/ ) {
22206 my $str = $$rtokens[$i];
22207 my $saw_modifier_e;
22208 while ( $str =~ /\G$allowed_quote_modifiers/gc ) {
22209 my $pos = pos($str);
22210 my $char = substr( $str, $pos - 1, 1 );
22211 $saw_modifier_e ||= ( $char eq 'e' );
22214 # For an 'e' quote modifier we must scan the replacement
22215 # text for here-doc targets.
22216 if ($saw_modifier_e) {
22218 my $rht = scan_replacement_text($qs1);
22220 # Change type from 'Q' to 'h' for quotes with
22221 # here-doc targets so that the formatter (see sub
22222 # print_line_of_tokens) will not make any line
22223 # breaks after this point.
22225 push @{$rhere_target_list}, @{$rht};
22227 if ( $i_tok < 0 ) {
22228 my $ilast = $routput_token_list->[-1];
22229 $routput_token_type->[$ilast] = $type;
22234 if ( defined( pos($str) ) ) {
22237 if ( pos($str) == length($str) ) {
22238 last if ( ++$i > $max_token_index );
22241 # Looks like a joined quote modifier
22242 # and keyword, maybe something like
22243 # s/xxx/yyy/gefor @k=...
22244 # Example is "galgen.pl". Would have to split
22245 # the word and insert a new token in the
22246 # pre-token list. This is so rare that I haven't
22247 # done it. Will just issue a warning citation.
22249 # This error might also be triggered if my quote
22250 # modifier characters are incomplete
22254 Partial match to quote modifier $allowed_quote_modifiers at word: '$str'
22255 Please put a space between quote modifiers and trailing keywords.
22258 # print "token $$rtokens[$i]\n";
22259 # my $num = length($str) - pos($str);
22260 # $$rtokens[$i]=substr($$rtokens[$i],pos($str),$num);
22261 # print "continuing with new token $$rtokens[$i]\n";
22263 # skipping past this token does least damage
22264 last if ( ++$i > $max_token_index );
22269 # example file: rokicki4.pl
22270 # This error might also be triggered if my quote
22271 # modifier characters are incomplete
22272 write_logfile_entry(
22273 "Note: found word $str at quote modifier location\n"
22279 $allowed_quote_modifiers = "";
22283 unless ( $tok =~ /^\s*$/ ) {
22285 # try to catch some common errors
22286 if ( ( $type eq 'n' ) && ( $tok ne '0' ) ) {
22288 if ( $last_nonblank_token eq 'eq' ) {
22289 complain("Should 'eq' be '==' here ?\n");
22291 elsif ( $last_nonblank_token eq 'ne' ) {
22292 complain("Should 'ne' be '!=' here ?\n");
22296 $last_last_nonblank_token = $last_nonblank_token;
22297 $last_last_nonblank_type = $last_nonblank_type;
22298 $last_last_nonblank_block_type = $last_nonblank_block_type;
22299 $last_last_nonblank_container_type =
22300 $last_nonblank_container_type;
22301 $last_last_nonblank_type_sequence =
22302 $last_nonblank_type_sequence;
22303 $last_nonblank_token = $tok;
22304 $last_nonblank_type = $type;
22305 $last_nonblank_prototype = $prototype;
22306 $last_nonblank_block_type = $block_type;
22307 $last_nonblank_container_type = $container_type;
22308 $last_nonblank_type_sequence = $type_sequence;
22309 $last_nonblank_i = $i_tok;
22312 # store previous token type
22313 if ( $i_tok >= 0 ) {
22314 $routput_token_type->[$i_tok] = $type;
22315 $routput_block_type->[$i_tok] = $block_type;
22316 $routput_container_type->[$i_tok] = $container_type;
22317 $routput_type_sequence->[$i_tok] = $type_sequence;
22319 my $pre_tok = $$rtokens[$i]; # get the next pre-token
22320 my $pre_type = $$rtoken_type[$i]; # and type
22322 $type = $pre_type; # to be modified as necessary
22323 $block_type = ""; # blank for all tokens except code block braces
22324 $container_type = ""; # blank for all tokens except some parens
22325 $type_sequence = ""; # blank for all tokens except ?/:
22326 $prototype = ""; # blank for all tokens except user defined subs
22329 # this pre-token will start an output token
22330 push( @{$routput_token_list}, $i_tok );
22332 # continue gathering identifier if necessary
22333 # but do not start on blanks and comments
22334 if ( $id_scan_state && $pre_type !~ /[b#]/ ) {
22336 if ( $id_scan_state =~ /^(sub|package)/ ) {
22343 last if ($id_scan_state);
22344 next if ( ( $i > 0 ) || $type );
22346 # didn't find any token; start over
22351 # handle whitespace tokens..
22352 next if ( $type eq 'b' );
22353 my $prev_tok = $i > 0 ? $$rtokens[ $i - 1 ] : ' ';
22354 my $prev_type = $i > 0 ? $$rtoken_type[ $i - 1 ] : 'b';
22356 # Build larger tokens where possible, since we are not in a quote.
22358 # First try to assemble digraphs. The following tokens are
22359 # excluded and handled specially:
22360 # '/=' is excluded because the / might start a pattern.
22361 # 'x=' is excluded since it might be $x=, with $ on previous line
22362 # '**' and *= might be typeglobs of punctuation variables
22363 # I have allowed tokens starting with <, such as <=,
22364 # because I don't think these could be valid angle operators.
22365 # test file: storrs4.pl
22366 my $test_tok = $tok . $$rtokens[ $i + 1 ];
22367 my $combine_ok = $is_digraph{$test_tok};
22369 # check for special cases which cannot be combined
22372 # '//' must be defined_or operator if an operator is expected.
22373 # TODO: Code for other ambiguous digraphs (/=, x=, **, *=)
22374 # could be migrated here for clarity
22375 if ( $test_tok eq '//' ) {
22376 my $next_type = $$rtokens[ $i + 1 ];
22378 operator_expected( $prev_type, $tok, $next_type );
22379 $combine_ok = 0 unless ( $expecting == OPERATOR );
22385 && ( $test_tok ne '/=' ) # might be pattern
22386 && ( $test_tok ne 'x=' ) # might be $x
22387 && ( $test_tok ne '**' ) # typeglob?
22388 && ( $test_tok ne '*=' ) # typeglob?
22394 # Now try to assemble trigraphs. Note that all possible
22395 # perl trigraphs can be constructed by appending a character
22397 $test_tok = $tok . $$rtokens[ $i + 1 ];
22399 if ( $is_trigraph{$test_tok} ) {
22406 $next_tok = $$rtokens[ $i + 1 ];
22407 $next_type = $$rtoken_type[ $i + 1 ];
22409 TOKENIZER_DEBUG_FLAG_TOKENIZE && do {
22412 $last_nonblank_token, $tok,
22413 $next_tok, $brace_depth,
22414 $brace_type[$brace_depth], $paren_depth,
22415 $paren_type[$paren_depth]
22417 print "TOKENIZE:(@debug_list)\n";
22420 # turn off attribute list on first non-blank, non-bareword
22421 if ( $pre_type ne 'w' ) { $in_attribute_list = 0 }
22423 ###############################################################
22424 # We have the next token, $tok.
22425 # Now we have to examine this token and decide what it is
22426 # and define its $type
22428 # section 1: bare words
22429 ###############################################################
22431 if ( $pre_type eq 'w' ) {
22432 $expecting = operator_expected( $prev_type, $tok, $next_type );
22433 my ( $next_nonblank_token, $i_next ) =
22434 find_next_nonblank_token( $i, $rtokens, $max_token_index );
22436 # ATTRS: handle sub and variable attributes
22437 if ($in_attribute_list) {
22439 # treat bare word followed by open paren like qw(
22440 if ( $next_nonblank_token eq '(' ) {
22441 $in_quote = $quote_items{'q'};
22442 $allowed_quote_modifiers = $quote_modifiers{'q'};
22448 # handle bareword not followed by open paren
22455 # quote a word followed by => operator
22456 if ( $next_nonblank_token eq '=' ) {
22458 if ( $$rtokens[ $i_next + 1 ] eq '>' ) {
22459 if ( $is_constant{$current_package}{$tok} ) {
22462 elsif ( $is_user_function{$current_package}{$tok} ) {
22465 $user_function_prototype{$current_package}{$tok};
22467 elsif ( $tok =~ /^v\d+$/ ) {
22469 report_v_string($tok);
22471 else { $type = 'w' }
22477 # quote a bare word within braces..like xxx->{s}; note that we
22478 # must be sure this is not a structural brace, to avoid
22479 # mistaking {s} in the following for a quoted bare word:
22480 # for(@[){s}bla}BLA}
22481 if ( ( $last_nonblank_type eq 'L' )
22482 && ( $next_nonblank_token eq '}' ) )
22488 # a bare word immediately followed by :: is not a keyword;
22489 # use $tok_kw when testing for keywords to avoid a mistake
22491 if ( $$rtokens[ $i + 1 ] eq ':' && $$rtokens[ $i + 2 ] eq ':' )
22496 # handle operator x (now we know it isn't $x=)
22497 if ( ( $tok =~ /^x\d*$/ ) && ( $expecting == OPERATOR ) ) {
22498 if ( $tok eq 'x' ) {
22500 if ( $$rtokens[ $i + 1 ] eq '=' ) { # x=
22510 # FIXME: Patch: mark something like x4 as an integer for now
22511 # It gets fixed downstream. This is easier than
22512 # splitting the pretoken.
22518 elsif ( ( $tok eq 'strict' )
22519 and ( $last_nonblank_token eq 'use' ) )
22521 $tokenizer_self->{_saw_use_strict} = 1;
22522 scan_bare_identifier();
22525 elsif ( ( $tok eq 'warnings' )
22526 and ( $last_nonblank_token eq 'use' ) )
22528 $tokenizer_self->{_saw_perl_dash_w} = 1;
22530 # scan as identifier, so that we pick up something like:
22531 # use warnings::register
22532 scan_bare_identifier();
22536 $tok eq 'AutoLoader'
22537 && $tokenizer_self->{_look_for_autoloader}
22539 $last_nonblank_token eq 'use'
22541 # these regexes are from AutoSplit.pm, which we want
22543 || $input_line =~ /^\s*(use|require)\s+AutoLoader\b/
22544 || $input_line =~ /\bISA\s*=.*\bAutoLoader\b/
22548 write_logfile_entry("AutoLoader seen, -nlal deactivates\n");
22549 $tokenizer_self->{_saw_autoloader} = 1;
22550 $tokenizer_self->{_look_for_autoloader} = 0;
22551 scan_bare_identifier();
22555 $tok eq 'SelfLoader'
22556 && $tokenizer_self->{_look_for_selfloader}
22557 && ( $last_nonblank_token eq 'use'
22558 || $input_line =~ /^\s*(use|require)\s+SelfLoader\b/
22559 || $input_line =~ /\bISA\s*=.*\bSelfLoader\b/ )
22562 write_logfile_entry("SelfLoader seen, -nlsl deactivates\n");
22563 $tokenizer_self->{_saw_selfloader} = 1;
22564 $tokenizer_self->{_look_for_selfloader} = 0;
22565 scan_bare_identifier();
22568 elsif ( ( $tok eq 'constant' )
22569 and ( $last_nonblank_token eq 'use' ) )
22571 scan_bare_identifier();
22572 my ( $next_nonblank_token, $i_next ) =
22573 find_next_nonblank_token( $i, $rtokens,
22574 $max_token_index );
22576 if ($next_nonblank_token) {
22578 if ( $is_keyword{$next_nonblank_token} ) {
22580 "Attempting to define constant '$next_nonblank_token' which is a perl keyword\n"
22584 # FIXME: could check for error in which next token is
22585 # not a word (number, punctuation, ..)
22587 $is_constant{$current_package}
22588 {$next_nonblank_token} = 1;
22593 # various quote operators
22594 elsif ( $is_q_qq_qw_qx_qr_s_y_tr_m{$tok} ) {
22595 if ( $expecting == OPERATOR ) {
22597 # patch for paren-less for/foreach glitch, part 1
22598 # perl will accept this construct as valid:
22600 # foreach my $key qw\Uno Due Tres Quadro\ {
22601 # print "Set $key\n";
22603 unless ( $tok eq 'qw' && $is_for_foreach{$want_paren} )
22605 error_if_expecting_OPERATOR();
22608 $in_quote = $quote_items{$tok};
22609 $allowed_quote_modifiers = $quote_modifiers{$tok};
22611 # All quote types are 'Q' except possibly qw quotes.
22612 # qw quotes are special in that they may generally be trimmed
22613 # of leading and trailing whitespace. So they are given a
22614 # separate type, 'q', unless requested otherwise.
22616 ( $tok eq 'qw' && $tokenizer_self->{_trim_qw} )
22619 $quote_type = $type;
22622 # check for a statement label
22624 ( $next_nonblank_token eq ':' )
22625 && ( $$rtokens[ $i_next + 1 ] ne ':' )
22626 && ( $i_next <= $max_token_index ) # colon on same line
22630 if ( $tok !~ /A-Z/ ) {
22631 push @{ $tokenizer_self->{_rlower_case_labels_at} },
22632 $input_line_number;
22640 # 'sub' || 'package'
22641 elsif ( $is_sub_package{$tok_kw} ) {
22642 error_if_expecting_OPERATOR()
22643 if ( $expecting == OPERATOR );
22647 # Note on token types for format, __DATA__, __END__:
22648 # It simplifies things to give these type ';', so that when we
22649 # start rescanning we will be expecting a token of type TERM.
22650 # We will switch to type 'k' before outputting the tokens.
22651 elsif ( $is_format_END_DATA{$tok_kw} ) {
22652 $type = ';'; # make tokenizer look for TERM next
22653 $tokenizer_self->{ $is_format_END_DATA{$tok_kw} } = 1;
22657 elsif ( $is_keyword{$tok_kw} ) {
22660 # Since for and foreach may not be followed immediately
22661 # by an opening paren, we have to remember which keyword
22662 # is associated with the next '('
22663 if ( $is_for_foreach{$tok} ) {
22664 if ( new_statement_ok() ) {
22665 $want_paren = $tok;
22669 # recognize 'use' statements, which are special
22670 elsif ( $is_use_require{$tok} ) {
22671 $statement_type = $tok;
22672 error_if_expecting_OPERATOR()
22673 if ( $expecting == OPERATOR );
22676 # remember my and our to check for trailing ": shared"
22677 elsif ( $is_my_our{$tok} ) {
22678 $statement_type = $tok;
22681 # Check for misplaced 'elsif' and 'else', but allow isolated
22682 # else or elsif blocks to be formatted. This is indicated
22683 # by a last noblank token of ';'
22684 elsif ( $tok eq 'elsif' ) {
22685 if ( $last_nonblank_token ne ';'
22686 && $last_nonblank_block_type !~
22687 /^(if|elsif|unless)$/ )
22690 "expecting '$tok' to follow one of 'if|elsif|unless'\n"
22694 elsif ( $tok eq 'else' ) {
22696 # patched for SWITCH/CASE
22697 if ( $last_nonblank_token ne ';'
22698 && $last_nonblank_block_type !~
22699 /^(if|elsif|unless|case|when)$/ )
22702 "expecting '$tok' to follow one of 'if|elsif|unless|case|when'\n"
22706 elsif ( $tok eq 'continue' ) {
22707 if ( $last_nonblank_token ne ';'
22708 && $last_nonblank_block_type !~
22709 /(^(\{|\}|;|while|until|for|foreach)|:$)/ )
22712 # note: ';' '{' and '}' in list above
22713 # because continues can follow bare blocks;
22714 # ':' is labeled block
22715 warning("'$tok' should follow a block\n");
22719 # patch for SWITCH/CASE if 'case' and 'when are
22720 # treated as keywords.
22721 elsif ( $tok eq 'when' || $tok eq 'case' ) {
22722 $statement_type = $tok; # next '{' is block
22726 # check for inline label following
22727 # /^(redo|last|next|goto)$/
22728 elsif (( $last_nonblank_type eq 'k' )
22729 && ( $is_redo_last_next_goto{$last_nonblank_token} ) )
22735 # something else --
22738 scan_bare_identifier();
22739 if ( $type eq 'w' ) {
22741 if ( $expecting == OPERATOR ) {
22743 # don't complain about possible indirect object
22747 # sub new($) { ... }
22748 # $b = new A::; # calls A::new
22749 # $c = new A; # same thing but suspicious
22750 # This will call A::new but we have a 'new' in
22751 # main:: which looks like a constant.
22753 if ( $last_nonblank_type eq 'C' ) {
22754 if ( $tok !~ /::$/ ) {
22756 Expecting operator after '$last_nonblank_token' but found bare word '$tok'
22757 Maybe indirectet object notation?
22762 error_if_expecting_OPERATOR("bareword");
22766 # mark bare words immediately followed by a paren as
22768 $next_tok = $$rtokens[ $i + 1 ];
22769 if ( $next_tok eq '(' ) {
22773 # underscore after file test operator is file handle
22774 if ( $tok eq '_' && $last_nonblank_type eq 'F' ) {
22778 # patch for SWITCH/CASE if 'case' and 'when are
22779 # not treated as keywords:
22783 && $brace_type[$brace_depth] eq 'switch'
22785 || ( $tok eq 'when'
22786 && $brace_type[$brace_depth] eq 'given' )
22789 $statement_type = $tok; # next '{' is block
22790 $type = 'k'; # for keyword syntax coloring
22793 # patch for SWITCH/CASE if switch and given not keywords
22794 # Switch is not a perl 5 keyword, but we will gamble
22795 # and mark switch followed by paren as a keyword. This
22796 # is only necessary to get html syntax coloring nice,
22797 # and does not commit this as being a switch/case.
22798 if ( $next_nonblank_token eq '('
22799 && ( $tok eq 'switch' || $tok eq 'given' ) )
22801 $type = 'k'; # for keyword syntax coloring
22807 ###############################################################
22808 # section 2: strings of digits
22809 ###############################################################
22810 elsif ( $pre_type eq 'd' ) {
22811 $expecting = operator_expected( $prev_type, $tok, $next_type );
22812 error_if_expecting_OPERATOR("Number")
22813 if ( $expecting == OPERATOR );
22814 my $number = scan_number();
22815 if ( !defined($number) ) {
22817 # shouldn't happen - we should always get a number
22818 warning("non-number beginning with digit--program bug\n");
22819 report_definite_bug();
22823 ###############################################################
22824 # section 3: all other tokens
22825 ###############################################################
22828 last if ( $tok eq '#' );
22829 my $code = $tokenization_code->{$tok};
22832 operator_expected( $prev_type, $tok, $next_type );
22839 # -----------------------------
22840 # end of main tokenization loop
22841 # -----------------------------
22843 if ( $i_tok >= 0 ) {
22844 $routput_token_type->[$i_tok] = $type;
22845 $routput_block_type->[$i_tok] = $block_type;
22846 $routput_container_type->[$i_tok] = $container_type;
22847 $routput_type_sequence->[$i_tok] = $type_sequence;
22850 unless ( ( $type eq 'b' ) || ( $type eq '#' ) ) {
22851 $last_last_nonblank_token = $last_nonblank_token;
22852 $last_last_nonblank_type = $last_nonblank_type;
22853 $last_last_nonblank_block_type = $last_nonblank_block_type;
22854 $last_last_nonblank_container_type = $last_nonblank_container_type;
22855 $last_last_nonblank_type_sequence = $last_nonblank_type_sequence;
22856 $last_nonblank_token = $tok;
22857 $last_nonblank_type = $type;
22858 $last_nonblank_block_type = $block_type;
22859 $last_nonblank_container_type = $container_type;
22860 $last_nonblank_type_sequence = $type_sequence;
22861 $last_nonblank_prototype = $prototype;
22864 # reset indentation level if necessary at a sub or package
22865 # in an attempt to recover from a nesting error
22866 if ( $level_in_tokenizer < 0 ) {
22867 if ( $input_line =~ /^\s*(sub|package)\s+(\w+)/ ) {
22868 reset_indentation_level(0);
22869 brace_warning("resetting level to 0 at $1 $2\n");
22873 # all done tokenizing this line ...
22874 # now prepare the final list of tokens and types
22876 my @token_type = (); # stack of output token types
22877 my @block_type = (); # stack of output code block types
22878 my @container_type = (); # stack of output code container types
22879 my @type_sequence = (); # stack of output type sequence numbers
22880 my @tokens = (); # output tokens
22881 my @levels = (); # structural brace levels of output tokens
22882 my @slevels = (); # secondary nesting levels of output tokens
22883 my @nesting_tokens = (); # string of tokens leading to this depth
22884 my @nesting_types = (); # string of token types leading to this depth
22885 my @nesting_blocks = (); # string of block types leading to this depth
22886 my @nesting_lists = (); # string of list types leading to this depth
22887 my @ci_string = (); # string needed to compute continuation indentation
22888 my @container_environment = (); # BLOCK or LIST
22889 my $container_environment = '';
22890 my $im = -1; # previous $i value
22892 my $ci_string_sum = ones_count($ci_string_in_tokenizer);
22894 # Computing Token Indentation
22896 # The final section of the tokenizer forms tokens and also computes
22897 # parameters needed to find indentation. It is much easier to do it
22898 # in the tokenizer than elsewhere. Here is a brief description of how
22899 # indentation is computed. Perl::Tidy computes indentation as the sum
22902 # (1) structural indentation, such as if/else/elsif blocks
22903 # (2) continuation indentation, such as long parameter call lists.
22905 # These are occasionally called primary and secondary indentation.
22907 # Structural indentation is introduced by tokens of type '{', although
22908 # the actual tokens might be '{', '(', or '['. Structural indentation
22909 # is of two types: BLOCK and non-BLOCK. Default structural indentation
22910 # is 4 characters if the standard indentation scheme is used.
22912 # Continuation indentation is introduced whenever a line at BLOCK level
22913 # is broken before its termination. Default continuation indentation
22914 # is 2 characters in the standard indentation scheme.
22916 # Both types of indentation may be nested arbitrarily deep and
22917 # interlaced. The distinction between the two is somewhat arbitrary.
22919 # For each token, we will define two variables which would apply if
22920 # the current statement were broken just before that token, so that
22921 # that token started a new line:
22923 # $level = the structural indentation level,
22924 # $ci_level = the continuation indentation level
22926 # The total indentation will be $level * (4 spaces) + $ci_level * (2 spaces),
22927 # assuming defaults. However, in some special cases it is customary
22928 # to modify $ci_level from this strict value.
22930 # The total structural indentation is easy to compute by adding and
22931 # subtracting 1 from a saved value as types '{' and '}' are seen. The
22932 # running value of this variable is $level_in_tokenizer.
22934 # The total continuation is much more difficult to compute, and requires
22935 # several variables. These veriables are:
22937 # $ci_string_in_tokenizer = a string of 1's and 0's indicating, for
22938 # each indentation level, if there are intervening open secondary
22939 # structures just prior to that level.
22940 # $continuation_string_in_tokenizer = a string of 1's and 0's indicating
22941 # if the last token at that level is "continued", meaning that it
22942 # is not the first token of an expression.
22943 # $nesting_block_string = a string of 1's and 0's indicating, for each
22944 # indentation level, if the level is of type BLOCK or not.
22945 # $nesting_block_flag = the most recent 1 or 0 of $nesting_block_string
22946 # $nesting_list_string = a string of 1's and 0's indicating, for each
22947 # indentation level, if it is is appropriate for list formatting.
22948 # If so, continuation indentation is used to indent long list items.
22949 # $nesting_list_flag = the most recent 1 or 0 of $nesting_list_string
22950 # @{$rslevel_stack} = a stack of total nesting depths at each
22951 # structural indentation level, where "total nesting depth" means
22952 # the nesting depth that would occur if every nesting token -- '{', '[',
22953 # and '(' -- , regardless of context, is used to compute a nesting
22956 #my $nesting_block_flag = ($nesting_block_string =~ /1$/);
22957 #my $nesting_list_flag = ($nesting_list_string =~ /1$/);
22959 my ( $ci_string_i, $level_i, $nesting_block_string_i,
22960 $nesting_list_string_i, $nesting_token_string_i,
22961 $nesting_type_string_i, );
22963 foreach $i ( @{$routput_token_list} )
22964 { # scan the list of pre-tokens indexes
22966 # self-checking for valid token types
22967 my $type = $routput_token_type->[$i];
22968 my $tok = $$rtokens[$i]; # the token, but ONLY if same as pretoken
22969 $level_i = $level_in_tokenizer;
22971 # This can happen by running perltidy on non-scripts
22972 # although it could also be bug introduced by programming change.
22973 # Perl silently accepts a 032 (^Z) and takes it as the end
22974 if ( !$is_valid_token_type{$type} ) {
22975 my $val = ord($type);
22977 "unexpected character decimal $val ($type) in script\n");
22978 $tokenizer_self->{_in_error} = 1;
22981 # ----------------------------------------------------------------
22982 # TOKEN TYPE PATCHES
22983 # output __END__, __DATA__, and format as type 'k' instead of ';'
22984 # to make html colors correct, etc.
22985 my $fix_type = $type;
22986 if ( $type eq ';' && $tok =~ /\w/ ) { $fix_type = 'k' }
22988 # output anonymous 'sub' as keyword
22989 if ( $type eq 't' && $tok eq 'sub' ) { $fix_type = 'k' }
22991 # -----------------------------------------------------------------
22993 $nesting_token_string_i = $nesting_token_string;
22994 $nesting_type_string_i = $nesting_type_string;
22995 $nesting_block_string_i = $nesting_block_string;
22996 $nesting_list_string_i = $nesting_list_string;
22998 # set primary indentation levels based on structural braces
22999 # Note: these are set so that the leading braces have a HIGHER
23000 # level than their CONTENTS, which is convenient for indentation
23001 # Also, define continuation indentation for each token.
23002 if ( $type eq '{' || $type eq 'L' ) {
23004 # use environment before updating
23005 $container_environment =
23006 $nesting_block_flag ? 'BLOCK'
23007 : $nesting_list_flag ? 'LIST'
23010 # if the difference between total nesting levels is not 1,
23011 # there are intervening non-structural nesting types between
23012 # this '{' and the previous unclosed '{'
23013 my $intervening_secondary_structure = 0;
23014 if ( @{$rslevel_stack} ) {
23015 $intervening_secondary_structure =
23016 $slevel_in_tokenizer - $rslevel_stack->[-1];
23019 # Continuation Indentation
23021 # Having tried setting continuation indentation both in the formatter and
23022 # in the tokenizer, I can say that setting it in the tokenizer is much,
23023 # much easier. The formatter already has too much to do, and can't
23024 # make decisions on line breaks without knowing what 'ci' will be at
23025 # arbitrary locations.
23027 # But a problem with setting the continuation indentation (ci) here
23028 # in the tokenizer is that we do not know where line breaks will actually
23029 # be. As a result, we don't know if we should propagate continuation
23030 # indentation to higher levels of structure.
23032 # For nesting of only structural indentation, we never need to do this.
23033 # For example, in a long if statement, like this
23035 # if ( !$output_block_type[$i]
23036 # && ($in_statement_continuation) )
23041 # the second line has ci but we do normally give the lines within the BLOCK
23042 # any ci. This would be true if we had blocks nested arbitrarily deeply.
23044 # But consider something like this, where we have created a break after
23045 # an opening paren on line 1, and the paren is not (currently) a
23046 # structural indentation token:
23048 # my $file = $menubar->Menubutton(
23049 # qw/-text File -underline 0 -menuitems/ => [
23051 # Cascade => '~View',
23055 # The second line has ci, so it would seem reasonable to propagate it
23056 # down, giving the third line 1 ci + 1 indentation. This suggests the
23057 # following rule, which is currently used to propagating ci down: if there
23058 # are any non-structural opening parens (or brackets, or braces), before
23059 # an opening structural brace, then ci is propagated down, and otherwise
23060 # not. The variable $intervening_secondary_structure contains this
23061 # information for the current token, and the string
23062 # "$ci_string_in_tokenizer" is a stack of previous values of this
23065 # save the current states
23066 push( @{$rslevel_stack}, 1 + $slevel_in_tokenizer );
23067 $level_in_tokenizer++;
23069 if ( $routput_block_type->[$i] ) {
23070 $nesting_block_flag = 1;
23071 $nesting_block_string .= '1';
23074 $nesting_block_flag = 0;
23075 $nesting_block_string .= '0';
23078 # we will use continuation indentation within containers
23079 # which are not blocks and not logical expressions
23081 if ( !$routput_block_type->[$i] ) {
23083 # propagate flag down at nested open parens
23084 if ( $routput_container_type->[$i] eq '(' ) {
23085 $bit = 1 if $nesting_list_flag;
23088 # use list continuation if not a logical grouping
23089 # /^(if|elsif|unless|while|and|or|not|&&|!|\|\||for|foreach)$/
23093 $is_logical_container{ $routput_container_type->[$i]
23097 $nesting_list_string .= $bit;
23098 $nesting_list_flag = $bit;
23100 $ci_string_in_tokenizer .=
23101 ( $intervening_secondary_structure != 0 ) ? '1' : '0';
23102 $ci_string_sum = ones_count($ci_string_in_tokenizer);
23103 $continuation_string_in_tokenizer .=
23104 ( $in_statement_continuation > 0 ) ? '1' : '0';
23106 # Sometimes we want to give an opening brace continuation indentation,
23107 # and sometimes not. For code blocks, we don't do it, so that the leading
23108 # '{' gets outdented, like this:
23110 # if ( !$output_block_type[$i]
23111 # && ($in_statement_continuation) )
23114 # For other types, we will give them continuation indentation. For example,
23115 # here is how a list looks with the opening paren indented:
23118 # ( [ "fred", "barney" ], [ "george", "jane", "elroy" ],
23119 # [ "homer", "marge", "bart" ], );
23121 # This looks best when 'ci' is one-half of the indentation (i.e., 2 and 4)
23123 my $total_ci = $ci_string_sum;
23125 !$routput_block_type->[$i] # patch: skip for BLOCK
23126 && ($in_statement_continuation)
23129 $total_ci += $in_statement_continuation
23130 unless ( $ci_string_in_tokenizer =~ /1$/ );
23133 $ci_string_i = $total_ci;
23134 $in_statement_continuation = 0;
23137 elsif ( $type eq '}' || $type eq 'R' ) {
23139 # only a nesting error in the script would prevent popping here
23140 if ( @{$rslevel_stack} > 1 ) { pop( @{$rslevel_stack} ); }
23142 $level_i = --$level_in_tokenizer;
23144 # restore previous level values
23145 if ( length($nesting_block_string) > 1 )
23146 { # true for valid script
23147 chop $nesting_block_string;
23148 $nesting_block_flag = ( $nesting_block_string =~ /1$/ );
23149 chop $nesting_list_string;
23150 $nesting_list_flag = ( $nesting_list_string =~ /1$/ );
23152 chop $ci_string_in_tokenizer;
23153 $ci_string_sum = ones_count($ci_string_in_tokenizer);
23155 $in_statement_continuation =
23156 chop $continuation_string_in_tokenizer;
23158 # zero continuation flag at terminal BLOCK '}' which
23159 # ends a statement.
23160 if ( $routput_block_type->[$i] ) {
23162 # ...These include non-anonymous subs
23163 # note: could be sub ::abc { or sub 'abc
23164 if ( $routput_block_type->[$i] =~ m/^sub\s*/gc ) {
23166 # note: older versions of perl require the /gc modifier
23167 # here or else the \G does not work.
23168 if ( $routput_block_type->[$i] =~ /\G('|::|\w)/gc )
23170 $in_statement_continuation = 0;
23174 # ...and include all block types except user subs with
23175 # block prototypes and these: (sort|grep|map|do|eval)
23176 # /^(\}|\{|BEGIN|END|CHECK|INIT|AUTOLOAD|DESTROY|continue|;|if|elsif|else|unless|while|until|for|foreach)$/
23178 $is_zero_continuation_block_type{
23179 $routput_block_type->[$i] } )
23181 $in_statement_continuation = 0;
23184 # ..but these are not terminal types:
23185 # /^(sort|grep|map|do|eval)$/ )
23187 $is_not_zero_continuation_block_type{
23188 $routput_block_type->[$i] } )
23192 # ..and a block introduced by a label
23193 # /^\w+\s*:$/gc ) {
23194 elsif ( $routput_block_type->[$i] =~ /:$/ ) {
23195 $in_statement_continuation = 0;
23198 # user function with block prototype
23200 $in_statement_continuation = 0;
23204 # If we are in a list, then
23205 # we must set continuatoin indentation at the closing
23206 # paren of something like this (paren after $check):
23209 # ( not defined $check )
23211 # or $check eq "new"
23212 # or $check eq "old",
23214 elsif ( $tok eq ')' ) {
23215 $in_statement_continuation = 1
23216 if $routput_container_type->[$i] =~ /^[;,\{\}]$/;
23220 # use environment after updating
23221 $container_environment =
23222 $nesting_block_flag ? 'BLOCK'
23223 : $nesting_list_flag ? 'LIST'
23225 $ci_string_i = $ci_string_sum + $in_statement_continuation;
23226 $nesting_block_string_i = $nesting_block_string;
23227 $nesting_list_string_i = $nesting_list_string;
23230 # not a structural indentation type..
23233 $container_environment =
23234 $nesting_block_flag ? 'BLOCK'
23235 : $nesting_list_flag ? 'LIST'
23238 # zero the continuation indentation at certain tokens so
23239 # that they will be at the same level as its container. For
23240 # commas, this simplifies the -lp indentation logic, which
23241 # counts commas. For ?: it makes them stand out.
23242 if ($nesting_list_flag) {
23243 if ( $type =~ /^[,\?\:]$/ ) {
23244 $in_statement_continuation = 0;
23248 # be sure binary operators get continuation indentation
23250 $container_environment
23251 && ( $type eq 'k' && $is_binary_keyword{$tok}
23252 || $is_binary_type{$type} )
23255 $in_statement_continuation = 1;
23258 # continuation indentation is sum of any open ci from previous
23259 # levels plus the current level
23260 $ci_string_i = $ci_string_sum + $in_statement_continuation;
23262 # update continuation flag ...
23263 # if this isn't a blank or comment..
23264 if ( $type ne 'b' && $type ne '#' ) {
23266 # and we are in a BLOCK
23267 if ($nesting_block_flag) {
23269 # the next token after a ';' and label starts a new stmt
23270 if ( $type eq ';' || $type eq 'J' ) {
23271 $in_statement_continuation = 0;
23274 # otherwise, we are continuing the current statement
23276 $in_statement_continuation = 1;
23280 # if we are not in a BLOCK..
23283 # do not use continuation indentation if not list
23284 # environment (could be within if/elsif clause)
23285 if ( !$nesting_list_flag ) {
23286 $in_statement_continuation = 0;
23289 # otherwise, the next token after a ',' starts a new term
23290 elsif ( $type eq ',' ) {
23291 $in_statement_continuation = 0;
23294 # otherwise, we are continuing the current term
23296 $in_statement_continuation = 1;
23302 if ( $level_in_tokenizer < 0 ) {
23303 unless ( $tokenizer_self->{_saw_negative_indentation} ) {
23304 $tokenizer_self->{_saw_negative_indentation} = 1;
23305 warning("Starting negative indentation\n");
23309 # set secondary nesting levels based on all continment token types
23310 # Note: these are set so that the nesting depth is the depth
23311 # of the PREVIOUS TOKEN, which is convenient for setting
23312 # the stength of token bonds
23313 my $slevel_i = $slevel_in_tokenizer;
23316 if ( $is_opening_type{$type} ) {
23317 $slevel_in_tokenizer++;
23318 $nesting_token_string .= $tok;
23319 $nesting_type_string .= $type;
23323 elsif ( $is_closing_type{$type} ) {
23324 $slevel_in_tokenizer--;
23325 my $char = chop $nesting_token_string;
23327 if ( $char ne $matching_start_token{$tok} ) {
23328 $nesting_token_string .= $char . $tok;
23329 $nesting_type_string .= $type;
23332 chop $nesting_type_string;
23336 push( @block_type, $routput_block_type->[$i] );
23337 push( @ci_string, $ci_string_i );
23338 push( @container_environment, $container_environment );
23339 push( @container_type, $routput_container_type->[$i] );
23340 push( @levels, $level_i );
23341 push( @nesting_tokens, $nesting_token_string_i );
23342 push( @nesting_types, $nesting_type_string_i );
23343 push( @slevels, $slevel_i );
23344 push( @token_type, $fix_type );
23345 push( @type_sequence, $routput_type_sequence->[$i] );
23346 push( @nesting_blocks, $nesting_block_string );
23347 push( @nesting_lists, $nesting_list_string );
23349 # now form the previous token
23352 $$rtoken_map[$i] - $$rtoken_map[$im]; # how many characters
23356 substr( $input_line, $$rtoken_map[$im], $num ) );
23362 $num = length($input_line) - $$rtoken_map[$im]; # make the last token
23364 push( @tokens, substr( $input_line, $$rtoken_map[$im], $num ) );
23367 $tokenizer_self->{_in_attribute_list} = $in_attribute_list;
23368 $tokenizer_self->{_in_quote} = $in_quote;
23369 $tokenizer_self->{_quote_target} =
23370 $in_quote ? matching_end_token($quote_character) : "";
23371 $tokenizer_self->{_rhere_target_list} = $rhere_target_list;
23373 $line_of_tokens->{_rtoken_type} = \@token_type;
23374 $line_of_tokens->{_rtokens} = \@tokens;
23375 $line_of_tokens->{_rblock_type} = \@block_type;
23376 $line_of_tokens->{_rcontainer_type} = \@container_type;
23377 $line_of_tokens->{_rcontainer_environment} = \@container_environment;
23378 $line_of_tokens->{_rtype_sequence} = \@type_sequence;
23379 $line_of_tokens->{_rlevels} = \@levels;
23380 $line_of_tokens->{_rslevels} = \@slevels;
23381 $line_of_tokens->{_rnesting_tokens} = \@nesting_tokens;
23382 $line_of_tokens->{_rci_levels} = \@ci_string;
23383 $line_of_tokens->{_rnesting_blocks} = \@nesting_blocks;
23387 } # end tokenize_this_line
23389 #########i#############################################################
23390 # Tokenizer routines which assist in identifying token types
23391 #######################################################################
23393 sub operator_expected {
23395 # Many perl symbols have two or more meanings. For example, '<<'
23396 # can be a shift operator or a here-doc operator. The
23397 # interpretation of these symbols depends on the current state of
23398 # the tokenizer, which may either be expecting a term or an
23399 # operator. For this example, a << would be a shift if an operator
23400 # is expected, and a here-doc if a term is expected. This routine
23401 # is called to make this decision for any current token. It returns
23402 # one of three possible values:
23404 # OPERATOR - operator expected (or at least, not a term)
23405 # UNKNOWN - can't tell
23406 # TERM - a term is expected (or at least, not an operator)
23408 # The decision is based on what has been seen so far. This
23409 # information is stored in the "$last_nonblank_type" and
23410 # "$last_nonblank_token" variables. For example, if the
23411 # $last_nonblank_type is '=~', then we are expecting a TERM, whereas
23412 # if $last_nonblank_type is 'n' (numeric), we are expecting an
23415 # If a UNKNOWN is returned, the calling routine must guess. A major
23416 # goal of this tokenizer is to minimize the possiblity of returning
23417 # UNKNOWN, because a wrong guess can spoil the formatting of a
23420 # adding NEW_TOKENS: it is critically important that this routine be
23421 # updated to allow it to determine if an operator or term is to be
23422 # expected after the new token. Doing this simply involves adding
23423 # the new token character to one of the regexes in this routine or
23424 # to one of the hash lists
23425 # that it uses, which are initialized in the BEGIN section.
23426 # USES GLOBAL VARIABLES: $last_nonblank_type, $last_nonblank_token,
23429 my ( $prev_type, $tok, $next_type ) = @_;
23431 my $op_expected = UNKNOWN;
23433 #print "tok=$tok last type=$last_nonblank_type last tok=$last_nonblank_token\n";
23435 # Note: function prototype is available for token type 'U' for future
23436 # program development. It contains the leading and trailing parens,
23437 # and no blanks. It might be used to eliminate token type 'C', for
23438 # example (prototype = '()'). Thus:
23439 # if ($last_nonblank_type eq 'U') {
23440 # print "previous token=$last_nonblank_token type=$last_nonblank_type prototype=$last_nonblank_prototype\n";
23443 # A possible filehandle (or object) requires some care...
23444 if ( $last_nonblank_type eq 'Z' ) {
23447 if ( $last_nonblank_token =~ /^[A-Za-z_]/ ) {
23448 $op_expected = UNKNOWN;
23451 # For possible file handle like "$a", Perl uses weird parsing rules.
23453 # print $a/2,"/hi"; - division
23454 # print $a / 2,"/hi"; - division
23455 # print $a/ 2,"/hi"; - division
23456 # print $a /2,"/hi"; - pattern (and error)!
23457 elsif ( ( $prev_type eq 'b' ) && ( $next_type ne 'b' ) ) {
23458 $op_expected = TERM;
23461 # Note when an operation is being done where a
23462 # filehandle might be expected, since a change in whitespace
23463 # could change the interpretation of the statement.
23465 if ( $tok =~ /^([x\/\+\-\*\%\&\.\?\<]|\>\>)$/ ) {
23466 complain("operator in print statement not recommended\n");
23467 $op_expected = OPERATOR;
23472 # handle something after 'do' and 'eval'
23473 elsif ( $is_block_operator{$last_nonblank_token} ) {
23475 # something like $a = eval "expression";
23477 if ( $last_nonblank_type eq 'k' ) {
23478 $op_expected = TERM; # expression or list mode following keyword
23481 # something like $a = do { BLOCK } / 2;
23484 $op_expected = OPERATOR; # block mode following }
23488 # handle bare word..
23489 elsif ( $last_nonblank_type eq 'w' ) {
23491 # unfortunately, we can't tell what type of token to expect next
23492 # after most bare words
23493 $op_expected = UNKNOWN;
23496 # operator, but not term possible after these types
23497 # Note: moved ')' from type to token because parens in list context
23498 # get marked as '{' '}' now. This is a minor glitch in the following:
23499 # my %opts = (ref $_[0] eq 'HASH') ? %{shift()} : ();
23501 elsif (( $last_nonblank_type =~ /^[\]RnviQh]$/ )
23502 || ( $last_nonblank_token =~ /^(\)|\$|\-\>)/ ) )
23504 $op_expected = OPERATOR;
23506 # in a 'use' statement, numbers and v-strings are not true
23507 # numbers, so to avoid incorrect error messages, we will
23508 # mark them as unknown for now (use.t)
23509 # TODO: it would be much nicer to create a new token V for VERSION
23510 # number in a use statement. Then this could be a check on type V
23511 # and related patches which change $statement_type for '=>'
23512 # and ',' could be removed. Further, it would clean things up to
23513 # scan the 'use' statement with a separate subroutine.
23514 if ( ( $statement_type eq 'use' )
23515 && ( $last_nonblank_type =~ /^[nv]$/ ) )
23517 $op_expected = UNKNOWN;
23521 # no operator after many keywords, such as "die", "warn", etc
23522 elsif ( $expecting_term_token{$last_nonblank_token} ) {
23524 # patch for dor.t (defined or).
23525 # perl functions which may be unary operators
23526 # TODO: This list is incomplete, and these should be put
23529 && $next_type eq '/'
23530 && $last_nonblank_type eq 'k'
23531 && $last_nonblank_token =~ /^eof|undef|shift|pop$/ )
23533 $op_expected = OPERATOR;
23536 $op_expected = TERM;
23540 # no operator after things like + - ** (i.e., other operators)
23541 elsif ( $expecting_term_types{$last_nonblank_type} ) {
23542 $op_expected = TERM;
23545 # a few operators, like "time", have an empty prototype () and so
23546 # take no parameters but produce a value to operate on
23547 elsif ( $expecting_operator_token{$last_nonblank_token} ) {
23548 $op_expected = OPERATOR;
23551 # post-increment and decrement produce values to be operated on
23552 elsif ( $expecting_operator_types{$last_nonblank_type} ) {
23553 $op_expected = OPERATOR;
23556 # no value to operate on after sub block
23557 elsif ( $last_nonblank_token =~ /^sub\s/ ) { $op_expected = TERM; }
23559 # a right brace here indicates the end of a simple block.
23560 # all non-structural right braces have type 'R'
23561 # all braces associated with block operator keywords have been given those
23562 # keywords as "last_nonblank_token" and caught above.
23563 # (This statement is order dependent, and must come after checking
23564 # $last_nonblank_token).
23565 elsif ( $last_nonblank_type eq '}' ) {
23567 # patch for dor.t (defined or).
23569 && $next_type eq '/'
23570 && $last_nonblank_token eq ']' )
23572 $op_expected = OPERATOR;
23575 $op_expected = TERM;
23579 # something else..what did I forget?
23582 # collecting diagnostics on unknown operator types..see what was missed
23583 $op_expected = UNKNOWN;
23585 "OP: unknown after type=$last_nonblank_type token=$last_nonblank_token\n"
23589 TOKENIZER_DEBUG_FLAG_EXPECT && do {
23591 "EXPECT: returns $op_expected for last type $last_nonblank_type token $last_nonblank_token\n";
23593 return $op_expected;
23596 sub new_statement_ok {
23598 # return true if the current token can start a new statement
23599 # USES GLOBAL VARIABLES: $last_nonblank_type
23601 return label_ok() # a label would be ok here
23603 || $last_nonblank_type eq 'J'; # or we follow a label
23609 # Decide if a bare word followed by a colon here is a label
23610 # USES GLOBAL VARIABLES: $last_nonblank_token, $last_nonblank_type,
23611 # $brace_depth, @brace_type
23613 # if it follows an opening or closing code block curly brace..
23614 if ( ( $last_nonblank_token eq '{' || $last_nonblank_token eq '}' )
23615 && $last_nonblank_type eq $last_nonblank_token )
23618 # it is a label if and only if the curly encloses a code block
23619 return $brace_type[$brace_depth];
23622 # otherwise, it is a label if and only if it follows a ';'
23625 return ( $last_nonblank_type eq ';' );
23629 sub code_block_type {
23631 # Decide if this is a block of code, and its type.
23632 # Must be called only when $type = $token = '{'
23633 # The problem is to distinguish between the start of a block of code
23634 # and the start of an anonymous hash reference
23635 # Returns "" if not code block, otherwise returns 'last_nonblank_token'
23636 # to indicate the type of code block. (For example, 'last_nonblank_token'
23637 # might be 'if' for an if block, 'else' for an else block, etc).
23638 # USES GLOBAL VARIABLES: $last_nonblank_token, $last_nonblank_type,
23639 # $last_nonblank_block_type, $brace_depth, @brace_type
23641 # handle case of multiple '{'s
23643 # print "BLOCK_TYPE EXAMINING: type=$last_nonblank_type tok=$last_nonblank_token\n";
23645 my ( $i, $rtokens, $rtoken_type, $max_token_index ) = @_;
23646 if ( $last_nonblank_token eq '{'
23647 && $last_nonblank_type eq $last_nonblank_token )
23650 # opening brace where a statement may appear is probably
23651 # a code block but might be and anonymous hash reference
23652 if ( $brace_type[$brace_depth] ) {
23653 return decide_if_code_block( $i, $rtokens, $rtoken_type,
23654 $max_token_index );
23657 # cannot start a code block within an anonymous hash
23663 elsif ( $last_nonblank_token eq ';' ) {
23665 # an opening brace where a statement may appear is probably
23666 # a code block but might be and anonymous hash reference
23667 return decide_if_code_block( $i, $rtokens, $rtoken_type,
23668 $max_token_index );
23671 # handle case of '}{'
23672 elsif ($last_nonblank_token eq '}'
23673 && $last_nonblank_type eq $last_nonblank_token )
23676 # a } { situation ...
23677 # could be hash reference after code block..(blktype1.t)
23678 if ($last_nonblank_block_type) {
23679 return decide_if_code_block( $i, $rtokens, $rtoken_type,
23680 $max_token_index );
23683 # must be a block if it follows a closing hash reference
23685 return $last_nonblank_token;
23689 # NOTE: braces after type characters start code blocks, but for
23690 # simplicity these are not identified as such. See also
23691 # sub is_non_structural_brace.
23692 # elsif ( $last_nonblank_type eq 't' ) {
23693 # return $last_nonblank_token;
23696 # brace after label:
23697 elsif ( $last_nonblank_type eq 'J' ) {
23698 return $last_nonblank_token;
23701 # otherwise, look at previous token. This must be a code block if
23702 # it follows any of these:
23703 # /^(BEGIN|END|CHECK|INIT|AUTOLOAD|DESTROY|continue|if|elsif|else|unless|do|while|until|eval|for|foreach|map|grep|sort)$/
23704 elsif ( $is_code_block_token{$last_nonblank_token} ) {
23705 return $last_nonblank_token;
23708 # or a sub definition
23709 elsif ( ( $last_nonblank_type eq 'i' || $last_nonblank_type eq 't' )
23710 && $last_nonblank_token =~ /^sub\b/ )
23712 return $last_nonblank_token;
23715 # user-defined subs with block parameters (like grep/map/eval)
23716 elsif ( $last_nonblank_type eq 'G' ) {
23717 return $last_nonblank_token;
23721 elsif ( $last_nonblank_type eq 'w' ) {
23722 return decide_if_code_block( $i, $rtokens, $rtoken_type,
23723 $max_token_index );
23726 # anything else must be anonymous hash reference
23732 sub decide_if_code_block {
23734 # USES GLOBAL VARIABLES: $last_nonblank_token
23735 my ( $i, $rtokens, $rtoken_type, $max_token_index ) = @_;
23736 my ( $next_nonblank_token, $i_next ) =
23737 find_next_nonblank_token( $i, $rtokens, $max_token_index );
23739 # we are at a '{' where a statement may appear.
23740 # We must decide if this brace starts an anonymous hash or a code
23742 # return "" if anonymous hash, and $last_nonblank_token otherwise
23744 # initialize to be code BLOCK
23745 my $code_block_type = $last_nonblank_token;
23747 # Check for the common case of an empty anonymous hash reference:
23748 # Maybe something like sub { { } }
23749 if ( $next_nonblank_token eq '}' ) {
23750 $code_block_type = "";
23755 # To guess if this '{' is an anonymous hash reference, look ahead
23756 # and test as follows:
23758 # it is a hash reference if next come:
23759 # - a string or digit followed by a comma or =>
23760 # - bareword followed by =>
23761 # otherwise it is a code block
23763 # Examples of anonymous hash ref:
23767 # Examples of code blocks:
23768 # {1; print "hello\n", 1;}
23771 # We are only going to look ahead one more (nonblank/comment) line.
23772 # Strange formatting could cause a bad guess, but that's unlikely.
23773 my @pre_types = @$rtoken_type[ $i + 1 .. $max_token_index ];
23774 my @pre_tokens = @$rtokens[ $i + 1 .. $max_token_index ];
23775 my ( $rpre_tokens, $rpre_types ) =
23776 peek_ahead_for_n_nonblank_pre_tokens(20); # 20 is arbitrary but
23777 # generous, and prevents
23779 # time in mangled files
23780 if ( defined($rpre_types) && @$rpre_types ) {
23781 push @pre_types, @$rpre_types;
23782 push @pre_tokens, @$rpre_tokens;
23785 # put a sentinal token to simplify stopping the search
23786 push @pre_types, '}';
23789 $jbeg = 1 if $pre_types[0] eq 'b';
23791 # first look for one of these
23793 # - bareword with leading -
23797 if ( $pre_types[$j] =~ /^[\'\"]/ ) {
23799 # find the closing quote; don't worry about escapes
23800 my $quote_mark = $pre_types[$j];
23801 for ( my $k = $j + 1 ; $k < $#pre_types ; $k++ ) {
23802 if ( $pre_types[$k] eq $quote_mark ) {
23804 my $next = $pre_types[$j];
23809 elsif ( $pre_types[$j] eq 'd' ) {
23812 elsif ( $pre_types[$j] eq 'w' ) {
23813 unless ( $is_keyword{ $pre_tokens[$j] } ) {
23817 elsif ( $pre_types[$j] eq '-' && $pre_types[ ++$j ] eq 'w' ) {
23820 if ( $j > $jbeg ) {
23822 $j++ if $pre_types[$j] eq 'b';
23824 # it's a hash ref if a comma or => follow next
23825 if ( $pre_types[$j] eq ','
23826 || ( $pre_types[$j] eq '=' && $pre_types[ ++$j ] eq '>' ) )
23828 $code_block_type = "";
23833 return $code_block_type;
23838 # report unexpected token type and show where it is
23839 # USES GLOBAL VARIABLES: $tokenizer_self
23840 my ( $found, $expecting, $i_tok, $last_nonblank_i, $rpretoken_map,
23841 $rpretoken_type, $input_line )
23844 if ( ++$tokenizer_self->{_unexpected_error_count} <= MAX_NAG_MESSAGES ) {
23845 my $msg = "found $found where $expecting expected";
23846 my $pos = $$rpretoken_map[$i_tok];
23847 interrupt_logfile();
23848 my $input_line_number = $tokenizer_self->{_last_line_number};
23849 my ( $offset, $numbered_line, $underline ) =
23850 make_numbered_line( $input_line_number, $input_line, $pos );
23851 $underline = write_on_underline( $underline, $pos - $offset, '^' );
23854 if ( ( $i_tok > 0 ) && ( $last_nonblank_i >= 0 ) ) {
23855 my $pos_prev = $$rpretoken_map[$last_nonblank_i];
23857 if ( $$rpretoken_type[ $i_tok - 1 ] eq 'b' ) {
23858 $num = $$rpretoken_map[ $i_tok - 1 ] - $pos_prev;
23861 $num = $pos - $pos_prev;
23863 if ( $num > 40 ) { $num = 40; $pos_prev = $pos - 40; }
23866 write_on_underline( $underline, $pos_prev - $offset, '-' x $num );
23867 $trailer = " (previous token underlined)";
23869 warning( $numbered_line . "\n" );
23870 warning( $underline . "\n" );
23871 warning( $msg . $trailer . "\n" );
23876 sub is_non_structural_brace {
23878 # Decide if a brace or bracket is structural or non-structural
23879 # by looking at the previous token and type
23880 # USES GLOBAL VARIABLES: $last_nonblank_type, $last_nonblank_token
23882 # EXPERIMENTAL: Mark slices as structural; idea was to improve formatting.
23883 # Tentatively deactivated because it caused the wrong operator expectation
23885 # $user = @vars[1] / 100;
23886 # Must update sub operator_expected before re-implementing.
23887 # if ( $last_nonblank_type eq 'i' && $last_nonblank_token =~ /^@/ ) {
23891 # NOTE: braces after type characters start code blocks, but for
23892 # simplicity these are not identified as such. See also
23893 # sub code_block_type
23894 # if ($last_nonblank_type eq 't') {return 0}
23896 # otherwise, it is non-structural if it is decorated
23897 # by type information.
23898 # For example, the '{' here is non-structural: ${xxx}
23900 $last_nonblank_token =~ /^([\$\@\*\&\%\)]|->|::)/
23902 # or if we follow a hash or array closing curly brace or bracket
23903 # For example, the second '{' in this is non-structural: $a{'x'}{'y'}
23904 # because the first '}' would have been given type 'R'
23905 || $last_nonblank_type =~ /^([R\]])$/
23909 #########i#############################################################
23910 # Tokenizer routines for tracking container nesting depths
23911 #######################################################################
23913 # The following routines keep track of nesting depths of the nesting
23914 # types, ( [ { and ?. This is necessary for determining the indentation
23915 # level, and also for debugging programs. Not only do they keep track of
23916 # nesting depths of the individual brace types, but they check that each
23917 # of the other brace types is balanced within matching pairs. For
23918 # example, if the program sees this sequence:
23922 # then it can determine that there is an extra left paren somewhere
23923 # between the { and the }. And so on with every other possible
23924 # combination of outer and inner brace types. For another
23929 # which has an extra ] within the parens.
23931 # The brace types have indexes 0 .. 3 which are indexes into
23934 # The pair ? : are treated as just another nesting type, with ? acting
23935 # as the opening brace and : acting as the closing brace.
23939 # $depth_array[$a][$b][ $current_depth[$a] ] = $current_depth[$b];
23941 # saves the nesting depth of brace type $b (where $b is either of the other
23942 # nesting types) when brace type $a enters a new depth. When this depth
23943 # decreases, a check is made that the current depth of brace types $b is
23944 # unchanged, or otherwise there must have been an error. This can
23945 # be very useful for localizing errors, particularly when perl runs to
23946 # the end of a large file (such as this one) and announces that there
23947 # is a problem somewhere.
23949 # A numerical sequence number is maintained for every nesting type,
23950 # so that each matching pair can be uniquely identified in a simple
23953 sub increase_nesting_depth {
23954 my ( $a, $pos ) = @_;
23956 # USES GLOBAL VARIABLES: $tokenizer_self, @current_depth,
23957 # @current_sequence_number, @depth_array, @starting_line_of_current_depth
23959 $current_depth[$a]++;
23960 my $input_line_number = $tokenizer_self->{_last_line_number};
23961 my $input_line = $tokenizer_self->{_line_text};
23963 # Sequence numbers increment by number of items. This keeps
23964 # a unique set of numbers but still allows the relative location
23965 # of any type to be determined.
23966 $nesting_sequence_number[$a] += scalar(@closing_brace_names);
23967 my $seqno = $nesting_sequence_number[$a];
23968 $current_sequence_number[$a][ $current_depth[$a] ] = $seqno;
23970 $starting_line_of_current_depth[$a][ $current_depth[$a] ] =
23971 [ $input_line_number, $input_line, $pos ];
23973 for $b ( 0 .. $#closing_brace_names ) {
23974 next if ( $b == $a );
23975 $depth_array[$a][$b][ $current_depth[$a] ] = $current_depth[$b];
23980 sub decrease_nesting_depth {
23982 my ( $a, $pos ) = @_;
23984 # USES GLOBAL VARIABLES: $tokenizer_self, @current_depth,
23985 # @current_sequence_number, @depth_array, @starting_line_of_current_depth
23988 my $input_line_number = $tokenizer_self->{_last_line_number};
23989 my $input_line = $tokenizer_self->{_line_text};
23991 if ( $current_depth[$a] > 0 ) {
23993 $seqno = $current_sequence_number[$a][ $current_depth[$a] ];
23995 # check that any brace types $b contained within are balanced
23996 for $b ( 0 .. $#closing_brace_names ) {
23997 next if ( $b == $a );
23999 unless ( $depth_array[$a][$b][ $current_depth[$a] ] ==
24000 $current_depth[$b] )
24003 $current_depth[$b] -
24004 $depth_array[$a][$b][ $current_depth[$a] ];
24006 # don't whine too many times
24007 my $saw_brace_error = get_saw_brace_error();
24009 $saw_brace_error <= MAX_NAG_MESSAGES
24011 # if too many closing types have occured, we probably
24012 # already caught this error
24013 && ( ( $diff > 0 ) || ( $saw_brace_error <= 0 ) )
24016 interrupt_logfile();
24018 $starting_line_of_current_depth[$a][ $current_depth[$a] ];
24020 my $rel = [ $input_line_number, $input_line, $pos ];
24024 if ( $diff == 1 || $diff == -1 ) {
24032 ? $opening_brace_names[$b]
24033 : $closing_brace_names[$b];
24034 write_error_indicator_pair( @$rsl, '^' );
24036 Found $diff extra $bname$ess between $opening_brace_names[$a] on line $sl and $closing_brace_names[$a] on line $el
24041 $starting_line_of_current_depth[$b]
24042 [ $current_depth[$b] ];
24045 " The most recent un-matched $bname is on line $ml\n";
24046 write_error_indicator_pair( @$rml, '^' );
24048 write_error_indicator_pair( @$rel, '^' );
24052 increment_brace_error();
24055 $current_depth[$a]--;
24059 my $saw_brace_error = get_saw_brace_error();
24060 if ( $saw_brace_error <= MAX_NAG_MESSAGES ) {
24062 There is no previous $opening_brace_names[$a] to match a $closing_brace_names[$a] on line $input_line_number
24064 indicate_error( $msg, $input_line_number, $input_line, $pos, '^' );
24066 increment_brace_error();
24071 sub check_final_nesting_depths {
24074 # USES GLOBAL VARIABLES: @current_depth, @starting_line_of_current_depth
24076 for $a ( 0 .. $#closing_brace_names ) {
24078 if ( $current_depth[$a] ) {
24079 my $rsl = $starting_line_of_current_depth[$a][ $current_depth[$a] ];
24082 Final nesting depth of $opening_brace_names[$a]s is $current_depth[$a]
24083 The most recent un-matched $opening_brace_names[$a] is on line $sl
24085 indicate_error( $msg, @$rsl, '^' );
24086 increment_brace_error();
24091 #########i#############################################################
24092 # Tokenizer routines for looking ahead in input stream
24093 #######################################################################
24095 sub peek_ahead_for_n_nonblank_pre_tokens {
24097 # returns next n pretokens if they exist
24098 # returns undef's if hits eof without seeing any pretokens
24099 # USES GLOBAL VARIABLES: $tokenizer_self
24100 my $max_pretokens = shift;
24103 my ( $rpre_tokens, $rmap, $rpre_types );
24105 while ( $line = $tokenizer_self->{_line_buffer_object}->peek_ahead( $i++ ) )
24107 $line =~ s/^\s*//; # trim leading blanks
24108 next if ( length($line) <= 0 ); # skip blank
24109 next if ( $line =~ /^#/ ); # skip comment
24110 ( $rpre_tokens, $rmap, $rpre_types ) =
24111 pre_tokenize( $line, $max_pretokens );
24114 return ( $rpre_tokens, $rpre_types );
24117 # look ahead for next non-blank, non-comment line of code
24118 sub peek_ahead_for_nonblank_token {
24120 # USES GLOBAL VARIABLES: $tokenizer_self
24121 my ( $rtokens, $max_token_index ) = @_;
24125 while ( $line = $tokenizer_self->{_line_buffer_object}->peek_ahead( $i++ ) )
24127 $line =~ s/^\s*//; # trim leading blanks
24128 next if ( length($line) <= 0 ); # skip blank
24129 next if ( $line =~ /^#/ ); # skip comment
24130 my ( $rtok, $rmap, $rtype ) =
24131 pre_tokenize( $line, 2 ); # only need 2 pre-tokens
24132 my $j = $max_token_index + 1;
24135 foreach $tok (@$rtok) {
24136 last if ( $tok =~ "\n" );
24137 $$rtokens[ ++$j ] = $tok;
24144 #########i#############################################################
24145 # Tokenizer guessing routines for ambiguous situations
24146 #######################################################################
24148 sub guess_if_pattern_or_conditional {
24150 # this routine is called when we have encountered a ? following an
24151 # unknown bareword, and we must decide if it starts a pattern or not
24152 # input parameters:
24153 # $i - token index of the ? starting possible pattern
24154 # output parameters:
24155 # $is_pattern = 0 if probably not pattern, =1 if probably a pattern
24156 # msg = a warning or diagnostic message
24157 # USES GLOBAL VARIABLES: $last_nonblank_token
24158 my ( $i, $rtokens, $rtoken_map, $max_token_index ) = @_;
24159 my $is_pattern = 0;
24160 my $msg = "guessing that ? after $last_nonblank_token starts a ";
24162 if ( $i >= $max_token_index ) {
24163 $msg .= "conditional (no end to pattern found on the line)\n";
24168 my $next_token = $$rtokens[$i]; # first token after ?
24170 # look for a possible ending ? on this line..
24172 my $quote_depth = 0;
24173 my $quote_character = '';
24177 $i, $in_quote, $quote_character, $quote_pos, $quote_depth,
24180 = follow_quoted_string( $ibeg, $in_quote, $rtokens, $quote_character,
24181 $quote_pos, $quote_depth, $max_token_index );
24185 # we didn't find an ending ? on this line,
24186 # so we bias towards conditional
24188 $msg .= "conditional (no ending ? on this line)\n";
24190 # we found an ending ?, so we bias towards a pattern
24194 if ( pattern_expected( $i, $rtokens, $max_token_index ) >= 0 ) {
24196 $msg .= "pattern (found ending ? and pattern expected)\n";
24199 $msg .= "pattern (uncertain, but found ending ?)\n";
24203 return ( $is_pattern, $msg );
24206 sub guess_if_pattern_or_division {
24208 # this routine is called when we have encountered a / following an
24209 # unknown bareword, and we must decide if it starts a pattern or is a
24211 # input parameters:
24212 # $i - token index of the / starting possible pattern
24213 # output parameters:
24214 # $is_pattern = 0 if probably division, =1 if probably a pattern
24215 # msg = a warning or diagnostic message
24216 # USES GLOBAL VARIABLES: $last_nonblank_token
24217 my ( $i, $rtokens, $rtoken_map, $max_token_index ) = @_;
24218 my $is_pattern = 0;
24219 my $msg = "guessing that / after $last_nonblank_token starts a ";
24221 if ( $i >= $max_token_index ) {
24222 "division (no end to pattern found on the line)\n";
24226 my $divide_expected =
24227 numerator_expected( $i, $rtokens, $max_token_index );
24229 my $next_token = $$rtokens[$i]; # first token after slash
24231 # look for a possible ending / on this line..
24233 my $quote_depth = 0;
24234 my $quote_character = '';
24238 $i, $in_quote, $quote_character, $quote_pos, $quote_depth,
24241 = follow_quoted_string( $ibeg, $in_quote, $rtokens, $quote_character,
24242 $quote_pos, $quote_depth, $max_token_index );
24246 # we didn't find an ending / on this line,
24247 # so we bias towards division
24248 if ( $divide_expected >= 0 ) {
24250 $msg .= "division (no ending / on this line)\n";
24253 $msg = "multi-line pattern (division not possible)\n";
24259 # we found an ending /, so we bias towards a pattern
24262 if ( pattern_expected( $i, $rtokens, $max_token_index ) >= 0 ) {
24264 if ( $divide_expected >= 0 ) {
24266 if ( $i - $ibeg > 60 ) {
24267 $msg .= "division (matching / too distant)\n";
24271 $msg .= "pattern (but division possible too)\n";
24277 $msg .= "pattern (division not possible)\n";
24282 if ( $divide_expected >= 0 ) {
24284 $msg .= "division (pattern not possible)\n";
24289 "pattern (uncertain, but division would not work here)\n";
24294 return ( $is_pattern, $msg );
24297 # try to resolve here-doc vs. shift by looking ahead for
24298 # non-code or the end token (currently only looks for end token)
24299 # returns 1 if it is probably a here doc, 0 if not
24300 sub guess_if_here_doc {
24302 # This is how many lines we will search for a target as part of the
24303 # guessing strategy. It is a constant because there is probably
24304 # little reason to change it.
24305 # USES GLOBAL VARIABLES: $tokenizer_self, $current_package
24307 use constant HERE_DOC_WINDOW => 40;
24309 my $next_token = shift;
24310 my $here_doc_expected = 0;
24313 my $msg = "checking <<";
24315 while ( $line = $tokenizer_self->{_line_buffer_object}->peek_ahead( $k++ ) )
24319 if ( $line =~ /^$next_token$/ ) {
24320 $msg .= " -- found target $next_token ahead $k lines\n";
24321 $here_doc_expected = 1; # got it
24324 last if ( $k >= HERE_DOC_WINDOW );
24327 unless ($here_doc_expected) {
24329 if ( !defined($line) ) {
24330 $here_doc_expected = -1; # hit eof without seeing target
24331 $msg .= " -- must be shift; target $next_token not in file\n";
24334 else { # still unsure..taking a wild guess
24336 if ( !$is_constant{$current_package}{$next_token} ) {
24337 $here_doc_expected = 1;
24339 " -- guessing it's a here-doc ($next_token not a constant)\n";
24343 " -- guessing it's a shift ($next_token is a constant)\n";
24347 write_logfile_entry($msg);
24348 return $here_doc_expected;
24351 #########i#############################################################
24352 # Tokenizer Routines for scanning identifiers and related items
24353 #######################################################################
24355 sub scan_bare_identifier_do {
24357 # this routine is called to scan a token starting with an alphanumeric
24358 # variable or package separator, :: or '.
24359 # USES GLOBAL VARIABLES: $current_package, $last_nonblank_token,
24360 # $last_nonblank_type,@paren_type, $paren_depth
24362 my ( $input_line, $i, $tok, $type, $prototype, $rtoken_map,
24366 my $package = undef;
24370 # we have to back up one pretoken at a :: since each : is one pretoken
24371 if ( $tok eq '::' ) { $i_beg-- }
24372 if ( $tok eq '->' ) { $i_beg-- }
24373 my $pos_beg = $$rtoken_map[$i_beg];
24374 pos($input_line) = $pos_beg;
24381 if ( $input_line =~ m/\G\s*((?:\w*(?:'|::)))*(?:(?:->)?(\w+))?/gc ) {
24383 my $pos = pos($input_line);
24384 my $numc = $pos - $pos_beg;
24385 $tok = substr( $input_line, $pos_beg, $numc );
24387 # type 'w' includes anything without leading type info
24388 # ($,%,@,*) including something like abc::def::ghi
24392 if ( defined($2) ) { $sub_name = $2; }
24393 if ( defined($1) ) {
24396 # patch: don't allow isolated package name which just ends
24397 # in the old style package separator (single quote). Example:
24399 if ( !($sub_name) && substr( $package, -1, 1 ) eq '\'' ) {
24403 $package =~ s/\'/::/g;
24404 if ( $package =~ /^\:/ ) { $package = 'main' . $package }
24405 $package =~ s/::$//;
24408 $package = $current_package;
24410 if ( $is_keyword{$tok} ) {
24415 # if it is a bareword..
24416 if ( $type eq 'w' ) {
24418 # check for v-string with leading 'v' type character
24419 # (This seems to have presidence over filehandle, type 'Y')
24420 if ( $tok =~ /^v\d[_\d]*$/ ) {
24422 # we only have the first part - something like 'v101' -
24424 if ( $input_line =~ m/\G(\.\d[_\d]*)+/gc ) {
24425 $pos = pos($input_line);
24426 $numc = $pos - $pos_beg;
24427 $tok = substr( $input_line, $pos_beg, $numc );
24431 # warn if this version can't handle v-strings
24432 report_v_string($tok);
24435 elsif ( $is_constant{$package}{$sub_name} ) {
24439 # bareword after sort has implied empty prototype; for example:
24440 # @sorted = sort numerically ( 53, 29, 11, 32, 7 );
24441 # This has priority over whatever the user has specified.
24442 elsif ($last_nonblank_token eq 'sort'
24443 && $last_nonblank_type eq 'k' )
24448 # Note: strangely, perl does not seem to really let you create
24449 # functions which act like eval and do, in the sense that eval
24450 # and do may have operators following the final }, but any operators
24451 # that you create with prototype (&) apparently do not allow
24452 # trailing operators, only terms. This seems strange.
24453 # If this ever changes, here is the update
24454 # to make perltidy behave accordingly:
24456 # elsif ( $is_block_function{$package}{$tok} ) {
24457 # $tok='eval'; # patch to do braces like eval - doesn't work
24460 # FIXME: This could become a separate type to allow for different
24462 elsif ( $is_block_function{$package}{$sub_name} ) {
24466 elsif ( $is_block_list_function{$package}{$sub_name} ) {
24469 elsif ( $is_user_function{$package}{$sub_name} ) {
24471 $prototype = $user_function_prototype{$package}{$sub_name};
24474 # check for indirect object
24477 # added 2001-03-27: must not be followed immediately by '('
24479 ( $input_line !~ m/\G\(/gc )
24484 # preceded by keyword like 'print', 'printf' and friends
24485 $is_indirect_object_taker{$last_nonblank_token}
24487 # or preceded by something like 'print(' or 'printf('
24489 ( $last_nonblank_token eq '(' )
24490 && $is_indirect_object_taker{ $paren_type[$paren_depth]
24498 # may not be indirect object unless followed by a space
24499 if ( $input_line =~ m/\G\s+/gc ) {
24503 # Perl's indirect object notation is a very bad
24504 # thing and can cause subtle bugs, especially for
24505 # beginning programmers. And I haven't even been
24506 # able to figure out a sane warning scheme which
24507 # doesn't get in the way of good scripts.
24509 # Complain if a filehandle has any lower case
24510 # letters. This is suggested good practice, but the
24511 # main reason for this warning is that prior to
24512 # release 20010328, perltidy incorrectly parsed a
24513 # function call after a print/printf, with the
24514 # result that a space got added before the opening
24515 # paren, thereby converting the function name to a
24516 # filehandle according to perl's weird rules. This
24517 # will not usually generate a syntax error, so this
24518 # is a potentially serious bug. By warning
24519 # of filehandles with any lower case letters,
24520 # followed by opening parens, we will help the user
24521 # find almost all of these older errors.
24522 # use 'sub_name' because something like
24523 # main::MYHANDLE is ok for filehandle
24524 if ( $sub_name =~ /[a-z]/ ) {
24526 # could be bug caused by older perltidy if
24528 if ( $input_line =~ m/\G\s*\(/gc ) {
24530 "Caution: unknown word '$tok' in indirect object slot\n"
24536 # bareword not followed by a space -- may not be filehandle
24537 # (may be function call defined in a 'use' statement)
24544 # Now we must convert back from character position
24545 # to pre_token index.
24546 # I don't think an error flag can occur here ..but who knows
24549 inverse_pretoken_map( $i, $pos, $rtoken_map, $max_token_index );
24551 warning("scan_bare_identifier: Possibly invalid tokenization\n");
24555 # no match but line not blank - could be syntax error
24556 # perl will take '::' alone without complaint
24560 # change this warning to log message if it becomes annoying
24561 warning("didn't find identifier after leading ::\n");
24563 return ( $i, $tok, $type, $prototype );
24568 # This is the new scanner and will eventually replace scan_identifier.
24569 # Only type 'sub' and 'package' are implemented.
24570 # Token types $ * % @ & -> are not yet implemented.
24572 # Scan identifier following a type token.
24573 # The type of call depends on $id_scan_state: $id_scan_state = ''
24574 # for starting call, in which case $tok must be the token defining
24577 # If the type token is the last nonblank token on the line, a value
24578 # of $id_scan_state = $tok is returned, indicating that further
24579 # calls must be made to get the identifier. If the type token is
24580 # not the last nonblank token on the line, the identifier is
24581 # scanned and handled and a value of '' is returned.
24582 # USES GLOBAL VARIABLES: $current_package, $last_nonblank_token, $in_attribute_list,
24583 # $statement_type, $tokenizer_self
24585 my ( $input_line, $i, $tok, $rtokens, $rtoken_map, $id_scan_state,
24589 my ( $i_beg, $pos_beg );
24591 #print "NSCAN:entering i=$i, tok=$tok, type=$type, state=$id_scan_state\n";
24592 #my ($a,$b,$c) = caller;
24593 #print "NSCAN: scan_id called with tok=$tok $a $b $c\n";
24595 # on re-entry, start scanning at first token on the line
24596 if ($id_scan_state) {
24601 # on initial entry, start scanning just after type token
24604 $id_scan_state = $tok;
24608 # find $i_beg = index of next nonblank token,
24609 # and handle empty lines
24610 my $blank_line = 0;
24611 my $next_nonblank_token = $$rtokens[$i_beg];
24612 if ( $i_beg > $max_token_index ) {
24617 # only a '#' immediately after a '$' is not a comment
24618 if ( $next_nonblank_token eq '#' ) {
24619 unless ( $tok eq '$' ) {
24624 if ( $next_nonblank_token =~ /^\s/ ) {
24625 ( $next_nonblank_token, $i_beg ) =
24626 find_next_nonblank_token_on_this_line( $i_beg, $rtokens,
24627 $max_token_index );
24628 if ( $next_nonblank_token =~ /(^#|^\s*$)/ ) {
24634 # handle non-blank line; identifier, if any, must follow
24635 unless ($blank_line) {
24637 if ( $id_scan_state eq 'sub' ) {
24638 ( $i, $tok, $type, $id_scan_state ) = do_scan_sub(
24639 $input_line, $i, $i_beg,
24640 $tok, $type, $rtokens,
24641 $rtoken_map, $id_scan_state, $max_token_index
24645 elsif ( $id_scan_state eq 'package' ) {
24646 ( $i, $tok, $type ) =
24647 do_scan_package( $input_line, $i, $i_beg, $tok, $type, $rtokens,
24648 $rtoken_map, $max_token_index );
24649 $id_scan_state = '';
24653 warning("invalid token in scan_id: $tok\n");
24654 $id_scan_state = '';
24658 if ( $id_scan_state && ( !defined($type) || !$type ) ) {
24660 # shouldn't happen:
24662 "Program bug in scan_id: undefined type but scan_state=$id_scan_state\n"
24664 report_definite_bug();
24667 TOKENIZER_DEBUG_FLAG_NSCAN && do {
24669 "NSCAN: returns i=$i, tok=$tok, type=$type, state=$id_scan_state\n";
24671 return ( $i, $tok, $type, $id_scan_state );
24674 sub check_prototype {
24675 my ( $proto, $package, $subname ) = @_;
24676 return unless ( defined($package) && defined($subname) );
24677 if ( defined($proto) ) {
24678 $proto =~ s/^\s*\(\s*//;
24679 $proto =~ s/\s*\)$//;
24681 $is_user_function{$package}{$subname} = 1;
24682 $user_function_prototype{$package}{$subname} = "($proto)";
24684 # prototypes containing '&' must be treated specially..
24685 if ( $proto =~ /\&/ ) {
24687 # right curly braces of prototypes ending in
24688 # '&' may be followed by an operator
24689 if ( $proto =~ /\&$/ ) {
24690 $is_block_function{$package}{$subname} = 1;
24693 # right curly braces of prototypes NOT ending in
24694 # '&' may NOT be followed by an operator
24695 elsif ( $proto !~ /\&$/ ) {
24696 $is_block_list_function{$package}{$subname} = 1;
24701 $is_constant{$package}{$subname} = 1;
24705 $is_user_function{$package}{$subname} = 1;
24709 sub do_scan_package {
24711 # do_scan_package parses a package name
24712 # it is called with $i_beg equal to the index of the first nonblank
24713 # token following a 'package' token.
24714 # USES GLOBAL VARIABLES: $current_package,
24716 my ( $input_line, $i, $i_beg, $tok, $type, $rtokens, $rtoken_map,
24719 my $package = undef;
24720 my $pos_beg = $$rtoken_map[$i_beg];
24721 pos($input_line) = $pos_beg;
24723 # handle non-blank line; package name, if any, must follow
24724 if ( $input_line =~ m/\G\s*((?:\w*(?:'|::))*\w+)/gc ) {
24726 $package = ( defined($1) && $1 ) ? $1 : 'main';
24727 $package =~ s/\'/::/g;
24728 if ( $package =~ /^\:/ ) { $package = 'main' . $package }
24729 $package =~ s/::$//;
24730 my $pos = pos($input_line);
24731 my $numc = $pos - $pos_beg;
24732 $tok = 'package ' . substr( $input_line, $pos_beg, $numc );
24735 # Now we must convert back from character position
24736 # to pre_token index.
24737 # I don't think an error flag can occur here ..but ?
24740 inverse_pretoken_map( $i, $pos, $rtoken_map, $max_token_index );
24741 if ($error) { warning("Possibly invalid package\n") }
24742 $current_package = $package;
24745 my ( $next_nonblank_token, $i_next ) =
24746 find_next_nonblank_token( $i, $rtokens, $max_token_index );
24747 if ( $next_nonblank_token !~ /^[;\}]$/ ) {
24749 "Unexpected '$next_nonblank_token' after package name '$tok'\n"
24754 # no match but line not blank --
24755 # could be a label with name package, like package: , for example.
24760 return ( $i, $tok, $type );
24763 sub scan_identifier_do {
24765 # This routine assembles tokens into identifiers. It maintains a
24766 # scan state, id_scan_state. It updates id_scan_state based upon
24767 # current id_scan_state and token, and returns an updated
24768 # id_scan_state and the next index after the identifier.
24769 # USES GLOBAL VARIABLES: $context, $last_nonblank_token,
24770 # $last_nonblank_type
24772 my ( $i, $id_scan_state, $identifier, $rtokens, $max_token_index ) = @_;
24775 my $tok_begin = $$rtokens[$i_begin];
24776 if ( $tok_begin eq ':' ) { $tok_begin = '::' }
24777 my $id_scan_state_begin = $id_scan_state;
24778 my $identifier_begin = $identifier;
24779 my $tok = $tok_begin;
24782 # these flags will be used to help figure out the type:
24783 my $saw_alpha = ( $tok =~ /^[A-Za-z_]/ );
24786 # allow old package separator (') except in 'use' statement
24787 my $allow_tick = ( $last_nonblank_token ne 'use' );
24789 # get started by defining a type and a state if necessary
24790 unless ($id_scan_state) {
24791 $context = UNKNOWN_CONTEXT;
24793 # fixup for digraph
24794 if ( $tok eq '>' ) {
24798 $identifier = $tok;
24800 if ( $tok eq '$' || $tok eq '*' ) {
24801 $id_scan_state = '$';
24802 $context = SCALAR_CONTEXT;
24804 elsif ( $tok eq '%' || $tok eq '@' ) {
24805 $id_scan_state = '$';
24806 $context = LIST_CONTEXT;
24808 elsif ( $tok eq '&' ) {
24809 $id_scan_state = '&';
24811 elsif ( $tok eq 'sub' or $tok eq 'package' ) {
24812 $saw_alpha = 0; # 'sub' is considered type info here
24813 $id_scan_state = '$';
24814 $identifier .= ' '; # need a space to separate sub from sub name
24816 elsif ( $tok eq '::' ) {
24817 $id_scan_state = 'A';
24819 elsif ( $tok =~ /^[A-Za-z_]/ ) {
24820 $id_scan_state = ':';
24822 elsif ( $tok eq '->' ) {
24823 $id_scan_state = '$';
24828 my ( $a, $b, $c ) = caller;
24829 warning("Program Bug: scan_identifier given bad token = $tok \n");
24830 warning(" called from sub $a line: $c\n");
24831 report_definite_bug();
24833 $saw_type = !$saw_alpha;
24837 $saw_type = ( $tok =~ /([\$\%\@\*\&])/ );
24840 # now loop to gather the identifier
24843 while ( $i < $max_token_index ) {
24844 $i_save = $i unless ( $tok =~ /^\s*$/ );
24845 $tok = $$rtokens[ ++$i ];
24847 if ( ( $tok eq ':' ) && ( $$rtokens[ $i + 1 ] eq ':' ) ) {
24852 if ( $id_scan_state eq '$' ) { # starting variable name
24854 if ( $tok eq '$' ) {
24856 $identifier .= $tok;
24858 # we've got a punctuation variable if end of line (punct.t)
24859 if ( $i == $max_token_index ) {
24861 $id_scan_state = '';
24865 elsif ( $tok =~ /^[A-Za-z_]/ ) { # alphanumeric ..
24867 $id_scan_state = ':'; # now need ::
24868 $identifier .= $tok;
24870 elsif ( $tok eq "'" && $allow_tick ) { # alphanumeric ..
24872 $id_scan_state = ':'; # now need ::
24873 $identifier .= $tok;
24875 # Perl will accept leading digits in identifiers,
24876 # although they may not always produce useful results.
24877 # Something like $main::0 is ok. But this also works:
24879 # sub howdy::123::bubba{ print "bubba $54321!\n" }
24880 # howdy::123::bubba();
24883 elsif ( $tok =~ /^[0-9]/ ) { # numeric
24885 $id_scan_state = ':'; # now need ::
24886 $identifier .= $tok;
24888 elsif ( $tok eq '::' ) {
24889 $id_scan_state = 'A';
24890 $identifier .= $tok;
24892 elsif ( ( $tok eq '#' ) && ( $identifier eq '$' ) ) { # $#array
24893 $identifier .= $tok; # keep same state, a $ could follow
24895 elsif ( $tok eq '{' ) {
24897 # check for something like ${#} or ${©}
24898 if ( $identifier eq '$'
24899 && $i + 2 <= $max_token_index
24900 && $$rtokens[ $i + 2 ] eq '}'
24901 && $$rtokens[ $i + 1 ] !~ /[\s\w]/ )
24903 my $next2 = $$rtokens[ $i + 2 ];
24904 my $next1 = $$rtokens[ $i + 1 ];
24905 $identifier .= $tok . $next1 . $next2;
24907 $id_scan_state = '';
24911 # skip something like ${xxx} or ->{
24912 $id_scan_state = '';
24914 # if this is the first token of a line, any tokens for this
24915 # identifier have already been accumulated
24916 if ( $identifier eq '$' || $i == 0 ) { $identifier = ''; }
24921 # space ok after leading $ % * & @
24922 elsif ( $tok =~ /^\s*$/ ) {
24924 if ( $identifier =~ /^[\$\%\*\&\@]/ ) {
24926 if ( length($identifier) > 1 ) {
24927 $id_scan_state = '';
24929 $type = 'i'; # probably punctuation variable
24934 # spaces after $'s are common, and space after @
24935 # is harmless, so only complain about space
24936 # after other type characters. Space after $ and
24937 # @ will be removed in formatting. Report space
24938 # after % and * because they might indicate a
24939 # parsing error. In other words '% ' might be a
24940 # modulo operator. Delete this warning if it
24942 if ( $identifier !~ /^[\@\$]$/ ) {
24944 "Space in identifier, following $identifier\n";
24950 # space after '->' is ok
24952 elsif ( $tok eq '^' ) {
24954 # check for some special variables like $^W
24955 if ( $identifier =~ /^[\$\*\@\%]$/ ) {
24956 $identifier .= $tok;
24957 $id_scan_state = 'A';
24959 # Perl accepts '$^]' or '@^]', but
24960 # there must not be a space before the ']'.
24961 my $next1 = $$rtokens[ $i + 1 ];
24962 if ( $next1 eq ']' ) {
24964 $identifier .= $next1;
24965 $id_scan_state = "";
24970 $id_scan_state = '';
24973 else { # something else
24975 # check for various punctuation variables
24976 if ( $identifier =~ /^[\$\*\@\%]$/ ) {
24977 $identifier .= $tok;
24980 elsif ( $identifier eq '$#' ) {
24982 if ( $tok eq '{' ) { $type = 'i'; $i = $i_save }
24984 # perl seems to allow just these: $#: $#- $#+
24985 elsif ( $tok =~ /^[\:\-\+]$/ ) {
24987 $identifier .= $tok;
24991 write_logfile_entry( 'Use of $# is deprecated' . "\n" );
24994 elsif ( $identifier eq '$$' ) {
24996 # perl does not allow references to punctuation
24997 # variables without braces. For example, this
25001 # You would have to use
25005 if ( $tok eq '{' ) { $type = 't' }
25006 else { $type = 'i' }
25008 elsif ( $identifier eq '->' ) {
25013 if ( length($identifier) == 1 ) { $identifier = ''; }
25015 $id_scan_state = '';
25019 elsif ( $id_scan_state eq '&' ) { # starting sub call?
25021 if ( $tok =~ /^[\$A-Za-z_]/ ) { # alphanumeric ..
25022 $id_scan_state = ':'; # now need ::
25024 $identifier .= $tok;
25026 elsif ( $tok eq "'" && $allow_tick ) { # alphanumeric ..
25027 $id_scan_state = ':'; # now need ::
25029 $identifier .= $tok;
25031 elsif ( $tok =~ /^[0-9]/ ) { # numeric..see comments above
25032 $id_scan_state = ':'; # now need ::
25034 $identifier .= $tok;
25036 elsif ( $tok =~ /^\s*$/ ) { # allow space
25038 elsif ( $tok eq '::' ) { # leading ::
25039 $id_scan_state = 'A'; # accept alpha next
25040 $identifier .= $tok;
25042 elsif ( $tok eq '{' ) {
25043 if ( $identifier eq '&' || $i == 0 ) { $identifier = ''; }
25045 $id_scan_state = '';
25050 # punctuation variable?
25051 # testfile: cunningham4.pl
25052 if ( $identifier eq '&' ) {
25053 $identifier .= $tok;
25060 $id_scan_state = '';
25064 elsif ( $id_scan_state eq 'A' ) { # looking for alpha (after ::)
25066 if ( $tok =~ /^[A-Za-z_]/ ) { # found it
25067 $identifier .= $tok;
25068 $id_scan_state = ':'; # now need ::
25071 elsif ( $tok eq "'" && $allow_tick ) {
25072 $identifier .= $tok;
25073 $id_scan_state = ':'; # now need ::
25076 elsif ( $tok =~ /^[0-9]/ ) { # numeric..see comments above
25077 $identifier .= $tok;
25078 $id_scan_state = ':'; # now need ::
25081 elsif ( ( $identifier =~ /^sub / ) && ( $tok =~ /^\s*$/ ) ) {
25082 $id_scan_state = '(';
25083 $identifier .= $tok;
25085 elsif ( ( $identifier =~ /^sub / ) && ( $tok eq '(' ) ) {
25086 $id_scan_state = ')';
25087 $identifier .= $tok;
25090 $id_scan_state = '';
25095 elsif ( $id_scan_state eq ':' ) { # looking for :: after alpha
25097 if ( $tok eq '::' ) { # got it
25098 $identifier .= $tok;
25099 $id_scan_state = 'A'; # now require alpha
25101 elsif ( $tok =~ /^[A-Za-z_]/ ) { # more alphanumeric is ok here
25102 $identifier .= $tok;
25103 $id_scan_state = ':'; # now need ::
25106 elsif ( $tok =~ /^[0-9]/ ) { # numeric..see comments above
25107 $identifier .= $tok;
25108 $id_scan_state = ':'; # now need ::
25111 elsif ( $tok eq "'" && $allow_tick ) { # tick
25113 if ( $is_keyword{$identifier} ) {
25114 $id_scan_state = ''; # that's all
25118 $identifier .= $tok;
25121 elsif ( ( $identifier =~ /^sub / ) && ( $tok =~ /^\s*$/ ) ) {
25122 $id_scan_state = '(';
25123 $identifier .= $tok;
25125 elsif ( ( $identifier =~ /^sub / ) && ( $tok eq '(' ) ) {
25126 $id_scan_state = ')';
25127 $identifier .= $tok;
25130 $id_scan_state = ''; # that's all
25135 elsif ( $id_scan_state eq '(' ) { # looking for ( of prototype
25137 if ( $tok eq '(' ) { # got it
25138 $identifier .= $tok;
25139 $id_scan_state = ')'; # now find the end of it
25141 elsif ( $tok =~ /^\s*$/ ) { # blank - keep going
25142 $identifier .= $tok;
25145 $id_scan_state = ''; # that's all - no prototype
25150 elsif ( $id_scan_state eq ')' ) { # looking for ) to end
25152 if ( $tok eq ')' ) { # got it
25153 $identifier .= $tok;
25154 $id_scan_state = ''; # all done
25157 elsif ( $tok =~ /^[\s\$\%\\\*\@\&\;]/ ) {
25158 $identifier .= $tok;
25160 else { # probable error in script, but keep going
25161 warning("Unexpected '$tok' while seeking end of prototype\n");
25162 $identifier .= $tok;
25165 else { # can get here due to error in initialization
25166 $id_scan_state = '';
25172 if ( $id_scan_state eq ')' ) {
25173 warning("Hit end of line while seeking ) to end prototype\n");
25176 # once we enter the actual identifier, it may not extend beyond
25177 # the end of the current line
25178 if ( $id_scan_state =~ /^[A\:\(\)]/ ) {
25179 $id_scan_state = '';
25181 if ( $i < 0 ) { $i = 0 }
25188 if ( $identifier =~ /^->/ && $last_nonblank_type eq 'w' ) {
25191 else { $type = 'i' }
25193 elsif ( $identifier eq '->' ) {
25197 ( length($identifier) > 1 )
25199 # In something like '@$=' we have an identifier '@$'
25200 # In something like '$${' we have type '$$' (and only
25201 # part of an identifier)
25202 && !( $identifier =~ /\$$/ && $tok eq '{' )
25203 && ( $identifier !~ /^(sub |package )$/ )
25208 else { $type = 't' }
25210 elsif ($saw_alpha) {
25212 # type 'w' includes anything without leading type info
25213 # ($,%,@,*) including something like abc::def::ghi
25218 } # this can happen on a restart
25222 $tok = $identifier;
25223 if ($message) { write_logfile_entry($message) }
25230 TOKENIZER_DEBUG_FLAG_SCAN_ID && do {
25231 my ( $a, $b, $c ) = caller;
25233 "SCANID: called from $a $b $c with tok, i, state, identifier =$tok_begin, $i_begin, $id_scan_state_begin, $identifier_begin\n";
25235 "SCANID: returned with tok, i, state, identifier =$tok, $i, $id_scan_state, $identifier\n";
25237 return ( $i, $tok, $type, $id_scan_state, $identifier );
25242 # saved package and subnames in case prototype is on separate line
25243 my ( $package_saved, $subname_saved );
25247 # do_scan_sub parses a sub name and prototype
25248 # it is called with $i_beg equal to the index of the first nonblank
25249 # token following a 'sub' token.
25251 # TODO: add future error checks to be sure we have a valid
25252 # sub name. For example, 'sub &doit' is wrong. Also, be sure
25253 # a name is given if and only if a non-anonymous sub is
25255 # USES GLOBAL VARS: $current_package, $last_nonblank_token,
25256 # $in_attribute_list, %saw_function_definition,
25260 $input_line, $i, $i_beg,
25261 $tok, $type, $rtokens,
25262 $rtoken_map, $id_scan_state, $max_token_index
25264 $id_scan_state = ""; # normally we get everything in one call
25265 my $subname = undef;
25266 my $package = undef;
25271 my $pos_beg = $$rtoken_map[$i_beg];
25272 pos($input_line) = $pos_beg;
25274 # sub NAME PROTO ATTRS
25276 $input_line =~ m/\G\s*
25277 ((?:\w*(?:'|::))*) # package - something that ends in :: or '
25278 (\w+) # NAME - required
25279 (\s*\([^){]*\))? # PROTO - something in parens
25280 (\s*:)? # ATTRS - leading : of attribute list
25289 $package = ( defined($1) && $1 ) ? $1 : $current_package;
25290 $package =~ s/\'/::/g;
25291 if ( $package =~ /^\:/ ) { $package = 'main' . $package }
25292 $package =~ s/::$//;
25293 my $pos = pos($input_line);
25294 my $numc = $pos - $pos_beg;
25295 $tok = 'sub ' . substr( $input_line, $pos_beg, $numc );
25299 # Look for prototype/attributes not preceded on this line by subname;
25300 # This might be an anonymous sub with attributes,
25301 # or a prototype on a separate line from its sub name
25303 $input_line =~ m/\G(\s*\([^){]*\))? # PROTO
25304 (\s*:)? # ATTRS leading ':'
25313 # Handle prototype on separate line from subname
25314 if ($subname_saved) {
25315 $package = $package_saved;
25316 $subname = $subname_saved;
25317 $tok = $last_nonblank_token;
25324 # ATTRS: if there are attributes, back up and let the ':' be
25325 # found later by the scanner.
25326 my $pos = pos($input_line);
25328 $pos -= length($attrs);
25331 my $next_nonblank_token = $tok;
25333 # catch case of line with leading ATTR ':' after anonymous sub
25334 if ( $pos == $pos_beg && $tok eq ':' ) {
25336 $in_attribute_list = 1;
25339 # We must convert back from character position
25340 # to pre_token index.
25343 # I don't think an error flag can occur here ..but ?
25345 ( $i, $error ) = inverse_pretoken_map( $i, $pos, $rtoken_map,
25346 $max_token_index );
25347 if ($error) { warning("Possibly invalid sub\n") }
25349 # check for multiple definitions of a sub
25350 ( $next_nonblank_token, my $i_next ) =
25351 find_next_nonblank_token_on_this_line( $i, $rtokens,
25352 $max_token_index );
25355 if ( $next_nonblank_token =~ /^(\s*|#)$/ )
25356 { # skip blank or side comment
25357 my ( $rpre_tokens, $rpre_types ) =
25358 peek_ahead_for_n_nonblank_pre_tokens(1);
25359 if ( defined($rpre_tokens) && @$rpre_tokens ) {
25360 $next_nonblank_token = $rpre_tokens->[0];
25363 $next_nonblank_token = '}';
25366 $package_saved = "";
25367 $subname_saved = "";
25368 if ( $next_nonblank_token eq '{' ) {
25371 # Check for multiple definitions of a sub, but
25372 # it is ok to have multiple sub BEGIN, etc,
25373 # so we do not complain if name is all caps
25374 if ( $saw_function_definition{$package}{$subname}
25375 && $subname !~ /^[A-Z]+$/ )
25377 my $lno = $saw_function_definition{$package}{$subname};
25379 "already saw definition of 'sub $subname' in package '$package' at line $lno\n"
25382 $saw_function_definition{$package}{$subname} =
25383 $tokenizer_self->{_last_line_number};
25386 elsif ( $next_nonblank_token eq ';' ) {
25388 elsif ( $next_nonblank_token eq '}' ) {
25391 # ATTRS - if an attribute list follows, remember the name
25392 # of the sub so the next opening brace can be labeled.
25393 # Setting 'statement_type' causes any ':'s to introduce
25395 elsif ( $next_nonblank_token eq ':' ) {
25396 $statement_type = $tok;
25399 # see if PROTO follows on another line:
25400 elsif ( $next_nonblank_token eq '(' ) {
25401 if ( $attrs || $proto ) {
25403 "unexpected '(' after definition or declaration of sub '$subname'\n"
25407 $id_scan_state = 'sub'; # we must come back to get proto
25408 $statement_type = $tok;
25409 $package_saved = $package;
25410 $subname_saved = $subname;
25413 elsif ($next_nonblank_token) { # EOF technically ok
25415 "expecting ':' or ';' or '{' after definition or declaration of sub '$subname' but saw '$next_nonblank_token'\n"
25418 check_prototype( $proto, $package, $subname );
25421 # no match but line not blank
25424 return ( $i, $tok, $type, $id_scan_state );
25428 #########i###############################################################
25429 # Tokenizer utility routines which may use CONSTANTS but no other GLOBALS
25430 #########################################################################
25432 sub find_next_nonblank_token {
25433 my ( $i, $rtokens, $max_token_index ) = @_;
25435 if ( $i >= $max_token_index ) {
25436 if ( !peeked_ahead() ) {
25439 peek_ahead_for_nonblank_token( $rtokens, $max_token_index );
25442 my $next_nonblank_token = $$rtokens[ ++$i ];
25444 if ( $next_nonblank_token =~ /^\s*$/ ) {
25445 $next_nonblank_token = $$rtokens[ ++$i ];
25447 return ( $next_nonblank_token, $i );
25450 sub numerator_expected {
25452 # this is a filter for a possible numerator, in support of guessing
25453 # for the / pattern delimiter token.
25458 # Note: I am using the convention that variables ending in
25459 # _expected have these 3 possible values.
25460 my ( $i, $rtokens, $max_token_index ) = @_;
25461 my $next_token = $$rtokens[ $i + 1 ];
25462 if ( $next_token eq '=' ) { $i++; } # handle /=
25463 my ( $next_nonblank_token, $i_next ) =
25464 find_next_nonblank_token( $i, $rtokens, $max_token_index );
25466 if ( $next_nonblank_token =~ /(\(|\$|\w|\.|\@)/ ) {
25471 if ( $next_nonblank_token =~ /^\s*$/ ) {
25480 sub pattern_expected {
25482 # This is the start of a filter for a possible pattern.
25483 # It looks at the token after a possbible pattern and tries to
25484 # determine if that token could end a pattern.
25489 my ( $i, $rtokens, $max_token_index ) = @_;
25490 my $next_token = $$rtokens[ $i + 1 ];
25491 if ( $next_token =~ /^[cgimosx]/ ) { $i++; } # skip possible modifier
25492 my ( $next_nonblank_token, $i_next ) =
25493 find_next_nonblank_token( $i, $rtokens, $max_token_index );
25495 # list of tokens which may follow a pattern
25496 # (can probably be expanded)
25497 if ( $next_nonblank_token =~ /(\)|\}|\;|\&\&|\|\||and|or|while|if|unless)/ )
25503 if ( $next_nonblank_token =~ /^\s*$/ ) {
25512 sub find_next_nonblank_token_on_this_line {
25513 my ( $i, $rtokens, $max_token_index ) = @_;
25514 my $next_nonblank_token;
25516 if ( $i < $max_token_index ) {
25517 $next_nonblank_token = $$rtokens[ ++$i ];
25519 if ( $next_nonblank_token =~ /^\s*$/ ) {
25521 if ( $i < $max_token_index ) {
25522 $next_nonblank_token = $$rtokens[ ++$i ];
25527 $next_nonblank_token = "";
25529 return ( $next_nonblank_token, $i );
25532 sub find_angle_operator_termination {
25534 # We are looking at a '<' and want to know if it is an angle operator.
25535 # We are to return:
25536 # $i = pretoken index of ending '>' if found, current $i otherwise
25537 # $type = 'Q' if found, '>' otherwise
25538 my ( $input_line, $i_beg, $rtoken_map, $expecting, $max_token_index ) = @_;
25541 pos($input_line) = 1 + $$rtoken_map[$i];
25545 # we just have to find the next '>' if a term is expected
25546 if ( $expecting == TERM ) { $filter = '[\>]' }
25548 # we have to guess if we don't know what is expected
25549 elsif ( $expecting == UNKNOWN ) { $filter = '[\>\;\=\#\|\<]' }
25551 # shouldn't happen - we shouldn't be here if operator is expected
25552 else { warning("Program Bug in find_angle_operator_termination\n") }
25554 # To illustrate what we might be looking at, in case we are
25555 # guessing, here are some examples of valid angle operators
25562 # <jskdfjskdfj* op/* jskdjfjkosvk*> ( glob.t)
25563 # <${PREFIX}*img*.$IMAGE_TYPE>
25564 # <img*.$IMAGE_TYPE>
25565 # <Timg*.$IMAGE_TYPE>
25566 # <$LATEX2HTMLVERSIONS${dd}html[1-9].[0-9].pl>
25568 # Here are some examples of lines which do not have angle operators:
25569 # return undef unless $self->[2]++ < $#{$self->[1]};
25572 # the following line from dlister.pl caused trouble:
25573 # print'~'x79,"\n",$D<1024?"0.$D":$D>>10,"K, $C files\n\n\n";
25575 # If the '<' starts an angle operator, it must end on this line and
25576 # it must not have certain characters like ';' and '=' in it. I use
25577 # this to limit the testing. This filter should be improved if
25580 if ( $input_line =~ /($filter)/g ) {
25584 # We MAY have found an angle operator termination if we get
25585 # here, but we need to do more to be sure we haven't been
25587 my $pos = pos($input_line);
25589 my $pos_beg = $$rtoken_map[$i];
25590 my $str = substr( $input_line, $pos_beg, ( $pos - $pos_beg ) );
25592 # Reject if the closing '>' follows a '-' as in:
25593 # if ( VERSION < 5.009 && $op-> name eq 'aassign' ) { }
25594 if ( $expecting eq UNKNOWN ) {
25595 my $check = substr( $input_line, $pos - 2, 1 );
25596 if ( $check eq '-' ) {
25597 return ( $i, $type );
25601 ######################################debug#####
25602 #write_diagnostics( "ANGLE? :$str\n");
25603 #print "ANGLE: found $1 at pos=$pos str=$str check=$check\n";
25604 ######################################debug#####
25608 inverse_pretoken_map( $i, $pos, $rtoken_map, $max_token_index );
25610 # It may be possible that a quote ends midway in a pretoken.
25611 # If this happens, it may be necessary to split the pretoken.
25614 "Possible tokinization error..please check this line\n");
25615 report_possible_bug();
25618 # Now let's see where we stand....
25619 # OK if math op not possible
25620 if ( $expecting == TERM ) {
25623 # OK if there are no more than 2 pre-tokens inside
25624 # (not possible to write 2 token math between < and >)
25625 # This catches most common cases
25626 elsif ( $i <= $i_beg + 3 ) {
25627 write_diagnostics("ANGLE(1 or 2 tokens): $str\n");
25633 # Let's try a Brace Test: any braces inside must balance
25635 while ( $str =~ /\{/g ) { $br++ }
25636 while ( $str =~ /\}/g ) { $br-- }
25638 while ( $str =~ /\[/g ) { $sb++ }
25639 while ( $str =~ /\]/g ) { $sb-- }
25641 while ( $str =~ /\(/g ) { $pr++ }
25642 while ( $str =~ /\)/g ) { $pr-- }
25644 # if braces do not balance - not angle operator
25645 if ( $br || $sb || $pr ) {
25649 "NOT ANGLE (BRACE={$br ($pr [$sb ):$str\n");
25652 # we should keep doing more checks here...to be continued
25653 # Tentatively accepting this as a valid angle operator.
25654 # There are lots more things that can be checked.
25657 "ANGLE-Guessing yes: $str expecting=$expecting\n");
25658 write_logfile_entry("Guessing angle operator here: $str\n");
25663 # didn't find ending >
25665 if ( $expecting == TERM ) {
25666 warning("No ending > for angle operator\n");
25670 return ( $i, $type );
25673 sub scan_number_do {
25675 # scan a number in any of the formats that Perl accepts
25676 # Underbars (_) are allowed in decimal numbers.
25677 # input parameters -
25678 # $input_line - the string to scan
25679 # $i - pre_token index to start scanning
25680 # $rtoken_map - reference to the pre_token map giving starting
25681 # character position in $input_line of token $i
25682 # output parameters -
25683 # $i - last pre_token index of the number just scanned
25684 # number - the number (characters); or undef if not a number
25686 my ( $input_line, $i, $rtoken_map, $input_type, $max_token_index ) = @_;
25687 my $pos_beg = $$rtoken_map[$i];
25690 my $number = undef;
25691 my $type = $input_type;
25693 my $first_char = substr( $input_line, $pos_beg, 1 );
25695 # Look for bad starting characters; Shouldn't happen..
25696 if ( $first_char !~ /[\d\.\+\-Ee]/ ) {
25697 warning("Program bug - scan_number given character $first_char\n");
25698 report_definite_bug();
25699 return ( $i, $type, $number );
25702 # handle v-string without leading 'v' character ('Two Dot' rule)
25704 # TODO: v-strings may contain underscores
25705 pos($input_line) = $pos_beg;
25706 if ( $input_line =~ /\G((\d+)?\.\d+(\.\d+)+)/g ) {
25707 $pos = pos($input_line);
25708 my $numc = $pos - $pos_beg;
25709 $number = substr( $input_line, $pos_beg, $numc );
25711 report_v_string($number);
25714 # handle octal, hex, binary
25715 if ( !defined($number) ) {
25716 pos($input_line) = $pos_beg;
25717 if ( $input_line =~ /\G[+-]?0((x[0-9a-fA-F_]+)|([0-7_]+)|(b[01_]+))/g )
25719 $pos = pos($input_line);
25720 my $numc = $pos - $pos_beg;
25721 $number = substr( $input_line, $pos_beg, $numc );
25727 if ( !defined($number) ) {
25728 pos($input_line) = $pos_beg;
25730 if ( $input_line =~ /\G([+-]?[\d_]*(\.[\d_]*)?([Ee][+-]?(\d+))?)/g ) {
25731 $pos = pos($input_line);
25733 # watch out for things like 0..40 which would give 0. by this;
25734 if ( ( substr( $input_line, $pos - 1, 1 ) eq '.' )
25735 && ( substr( $input_line, $pos, 1 ) eq '.' ) )
25739 my $numc = $pos - $pos_beg;
25740 $number = substr( $input_line, $pos_beg, $numc );
25745 # filter out non-numbers like e + - . e2 .e3 +e6
25746 # the rule: at least one digit, and any 'e' must be preceded by a digit
25748 $number !~ /\d/ # no digits
25749 || ( $number =~ /^(.*)[eE]/
25750 && $1 !~ /\d/ ) # or no digits before the 'e'
25754 $type = $input_type;
25755 return ( $i, $type, $number );
25758 # Found a number; now we must convert back from character position
25759 # to pre_token index. An error here implies user syntax error.
25760 # An example would be an invalid octal number like '009'.
25763 inverse_pretoken_map( $i, $pos, $rtoken_map, $max_token_index );
25764 if ($error) { warning("Possibly invalid number\n") }
25766 return ( $i, $type, $number );
25769 sub inverse_pretoken_map {
25771 # Starting with the current pre_token index $i, scan forward until
25772 # finding the index of the next pre_token whose position is $pos.
25773 my ( $i, $pos, $rtoken_map, $max_token_index ) = @_;
25776 while ( ++$i <= $max_token_index ) {
25778 if ( $pos <= $$rtoken_map[$i] ) {
25780 # Let the calling routine handle errors in which we do not
25781 # land on a pre-token boundary. It can happen by running
25782 # perltidy on some non-perl scripts, for example.
25783 if ( $pos < $$rtoken_map[$i] ) { $error = 1 }
25788 return ( $i, $error );
25791 sub find_here_doc {
25793 # find the target of a here document, if any
25794 # input parameters:
25795 # $i - token index of the second < of <<
25796 # ($i must be less than the last token index if this is called)
25797 # output parameters:
25798 # $found_target = 0 didn't find target; =1 found target
25799 # HERE_TARGET - the target string (may be empty string)
25800 # $i - unchanged if not here doc,
25801 # or index of the last token of the here target
25802 # $saw_error - flag noting unbalanced quote on here target
25803 my ( $expecting, $i, $rtokens, $rtoken_map, $max_token_index ) = @_;
25805 my $found_target = 0;
25806 my $here_doc_target = '';
25807 my $here_quote_character = '';
25809 my ( $next_nonblank_token, $i_next_nonblank, $next_token );
25810 $next_token = $$rtokens[ $i + 1 ];
25812 # perl allows a backslash before the target string (heredoc.t)
25814 if ( $next_token eq '\\' ) {
25816 $next_token = $$rtokens[ $i + 2 ];
25819 ( $next_nonblank_token, $i_next_nonblank ) =
25820 find_next_nonblank_token_on_this_line( $i, $rtokens, $max_token_index );
25822 if ( $next_nonblank_token =~ /[\'\"\`]/ ) {
25825 my $quote_depth = 0;
25830 $i, $in_quote, $here_quote_character, $quote_pos, $quote_depth,
25833 = follow_quoted_string( $i_next_nonblank, $in_quote, $rtokens,
25834 $here_quote_character, $quote_pos, $quote_depth, $max_token_index );
25836 if ($in_quote) { # didn't find end of quote, so no target found
25838 if ( $expecting == TERM ) {
25840 "Did not find here-doc string terminator ($here_quote_character) before end of line \n"
25845 else { # found ending quote
25850 for ( $j = $i_next_nonblank + 1 ; $j < $i ; $j++ ) {
25851 $tokj = $$rtokens[$j];
25853 # we have to remove any backslash before the quote character
25854 # so that the here-doc-target exactly matches this string
25858 && $$rtokens[ $j + 1 ] eq $here_quote_character );
25859 $here_doc_target .= $tokj;
25864 elsif ( ( $next_token =~ /^\s*$/ ) and ( $expecting == TERM ) ) {
25866 write_logfile_entry(
25867 "found blank here-target after <<; suggest using \"\"\n");
25870 elsif ( $next_token =~ /^\w/ ) { # simple bareword or integer after <<
25872 my $here_doc_expected;
25873 if ( $expecting == UNKNOWN ) {
25874 $here_doc_expected = guess_if_here_doc($next_token);
25877 $here_doc_expected = 1;
25880 if ($here_doc_expected) {
25882 $here_doc_target = $next_token;
25889 if ( $expecting == TERM ) {
25891 write_logfile_entry("Note: bare here-doc operator <<\n");
25898 # patch to neglect any prepended backslash
25899 if ( $found_target && $backslash ) { $i++ }
25901 return ( $found_target, $here_doc_target, $here_quote_character, $i,
25907 # follow (or continue following) quoted string(s)
25908 # $in_quote return code:
25909 # 0 - ok, found end
25910 # 1 - still must find end of quote whose target is $quote_character
25911 # 2 - still looking for end of first of two quotes
25913 # Returns updated strings:
25914 # $quoted_string_1 = quoted string seen while in_quote=1
25915 # $quoted_string_2 = quoted string seen while in_quote=2
25917 $i, $in_quote, $quote_character,
25918 $quote_pos, $quote_depth, $quoted_string_1,
25919 $quoted_string_2, $rtokens, $rtoken_map,
25923 my $in_quote_starting = $in_quote;
25926 if ( $in_quote == 2 ) { # two quotes/quoted_string_1s to follow
25929 $i, $in_quote, $quote_character, $quote_pos, $quote_depth,
25932 = follow_quoted_string( $i, $in_quote, $rtokens, $quote_character,
25933 $quote_pos, $quote_depth, $max_token_index );
25934 $quoted_string_2 .= $quoted_string;
25935 if ( $in_quote == 1 ) {
25936 if ( $quote_character =~ /[\{\[\<\(]/ ) { $i++; }
25937 $quote_character = '';
25940 $quoted_string_2 .= "\n";
25944 if ( $in_quote == 1 ) { # one (more) quote to follow
25947 $i, $in_quote, $quote_character, $quote_pos, $quote_depth,
25950 = follow_quoted_string( $ibeg, $in_quote, $rtokens, $quote_character,
25951 $quote_pos, $quote_depth, $max_token_index );
25952 $quoted_string_1 .= $quoted_string;
25953 if ( $in_quote == 1 ) {
25954 $quoted_string_1 .= "\n";
25957 return ( $i, $in_quote, $quote_character, $quote_pos, $quote_depth,
25958 $quoted_string_1, $quoted_string_2 );
25961 sub follow_quoted_string {
25963 # scan for a specific token, skipping escaped characters
25964 # if the quote character is blank, use the first non-blank character
25965 # input parameters:
25966 # $rtokens = reference to the array of tokens
25967 # $i = the token index of the first character to search
25968 # $in_quote = number of quoted strings being followed
25969 # $beginning_tok = the starting quote character
25970 # $quote_pos = index to check next for alphanumeric delimiter
25971 # output parameters:
25972 # $i = the token index of the ending quote character
25973 # $in_quote = decremented if found end, unchanged if not
25974 # $beginning_tok = the starting quote character
25975 # $quote_pos = index to check next for alphanumeric delimiter
25976 # $quote_depth = nesting depth, since delimiters '{ ( [ <' can be nested.
25977 # $quoted_string = the text of the quote (without quotation tokens)
25978 my ( $i_beg, $in_quote, $rtokens, $beginning_tok, $quote_pos, $quote_depth,
25981 my ( $tok, $end_tok );
25982 my $i = $i_beg - 1;
25983 my $quoted_string = "";
25985 TOKENIZER_DEBUG_FLAG_QUOTE && do {
25987 "QUOTE entering with quote_pos = $quote_pos i=$i beginning_tok =$beginning_tok\n";
25990 # get the corresponding end token
25991 if ( $beginning_tok !~ /^\s*$/ ) {
25992 $end_tok = matching_end_token($beginning_tok);
25995 # a blank token means we must find and use the first non-blank one
25997 my $allow_quote_comments = ( $i < 0 ) ? 1 : 0; # i<0 means we saw a <cr>
25999 while ( $i < $max_token_index ) {
26000 $tok = $$rtokens[ ++$i ];
26002 if ( $tok !~ /^\s*$/ ) {
26004 if ( ( $tok eq '#' ) && ($allow_quote_comments) ) {
26005 $i = $max_token_index;
26009 if ( length($tok) > 1 ) {
26010 if ( $quote_pos <= 0 ) { $quote_pos = 1 }
26011 $beginning_tok = substr( $tok, $quote_pos - 1, 1 );
26014 $beginning_tok = $tok;
26017 $end_tok = matching_end_token($beginning_tok);
26023 $allow_quote_comments = 1;
26028 # There are two different loops which search for the ending quote
26029 # character. In the rare case of an alphanumeric quote delimiter, we
26030 # have to look through alphanumeric tokens character-by-character, since
26031 # the pre-tokenization process combines multiple alphanumeric
26032 # characters, whereas for a non-alphanumeric delimiter, only tokens of
26033 # length 1 can match.
26035 ###################################################################
26036 # Case 1 (rare): loop for case of alphanumeric quote delimiter..
26037 # "quote_pos" is the position the current word to begin searching
26038 ###################################################################
26039 if ( $beginning_tok =~ /\w/ ) {
26041 # Note this because it is not recommended practice except
26042 # for obfuscated perl contests
26043 if ( $in_quote == 1 ) {
26044 write_logfile_entry(
26045 "Note: alphanumeric quote delimiter ($beginning_tok) \n");
26048 while ( $i < $max_token_index ) {
26050 if ( $quote_pos == 0 || ( $i < 0 ) ) {
26051 $tok = $$rtokens[ ++$i ];
26053 if ( $tok eq '\\' ) {
26055 # retain backslash unless it hides the end token
26056 $quoted_string .= $tok
26057 unless $$rtokens[ $i + 1 ] eq $end_tok;
26059 last if ( $i >= $max_token_index );
26060 $tok = $$rtokens[ ++$i ];
26063 my $old_pos = $quote_pos;
26065 unless ( defined($tok) && defined($end_tok) && defined($quote_pos) )
26069 $quote_pos = 1 + index( $tok, $end_tok, $quote_pos );
26071 if ( $quote_pos > 0 ) {
26074 substr( $tok, $old_pos, $quote_pos - $old_pos - 1 );
26078 if ( $quote_depth == 0 ) {
26084 $quoted_string .= substr( $tok, $old_pos );
26089 ########################################################################
26090 # Case 2 (normal): loop for case of a non-alphanumeric quote delimiter..
26091 ########################################################################
26094 while ( $i < $max_token_index ) {
26095 $tok = $$rtokens[ ++$i ];
26097 if ( $tok eq $end_tok ) {
26100 if ( $quote_depth == 0 ) {
26105 elsif ( $tok eq $beginning_tok ) {
26108 elsif ( $tok eq '\\' ) {
26110 # retain backslash unless it hides the beginning or end token
26111 $tok = $$rtokens[ ++$i ];
26112 $quoted_string .= '\\'
26113 unless ( $tok eq $end_tok || $tok eq $beginning_tok );
26115 $quoted_string .= $tok;
26118 if ( $i > $max_token_index ) { $i = $max_token_index }
26119 return ( $i, $in_quote, $beginning_tok, $quote_pos, $quote_depth,
26123 sub indicate_error {
26124 my ( $msg, $line_number, $input_line, $pos, $carrat ) = @_;
26125 interrupt_logfile();
26127 write_error_indicator_pair( $line_number, $input_line, $pos, $carrat );
26131 sub write_error_indicator_pair {
26132 my ( $line_number, $input_line, $pos, $carrat ) = @_;
26133 my ( $offset, $numbered_line, $underline ) =
26134 make_numbered_line( $line_number, $input_line, $pos );
26135 $underline = write_on_underline( $underline, $pos - $offset, $carrat );
26136 warning( $numbered_line . "\n" );
26137 $underline =~ s/\s*$//;
26138 warning( $underline . "\n" );
26141 sub make_numbered_line {
26143 # Given an input line, its line number, and a character position of
26144 # interest, create a string not longer than 80 characters of the form
26145 # $lineno: sub_string
26146 # such that the sub_string of $str contains the position of interest
26148 # Here is an example of what we want, in this case we add trailing
26149 # '...' because the line is long.
26151 # 2: (One of QAML 2.0's authors is a member of the World Wide Web Con ...
26153 # Here is another example, this time in which we used leading '...'
26154 # because of excessive length:
26156 # 2: ... er of the World Wide Web Consortium's
26158 # input parameters are:
26159 # $lineno = line number
26160 # $str = the text of the line
26161 # $pos = position of interest (the error) : 0 = first character
26164 # - $offset = an offset which corrects the position in case we only
26165 # display part of a line, such that $pos-$offset is the effective
26166 # position from the start of the displayed line.
26167 # - $numbered_line = the numbered line as above,
26168 # - $underline = a blank 'underline' which is all spaces with the same
26169 # number of characters as the numbered line.
26171 my ( $lineno, $str, $pos ) = @_;
26172 my $offset = ( $pos < 60 ) ? 0 : $pos - 40;
26173 my $excess = length($str) - $offset - 68;
26174 my $numc = ( $excess > 0 ) ? 68 : undef;
26176 if ( defined($numc) ) {
26177 if ( $offset == 0 ) {
26178 $str = substr( $str, $offset, $numc - 4 ) . " ...";
26181 $str = "... " . substr( $str, $offset + 4, $numc - 4 ) . " ...";
26186 if ( $offset == 0 ) {
26189 $str = "... " . substr( $str, $offset + 4 );
26193 my $numbered_line = sprintf( "%d: ", $lineno );
26194 $offset -= length($numbered_line);
26195 $numbered_line .= $str;
26196 my $underline = " " x length($numbered_line);
26197 return ( $offset, $numbered_line, $underline );
26200 sub write_on_underline {
26202 # The "underline" is a string that shows where an error is; it starts
26203 # out as a string of blanks with the same length as the numbered line of
26204 # code above it, and we have to add marking to show where an error is.
26205 # In the example below, we want to write the string '--^' just below
26206 # the line of bad code:
26208 # 2: (One of QAML 2.0's authors is a member of the World Wide Web Con ...
26210 # We are given the current underline string, plus a position and a
26211 # string to write on it.
26213 # In the above example, there will be 2 calls to do this:
26214 # First call: $pos=19, pos_chr=^
26215 # Second call: $pos=16, pos_chr=---
26217 # This is a trivial thing to do with substr, but there is some
26220 my ( $underline, $pos, $pos_chr ) = @_;
26222 # check for error..shouldn't happen
26223 unless ( ( $pos >= 0 ) && ( $pos <= length($underline) ) ) {
26226 my $excess = length($pos_chr) + $pos - length($underline);
26227 if ( $excess > 0 ) {
26228 $pos_chr = substr( $pos_chr, 0, length($pos_chr) - $excess );
26230 substr( $underline, $pos, length($pos_chr) ) = $pos_chr;
26231 return ($underline);
26236 # Break a string, $str, into a sequence of preliminary tokens. We
26237 # are interested in these types of tokens:
26238 # words (type='w'), example: 'max_tokens_wanted'
26239 # digits (type = 'd'), example: '0755'
26240 # whitespace (type = 'b'), example: ' '
26241 # any other single character (i.e. punct; type = the character itself).
26242 # We cannot do better than this yet because we might be in a quoted
26243 # string or pattern. Caller sets $max_tokens_wanted to 0 to get all
26245 my ( $str, $max_tokens_wanted ) = @_;
26247 # we return references to these 3 arrays:
26248 my @tokens = (); # array of the tokens themselves
26249 my @token_map = (0); # string position of start of each token
26250 my @type = (); # 'b'=whitespace, 'd'=digits, 'w'=alpha, or punct
26255 if ( $str =~ /\G(\s+)/gc ) { push @type, 'b'; }
26258 # note that this must come before words!
26259 elsif ( $str =~ /\G(\d+)/gc ) { push @type, 'd'; }
26262 elsif ( $str =~ /\G(\w+)/gc ) { push @type, 'w'; }
26264 # single-character punctuation
26265 elsif ( $str =~ /\G(\W)/gc ) { push @type, $1; }
26269 return ( \@tokens, \@token_map, \@type );
26273 push @token_map, pos($str);
26275 } while ( --$max_tokens_wanted != 0 );
26277 return ( \@tokens, \@token_map, \@type );
26282 # this is an old debug routine
26283 my ( $rtokens, $rtoken_map ) = @_;
26284 my $num = scalar(@$rtokens);
26287 for ( $i = 0 ; $i < $num ; $i++ ) {
26288 my $len = length( $$rtokens[$i] );
26289 print "$i:$len:$$rtoken_map[$i]:$$rtokens[$i]:\n";
26293 sub matching_end_token {
26295 # find closing character for a pattern
26296 my $beginning_token = shift;
26298 if ( $beginning_token eq '{' ) {
26301 elsif ( $beginning_token eq '[' ) {
26304 elsif ( $beginning_token eq '<' ) {
26307 elsif ( $beginning_token eq '(' ) {
26315 sub dump_token_types {
26319 # This should be the latest list of token types in use
26320 # adding NEW_TOKENS: add a comment here
26321 print $fh <<'END_OF_LIST';
26323 Here is a list of the token types currently used for lines of type 'CODE'.
26324 For the following tokens, the "type" of a token is just the token itself.
26326 .. :: << >> ** && .. || // -> => += -= .= %= &= |= ^= *= <>
26327 ( ) <= >= == =~ !~ != ++ -- /= x=
26328 ... **= <<= >>= &&= ||= //= <=>
26329 , + - / * | % ! x ~ = \ ? : . < > ^ &
26331 The following additional token types are defined:
26334 b blank (white space)
26335 { indent: opening structural curly brace or square bracket or paren
26336 (code block, anonymous hash reference, or anonymous array reference)
26337 } outdent: right structural curly brace or square bracket or paren
26338 [ left non-structural square bracket (enclosing an array index)
26339 ] right non-structural square bracket
26340 ( left non-structural paren (all but a list right of an =)
26341 ) right non-structural parena
26342 L left non-structural curly brace (enclosing a key)
26343 R right non-structural curly brace
26344 ; terminal semicolon
26345 f indicates a semicolon in a "for" statement
26346 h here_doc operator <<
26348 Q indicates a quote or pattern
26349 q indicates a qw quote block
26351 C user-defined constant or constant function (with void prototype = ())
26352 U user-defined function taking parameters
26353 G user-defined function taking block parameter (like grep/map/eval)
26354 M (unused, but reserved for subroutine definition name)
26355 P (unused, but -html uses it to label pod text)
26356 t type indicater such as %,$,@,*,&,sub
26357 w bare word (perhaps a subroutine call)
26358 i identifier of some type (with leading %, $, @, *, &, sub, -> )
26361 F a file test operator (like -e)
26363 Z identifier in indirect object slot: may be file handle, object
26364 J LABEL: code block label
26365 j LABEL after next, last, redo, goto
26368 pp pre-increment operator ++
26369 mm pre-decrement operator --
26370 A : used as attribute separator
26372 Here are the '_line_type' codes used internally:
26373 SYSTEM - system-specific code before hash-bang line
26374 CODE - line of perl code (including comments)
26375 POD_START - line starting pod, such as '=head'
26376 POD - pod documentation text
26377 POD_END - last line of pod section, '=cut'
26378 HERE - text of here-document
26379 HERE_END - last line of here-doc (target word)
26380 FORMAT - format section
26381 FORMAT_END - last line of format section, '.'
26382 DATA_START - __DATA__ line
26383 DATA - unidentified text following __DATA__
26384 END_START - __END__ line
26385 END - unidentified text following __END__
26386 ERROR - we are in big trouble, probably not a perl script
26392 # These names are used in error messages
26393 @opening_brace_names = qw# '{' '[' '(' '?' #;
26394 @closing_brace_names = qw# '}' ']' ')' ':' #;
26397 .. :: << >> ** && .. || // -> => += -= .= %= &= |= ^= *= <>
26398 <= >= == =~ !~ != ++ -- /= x= ~~
26400 @is_digraph{@digraphs} = (1) x scalar(@digraphs);
26402 my @trigraphs = qw( ... **= <<= >>= &&= ||= //= <=> !~~ );
26403 @is_trigraph{@trigraphs} = (1) x scalar(@trigraphs);
26405 # make a hash of all valid token types for self-checking the tokenizer
26406 # (adding NEW_TOKENS : select a new character and add to this list)
26407 my @valid_token_types = qw#
26408 A b C G L R f h Q k t w i q n p m F pp mm U j J Y Z v
26409 { } ( ) [ ] ; + - / * | % ! x ~ = \ ? : . < > ^ &
26411 push( @valid_token_types, @digraphs );
26412 push( @valid_token_types, @trigraphs );
26413 push( @valid_token_types, '#' );
26414 push( @valid_token_types, ',' );
26415 @is_valid_token_type{@valid_token_types} = (1) x scalar(@valid_token_types);
26417 # a list of file test letters, as in -e (Table 3-4 of 'camel 3')
26418 my @file_test_operators =
26419 qw( A B C M O R S T W X b c d e f g k l o p r s t u w x z);
26420 @is_file_test_operator{@file_test_operators} =
26421 (1) x scalar(@file_test_operators);
26423 # these functions have prototypes of the form (&), so when they are
26424 # followed by a block, that block MAY BE followed by an operator.
26425 @_ = qw( do eval );
26426 @is_block_operator{@_} = (1) x scalar(@_);
26428 # these functions allow an identifier in the indirect object slot
26429 @_ = qw( print printf sort exec system say);
26430 @is_indirect_object_taker{@_} = (1) x scalar(@_);
26432 # These tokens may precede a code block
26433 # patched for SWITCH/CASE
26434 @_ = qw( BEGIN END CHECK INIT AUTOLOAD DESTROY continue if elsif else
26435 unless do while until eval for foreach map grep sort
26436 switch case given when);
26437 @is_code_block_token{@_} = (1) x scalar(@_);
26439 # I'll build the list of keywords incrementally
26442 # keywords and tokens after which a value or pattern is expected,
26443 # but not an operator. In other words, these should consume terms
26444 # to their right, or at least they are not expected to be followed
26445 # immediately by operators.
26446 my @value_requestor = qw(
26665 # patched above for SWITCH/CASE given/when err say
26666 # 'err' is a fairly safe addition.
26667 # TODO: 'default' still needed if appropriate
26668 # 'use feature' seen, but perltidy works ok without it.
26669 # Concerned that 'default' could break code.
26670 push( @Keywords, @value_requestor );
26672 # These are treated the same but are not keywords:
26677 push( @value_requestor, @extra_vr );
26679 @expecting_term_token{@value_requestor} = (1) x scalar(@value_requestor);
26681 # this list contains keywords which do not look for arguments,
26682 # so that they might be followed by an operator, or at least
26684 my @operator_requestor = qw(
26708 push( @Keywords, @operator_requestor );
26710 # These are treated the same but are not considered keywords:
26717 push( @operator_requestor, @extra_or );
26719 @expecting_operator_token{@operator_requestor} =
26720 (1) x scalar(@operator_requestor);
26722 # these token TYPES expect trailing operator but not a term
26723 # note: ++ and -- are post-increment and decrement, 'C' = constant
26724 my @operator_requestor_types = qw( ++ -- C <> q );
26725 @expecting_operator_types{@operator_requestor_types} =
26726 (1) x scalar(@operator_requestor_types);
26728 # these token TYPES consume values (terms)
26729 # note: pp and mm are pre-increment and decrement
26730 # f=semicolon in for, F=file test operator
26731 my @value_requestor_type = qw#
26732 L { ( [ ~ !~ =~ ; . .. ... A : && ! || // = + - x
26733 **= += -= .= /= *= %= x= &= |= ^= <<= >>= &&= ||= //=
26734 <= >= == != => \ > < % * / ? & | ** <=> ~~ !~~
26735 f F pp mm Y p m U J G j >> << ^ t
26737 push( @value_requestor_type, ',' )
26738 ; # (perl doesn't like a ',' in a qw block)
26739 @expecting_term_types{@value_requestor_type} =
26740 (1) x scalar(@value_requestor_type);
26742 # Note: the following valid token types are not assigned here to
26743 # hashes requesting to be followed by values or terms, but are
26744 # instead currently hard-coded into sub operator_expected:
26745 # ) -> :: Q R Z ] b h i k n v w } #
26747 # For simple syntax checking, it is nice to have a list of operators which
26748 # will really be unhappy if not followed by a term. This includes most
26750 %really_want_term = %expecting_term_types;
26752 # with these exceptions...
26753 delete $really_want_term{'U'}; # user sub, depends on prototype
26754 delete $really_want_term{'F'}; # file test works on $_ if no following term
26755 delete $really_want_term{'Y'}; # indirect object, too risky to check syntax;
26758 @_ = qw(q qq qw qx qr s y tr m);
26759 @is_q_qq_qw_qx_qr_s_y_tr_m{@_} = (1) x scalar(@_);
26761 # These keywords are handled specially in the tokenizer code:
26762 my @special_keywords = qw(
26778 push( @Keywords, @special_keywords );
26780 # Keywords after which list formatting may be used
26781 # WARNING: do not include |map|grep|eval or perl may die on
26782 # syntax errors (map1.t).
26783 my @keyword_taking_list = qw(
26855 @is_keyword_taking_list{@keyword_taking_list} =
26856 (1) x scalar(@keyword_taking_list);
26858 # These are not used in any way yet
26859 # my @unused_keywords = qw(
26866 # The list of keywords was extracted from function 'keyword' in
26867 # perl file toke.c version 5.005.03, using this utility, plus a
26868 # little editing: (file getkwd.pl):
26869 # while (<>) { while (/\"(.*)\"/g) { print "$1\n"; } }
26870 # Add 'get' prefix where necessary, then split into the above lists.
26871 # This list should be updated as necessary.
26872 # The list should not contain these special variables:
26873 # ARGV DATA ENV SIG STDERR STDIN STDOUT
26876 @is_keyword{@Keywords} = (1) x scalar(@Keywords);
26883 Perl::Tidy - Parses and beautifies perl source
26889 Perl::Tidy::perltidy(
26891 destination => $destination,
26894 perltidyrc => $perltidyrc,
26895 logfile => $logfile,
26896 errorfile => $errorfile,
26897 formatter => $formatter, # callback object (see below)
26898 dump_options => $dump_options,
26899 dump_options_type => $dump_options_type,
26904 This module makes the functionality of the perltidy utility available to perl
26905 scripts. Any or all of the input parameters may be omitted, in which case the
26906 @ARGV array will be used to provide input parameters as described
26907 in the perltidy(1) man page.
26909 For example, the perltidy script is basically just this:
26912 Perl::Tidy::perltidy();
26914 The module accepts input and output streams by a variety of methods.
26915 The following list of parameters may be any of a the following: a
26916 filename, an ARRAY reference, a SCALAR reference, or an object with
26917 either a B<getline> or B<print> method, as appropriate.
26919 source - the source of the script to be formatted
26920 destination - the destination of the formatted output
26921 stderr - standard error output
26922 perltidyrc - the .perltidyrc file
26923 logfile - the .LOG file stream, if any
26924 errorfile - the .ERR file stream, if any
26925 dump_options - ref to a hash to receive parameters (see below),
26926 dump_options_type - controls contents of dump_options
26927 dump_getopt_flags - ref to a hash to receive Getopt flags
26928 dump_options_category - ref to a hash giving category of options
26929 dump_abbreviations - ref to a hash giving all abbreviations
26931 The following chart illustrates the logic used to decide how to
26934 ref($param) $param is assumed to be:
26935 ----------- ---------------------
26937 SCALAR ref to string
26939 (other) object with getline (if source) or print method
26941 If the parameter is an object, and the object has a B<close> method, that
26942 close method will be called at the end of the stream.
26948 If the B<source> parameter is given, it defines the source of the
26953 If the B<destination> parameter is given, it will be used to define the
26954 file or memory location to receive output of perltidy.
26958 The B<stderr> parameter allows the calling program to capture the output
26959 to what would otherwise go to the standard error output device.
26963 If the B<perltidyrc> file is given, it will be used instead of any
26964 F<.perltidyrc> configuration file that would otherwise be used.
26968 If the B<argv> parameter is given, it will be used instead of the
26969 B<@ARGV> array. The B<argv> parameter may be a string, a reference to a
26970 string, or a reference to an array. If it is a string or reference to a
26971 string, it will be parsed into an array of items just as if it were a
26972 command line string.
26976 If the B<dump_options> parameter is given, it must be the reference to a hash.
26977 In this case, the parameters contained in any perltidyrc configuration file
26978 will be placed in this hash and perltidy will return immediately. This is
26979 equivalent to running perltidy with --dump-options, except that the perameters
26980 are returned in a hash rather than dumped to standard output. Also, by default
26981 only the parameters in the perltidyrc file are returned, but this can be
26982 changed (see the next parameter). This parameter provides a convenient method
26983 for external programs to read a perltidyrc file. An example program using
26984 this feature, F<perltidyrc_dump.pl>, is included in the distribution.
26986 Any combination of the B<dump_> parameters may be used together.
26988 =item dump_options_type
26990 This parameter is a string which can be used to control the parameters placed
26991 in the hash reference supplied by B<dump_options>. The possible values are
26992 'perltidyrc' (default) and 'full'. The 'full' parameter causes both the
26993 default options plus any options found in a perltidyrc file to be returned.
26995 =item dump_getopt_flags
26997 If the B<dump_getopt_flags> parameter is given, it must be the reference to a
26998 hash. This hash will receive all of the parameters that perltidy understands
26999 and flags that are passed to Getopt::Long. This parameter may be
27000 used alone or with the B<dump_options> flag. Perltidy will
27001 exit immediately after filling this hash. See the demo program
27002 F<perltidyrc_dump.pl> for example usage.
27004 =item dump_options_category
27006 If the B<dump_options_category> parameter is given, it must be the reference to a
27007 hash. This hash will receive a hash with keys equal to all long parameter names
27008 and values equal to the title of the corresponding section of the perltidy manual.
27009 See the demo program F<perltidyrc_dump.pl> for example usage.
27011 =item dump_abbreviations
27013 If the B<dump_abbreviations> parameter is given, it must be the reference to a
27014 hash. This hash will receive all abbreviations used by Perl::Tidy. See the
27015 demo program F<perltidyrc_dump.pl> for example usage.
27021 The following example passes perltidy a snippet as a reference
27022 to a string and receives the result back in a reference to
27027 # some messy source code to format
27028 my $source = <<'EOM';
27030 my @editors=('Emacs', 'Vi '); my $rand = rand();
27031 print "A poll of 10 random programmers gave these results:\n";
27033 my $i=int ($rand+rand());
27034 print " $editors[$i] users are from Venus" . ", " .
27035 "$editors[1-$i] users are from Mars" .
27040 # We'll pass it as ref to SCALAR and receive it in a ref to ARRAY
27042 perltidy( source => \$source, destination => \@dest );
27043 foreach (@dest) {print}
27045 =head1 Using the B<formatter> Callback Object
27047 The B<formatter> parameter is an optional callback object which allows
27048 the calling program to receive tokenized lines directly from perltidy for
27049 further specialized processing. When this parameter is used, the two
27050 formatting options which are built into perltidy (beautification or
27051 html) are ignored. The following diagram illustrates the logical flow:
27053 |-- (normal route) -> code beautification
27054 caller->perltidy->|-- (-html flag ) -> create html
27055 |-- (formatter given)-> callback to write_line
27057 This can be useful for processing perl scripts in some way. The
27058 parameter C<$formatter> in the perltidy call,
27060 formatter => $formatter,
27062 is an object created by the caller with a C<write_line> method which
27063 will accept and process tokenized lines, one line per call. Here is
27064 a simple example of a C<write_line> which merely prints the line number,
27065 the line type (as determined by perltidy), and the text of the line:
27069 # This is called from perltidy line-by-line
27071 my $line_of_tokens = shift;
27072 my $line_type = $line_of_tokens->{_line_type};
27073 my $input_line_number = $line_of_tokens->{_line_number};
27074 my $input_line = $line_of_tokens->{_line_text};
27075 print "$input_line_number:$line_type:$input_line";
27078 The complete program, B<perllinetype>, is contained in the examples section of
27079 the source distribution. As this example shows, the callback method
27080 receives a parameter B<$line_of_tokens>, which is a reference to a hash
27081 of other useful information. This example uses these hash entries:
27083 $line_of_tokens->{_line_number} - the line number (1,2,...)
27084 $line_of_tokens->{_line_text} - the text of the line
27085 $line_of_tokens->{_line_type} - the type of the line, one of:
27087 SYSTEM - system-specific code before hash-bang line
27088 CODE - line of perl code (including comments)
27089 POD_START - line starting pod, such as '=head'
27090 POD - pod documentation text
27091 POD_END - last line of pod section, '=cut'
27092 HERE - text of here-document
27093 HERE_END - last line of here-doc (target word)
27094 FORMAT - format section
27095 FORMAT_END - last line of format section, '.'
27096 DATA_START - __DATA__ line
27097 DATA - unidentified text following __DATA__
27098 END_START - __END__ line
27099 END - unidentified text following __END__
27100 ERROR - we are in big trouble, probably not a perl script
27102 Most applications will be only interested in lines of type B<CODE>. For
27103 another example, let's write a program which checks for one of the
27104 so-called I<naughty matching variables> C<&`>, C<$&>, and C<$'>, which
27105 can slow down processing. Here is a B<write_line>, from the example
27106 program B<find_naughty.pl>, which does that:
27110 # This is called back from perltidy line-by-line
27111 # We're looking for $`, $&, and $'
27112 my ( $self, $line_of_tokens ) = @_;
27114 # pull out some stuff we might need
27115 my $line_type = $line_of_tokens->{_line_type};
27116 my $input_line_number = $line_of_tokens->{_line_number};
27117 my $input_line = $line_of_tokens->{_line_text};
27118 my $rtoken_type = $line_of_tokens->{_rtoken_type};
27119 my $rtokens = $line_of_tokens->{_rtokens};
27122 # skip comments, pod, etc
27123 return if ( $line_type ne 'CODE' );
27125 # loop over tokens looking for $`, $&, and $'
27126 for ( my $j = 0 ; $j < @$rtoken_type ; $j++ ) {
27128 # we only want to examine token types 'i' (identifier)
27129 next unless $$rtoken_type[$j] eq 'i';
27131 # pull out the actual token text
27132 my $token = $$rtokens[$j];
27135 if ( $token =~ /^\$[\`\&\']$/ ) {
27137 "$input_line_number: $token\n";
27142 This example pulls out these tokenization variables from the $line_of_tokens
27145 $rtoken_type = $line_of_tokens->{_rtoken_type};
27146 $rtokens = $line_of_tokens->{_rtokens};
27148 The variable C<$rtoken_type> is a reference to an array of token type codes,
27149 and C<$rtokens> is a reference to a corresponding array of token text.
27150 These are obviously only defined for lines of type B<CODE>.
27151 Perltidy classifies tokens into types, and has a brief code for each type.
27152 You can get a complete list at any time by running perltidy from the
27155 perltidy --dump-token-types
27157 In the present example, we are only looking for tokens of type B<i>
27158 (identifiers), so the for loop skips past all other types. When an
27159 identifier is found, its actual text is checked to see if it is one
27160 being sought. If so, the above write_line prints the token and its
27163 The B<formatter> feature is relatively new in perltidy, and further
27164 documentation needs to be written to complete its description. However,
27165 several example programs have been written and can be found in the
27166 B<examples> section of the source distribution. Probably the best way
27167 to get started is to find one of the examples which most closely matches
27168 your application and start modifying it.
27170 For help with perltidy's pecular way of breaking lines into tokens, you
27171 might run, from the command line,
27173 perltidy -D filename
27175 where F<filename> is a short script of interest. This will produce
27176 F<filename.DEBUG> with interleaved lines of text and their token types.
27177 The B<-D> flag has been in perltidy from the beginning for this purpose.
27178 If you want to see the code which creates this file, it is
27179 C<write_debug_entry> in Tidy.pm.
27187 Thanks to Hugh Myers who developed the initial modular interface
27192 This man page documents Perl::Tidy version 20070508.
27197 perltidy at users.sourceforge.net
27201 The perltidy(1) man page describes all of the features of perltidy. It
27202 can be found at http://perltidy.sourceforge.net.